header

Categories::

Projects::

SAGE
M4RI
Code Snippets
ECrypt II
iliketotallyloveit

Stuff::

Junge Linke (de)
Battrock (de)

MiniMe::

BitBucket
Flickr
Sat, 27. Jun 2009

LQUP vs. PLUQ

At SD16 Clément Pernet and myself have been working on improving the asymptotically fast PLUQ factorisation over GF(2) in M4RI. As mentioned earlier, one of the main problems is that column swaps are pretty expensive compared to many other operations we do. Eventually, we settled for LQUP over PLUQ since it has fewer column swaps overall since it does not compress U. We also improved the base case both w.r.t. to sparse matrices and in general (more Gray code tables are used now) and the column swap performance overall (cf. SD 16 Wiki). The result is noticeable, but we are not quite there yet:

M4RI r284 vs. r292

There are still some places which could be improved so this should get better eventually. Also, we might have another strategy to deal with these sparse-ish/structured matrices. Anyway, the new PLUQ code is at least as fast as M4RI for the structured HFE examples on the M4RI website on my Core2Duo 2.33Ghz notebook (and of course much faster on random examples and on other platforms) The new code is available on BitBucket.

posted at: 15:22 :: permanent link

Thu, 18. Jun 2009

More OpenMP Experiments with M4RI

Motivated by a thread on [mpir-dev] I played around with OpenMP again today. The performance does not scale linearly … but hey it scales at all. I guess eventually I’ll have to get serious about this and sit down to make this proper. Anyway, here are the timings (on geom.math.washington.edu)

Computing the reduced row echelon form of an $n \times n$ random dense matrix over $\mathbf{F}_2$.
n M4RI
1 thread
PLUQ
1 thread
M4RI
4 threads
PLUQ
4 threads
M4RI
16 threads
PLUQ
16 threads
PLUQ 16 threads
cutoff=2048
10,000 1.72 0.85 1.03 0.86 0.58 0.80 0.77
16,384 13.75 5.76 4.78 4.23
20,000 27.02 5.45 7.35 5.48 3.27 3.68
32,000 112.74 21.96 30.51 22.02 13.78 13.91 12.95
64,000 227.80 157.03 104.94 75.95 66.54
100,000 1078.72 429.32 869.43 596.51 428.08 260.99 231.01

For some reason which I don’t understand yet is PLUQ slower for 16,384 than 20,000 on this machine. The code is on bitbucket.

posted at: 22:44 :: permanent link

Wed, 03. Jun 2009

Geometric XL

The paper by Sean Murphy and Maura Paterson has been around for quite some time now (same for my toy implementation). Gröbner basis algorithms and related methods like XL are algebraic in nature. In particular, their complexity is not invariant under a linear change of coordinates. For example consider Cyclic-6

    
    sage: P.<a,b,c,d,e,f,h> = PolynomialRing(GF(32003))
    sage: I = sage.rings.ideal.Cyclic(P,6).homogenize(h)
    sage: J = Ideal(I.groebner_basis())

The generators of J are a Gröbner basis and we can use this property to find a common root for these generators. Now, consider the same equations but permute the variables in the ring.

    sage: P.<a,b,c,d,e,f,h> = PolynomialRing(GF(32003),order='lex')
    sage: I = sage.rings.ideal.Cyclic(P,6).homogenize(h)
    sage: J = Ideal(I.groebner_basis())
    sage: R = PolynomialRing(GF(32003),P.ngens(),list(reversed(P.variable_names())),order='lex')
    sage: H = Ideal([R(str(f)) for f in J.gens()])

The generators of H do not form a Gröbner basis in R which is just P with its variables reversed. If we are only trying to solve a system of equations choosing the right permutation of variables might have a significant impact on the performance of our Gröbner basis algorithm:

    sage: t = cputime()
    sage: gb = H.groebner_basis('libsingular:std')
    sage: gb[-1].degree()
    19
    sage: cputime(t) # output random-ish
    25.36...

While in this example it is easy to see which variable permutation is the cheapest one, this is not necessarily the case in general. The GeometricXL algorithm is invariant under any linear change of coordinates and has the following property:

Let D be the degree reached by the XL algorithm to solve a given system of equations under the optimal linear change of coordinates. Then GeometricXL will also solve this system of equations for the degree D, without applying this optimal linear change of coordinates first.

The above behaviour holds under two assumptions:

To demonstrate this behaviour, consider the synthetic benchmark available here which is a Gröbner basis under a linear change of coordinates:

    sage: e,h = random_example(n=6)

e is the original easy system while h is the “rotated” system

    sage: e.basis_is_groebner()
    True

    sage: max([f.total_degree() for f in e.gens()])
    2

    sage: h.basis_is_groebner()
    False

    sage: max([f.total_degree() for f in h.gens()])
    2

GeometricXL recovers linear factors and thus candidates for common roots at D=2:

    sage: hH = h.homogenize()
    sage: f = GeometricXL(hH, D=2); f.factor(False)
    0.0...s -- 1. D: 2
    ...
    (-2684) 
    * (-1056*x5 - 2964*x4 - 177*x3 + 6206*x2 + 376*x1 + 6257*x0 + h) 
    * (2957*x5 - 792*x4 - 4323*x3 - 14408*x2 - 2750*x1 - 8823*x0 + h)

While any Gröbner basis algorithm would have to reach at least degree 64:

    sage: gb = h.groebner_basis('libsingular:slimgb')
    sage: gb[-1].degree()
    64

While my toy implementation is neither robust not efficient, the following table (using the same synthetic benchmark as above) should give some indication that for some problems GeometricXL is a good choice:

Synthetic Benchmark for GeometricXL Algorithm over GF(32003)
n GeometricXL degree GeometricXL time in seconds Singular 3-0-4 time in seconds Magma 2.14 time in seconds Gröbner basis degree
22 0.01 0.00 0.00 4
32 0.07 0.00 0.00 8
42 0.18 0.00 0.00 16
52 0.37 0.02 0.01 32
62 0.73 0.12 0.04 64
72 1.36 0.94 0.09 128
82 2.35 9.59 0.56 256
92 4.24117.43 3.79 512
102 7.34 28.681024
112 12.08 205.052048
122 20.03
132 35.53
142 53.98
152 88.36
162143.16

While the benchmark is synthetical and unrealistic, there are a few problems in multivariate quadratic public key cryptography which might be tackled using an idea similar to the GeometricXL algorithm. Also, note that constructing a GeometricMatrixF5 algorithm is straight forward by replacing the first step of the algorithm.

posted at: 14:10 :: permanent link

Wed, 20. May 2009

F4-Style F5 — Next Attempt

Since the F4-style F5 I mentioned a while ago wasn’t really that “F4-style” I’ve pushed my next attempt into the public “algebraic attack” bitbucket repository. This version swaps the two outer loops, i.e. it proceeds by degree in the outer loop instead of the index of the polynomials. Of course, we are talking about a toy implementation here to understand the algorithm and not an attempt to implement F5 efficiently.

posted at: 20:47 :: permanent link

SSH Paper Available Online

Our paper on plaintext recovery attacks against SSH is now available online both locally and on Kenny’s website. Btw. contrary to popular belief the paper is not on an implementation error in OpenSSH but on a design flaw in the SSHv2 specification which enables a variety of attacks against various implementations of the standard.

posted at: 20:43 :: permanent link

Tue, 07. Apr 2009

Funny

It might very well be just a conincidence, but superficially it looks like Mathematica is targeting Sage users with ads on Google.

Mathematica ad

Source: [sage-devel] … scroll down to see Harald Schilly debunk the whole thing. Still, it’s funny.

posted at: 13:08 :: permanent link

Sun, 22. Mar 2009

Windows Sage 0.3

As you might have heard, there is a Windows port of Sage in the making. While the current version doesn’t look like Sage proper, it already ships quite a few building blocks. Since, incidently the Information Security Group is now part of the “Microsoft Developer Academic Alliance” — which grants students free access to their products — I gave it a quick spin. See below.

screenshot of Windows Sage 3.0

Installation was straight forward once you have Visual Studio 2008 installed, just call build.bat and wait an hour or so.

posted at: 16:24 :: permanent link

Fri, 20. Mar 2009

M4RI API Change and Big Matrices

Finally, I found some time to work on M4RI again: I changed the internal representation of matrices to support more than one malloc() call per matrix, i.e. each matrix is split into blocks of some maximal size. This allows to deal with much bigger matrices under Linux because the kernel often won’t just give you 8GB mapped to consecutive addresses but it will give you 1GB eight times. This limitation bugged me quite some time now, since this also limited the kind of systems I could solve using PolyBoRi with M4RI enabled. The result is available at http://bitbucket.org/malb/m4ri/ but bear in mind that the API changed quite a bit for this (e.g., I renamed the packedmatrix to mzd_t) on the way).

64-bit Ubuntu, Xeon X7400 @2.66GHz
Matrix
Dimension
Memory
(expected)
M4RI/M4RI
(64-bit, 1 core)
M4RI/PLUQ
(64-bit, 1 core)
100,000 x 100,000 > 1.16 GB 1078.72 s 429.32 s
200,000 x 200,000 > 4.65 GB 2298.30 s
256,000 x 256,000 > 7.63 GB 8979.33 s 3709.25 s

The above table gives the time to compute the rank of random matrices of the given dimension using the given algorithms on http://geom.math.washington.edu.

posted at: 12:16 :: permanent link

Tue, 10. Mar 2009

A Simple Bitslice Implementation of PRESENT

I had a hunch the other day which required $2^{36}$ plaintext—ciphertext pairs to be tested. While the hunch turned out to be wrong, at least I got a simple bitslice implementation of PRESENT (in pure C99) out of it. The implementation is far from optimal:

Still, it is better than what I found on the web via a quick Google search. On my 2.33 Ghz Core2Duo Macbook Pro it seems to run at 28 cycles per byte for long messages (not counting the key schedule). For comparison, I think the current speed of AES in bit slice mode is like 10 cycles per byte on the Core2. If I need some performance in the future, I might sit down and improve the implementation (such that it doesn’t totally suck) but for now I have to attend to other projects.

posted at: 13:02 :: permanent link

Mon, 02. Mar 2009

Attacking Cryptographic Schemes Based on “Perturbation Polynomials”

“We show attacks on several cryptographic schemes that have recently been proposed for achieving various security goals in sensor networks. Roughly speaking, these schemes all use “perturbation polynomials” to add “noise” to polynomial-based systems that offer information-theoretic security, in an attempt to increase the resilience threshold while maintaining efficiency. We show that the heuristic security arguments given for these modified schemes do not hold, and that they can be completely broken once we allow even a slight extension of the parameters beyond those achieved by the underlying information-theoretic schemes. Our attacks apply to the key predistribution scheme of Zhang et al. (MobiHoc~2007), the access-control schemes of Subramanian et al. (PerCom~2007), and the authentication schemes of Zhang et~al. (INFOCOM~2008).”

That’s the abstract of a paper of Craig Gentry, Shai Halevi, Jonathan Katz and myself which just appeared on eprint. The source code for the experiments conducted in that work is available at bitbucket.

posted at: 12:31 :: permanent link

Mon, 02. Feb 2009

Sage hits Debian/unstable

As jzmer just pointed out on #sage-devel, Sage hit Debian/unstable today-ish. Of course, this also means that the wealth of math software Sage ships (e.g., Singular) is now also available in Debian/unstable. Well done Tim!

Update:Carl — also on IRC — points out that there are issues with the binary shipped with Debian as of now. Hopefully, this will get resolved quickly.

posted at: 00:30 :: permanent link

Wed, 28. Jan 2009

Semi-Regular Bits and Pieces

John Perry’s slides from his talk at Sage Days 12 in San Diego are available on the Sage wiki.

In my Diplomarbeit I had an algorithm for computing Gröbner bases of block ciphers called “Gröbner surfing”: “Instead of computing a reduced Gröbner basis for all rounds $rgb_F$ it computes the reduced Gröbner basis $rgb_{i+1}$ up to round $i + 1$ recursively as $$rgb_{i+1} = rgb(gb_i + round_{i+1})$$ with $rgb_0 = rgb(round_0)$ where $$rgb(round_i)$$ denotes any algorithm returning a reduced Gröbner basis for a given finite set of polynomials $round_i$.” This obviously is a tiny subset of what $F_5$ does: computing the Gröbner basis for $\langle f_i,\dots,f_m \rangle$ from the Gröbner basis of $\langle f_{i+1},\dots,f_m \rangle$. So nothing new here, move along.

I’m going to the Fast Software Encryption (FSE) 2009 workshop in Leuven to present our paper “Algebraic Techniques in Differential Cryptanalysis”. The full list of accepted papers is available online.

posted at: 11:46 :: permanent link

Sun, 25. Jan 2009

F4-Style F5

When I asked Jean-Charles Faugère a while back why he doesn’t publish his F4-style F5 he answered that there would be nothing to publish since it is straight forward. I don’t know about that but I was actually quite surprised how quickly John Perry and I could come up with an F4-style F5 here at Sage Days 12 (pictures). Btw. Till Stegers calls this F4.5 but I don’t know how Jean-Charles Faugère refers it.

I’ve uploaded the toy implementation to bitbucket. It seems to behave like the polynomial F5 implementation in the same file, so at least it is bug by bug compatible with that. Speaking of behaviour: When computing Cyclic-6 over $\mathbb{F}_{32003}$ w.r.t. degrevlex I noticed that my F4 implementation only goes up to degree 16 while my F5 implementations need to consider degree 18. Although no matrix is constructed for degree 18, the F4-style F5 does construct and eliminate a matrix for degree 17 … puzzling.

posted at: 22:45 :: permanent link

Fri, 16. Jan 2009

Bitslicing and the Method of the Four Russians over Larger Finite Fields

Tom Boothby’s and Robert Bradshaw’s paper on the “Method of the Four Russian” multiplication algorithm over $\mathbb{F}_3$, $\mathbb{F}_5$, $\mathbb{F}_7$, $\mathbb{F}_{2^2}$ and $\mathbb{F}_{2^3}$ is available as pre-print on the arXiv. If you’re into fast exact linear algebra I highly recommend reading it since it has some really nice ideas in it and is well written.

Abstract. “We present a method of computing with matrices over very small finite fields of size larger than 2. Specifically, we show how the Method of Four Russians can be efficiently adapted to these larger fields, and introduce a row-wise matrix compression scheme that both reduces memory requirements and allows one to vectorize element operations. We also present timings which confirm the efficiency of these methods and exceed the speed of the fastest implementations the authors are aware of.”

posted at: 13:14 :: permanent link

Mon, 12. Jan 2009

F5 Sans Rewriting

In his PhD thesis Justin Gash writes: “Rewritten gives us information to be used as an additional criterion for eliminating critical pairs. … In short, we could remove all discussion of rules and Rewritten and F5 would work fine. (But it would work much more slowly.)”

This motivated an implementation of F5SansRewriting which removes all mentions of Rewriting to make it easier to solely focus on the F5 criterion first. Of course the result is horribly inefficient. Cyclic-5 over $Z_{32003}$ takes 0.15 seconds using F5 and 81520.64 seconds using F5SansRewriting … well, at least it gives correct results and hopefully helps somehow.

posted at: 17:46 :: permanent link

Fri, 09. Jan 2009

“What Every Programmer Should Know About Memory”

is the title of an interesting paper by Ulrich Drepper of Rad Hat, Inc. The paper is — as you might have guessed — about the memory architecture in modern CPUs and what you — dear programmer — can do to improve performance of your code based on this knowledge. The paper has a section on tools for performance tuning which also discusses OProfile. OProfile is an interface to the CPU’s event counters (which vary across platforms). These counters can be queried to investigate the behaviour of a given piece of code on a given CPU. Intel’s Intel 64 and IA-32 Architectures Optimization Reference Manual suggests a list of event counters to look at. For instance the Intel engineers write:

Clocks Per Instruction Retired Ratio (CPI): CPU_CLK_UNHALTED.CORE / INST_RETIRED.ANY. The Intel Core microarchitecture is capable of reaching CPI as low as 0.25 in ideal situations. But most of the code has higher CPI The greater value of CPI for a given workload indicate it has more opportunity for code tuning to improve performance. The CPI is an overall metric, it does not provide specificity of what microarchitectural sub-system may be contributing to a high CPI value.”

To take it for a spin I chose the obvious target: the libM4RI library. Below is the CPI for bench_elimination 20000 20000 pluq.

== UNIFORMLY RANDOM ==
                        symbol  CPI  % of overall time
------------------------------------------------------
                 _mzd_mul_m4rm  0.94 (49.68)
              mzd_process_rows  1.76 (13.68)
                          .plt  0.62 (10.47)
        mzd_process_rows2_pluq  1.88 (10.07)
                mzd_make_table  0.76 ( 8.31)
                      mzd_copy  1.85 ( 1.80)
                _mzd_mul_naive  1.10 ( 1.38)
       mzd_apply_p_right_trans  0.99 ( 1.14)
     _mzd_trsm_lower_left_even  0.75 ( 0.94)
             mzd_apply_p_right  0.81 ( 0.93)
                   mzd_combine  3.17 ( 0.61)

Note that mzd_process_rows() and mzd_process_rows2_pluq() (which are both called from the PLUQ MMPF base case) have a rather high CPI compared to the rest of the code, indicating that I should sit down and implement using more than two tables (multiplication uses eight tables). I don’t understand the high CPI for mzd_copy() though. Half rank matrices behave quite differently, but surprisingly the CPI for mzd_apply_p_right_trans() is rather low.

== HALF RANK ==
                        symbol  CPI  % of overall time
------------------------------------------------------
                 _mzd_mul_m4rm  0.93 (40.01)
       mzd_apply_p_right_trans  0.79 (10.05)
             mzd_apply_p_right  0.78 ( 7.44)
                mzd_make_table  0.75 ( 7.26)
                mzd_find_pivot  1.44 ( 6.00)
                          .plt  0.68 ( 5.64)
                _mzd_pluq_mmpf  1.58 ( 4.58)
           _mzd_pluq_submatrix  0.90 ( 3.88)
                  mzd_col_swap  1.20 ( 3.58)
              mzd_process_rows  1.76 ( 2.85)
            mzd_row_add_offset  1.61 ( 2.24)
        mzd_process_rows2_pluq  1.88 ( 2.10)
                _mzd_mul_naive  1.19 ( 1.49)
                      mzd_copy  2.17 ( 1.32)
                   mzd_combine  3.31 ( 0.89)
     _mzd_trsm_lower_left_even  0.75 ( 0.25)
     _mzd_trsm_upper_left_even  0.67 ( 0.13)
               mzd_init_window  1.24 ( 0.07)

My preliminary understanding of those high .plt values is that compiling the benchmarketing software with -static would improve the timings (or that I should inline more?).

posted at: 14:02 :: permanent link

Wed, 07. Jan 2009

PhD Thesis on F5

May I point my dear reader’s attention to Justin Gash’s PhD thesis on F5. It is mainly referred to because it points out an error in the proof of termination in the original F5 paper. I started reading the thesis today (in preparation of SD12 in San Diego) and it already helped me a lot to understand the algorithm. As a consequence I added a bunch of comments from Justin’s thesis to my toy F5 implementation which should explain a bit what is going on.

However, since Justin’s pseudocode is not identical with John Perry’s pseudocode (on which my implementation is based) things don’t match up perfectly. For instance, I still need to figure out what the differences in critical_pair exactly come down to.

posted at: 19:03 :: permanent link

Mon, 05. Jan 2009

M4RI-20090105 Released

Sources are available for download. See release notes for details about what changed since the last version and some timings (1, 2, 3, 4) to get an idea of the performance.

posted at: 17:44 :: permanent link

Sat, 03. Jan 2009

New M4RI Release Coming Up

I will probably cut a new M4RI release within a week or so. The main changes are:

PLUQ/M4RI/Magma

On a related note: Marco Bodrato came up with a new Strassen-like sequence for multiplying and squaring matrices which provides a small (linear) speed-up for squaring. He also provides a drop-in strassen.c replacement for M4RI-20080521 which implements his new sequence.

posted at: 15:12 :: permanent link

Tue, 23. Dec 2008

M4RI’s and MMPF’s Sensitivity to Density

The last couple of days I’ve been working on improving libM4RI for sparse-ish matrices. The matrices I am talking about here are still represented as dense matrices but have non-uniformly distributed entries. While PLUQ factorisation is still very very (very very) slow for e.g. half rank matrices, things are getting better for M4RI matrix elimination.

r219 vs. r221

The updated sources are available on bitbucket. I will probably cut a new release very early next year.

posted at: 15:07 :: permanent link

Sat, 13. Dec 2008

Bit Bucket

About three weeks ago I discovered Bitbucket: a web service (with free and paid-for plans) for hosting Mercurial repositories. So far it has everything I would want:

I’m now hosting the main M4RI repository on bitbucket and so far it is a very smooth experience. Speaking of M4RI, it now contains an implementation of PLUQ factorisation inspired by Greg’s M4RI algorithm dubbed “Method of Many People Factorisation” in brilliantrussian.c. While this implementation seems to do its job, we still have iron out a couple of bugs in the recursive PLUQ factorisation codebase.

Finally, some random code snippets I posted on this blog are now also available on bitbucket (e.g., F4, F5, Matrix F5, DES equation system generators, Present equation system generators, an ANF to CNF converter).

posted at: 18:31 :: permanent link

Reactions to CPNI-957037: Vulnerability in SSH

Update:The US-Cert vulnerability note is VU#958563.

posted at: 17:39 :: permanent link

Thu, 27. Nov 2008

Talk on Matrix F5

“We will describe the ‘Matrix F5’ algorithm. Although not presented as an algorithm there, this algorithm appears on the first pages of Jean-Charles Faugere’s famous F5 paper to explain the main idea of F5 itself. It was later described in several French PhD theses but never formally published in English. The talk is titled ‘for the working cryptographer’ because we are going to start from the relatively well known XL algorithm and extend it to Matrix F5.”

posted at: 11:56 :: permanent link

Fri, 14. Nov 2008

CPNI-957037: Vulnerability in SSH

Abstract:Vulnerability found in the network protocol SSH (Secure Shell), that allows data to be exchanged using a secure channel between two networked devices. A design flaw in the SSH specification allows an attacker with control over the network to recover up to 32 bits of plaintext from an SSH-protected connection in the standard configuration. The success probability in recovering 32 plaintext bits is $2^{-18}$ when attacking the OpenSSH implementation of the SSH RFCs. A variant of the attack against the OpenSSH implementation verifiably recovers 14 plaintext bits with probability $2^{-14}$. The recovered bits come from an arbitrary, attacker-selected block of ciphertext. The success probabilities for other implementations are unknown (but are potentially much higher).

See also: SSH.com

Update: Added “verifiably” for the 14-bit attack to point out that this attack is strictly better than guessing 14 bits. CPNI will probably update the advisory too.

posted at: 18:03 :: permanent link

Wed, 12. Nov 2008

“Efficient Multiplication of Dense Matrices over GF(2)”

We describe an efficient implementation of a hierarchy of algorithms for multiplication of dense matrices over the field with two elements (GF(2)). In particular we present our implementation - in the M4RI library - of Strassen-Winograd matrix multiplication and the “Method of the Four Russians” multiplication (M4RM) and compare it against other available implementations. Good performance is demonstrated on on AMD’s Opteron and particulary good performance on Intel’s Core 2 Duo. The open-source M4RI library is available stand-alone as well as part of the Sage mathematics software.

In machine terms, addition in GF(2) is logical-XOR, and multiplication is logical-AND, thus a machine word of 64-bits allows one to operate on 64 elements of GF(2) in parallel: at most one CPU cycle for 64 parallel additions or multiplications. As such, element-wise operations over GF(2) are relatively cheap. In fact, in this paper, we conclude that the actual bottlenecks are memory reads and writes and issues of data locality. We present our empirical findings in relation to minimizing these and give an analysis thereof.”

Related News: My shiny new version of Magma 2.14-17 seems to perform better than Magma 2.14-14 for matrix multiplication over $\mathbf{F}_2$ on the Core 2 Duo. So I updated the performance data on the M4RI website. However, the changelog doesn’t mention any improvements in this area. Btw. searching for “Magma 2.14” returns the M4RI website first for me, which feels wrong on so many levels. Finally, M4RI is being packaged for Fedora Core.

posted at: 17:43 :: permanent link

Thu, 06. Nov 2008

Yet Another Talk on Sage

Today, I gave a talk on Sage to the ISG PhD seminar at Royal Holloway. I think it went alright, although people around here just don’t get excited about computation that much. Anyway, I’ve uploaded the slides and the demo (pdf, worksheet).

posted at: 15:40 :: permanent link

Thu, 30. Oct 2008

Algebraic Attacks against Block Ciphers

That’s the title of the 15 minute talk I gave yesterday at Cambridge to Part 3 students. It is basically a refurbished version of a talk I gave in March 2007 in Seattle. Also, I was asked to forward this:

Beyond Part III will take place over 16-18 April 2009 at the Centre for Mathematical Sciences, University of Cambridge.

Our aim is to develop a stronger network between mathematics graduate communities in the UK and internationally. We would like to emphasise that Beyond Part III is open to all mathematical researchers at the start of their career, whether or not they did Part III. The majority of participants are likely to be those who began a PhD in mathematics within the last five years, but this is a guideline only.

A dozen sessions covering different research areas will be run for one and a half days. Most participants will present work and each session will also include a keynote talk from a more senior researcher in that field. Beyond Part III will enable young mathematicians working in similar areas to form new links, and so plenty of social events are also planned. …”

posted at: 11:49 :: permanent link

Sun, 26. Oct 2008

Matrix F5

I finally fixed my Matrix $F_5$ implementation. I’ll also give a talk about Matrix $F_5$ to the PhD seminar at ISG on November 27th. I’ll the post the slides around then for those interested.

posted at: 19:58 :: permanent link

Mon, 20. Oct 2008

Bits and Pieces

I spent last week at Sage Days 10 (pictures and more pictures) in Nancy, France which turned out to be a very nice event. I (together with Simon King, Michael Brickenstein and Ludovic Perret) spent most of my time working on various toy implementations of F5 in order to understand the algorithm (better). We also conversed with John Perry who’s pseudcode, Singular code and description of F5 was incredibly helpful (and motivated me to work on this project in the first place, btw). My toy implementation of the polynomial version of F5 is available online and so is Simon King’s Cython implementation for Sage. John Perry now also provides a non-homogeneous version of F5 based on my Sage implementation.

I also implemented a toy version of F5/Matrix which indeed avoids a fair number of zero reductions and returns a Groebner basis if $d_{max}$ is big enough. I don’t think it does avoid as many reductions as the polynomial version which indicates a problem in my code. Note, that F5/Matrix is not described in detail in English literature (if you speak French and want to translate some short notes on this algorithms to English, please let me know).

I didn’t work on M4RI during Sage Days 10 but Clement Pernet and Jean-Guillaume Dumas did. We will probably release a new version with tried and tested TRSM code soon. Btw. I also gave a contributed talk about M4RI at Sage Days 10. My project for a train ride right after Sage Days 10 was to improve univariate polynomials over $\mathbb{F}_2$ in Sage to improve modular composition of polynomials.

I’m off to Santander on Wednesday for the 2nd Workshop on Mathematical Cryptology and once I’m back I’ll give a talk at the Graduate Studies Elsewhere Open Afternoon in Cambridge on “Algebraic Attacks against Block Ciphers”.

One last thing: The DES generator is broken due to a bug in Sage, a fix is available on Trac.

Update:fixed a bug in the F5/Matrix code and removed a nonsense statement about the rank of the matrices.

posted at: 19:04 :: permanent link

Thu, 18. Sep 2008

Equation System Generator for DES

I’ve uploaded an equation system generator for the “Data Encryption Standard” (DES) hoping to spare others the tedious work of writing one. As far as I know only equation systems but no flexible generators are floating around the net. I implemented three S-Box representations (following Courtois’ and Bard’s paper Algebraic Cryptanalysis of DES): fully cubic equations, quadratic equations with added variables (Matthew Kwan’s bitslice representation) and the former with pre-computed Gröbner bases.

# loading the file
sage: attach des.py

# two rounds only, see DES? for help on the constructor
sage: des = DES(Nr=2)

# default S-Box representation is opns_gb
sage: F,s = des.polynomial_system()
Pre-computing Groebner bases for S-Boxes, this might take a moment.

sage: %time gb = F.groebner_basis()
CPU times: user 15.39 s, sys: 0.03 s, total: 15.42 s
Wall time: 15.78 s

# verify correctness
sage: gb[:14]
[k55, k54, k52, k51 + 1, k50 + 1, k49, k48 + 1, k47 + 1, k46 + 1, k45 + 1, k44 + 1, k43, k42, k41 + 1]
sage: s[des.k[55]]
0
sage: s[des.k[51]]
1

Let me know if you run into any trouble.

posted at: 13:26 :: permanent link

Upcoming Workshops

I’m going to attend Sage Days 10 in Nancy, France (October 10-15, 2008). At SD10 I’ll give a contributed talk about matrix multiplication over $\mathbb{F}_2$, i.e. the M4RI library. I’m also going to attend the Second Workshop on Mathematical Cryptology in Santander, Spain (October 23-25, 2008).

posted at: 12:50 :: permanent link

Tue, 16. Sep 2008

New Papers on eCrypt

Ann Hibner Koblitz (Women and Gender Studies Program, Arizona State University), Neal Koblitz (Dept. of Mathematics, Univ. of Washington), and Afred Menezes (Dept. of Combinatorics & Optimization, Univ. of Waterloo) published an interesting history of elliptic curve cryptography (ECC) titled: Elliptic Curve Cryptography: The Serpentine Course of a Paradigm Shift. They briefly introduce the math but the main part is their presentation and discussion how ECC became widely accepted (which apparently is not just because researchers agreed on a scientific basis). I only skimmed it, but it seems certainly worth a read. Speaking of Neal Koblitz and Alfred Menezis: Here are their papers on “provable security” which caused a fair amount of controversy: Another Look at “Provable Security” and Another Look at “Provable Security” II

Itai Dinur and Adi Shamir put a pre-print of their much hyped and discussed paper Cube Attacks on Tweakable Black Box Polynomials on ePrint. See also David Wagner’s summary.

posted at: 13:01 :: permanent link

Fri, 22. Aug 2008

Gröbner Bases over $ZZ$

Vanilla Sage does not compute Gröbner bases over $ZZ$ by any definition. However, this feature has been requested several times. The earliest account I could find quickly is this post by Joe Wetherell. Below is a list of options for Gröbner bases over $ZZ$ in Sage.

  1. Singular’s upcoming release will feature Gröbner bases over rings. In fact, the feature is present in the current Singular release but not enabled by default. An SPKG with that functionality enabled can be found here. ring r = integers,(x,y),lp; declares a ring over the integers where some things work and some things don’t. Note that this ring declaration is not final, i.e. the name integers may change. Also, this SPKG has issues and crashes on me for some operations. We’re working on tracking that issue down.
  2. Macaulay2 has support for Gröbner bases over rings and a decent Sage interface supporting that functionality. Macaulay2 1.1 is available as an experimental SPKG. First, one needs to install boehm_gc-7.1.p0 and gdbm-1.8.3. Then, since the version in experimental didn’t compile for me, try my new SPKG.
  3. If you are lucky enough to have Magma installed, I.groebner_basis(“magma:GroebnerBasis”) does the job. If you don’t have Magma installed try magma_free.
  4. Ginv (also available as a optional package) also supports Gröbner bases over $ZZ$. However, for the example Joe gave in his e-mail it crashes on me and I’ve contacted upstream about it.
  5. Last and least: I have a toy implementation of the $d$-Gröbner basis algorithm from the Becker-Weispfenning. Don’t hold your breath, it is dead slow.

Hopefully, due to the upcoming Singular release the situation will improve soon and we’ll finally have Gröbner bases over $ZZ$ in Sage.

posted at: 14:39 :: permanent link

Mon, 18. Aug 2008

Parallel Matrix Elimination

I released a new version of M4RI today which contains a parallel implementation for matrix elimination. Below I reproduce some timings for this code to give a rough idea of the performance of this code.

64-bit Debian/GNU Linux, 2.6Ghz Opteron (Virtualised)
Matrix
Dimension
Magma 2.14-13
(64-bit, 1 core)
M4RI
(64-bit, 1 core)
M4RI
(64-bit, 4 cores)
10,000 x 10,000 3.283 2.509 1.064
16,384 x 16,384 11.204 10.741 3.918
20,000 x 20,000 16.911 19.776 7.216
32,000 x 32,000 57.761 86.071 32.420
64,000 x 64,000355.477 640.742 307.213

The examples hfe25_5, hfe30_5 and hfe35_5 from the M4RI website take 1.44, 9.29 and 51.56 seconds respectively.

Note that this is work in progress and that the algorithm still has worse complexity than the one implemented in Magma. Also note that the speed-up is far from linear and that the speed-up decreases with the size. This is probably because each thread falls out of L2 more often and the threads clog each other.

posted at: 22:56 :: permanent link

Mon, 11. Aug 2008

GCC 4.3 and -O3

I recently upgraded an Opteron server to Debian/Lenny to get GCC 4.3 for OpenMP reasons. It turns out that my code, namely matrix multiplication as implemented in the M4RI library, ran much slower than when compiled with GCC 4.1. For instance, to multiply two $20,000 \times 20,000$ random matrices took 18.38 seconds with GCC 4.1 but 21.00 seconds with GCC 4.3.1 and to multiply two $32,000 \times 32,000$ random matrices took 70.24 seconds with GCC 4.1 but 80.00 second with GCC 4.3.1. Eventually, I checked the highlevel changelog and found: “The -ftree-vectorize option is now on by default under -O3. In order to generate code for a SIMD extension, it has to be enabled as well: use -maltivec for PowerPC platforms and -msse/-msse2 for i?86 and x86_64.” However, we don’t use SSE2 on the Opteron since it is slower than the standard instruction set for this application. Passing -no-tree-vectorize to the compiler fixed the problem. However, to my surprise -O2 didn’t come with a speed penalty either, so I settled for this. The final timings on my Opteron server are:

64-bit Debian/GNU Linux, 2.6Ghz Opteron (Virtualised)
Matrix
Dimension
M4RI GCC 4.3
(64-bit, 4 cores)
M4RI GCC 4.3
(64-bit, 1 core)
M4RI GCC 4.1
(64-bit, 1 core)
Magma 2.14-13
(64-bit, 1 core)
20000x200006.3617.8118.3818.35
32000x3200026.6568.0170.2468.01

I suppose the moral of the story is: -O3 isn’t necessarily better than -O2 just because 3>2.

posted at: 16:00 :: permanent link

Fri, 18. Jul 2008

ISSAC

I’m going to this year’s International Symposium on Symbolic and Algebraic Computation (ISSAC) starting on Sunday. William is giving a plenary lecture on Sage titled “Can There be a Viable Free Open Source Alternative to Magma, Maple, Mathematica, and Matlab?” a.k.a. “Computer algebra community meet Sage; Sage meet the computer algebra community”.

posted at: 14:23 :: permanent link

Tue, 08. Jul 2008

Scapy and Sage

Scapy is a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more. It can easily handle most classical tasks like scanning, tracerouting, probing, unit tests, attacks or network discovery (it can replace hping, 85% of nmap, arpspoof, arp-sk, arping, tcpdump, tethereal, p0f, etc.). It also performs very well at a lot of other specific tasks that most other tools can’t handle, like sending invalid frames, injecting your own 802.11 frames, combining technics (VLAN hopping+ARP cache poisoning, VOIP decoding on WEP encrypted channel, …)”

At the end of the day Scapy is one (one!) Python file so it couldn’t be easier to use it from within Sage. As an example let’s assume we have sniffed an SSH connection establishment including a Diffie-Hellmann Group Exchange as described in RFC 4419. Scapy can do live packet capture and injection but that would require root privileges, so I’m working with a pcap file in this example:

from scapy import rdpcap, TCP, IP

SSH2_MSG_KEX_DH_GEX_GROUP = 31

# read packets
packets = [p[IP] for p in rdpcap("/home/malb/example.pcap") \
               if p[TCP] and len(p[TCP]) > 32]

# find correct package & payload
for packet in packets:
    try:
        pl = [ord(e) for e in packet[TCP].payload.load]
        if pl[5] == SSH2_MSG_KEX_DH_GEX_GROUP:
            break
    except AttributeError:
        pass

def get_uint(pl, length):
    # this is not as generic as it should be since it doesn't work
    # with negative numbers
    value = ZZ(0)
    for i in range(length):
        value += pl[i] * 2**(8*(length - i - 1))
    return value, pl[length:]

packet_length, pl = get_uint(pl, 4)
padlen, pl = get_uint(pl, 1)
packet_type, pl = get_uint(pl, 1)
assert(packet_type == SSH2_MSG_KEX_DH_GEX_GROUP)

# p
p_length, pl = get_uint(pl, 4)
p, pl = get_uint(pl, p_length)

# g
g_length, pl = get_uint(pl, 4)
g, pl = get_uint(pl, g_length)

assert(len(pl) == padlen)
assert(p.is_prime())

Zp = GF(p)
g = Zp(g)
e = g**ZZ.random_element(0,p)

e.log(g) # yeah, right ;-)

Happy hacking.

posted at: 17:34 :: permanent link

Fri, 20. Jun 2008

XOR for Fun and Profit

I just gave a talk on linear algebra over GF(2), optimisation techniques and applications to algebraic cryptanalysis. Slides are available online.

posted at: 22:02 :: permanent link

libM4RI in Debian Unstable

malb@XXX:~$ apt-cache search m4ri  
libm4ri-dev - Method of the Four Russians library, development files
libm4ri0 - Method of the Four Russians library, shared library

Big thanks to Tim for making that happen!

posted at: 06:37 :: permanent link

QOTD

“Mathematics is the art of reducing problems to linear algebra.” Bill Hart was introduced to this definition when he started his undergrad.

posted at: 02:12 :: permanent link

Valid XHTML 1.0 Strict Valid CSS! blosxom