“Efficient Multiplication of Dense Matrices over GF(2)”
“We describe an efficient implementation of a hierarchy of algorithms for multiplication of dense matrices over the field with two elements (GF(2)). In particular we present our implementation - in the M4RI library - of Strassen-Winograd matrix multiplication and the “Method of the Four Russians” multiplication (M4RM) and compare it against other available implementations. Good performance is demonstrated on on AMD’s Opteron and particulary good performance on Intel’s Core 2 Duo. The open-source M4RI library is available stand-alone as well as part of the Sage mathematics software.
In machine terms, addition in GF(2) is logical-XOR, and multiplication is logical-AND, thus a machine word of 64-bits allows one to operate on 64 elements of GF(2) in parallel: at most one CPU cycle for 64 parallel additions or multiplications. As such, element-wise operations over GF(2) are relatively cheap. In fact, in this paper, we conclude that the actual bottlenecks are memory reads and writes and issues of data locality. We present our empirical findings in relation to minimizing these and give an analysis thereof.”
Related News: My shiny new version of Magma 2.14-17 seems to perform better than Magma 2.14-14 for matrix multiplication over $\mathbf{F}_2$ on the Core 2 Duo. So I updated the performance data on the M4RI website. However, the changelog doesn’t mention any improvements in this area. Btw. searching for “Magma 2.14” returns the M4RI website first for me, which feels wrong on so many levels. Finally, M4RI is being packaged for Fedora Core.
Thu, 06. Nov 2008Yet Another Talk on Sage
Today, I gave a talk on Sage to the ISG PhD seminar at Royal Holloway. I think it went alright, although people around here just don’t get excited about computation that much. Anyway, I’ve uploaded the slides and the demo (pdf, worksheet).
Sun, 26. Oct 2008Matrix F5
I finally fixed my Matrix $F_5$ implementation. I’ll also give a talk about Matrix $F_5$ to the PhD seminar at ISG on November 27th. I’ll the post the slides around then for those interested.
Mon, 20. Oct 2008Bits and Pieces
I spent last week at Sage Days 10 (pictures and more pictures) in Nancy, France which turned out to be a very nice event. I (together with Simon King, Michael Brickenstein and Ludovic Perret) spent most of my time working on various toy implementations of F5 in order to understand the algorithm (better). We also conversed with John Perry who’s pseudcode, Singular code and description of F5 was incredibly helpful (and motivated me to work on this project in the first place, btw). My toy implementation of the polynomial version of F5 is available online and so is Simon King’s Cython implementation for Sage. John Perry now also provides a non-homogeneous version of F5 based on my Sage implementation.
I also implemented a toy version of F5/Matrix which indeed avoids a fair number of zero reductions and returns a Groebner basis if $d_{max}$ is big enough. I don’t think it does avoid as many reductions as the polynomial version which indicates a problem in my code. Note, that F5/Matrix is not described in detail in English literature (if you speak French and want to translate some short notes on this algorithms to English, please let me know).
I didn’t work on M4RI during Sage Days 10 but Clement Pernet and Jean-Guillaume Dumas did. We will probably release a new version with tried and tested TRSM code soon. Btw. I also gave a contributed talk about M4RI at Sage Days 10. My project for a train ride right after Sage Days 10 was to improve univariate polynomials over $\mathbb{F}_2$ in Sage to improve modular composition of polynomials.
I’m off to Santander on Wednesday for the 2nd Workshop on Mathematical Cryptology and once I’m back I’ll give a talk at the Graduate Studies Elsewhere Open Afternoon in Cambridge on “Algebraic Attacks against Block Ciphers”.
One last thing: The DES generator is broken due to a bug in Sage, a fix is available on Trac.
Update:fixed a bug in the F5/Matrix code and removed a nonsense statement about the rank of the matrices.
Fri, 22. Aug 2008Gröbner Bases over $ZZ$
Vanilla Sage does not compute Gröbner bases over $ZZ$ by any definition. However, this feature has been requested several times. The earliest account I could find quickly is this post by Joe Wetherell. Below is a list of options for Gröbner bases over $ZZ$ in Sage.
- Singular’s upcoming release will feature Gröbner bases over rings. In fact, the feature is present in the current Singular release but not enabled by default. An SPKG with that functionality enabled can be found here. ring r = integers,(x,y),lp; declares a ring over the integers where some things work and some things don’t. Note that this ring declaration is not final, i.e. the name integers may change. Also, this SPKG has issues and crashes on me for some operations. We’re working on tracking that issue down.
- Macaulay2 has support for Gröbner bases over rings and a decent Sage interface supporting that functionality. Macaulay2 1.1 is available as an experimental SPKG. First, one needs to install boehm_gc-7.1.p0 and gdbm-1.8.3. Then, since the version in experimental didn’t compile for me, try my new SPKG.
- If you are lucky enough to have Magma installed, I.groebner_basis(“magma:GroebnerBasis”) does the job. If you don’t have Magma installed try magma_free.
- Ginv (also available as a optional package) also supports Gröbner bases over $ZZ$. However, for the example Joe gave in his e-mail it crashes on me and I’ve contacted upstream about it.
- Last and least: I have a toy implementation of the $d$-Gröbner basis algorithm from the Becker-Weispfenning. Don’t hold your breath, it is dead slow.
Hopefully, due to the upcoming Singular release the situation will improve soon and we’ll finally have Gröbner bases over $ZZ$ in Sage.
Mon, 18. Aug 2008Parallel Matrix Elimination
I released a new version of M4RI today which contains a parallel implementation for matrix elimination. Below I reproduce some timings for this code to give a rough idea of the performance of this code.
| Matrix Dimension |
Magma 2.14-13 (64-bit, 1 core) |
M4RI (64-bit, 1 core) |
M4RI (64-bit, 4 cores) |
|---|---|---|---|
| 10,000 x 10,000 | 3.283 | 2.509 | 1.064 |
| 16,384 x 16,384 | 11.204 | 10.741 | 3.918 |
| 20,000 x 20,000 | 16.911 | 19.776 | 7.216 |
| 32,000 x 32,000 | 57.761 | 86.071 | 32.420 |
| 64,000 x 64,000 | 355.477 | 640.742 | 307.213 |
The examples hfe25_5, hfe30_5 and hfe35_5 from the M4RI website take 1.44, 9.29 and 51.56 seconds respectively.
Note that this is work in progress and that the algorithm still has worse complexity than the one implemented in Magma. Also note that the speed-up is far from linear and that the speed-up decreases with the size. This is probably because each thread falls out of L2 more often and the threads clog each other.
Mon, 11. Aug 2008GCC 4.3 and -O3
I recently upgraded an Opteron server to Debian/Lenny to get GCC 4.3 for OpenMP reasons. It turns out that my code, namely matrix multiplication as implemented in the M4RI library, ran much slower than when compiled with GCC 4.1. For instance, to multiply two $20,000 \times 20,000$ random matrices took 18.38 seconds with GCC 4.1 but 21.00 seconds with GCC 4.3.1 and to multiply two $32,000 \times 32,000$ random matrices took 70.24 seconds with GCC 4.1 but 80.00 second with GCC 4.3.1. Eventually, I checked the highlevel changelog and found: “The -ftree-vectorize option is now on by default under -O3. In order to generate code for a SIMD extension, it has to be enabled as well: use -maltivec for PowerPC platforms and -msse/-msse2 for i?86 and x86_64.” However, we don’t use SSE2 on the Opteron since it is slower than the standard instruction set for this application. Passing -no-tree-vectorize to the compiler fixed the problem. However, to my surprise -O2 didn’t come with a speed penalty either, so I settled for this. The final timings on my Opteron server are:
| Matrix Dimension |
M4RI GCC 4.3 (64-bit, 4 cores) |
M4RI GCC 4.3 (64-bit, 1 core) |
M4RI GCC 4.1 (64-bit, 1 core) |
Magma 2.14-13 (64-bit, 1 core) |
|---|---|---|---|---|
| 20000x20000 | 6.36 | 17.81 | 18.38 | 18.35 |
| 32000x32000 | 26.65 | 68.01 | 70.24 | 68.01 |
I suppose the moral of the story is: -O3 isn’t necessarily better than -O2 just because 3>2.
Tue, 08. Jul 2008Scapy and Sage
“Scapy is a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more. It can easily handle most classical tasks like scanning, tracerouting, probing, unit tests, attacks or network discovery (it can replace hping, 85% of nmap, arpspoof, arp-sk, arping, tcpdump, tethereal, p0f, etc.). It also performs very well at a lot of other specific tasks that most other tools can’t handle, like sending invalid frames, injecting your own 802.11 frames, combining technics (VLAN hopping+ARP cache poisoning, VOIP decoding on WEP encrypted channel, …)”
At the end of the day Scapy is one (one!) Python file so it couldn’t be easier to use it from within Sage. As an example let’s assume we have sniffed an SSH connection establishment including a Diffie-Hellmann Group Exchange as described in RFC 4419. Scapy can do live packet capture and injection but that would require root privileges, so I’m working with a pcap file in this example:
from scapy import rdpcap, TCP, IP
SSH2_MSG_KEX_DH_GEX_GROUP = 31
# read packets
packets = [p[IP] for p in rdpcap("/home/malb/example.pcap") \
if p[TCP] and len(p[TCP]) > 32]
# find correct package & payload
for packet in packets:
try:
pl = [ord(e) for e in packet[TCP].payload.load]
if pl[5] == SSH2_MSG_KEX_DH_GEX_GROUP:
break
except AttributeError:
pass
def get_uint(pl, length):
# this is not as generic as it should be since it doesn't work
# with negative numbers
value = ZZ(0)
for i in range(length):
value += pl[i] * 2**(8*(length - i - 1))
return value, pl[length:]
packet_length, pl = get_uint(pl, 4)
padlen, pl = get_uint(pl, 1)
packet_type, pl = get_uint(pl, 1)
assert(packet_type == SSH2_MSG_KEX_DH_GEX_GROUP)
# p
p_length, pl = get_uint(pl, 4)
p, pl = get_uint(pl, p_length)
# g
g_length, pl = get_uint(pl, 4)
g, pl = get_uint(pl, g_length)
assert(len(pl) == padlen)
assert(p.is_prime())
Zp = GF(p)
g = Zp(g)
e = g**ZZ.random_element(0,p)
e.log(g) # yeah, right ;-)
Happy hacking.
Fri, 20. Jun 2008XOR for Fun and Profit
I just gave a talk on linear algebra over GF(2), optimisation techniques and applications to algebraic cryptanalysis. Slides are available online.
libM4RI in Debian Unstable
malb@XXX:~$ apt-cache search m4ri libm4ri-dev - Method of the Four Russians library, development files libm4ri0 - Method of the Four Russians library, shared library
Big thanks to Tim for making that happen!
Fri, 13. Jun 2008Fraction Free Gauss-Jordan Errata
I’m at Sage’s dev1 right now and so I have the pleasure of meeting Arne Storjohann. In his thesis he presented a fraction free asymptotically fast matrix elimination algorithm which unfortunately has some typos in it. Below I replicate the correct algorithm he explained/provided me yesterday:
def GaussJordan(A, k=-1, d0=None):
if d0 == None:
d0 = A.base_ring()(1)
n = A.nrows()
m = A.ncols()
I = MatrixSpace(A.base_ring(),A.nrows(),A.nrows())(1)
for i in xrange(k+1,n):
if any(A[i,j] for j in xrange(m)):
break
else:
U,P,r,h,d = d0*I, I, 0, n-k, d0
return (U,P,r,h,d)
if m == 1:
i = min([i for i in xrange(k+1,n) if A[i,0] != 0])
P = copy(I)
P.swap_rows(i,k+1)
r,h,d = 1, n-i, (P*A)[k+1,0]
U = d*I
for j in range(n):
U[j,k+1] = -(P*A)[j,0]
U[k+1,k+1] = d0
else:
m1,m2 = m//2, m-m//2
A1 = A.matrix_from_columns(range(m1))
B = A.matrix_from_columns(range(m1,m))
U1, P1, r1, h1, d1 = GaussJordan(A1, k, d0)
A2 = d0**(-1)*U1*P1*B
U2, P2, r2, h2, d = GaussJordan(A2, k+r1, d1)
U = d1**(-1) * U2*(P2*(U1 - d1*I) + d1*I)
P,r,h,d = P2*P1, r1+r2, min(h1,h2),d
return U,P,r,h,d
Note that this is not how one would actually implement this algorithm in practice: it is pseudo-code that happens to run in Sage. For a practical implementation check the IML library.
Tue, 13. May 2008M4RI Website
I finally put together the website for the M4RI library. For those who don’t know M4RI:
“M4RI is a library for fast arithmetic with dense matrices over $\mathbb{F}_2$. It was started by Gregory Bard and is now maintained by Martin Albrecht and Gregory Bard. The name M4RI comes from the first implemented algorithm: The “Method of the Four Russians” inversion algorithm published by Gregory Bard. This algorithm in turn is named after the “Method of the Four Russians” multiplication algorithm which is probably better referred to as Kronrod’s method. M4RI is used by the Sage mathematics software and the PolyBoRi library. M4RI is available under the General Public License Version 2 or later (GPLv2+).
Features of the M4RI library include:
- basic arithmetic with dense matrices over $\mathbb{F}_2$ (addition, equality testing, stacking, augmenting, sub-matrices, randomisation)
- asymptotically fast $O(n^{log_27})$ matrix multiplication via the “Method of the Four Russians” (M4RM) & Strassen-Winograd algorithm,
- asymptotically fast $O(n^{3}/log_2(n))$ row echelon form computation and matrix inversion via the “Method of the Four Russians” (M4RI), and
- support for the x86/x86_64 SSE2 instruction set where available.
- support for Linux and OS X (GCC), support for Solaris (Sun Studio Express) and support for Windows (Visual Studio 2008 Express).”
Performance-wise it is doing okay but not great. On Intel’s Core2Duo it seems to compare favourably to Magma 2.13. Though, I don’t have access to Magma 2.14 yet which improves dense linear algebra over $\mathbb{F}_2$. However, on AMD’s Opteron it is way behind Magma 2.13. This is possibly due to the 1MB L2 cache of the Opteron vs. 4MB L2 cache of the Core2Duo.
Fri, 21. Mar 2008A Cryptographic Tour and Todo List of Sage
Yesterday someone showed up on [sage-devel] and wrote: “I have been developing software and doing research in the areas of: mathematics, cryptography algorithms, encryption, and would like to contribute my time and effort to the Sage project. I would like any of you to get me started in the right direction, any info would be appreciated.”
This is the edited/polished version of my reply. I am posting it here in case anyone else wonders how to contribute to Sage for cryptographic research.
- David Kohel wrote an introductionary book to cryptography. He uses Sage in the book and wrote a fair amount of code to make that happen. The relevant module is sage.crypto. For example, it implements a linear feedback shift register:
It seems the code needs more documentation and also some areas are not implemented yet, e.g. block ciphers.
sage: FF = FiniteField(2) sage: P.<x> = PolynomialRing(FF) sage: E = LFSRCryptosystem(FF); E LFSR cryptosystem over Finite Field of size 2 sage: IS = [ FF(a) for a in [0,1,1,1,0,1,1] ] sage: g = x^7 + x + 1 sage: e = E((g,IS)) sage: B = BinaryStrings() sage: m = B.encoding("THECATINTHEHAT") sage: e(m) 0010001101111010111010101010001100000000110100010101011100001011110010010000011111100100100011001101101000001111
- Sage ships PyCrypto which implements many standard cryptographic algorithms. The docstring level documentation is horrible:
It is not really meant for research/education/playing around but for production code but maybe something could be done to have easier access to it from within Sage. Here is an example how to use it:
sage: import Crypto.Cipher.IDEA sage: Crypto.Cipher.IDEA? x.__init__(...) initializes x; see x.__class__.__doc__ for signature
sage: from Crypto.Hash import MD5 sage: m = MD5.new() sage: m.update('abc') sage: m.digest() '\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr' sage: m.hexdigest() '900150983cd24fb0d6963f7d28e17f72'
- Finite fields are basic building blocks in cryptography. Sage uses several finite field implementations for prime fields and extension fields of various sizes. The FiniteField_ext_pari implementation for finite extension fields of order $\ge 2^{16}$ should be replaced by two implementations using NTL’s ZZ_pE and lzz_pE depending on the size of the characteristic. This should be relatively straight-forward because there is an implementation for characteristic 2 using NTL’s GF2E already. To get a feeling about the possible speed improvements:
sage: k.<a> = GF(next_prime(2^65)^27) sage: e = a^30 sage: f = a^40 sage: %timeit e*f 1000 loops, best of 3: 557μs per loop sage: c = ntl.ZZ_pEContext(ntl.ZZ_pX(list(k.polynomial()),k.characteristic())) sage: e = c.ZZ_pE(list(e.polynomial())) sage: f = c.ZZ_pE(list(f.polynomial())) sage: %timeit e*f 10000 loops, best of 3: 154μs per loop
- Sage isn’t exactly kicking ass when it comes to elliptic and hyperelliptic curves over finite fields. As these are quite important in asymmetric cryptography it might be worth looking into this. John Cremona added to this: “That is fair. Apart from an SEA point-counting implementation — only over prime fields — the rest (for elliptic curves) is definitely only designed to work at sub-crypto field sizes.”
- algebraic techniques received some attention for the cryptanalysis of symmetric cryptographic primitives recently. In these attacks the cryptanalyst expresses the cipher as a large set of multivariate polynomial equations and attempts to solve the system. The most common case over $\mathbb{F}_2$ is handled by PolyBoRi. This library is the backbone of BooleanPolynomialRing and friends. This class needs testing, documentation, extension and bugfixes. Basically someone should sit down and add all the methods of MPolynomial[Ring]_libsingular to BooleanPolynomial[Ring] which make sense, add a ton of doctests and test the hell out of the library to make sure no SIGSEGVs surprise the user.
sage: F,s = sr.polynomial_system() sage: R = F.ring() sage: B = BooleanPolynomialRing(R.ngens(),R.variable_names(),R.term_order()) sage: F = [B(f) for f in F if B(f) != 0] sage: F = mq.MPolynomialSystem(B,F); F Polynomial System with 68 Polynomials in 36 Variables sage: gb = F.groebner_basis() sage: gb[-1] k003 sage: s[R("k003")] 0
- The module sage.crypto.mq is also relevant for algebraic cryptanalysis in symmetric cryptography. It implements
- small scale AES equation system generators over $\mathbb{F}_2$ and $\mathbb{F}_{2^n}$
- a class to represent multivariate polynomial systems
- an S-box class to analyse … well … S-boxes.
sage: S = mq.SBox(7, 6, 0, 4, 2, 5, 1, 3); S (7, 6, 0, 4, 2, 5, 1, 3) sage: S.polynomials() [x0*x2 + x1 + y1 + 1, x0*x1 + x1 + x2 + y0 + y1 + y2 + 1, x0*y1 + x0 + x2 + y0 + y2, x0*y0 + x0*y2 + x1 + x2 + y0 + y1 + y2 + 1, x1*x2 + x0 + x1 + x2 + y2 + 1, x0*y0 + x1*y0 + x0 + x2 + y1 + y2, x0*y0 + x1*y1 + x1 + y1 + 1, x1*y2 + x1 + x2 + y0 + y1 + y2 + 1, x0*y0 + x2*y0 + x1 + x2 + y1 + 1, x2*y1 + x0 + y1 + y2, x2*y2 + x1 + y1 + 1, y0*y1 + x0 + x2 + y0 + y1 + y2, y0*y2 + x1 + x2 + y0 + y1 + 1, y1*y2 + x2 + y0] sage: S.difference_distribution_matrix() [8 0 0 0 0 0 0 0] [0 2 2 0 2 0 0 2] [0 0 2 2 0 0 2 2] [0 2 0 2 2 0 2 0] [0 2 0 2 0 2 0 2] [0 0 2 2 2 2 0 0] [0 2 2 0 0 2 2 0] [0 0 0 0 2 2 2 2]
- Univariate polynomials over $\mathbb{F}_2$ are still implemented via NTL’s ZZ_pX rather than GF2X.
Furthermore there is gf2x a drop-in replacement library for NTL’s GF2X which is expected to be five times faster than NTL. Though, a formal vote is needed to get it into Sage.
sage: k = GF(2) sage: P.<x> = PolynomialRing(k) sage: f = P([k.random_element() for _ in range(10000)]) sage: %timeit f**2 100 loops, best of 3: 11.1 ms per loop sage: f = ntl.GF2X(f.list()) sage: %timeit f**2 100000 loops, best of 3: 2.02μs per loop
- At the end of the day everything boils down to linear algebra. So if one improves that, everybody wins. Sparse linear algebra over $\mathbb{F}_p$ is still too slow (Ralf-Phillip Weinmann did some work here wrapping code from eclib), there is
no special implementation for sparse linear algebra over $\mathbb{F}_2$ (both blackbox and reduced echelon forms), dense linear algebra over $\mathbb{F}_2$ lacks Strassen multiplication/reduction and dense linear algebra over $\mathbb{F}_{2^n}$ should probably get a specialised implementation. Note that up until a few thousand rows/columns that Sage’s dense linear algebra over $\mathbb{F}_2$ is actually pretty fast.
sage: A = random_matrix(GF(2),7000,7000) sage: time B = A.echelon_form() CPU times: user 2.91 s, sys: 0.04 s, total: 2.96 s Wall time: 3.01 sage: AM = A._magma_() sage: t = magma.cputime() sage: BM = AM.EchelonForm() sage: magma.cputime(t) 2.6800000000000002
I hope this list isn’t totally useless.
Sun, 16. Mar 2008Yet Another Talk on Sage
I gave a brief talk yesterday at the Open Knowledge Conference (OKCon) here in London. The slides were also discussed on [sage-devel] last week. I have to admit that I underappreciated David Joyner’s comments about the expected audience. My impression is that the majority of the audience couldn’t care less about the actual mathematics implemented in Sage. I suppose we still made a good impression but I had to skip most of the examples I care about due to time constraints and preceived lack of interest. After the talks I had some neat discussions with other participants, e.g. Gaël Varoquaux from the MayaVi2 project.
Thu, 28. Feb 2008Plotting Timing Experiments
Like any other person I regulary need to run experiments to check how fast or slow a particular algorithm/implementation is for a given problem. The natural choice is to plot the data. This way you at least get some more or less pretty picture out of the tendious experience of having to wait for the experiment to finish. I used to write crappy code to generate these pictures myself and I could not convince myself to remember the appropriate commands for matplotlib and R. Today I sat down and learned the five lines of code necessary to have decent plots for my experiments. I’m putting examples here for no good reason except maybe to show off Sage’s new HNF code which I use as a showcase.
First lets compare how long it takes to compute the Hermite Normal Form for a given random $n \times n$ matrix with (possibly negative) integer entries of size bounded absolute by $2^{16}$.
n = 10
b = 16
st =[]
mt = []
x = [20*i for i in range(n)]
for i in range(n):
A = random_matrix(ZZ,20*i,20*i, x=-2**b, y=2**b)
t = cputime()
E = A.echelon_form()
st.append(cputime(t))
AM = A._magma_()
t = magma.cputime()
EM = AM.EchelonForm()
mt.append(magma.cputime(t))
import pylab
pylab.clf() # clear the figure first
pylab.figure(1)
# plot some data and add a legend
pylab.plot(x,st,label="Sage")
pylab.plot(x,mt,label="Magma")
pylab.legend() # print the legend
pylab.title("HNF for Random Matrices with $%d$-bit Integer Entries: Sage vs. Magma"%b)
pylab.ylabel("execution time $t$") # label the axes
pylab.xlabel("n for n x n matrix")
pylab.savefig('foo.png',dpi=72) # fire!
Now lets use R to see how the runtimes vary for random $160 \times 160$ matrices with (possible negative) integer entries bounded absolute by $2^{10}$.
b = 10
st = []
for i in range(500):
A = random_matrix(ZZ,160,160, x=-2**b, y=2**b)
t = cputime()
E = A.echelon_form()
st.append(cputime(t))
from rpy import r
r.png('histogram.png',width=640,height=480)
r.hist(st,r.seq(1.2,3.7,0.02),main="SAGE HNF Histogram",col="lightblue", prob=True, xlab="seconds")
r.lines(r.density(st,bw=0.05),col="black")
r.rug(st)
r.dev_off()
Neat, isn’t it? Btw. Pygments is also neat, thanks rpw.
Wed, 20. Feb 2008Impressions from FSE 2008

If you don’t get it, don’t worry, it is not really funny.
Les Trophées du Libre 2007
Sage is among the finalists of this year’s “free software awards” competition in the science category. The other two finalists in that category are Giac/XCas (slides, session) and Getfem++. I am representing Sage in 25 minutes and I uploaded my slides and the demo worksheet (PDF).
Wed, 07. Nov 2007$GF(2^n)$ arithmetic speed
Since version 2.8.10 Sage’s finite extension fields of characteristic 2 and degree $\ge 16$ are implemented via NTL’s GF2E rather than Pari. For some more or less random reason I timed how fast multiplying two random elements is now.
The red line show the time it takes Magma 2.13-5 to multiply two random elements a million times for a given degree $n$. The green line shows the same calculation using Sage 2.8.12 with the default modulus and a Python loop. The blue line uses a Cython loop (== C loop) and the function good_modulus (see below) to generate a “good” modulus. The default modulus used by Sage is either the conway polynomial or - if we don’t know the conway polynomial - a random irreducible polynomial. I took the idea of using a “good” modulus from Michael Scott’s slides for his talk at the SPEED workshop. My attempt is not as sophisticated as his but naively searches for trinomials and pentanomials with low degree terms.
def good_modulus(n):
P = GF(2)['x']
x = P.gen()
for a in xrange(1,n):
f = x**n + x**a + 1
if f.is_irreducible():
return f
for N in range(0,n,10):
for a in xrange(1,N+1):
for b in xrange(a+1,N+1):
for c in xrange(b+1,N+1):
f = x**n + x**c + x**b + x**a + 1
if f.is_irreducible():
return f
# fall back to default if nothing was found
return GF(2**n,'a').polynomial()
Some comments:
- Up until $2^{15}$ we use Zech logarithms as they are implemented in Givaro. Magma uses Zech logarithms up to $2^{20}$ and we should do the same. If we use a Cython loop (i.e. remove the overhead of the loop) Sage’s arithmetic is as fast as Magma’s.
- I don’t know why there is that peak around $n=2$ for Magma. Bug? My bad?
- Magma scales quite nicely wordwise, as you would expect.
- Surprisingly enough we beat Magma starting at $2^{100}$ up until at least $2^{128}$ using the “good” moduli.
- What is going on with NTL between $2^{16}$ and $2^{64}$?
It seems we should internally - at least for large degrees - represent elements w.r.t. to a “good” modulus even if we know the conway polynomial.
Thu, 01. Nov 2007Yet Another Talk on Sage
May I point the reader’s attention to the slides of my most recent talk about Sage for the ISG Student Seminar.
Fri, 05. Oct 2007More Pictures/SAGE Days 5
I’ve uploaded my pictures from the “Tools for Cryptanalysis 2007” and “SAGE Days 5” workshops to flickr. At SD5 I
- worked on LLL (see wiki, trac, and update),
- found out that SAGE’s Strassen Echelonizer doesn’t require submatrices to be non-singular,
- generated kinda handy graphs of SAGE’s inheritance tree,
- and gave a talk about the status of commutative algebra in SAGE.
The SAGE inheritance tree in 3D:
The Sorry State of Sparse Linear Algebra over Finite Fields
By sparse linear algebra I actually only mean computing the (reduced) row echelon form. Surprisingly, there aren’t much implementations out there, not much I am aware of at least.
- Apparently, MAGMA cannot compute the reduced row echelon form. Well, you can compute the nullspace, but: “Since the result will be given in the dense representation, both the nullity of A and the number of rows of A must both be reasonably small.” (MAGMA Documentation)
- LinBox actually does Gaussian elimination for you if you compute the rank using the NoReordering method. However, it kills the rows it doesn’t need anymore to be more memory friendly. Also, in my experiments the Gaussian elimination wasn’t much faster than SAGE’s for random sparse matrices over $GF(127)$. However, it can compute the rank more quickly by using “Symbolic Reordering” (paper). Added bonus: LinBox also does $GF(p^n)$.
- SAGE offers a sparse Gaussian elimination: “We use Gauss elimination, in a slightly intelligent way, in that we clear each column using a row with the minimum number of nonzero entries.”.
The other day I kinda liked the idea to apply William Stein’s integer matrix rational-echelonize-via-solve algorithm to this case. The (adapted) algorithm is as follows (most of it is due to William Stein):
- Compute $r = rank(A)$. This is cheaper than Gaussian elimination because we can use “Symbolic Reordering”.
- Compute the pivot columns of the transpose $A^t$ of $A$. We can convince “Symbolic Reordering” to give us these as well.
- Let $B$ be the submatrix of $A$ consisting of the rows corresponding to the pivot columns found in the previous step. Note that, aside from zero rows at the bottom, $B$ and $A$ have the same reduced row echelon form.
- Compute the pivot columns of $B$. Again, we may do this using “Symbolic Reordering”.
- Let $C$ be the submatrix of $B$ of pivot columns. Let $D$ be the complementary submatrix of $B$ of all all non-pivot columns. Use a solver (such as Wiedemann) to find the matrix $X$ such that $C X=D$ . I.e., solve a bunch of linear systems of the form $ Cx = v$ , where the columns of $ X$ are the solutions.
- Return the matrix $I || X$ where $I$ is the identity matrix of rank $r$.
This algorithm has complexity of two “Symbolic Reordering” applications, and $ncols - r$ applications of a matrix-vector solver. If “Symbolic Reordering” significantly outperforms Gaussian elimination (speed and memory-wise) and if $ncols - r$ is small and the solver fast, this might outperform straight-forward Gaussian elimination. The algorithm in SAGE notation is:
def echelon_form_via_solve(A):
r = A.rank() # Step 1: Compute the rank
if r == self.nrows():
B = A
else:
# Steps 2 and 3: Extract out a submatrix of full rank.
P = A.transpose().pivots()
B = A.matrix_from_rows(P)
# Step 4: Now we instead worry about computing the reduced row echelon form of B.
pivots = B.pivots()
# Step 5: Apply solver
C = B.matrix_from_columns(pivots)
pivots_ = set(pivots)
non_pivots = [i for i in range(B.ncols()) if not i in pivots_]
D = B.matrix_from_columns(non_pivots)
X = C.solve_right(D, algorithm="LinBox:Blackbox")
R = self.parent()()
for i in range(len(pivots)): R[i,pivots[i]] = 1
for i in range(X.nrows()):
for j in range(X.ncols()):
R[i,non_pivots[j]] = X[i,j]
return R
However, as we have to call the solver repeatetly (or find a good matrix-matrix solver) I lost interest in implementing this thing. What is missing, is to hack LinBox to return the pivot columns when performing InPlaceLinearPivoting which is “Symbolic Reordering”. As a side product of my attempts some operations on sparse matrices over GF(p) are way faster now in SAGE (See #655). Also, if you - my dear reader - know about any fast implementation for that problem, please let me know.
Sun, 06. May 2007Pretty Pictures
A nifty feature which is going to be in SAGE 2.5 is matrix “visualization”. That means SAGE produces an image for your matrix indicating which parts are sparse and which are dense. Though, this is a trival 20 liner it is a pretty useful tool to discover structure.
As an example look at some matrices occuring during F4 if applied on Cyclic6 over $GF(127)$ or $CTC_{3,3,3}$ as shown below.
I have uploaded a set of images for all matrices occuring during the F4 computations against that CTC instance to flickr. Also, I have uploaded some photos from the ECRYPT PhD Summerschool 2007 from which I am returning when writing this. Finally, I have uploaded the slides of a short talk about SAGE I gave at that Summerschool.
Wed, 14. Mar 2007Westcoast Wrap-Up
In case anybody wonders, this is what I have done during my stay at the US westcoast.
- I gave four talks in total on Pyrex/SageX, commutative algebra implementations, and algebraic attacks against block ciphers.
- I got started on linking against Singular from SAGE. I am far from being finished but I am planning to put out a pre-alpha-poke-an-eye-out release soon.
- I got started on incorporating Gregory Bard’s M4RI implementation in SAGE and optimized it a little. Going to be released soon.
- I wrote a thread manager which might under some highly unlikely circumstances end up in FLINT. The point of a thread manager is to make sure the overhead of creating a new thread is minimized.
- co-worked on “fast integer” creation in SAGE. That means diving into Python and GMP internals to make the creation of new Integer objects fast.
- I helped a little on the IML and LinBox integration in SAGE.
- I wrote a SymbolicData wrapper for SAGE.
- I was a conference co-organizer for the first time.
Wed, 14. Feb 2007
Talks, Talks, Talks
- I gave a very brief talk on the state of affairs when it comes to commutative algebra in SAGE. The slides also include some preliminary examples and timing of the SAGE Singular bindings. Apparently, the Singular interpreter is pretty slow, such that one could underappreachiate Singuar’s performance from benchmarks with the Singular interpreter. But Singular is very fast and Python seems to be a good interpreter frontend for it.
- I also gave a brief talk on SageX/Pyrex again (slides).
- I finally put the source code of my thesis online (i.e. no strange ps2txt twiddling anymore).
Thu, 08. Feb 2007
Random Bits and Pieces
- Jason Martin took the time to sit down with me and compare the GMPbench results with and without his patches. The results on my 2.33 Ghz Macbook Pro (Debian Etch, AMD64, GMP 4.2.1) are: I get a GMPbench of 5825 without his patches and 7235 with his patches. So you definitely want his patches if you got a Core 2 Duo processor idling in your machine (my last report on his patches was wrong).
- SAGE 2.1 is going to include the LinBox library for fast matrix operations over finite fields and ZZ. Using the ATLAS3 (from Debian Etch, so not fine tuned for my machine, which is the whole point of ATLAS) library LinBox achieves impressive results. Computing the reduced row echelon echelon form of a dense random 2000x2000 matrix over GF(127) takes 4-5 seconds on my notebook.
- I managed to link against Singular today. I.e. I compiled Singular as a shared library. This means that eventually Singular’s superior polynomial arithmetic et al. can be used by SAGE directly. Though, it’s a long road.
- Gregory Bard is going to be at SAGE Days 3 in LA. Hopefully, we will manage then to include his M4RI algorithm in SAGE.
Thu, 14. Dec 2006
SAGE 1.5.0.2 releases
This blog would be even more boring if every SAGE release was properly announced. But this release is a really big one. SAGE 1.5 features a total rewrite of much of the basic arithmetic to make it both faster and better to understand. All matrix classes are now written in SageX/Pyrex/C such that they are much faster now. Givaro’s finite extension fields were included as the default implemention which also means a significant speed improvement. Also, SAGE now has some graph theory support.
SAGE may be downloaded here and tried online here.
Wed, 11. Oct 2006SAGE Days 2.2-2.4
SAGE Days 2 are over and it was great fun. Also we have made quite a bit of progress: Addition of integers was used to test-bed performance/architecture improvements and as a result for large integers (i.e., greater than word size) SAGE now seems only 50% slower than MAGMA and for small integer SAGE integers are twice or more as slow (but there is room for improvements). Python integers (up to word size) however are way faster than MAGMA integer if we didn’t get all the benchmarks wrong. An alpha interface to Axiom was also written and some solution for an implementation problem with p-adic numbers has been found which I know nothing about, the SAGE notebook will not be a spammers/cross-site-scripters heaven anymore in the near future, and tons of other stuff got wrapped/implemented/discussed. Slides/mp3s should be/appear at sage.math.
Sun, 08. Oct 2006SAGE Days 2.1
There have been lots of talks including both of mine. J. Voight btw. told me (and the audience) that he sat down with Allan Steel and asked him about all his secrets about making his F4 implementation such fricking fast. Allan Steel responded that his implementation was very slow at first and that all his speed-ups where due to implementation tricks and not mathematical ideas. So there might be hope for my implementation.
Elsewhere: We are putting together a comprehensive Wiki page on making Pyrex code fast which might be interesting to the general Pyrex audience and not only SAGE developers. We are focusing on SAGE, though.
Sat, 07. Oct 2006SAGE Days 2.0
The SAGE Days 2 coding sprint session started today and people where working on:
- Singular’s build system which has severe issues worth fixing. Also we just found out that Singular isn’t as GPL as it claims to be as it depends on omalloc which’s license is GPL incompatible.
- fast ODE solvers and DFT using the GNU Scientific Library for numerical calculations.
- an interface for p-adic numbers which is non-trivial due to precision issues I know nothing about.
- fast exact linear algebra where some scratches and itches needed to be ironed out
- the arithmetic architecture which means to e.g. ensure that once we are in Pyrex land we don’t have to call a Python function ever again (= speed)
- GAP’s interface
- automatic generation of interactive notebooks from docstring examples
- L-functions
- the SAGE graph theory package, mainly research what’s out there
- distributed SAGE
- the transition from Pari to Givaro for finite extension fields (=speed)
- distribution and server hosting of SAGE
- a SAGE-Axiom interface
As far as I can see the number of lines written per developer per hour probably is very low but we had a couple of good discussions. Tomorrow there will be no coding sprint but many talks.
Also I’ve meet Neal Koblitz today and discussed/talked about PhD opportunities, a bit of his current interests (Another Look at “Proveable Security”, Pairing-Based Cryptography at High Security Levels), and heaps of other stuff with him which was pretty cool. Elsewhere: Python 2.5 (included in SAGE 1.4) does return memory to the kernel.

