More OpenMP Experiments with M4RI
Motivated by a thread on [mpir-dev] I played around with OpenMP again today. The performance does not scale linearly … but hey it scales at all. I guess eventually I’ll have to get serious about this and sit down to make this proper. Anyway, here are the timings (on geom.math.washington.edu)
| n | M4RI 1 thread |
PLUQ 1 thread |
M4RI 4 threads |
PLUQ 4 threads |
M4RI 16 threads |
PLUQ 16 threads |
PLUQ 16 threads cutoff=2048 |
|---|---|---|---|---|---|---|---|
| 10,000 | 1.72 | 0.85 | 1.03 | 0.86 | 0.58 | 0.80 | 0.77 |
| 16,384 | 13.75 | 5.76 | 4.78 | 4.23 | |||
| 20,000 | 27.02 | 5.45 | 7.35 | 5.48 | 3.27 | 3.68 | |
| 32,000 | 112.74 | 21.96 | 30.51 | 22.02 | 13.78 | 13.91 | 12.95 |
| 64,000 | 227.80 | 157.03 | 104.94 | 75.95 | 66.54 | ||
| 100,000 | 1078.72 | 429.32 | 869.43 | 596.51 | 428.08 | 260.99 | 231.01 |
For some reason which I don’t understand yet is PLUQ slower for 16,384 than 20,000 on this machine. The code is on bitbucket.

