04.06.2010Python weave
As part of my master thesis I have developed a data mining algorithm and implemented it in Python. Benchmarks revealed that most of the time is spend iterating over big arrays and calculating the statistical relevance of sub arrays. After some time of tweaking I came to think that maybe involving some C/C++ code might speed things up a bit…
I stumbled across weave, which lets you include inline C/C++ code in your Python script. The usage is quite simple. You basically provide the C/C++ code as a string and execute it. The first time this is done takes some time, since the compiler gets invoked, but afterwards your script simply calls the compiled code fragments. I have to admit, that this is not a fair and fully blown benchmark, but the first tests blew my mind =)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | from numpy import array from time import time import scipy.weave.inline_tools as inline_tools def weaveInline(arr): code = """ long int res = 0; for (int i = 0; i < Narr[0]; i++) { res += arr[i]; } return_val = res; """ return inline_tools.inline(code,['arr']) def nativePython(arr): res = 0 for i in xrange(len(arr)): res += arr[i] return res if __name__ == '__main__': checksum1 = 0 checksum2 = 0 f = open('benchmark.dat', 'w') f.write("#calls, arraysize, zero, native on python array, inline on numpy array\n") for calls in xrange(10,110,10): for arraysize in xrange(0,55000,5000): arr1 = range(1, arraysize) arr2 = array(range(1, arraysize)) tmp = time() for i in range(calls): checksum1 += nativePython(arr1) t1 = time() - tmp tmp = time() for i in range(calls): checksum2 += weaveInline(arr2) t2 = time() - tmp f.write("%d %d %f %f %f\n" %(calls, arraysize, 0.0, t1, t2)) f.write("\n") f.close() print checksum1 print checksum2 |
Basically I was interested in how much runtime is spend on the call of the function and how it evolves with the size of the array.
The first plot shows a comparison between Pythons native arrays and the C++ implementation. The second plot just illustrates that the C++ way doesn’t run in const time, but shows the same growth behaviour (only flatter). One thing about the implementation:
I did run the pure python version on numpy arrays too, but normal python lists seemed to perform best here.
After all keep in mind, that this is not a serious benchmark. I simply wanted to share this info =)
<3 gecko
Update
Here are the results of what Jörn ased for in one of the comments:
The first graph shows that Jörn’s code performs similar to the weave code, although the weave version is a tiny bit faster, as shown in the second graph. I also tried arrays, that were bigger than my CPU’s L1 cache, but no significant difference here.
An interesting thing though is that depending on the size of the array, sometimes a periodic up and down in the runtime of the native python version occurs. This might be caching related…





Hi Gecko,
can you perhaps also try a third algorithm?
This one:
import scipy as sc
def scipyPython(arr):
return arr.sum()
and call it with:
scipyPython(arr2)
Also I’d expect at least some caching effect in your graphs: http://igoro.com/archive/gallery-of-processor-cache-effects/
Jörn
June 4th, 2010 at 20:29
Nice one, thanks.
We used SWIG to use C++ code with Python when I was at the MPIfR, but weave seems to be much handier when you just want to implement a few lines in C++…
June 5th, 2010 at 11:15