Python weave

As part of my master thesis I have developed a data mining algorithm and implemented it in Python. Benchmarks revealed that most of the time is spend iterating over big arrays and calculating the statistical relevance of sub arrays. After some time of tweaking I came to think that maybe involving some C/C++ code might speed things up a bit…

I stumbled across weave, which lets you include inline C/C++ code in your Python script. The usage is quite simple. You basically provide the C/C++ code as a string and execute it. The first time this is done takes some time, since the compiler gets invoked, but afterwards your script simply calls the compiled code fragments. I have to admit, that this is not a fair and fully blown benchmark, but the first tests blew my mind =)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
from numpy import array
from time import time
import scipy.weave.inline_tools as inline_tools

def weaveInline(arr):
    code = """
    long int res = 0;
    for (int i = 0; i < Narr[0]; i++) {
        res += arr[i];
    }
    return_val = res;
    """

    return inline_tools.inline(code,['arr'])

def nativePython(arr):
    res = 0
    for i in xrange(len(arr)):
        res += arr[i]
    return res

if __name__ == '__main__':
    checksum1 = 0
    checksum2 = 0

    f = open('benchmark.dat', 'w')
    f.write("#calls, arraysize, zero, native on python array, inline on numpy array\n")
    for calls in xrange(10,110,10):
        for arraysize in xrange(0,55000,5000):

            arr1 = range(1, arraysize)
            arr2 = array(range(1, arraysize))

            tmp = time()
            for i in range(calls):
                checksum1 += nativePython(arr1)
            t1 = time() - tmp

            tmp = time()
            for i in range(calls):
                checksum2 += weaveInline(arr2)
            t2 =  time() - tmp

            f.write("%d %d %f %f %f\n" %(calls, arraysize, 0.0, t1, t2))
        f.write("\n")
    f.close()
    print checksum1
    print checksum2

Basically I was interested in how much runtime is spend on the call of the function and how it evolves with the size of the array.

pyvsweave onlyweave

The first plot shows a comparison between Pythons native arrays and the C++ implementation. The second plot just illustrates that the C++ way doesn’t run in const time, but shows the same growth behaviour (only flatter). One thing about the implementation:

I did run the pure python version on numpy arrays too, but normal python lists seemed to perform best here.

After all keep in mind, that this is not a serious benchmark. I simply wanted to share this info =)

<3  gecko

Update

Here are the results of what Jörn ased for in one of the comments:

all joernvsc

The first graph shows that Jörn’s code performs similar to the weave code, although the weave version is a tiny bit faster, as shown in the second graph. I also tried arrays, that were bigger than my CPU’s L1 cache, but no significant difference here.
An interesting thing though is that depending on the size of the array, sometimes a periodic up and down in the runtime of the native python version occurs. This might be caching related…