18.12.2009Python: Finding shared items
Today I investigated in how to find shared items in listes. My first approach was straight forward and I checked for each item in list1, if it occurs in list2. Over time this became a bit more elegant and turned into:
shared_items = dict( (s,0) for s in list1 if s in list2 )
But this takes quite a while and so I wondered, if there were better solutions. First thing I found, was that the numpy arrays seem to be much faster to iterate, but the second thing is much more performance boosting… It makes a huge difference for the runtime, if you use
shared_items = dict((s,0) for s in set(list1) and set(list2))
instead of
shared_items = dict( (s,0) for s in list1 if s in list2 )
EDIT:
Jörn posted in a comment, to just skip the bulls##t an go straight for:
shared_items = set(list1) & set(list2)
Have a look:

And here is the code, to give you an idea, of what I did…enjoy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #!/usr/bin/env python # Task: Find all shared elements in two arrays from numpy.random import randint from time import time print "Comparing two ways of finding shared elements" # Two arrays with size 10000, filled with # random values between 1 and 1000000 list1 = randint(1,1000000,10000) list2 = randint(1,1000000,10000) # Slow t1_start = time() shared1 = dict((s,0) for s in list1 if s in list2) t1_stop = time() # Fast t2_start = time() shared2 = dict((s,0) for s in set(list1) and set(list2)) t2_stop = time() # Joern fast: # Edited, after a comment from Joern... t3_start = time() shared3 = set(list1) & set(list2) t3_stop = time() # Results print "Slow algorithm: %.5f seconds." %(t1_stop - t1_start) print "Fast algorithm: %.5f seconds." %(t2_stop - t2_start) print "Joern algorithm: %.5f seconds." %(t3_stop - t3_start) print "Same result: ", shared1 == shared2 == shared3 |
Although not shown here both methods do not only work on numbers, but on strings, or anything else that is sortable too…

