Python: Finding shared items

Today I investigated in how to find shared items in listes. My first approach was straight forward and I checked for each item in list1, if it occurs in list2. Over time this became a bit more elegant and turned into:
shared_items = dict( (s,0) for s in list1 if s in list2 )
But this takes quite a while and so I wondered, if there were better solutions. First thing I found, was that the numpy arrays seem to be much faster to iterate, but the second thing is much more performance boosting… It makes a huge difference for the runtime, if you use
shared_items = dict((s,0) for s in set(list1) and set(list2))
instead of
shared_items = dict( (s,0) for s in list1 if s in list2 )
EDIT:
Jörn posted in a comment, to just skip the bulls##t an go straight for:
shared_items = set(list1) & set(list2)
Have a look:
runtime comparison runtime runtime Joern vs Gecko
And here is the code, to give you an idea, of what I did…enjoy :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env python
# Task: Find all shared elements in two arrays
from numpy.random import randint
from time import time

print "Comparing two ways of finding shared elements"
# Two arrays with size 10000, filled with
# random values between 1 and  1000000
list1 = randint(1,1000000,10000)
list2 = randint(1,1000000,10000)

# Slow
t1_start = time()
shared1 = dict((s,0) for s in list1 if s in list2)
t1_stop = time()


# Fast
t2_start = time()
shared2 = dict((s,0) for s in set(list1) and set(list2))
t2_stop = time()

# Joern fast:
# Edited, after a comment from Joern...
t3_start = time()
shared3 = set(list1) & set(list2)
t3_stop = time()


# Results
print "Slow algorithm:  %.5f seconds." %(t1_stop - t1_start)
print "Fast algorithm:  %.5f seconds." %(t2_stop - t2_start)
print "Joern algorithm: %.5f seconds." %(t3_stop - t3_start)
print "Same result:    ", shared1 == shared2 == shared3

Although not shown here both methods do not only work on numbers, but on strings, or anything else that is sortable too…

4 Responses to “Python: Finding shared items”

  1. Jörn says:

    Hiho,
    why don’t you use:
    sl1 = set(list1)
    sl2 = set(list2)
    shared = sl1 & sl2

    Jörn

  2. admin says:

    Didn’t occur to me :)
    I guess, my way of thinking was stuck in solving it, iterating over the items…
    Big thanks Jörn =D

    Edit:
    I put your hint in the implementation and also compared their runtimes.
    Yours is clear faster.

  3. Cristobal Jeannoel says:

    Mr or Mrs Webmaster i just had a alert from my antivirus when i opened your website do you know why this occured? Could it be from your ads or something? Thanks, really odd i pray it was harmless?

  4. admin says:

    Hi Cristobal,
    I’m not sure, how that came. I just checked for odd activity (users/processes/network…) on the server and on this page (xss in the comments/new users/newly uploaded stuff…) but couldn’t find anything. Maybe it’s just false alarm. Anyway, I’ll keep an eye on that. Thanks for noticing me.

    Cheers,
    Daniel

Leave a Reply