2

I have a list, which may/may not contain duplicate elements. Given another list/set of Elements, I need a list of all unique elements present in that list.

Input: 
input_list = ['android', 'ios', 'android', '', 'none', 'android', 'junk_os']
os_list = ['android', 'ios', 'windows']
Output:
output = ['android', 'ios']

What will be most pythonic and efficient way of doing this ? Length of given list may be around 10, while Os_list is fixed to 3. But this line will be iterated over 10000 times.

2
  • How that out put elements are unique elements? Commented Aug 12, 2015 at 9:42
  • Output elements are unique elements present only in given os_list Commented Aug 12, 2015 at 9:46

2 Answers 2

3

You could use sets to do this

set(os_list).intersection(input_list)

Edit Since the os_list is fixed you may as well have that stored as a set:

os_list = {'android', 'ios', 'windows'}

# then it's a little less work to do each time in the loop
output = os_list.intersection(input_list)

Profiling

# me
os_set = {'android', 'ios', 'windows'}
%timeit os_set.intersection(input_list)
# 1000000 loops, best of 3: 323 ns per loop

# vks
os_list = ['android', 'ios', 'windows']
%timeit [i for i in os_list if i in input_list]
# 1000000 loops, best of 3: 550 ns per loop

Using Padraic Cunningham's method you can avoid the function lookup and scrape a little more performance out of it. As a bonus it ends up looking like a function name that makes sense.

os_set = {'android', 'ios', 'windows'}
unique_valid_devices = os_set.intersection

%timeit output_list = unique_valid_devices(input_list)
1000000 loops, best of 3: 290 ns per loop
Sign up to request clarification or add additional context in comments.

10 Comments

time complexity of intersection ?
@SameerPidadi this is slower than the list method.
@vks not on my machine? 779 ns for your list method vs 324 ns for my intersection
Just seen your updated method (without adding the set component). Comes out as 536 ns on my machine, so still slightly slower.
For small input the regular list comp is always going to be fast, the intersection method call alone is going to is going to be costly in comparison to the list comp, adding times for when no elements in os_list are in input_list should show the set approach to definitely be faster, also adding si = os_set.intersection and si(os_list) should be faster again
|
0
List = ['android', 'ios', 'android', '', 'none', 'android']
OS_list = ['android', 'ios', 'windows']

y=set(List)
print [i for i in OS_list if i in y]

You can use set here which is O(1).

The fastest though is (for smaller List having upto 600-700 elements)

List = ['android', 'ios', 'android', '', 'none', 'android']
OS_list = ['android', 'ios', 'windows']

[i for i in OS_list if i in List]

Time check:

s1="""List = ['android', 'ios', 'android', '', 'none', 'android']
OS_list = ['android', 'ios', 'windows']

[i for i in OS_list if i in List]"""


s2="""List = ['android', 'ios', 'android', '', 'none', 'android']
OS_list = ['android', 'ios', 'windows']
set(List).intersection(OS_list)"""

print timeit.timeit(stmt=s1,number=1000)
print timeit.timeit(stmt=s2,number=1000)

Output:

0.000895947527903
0.00130528204537

9 Comments

What do you need the conversion to a set for?
@Falko if i in y there if you search in list its O(n) but if y is a set its O(1)
I see. But the conversion is O(n), I guess. (At least you have to look at each element once.)
timing 5 elements is not really an accurate reflection of the huge difference between using a set for lookups and a list, you are taking quadratic vs linear performance.
And those timings are not accurate. You shouldn't be profiling things that aren't part of the loop in question. We know os_list is a constant and you don't want to muddy the timings by putting the input_list creation in there.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.