4

I have a list:

data_list = ['a.1','b.2','c.3']

And I want to retrieve only strings that start with strings from another list:

test_list = ['a.','c.']

a.1 and c.3 should be returned.

I suppose I could use a double for-loop:

for data in data_list:
    for test in test_list:
       if data.startswith(test):
           # do something with item

I was wondering if there was something more elegant and perhaps more peformant.

2
  • Are the strings in test_list of arbitrary length? Commented Jun 9, 2013 at 0:03
  • @jh314 yea could be any length. Commented Jun 9, 2013 at 0:34

5 Answers 5

14

str.startswith can also take a tuple (but not a list) of prefixes:

test_tuple=tuple(test_list)
for data in data_list:
    if data.startswith(test_tuple):
        ...

which means a simple list comprehension will give you the filtered list:

matching_strings = [ x for x in data_list if x.startswith(test_tuple) ]

or a call to filter:

import operator
f = operator.methodcaller( 'startswith', tuple(test_list) )
matching_strings = filter( f, test_list )
Sign up to request clarification or add additional context in comments.

6 Comments

That's cool! I didn't know startswith could take a tuple :)
Any reason to use x for x instead lambda x:, or vice versa?
Well, there's no call to use a lambda with a list comprehension. But if you're asking about using a list comprehension instead of filter (which could have used a lambda in place of the function returned by methodcaller), then no, no particular reason to use one over the other, I think. I suspect each has similar performance, and I'll let others argue over which is more Pythonic :).
I never used filters/lambdas, list comprehensions or any. Good stuff, i'll look into them more.
I think there's endless argument about filter and map vs list-comprehensions. Personally, I'll use whichever seems clearer to me. In this case, you're literally filtering the data, so I'd say filter is remarkably clear. :-)
|
3

Simply use filter with a lambda function and startswith:

data_list = ['a.1','b.2','c.3']
test_list = ('a.','c.')

result = filter(lambda x: x.startswith(test_list), data_list)

print(list(result))

Output:

['a.1', 'c.3']

Comments

2

Try the following:

for data in data_list:
    if any(data.startswith(test) for test in test_list):
        # do something

any() is a builtin that takes an iterable and returns True on the first value from the iterable that bool's true, else returns False. In my example, I'm using a generator expression, instead of building a list (which would be wasteful).

1 Comment

any() would be good if all that was required was to know if there was a match, but here we need to return the match as well; which is why I upvoted @chepner's answer.
1

Alternatively, break out regular expressions

import re
# build a pattern that matches any of the strings we are interested in 
pattern = re.compile('|'.join(map(re.escape, test_list))) 
# filter by matches
print filter(pattern.match, data_list)

This probably moves the most possible into C and may be more efficient than the other solutions. It may be a bit tricky for the uninitiated to follow though.

2 Comments

If I change test_list to ['2','c.'] I would think this would give me b.2 as well as c.3, but it only gives me c.3. The regex produced is 2|c\., so I don't know why it didn't return b.2.
@AndyArismendi, match only matches at the beginning of strings.
1

Check out filter and any in the python docs.

>>> data_list = ['a.1','b.2','c.3']
>>> test_list = ['a.','c.']
>>> new_list = filter(lambda x: any(x.startswith(t) for t in test_list), data_list)
>>> new_list
['a.1', 'c.3']

Then you can do whatever you want with the stuff in your new_list.

As @Chepner points out, you can also supply a tuple of strings to startswith, so the above could also be written:

>>> data_list = ['a.1','b.2','c.3']
>>> test_tuple = ('a.','c.')
>>> new_list = filter(lambda x: x.startswith(test_tuple), data_list)
>>> new_list
['a.1', 'c.3']

2 Comments

Something wrong with the first one, it did not return something.
sry, you're right. I just ran filter(...) in PyScripter. Usually I see output on the console but it didn't show it this time unless I added print.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.