2

I have a nested for loop something like:

for x in df['text']:
  for i in x:
    if i in someList:
      count++

Where df['text'] is a series of lists containing words such as ['word1', 'word2', 'etc']
I know I can just use the for format but I want to convert it into a lambda function.
I tried doing:
df['in'] = df['text'].apply(lambda x: [count++ for i in x if i in someList]) but it is not proper syntax. How can I modify to get the function to what I desire?

1
  • can you add a sample data and an expected output so we don't have to guess. Will be useful for future readers too.. Commented Jun 28, 2019 at 14:33

3 Answers 3

4

I feel like you need expend the row and doing with isin , since with pandas , we usually try not use for loop .

df['in']=pd.DataFrame(df['text'].tolist(),index=df.index).isin(someList).sum(1)
Sign up to request clarification or add additional context in comments.

Comments

2

You don't need any additional functions. Just create a sequences of ones (one per element) to sum.

count = sum(1 for x in df['text'] for i in x if i in someList)

4 Comments

@OP also possibly faster to use a set instead of someList
Is this output one value or list ?
It's one value; no new lists are created. A generator expression is passed to sum.
I should note that I have another column in my df, 'count', where the count for each text is placed into. The provided code sums up all of the counts I believe because my 'count' column contains all the same number, 60085.
2

Setup

someList = [*'ABCD']
df = pd.DataFrame(dict(text=[*map(list, 'AB CD AF EG BH IJ ACDE'.split())]))

df

           text
0        [A, B]
1        [C, D]
2        [A, F]
3        [E, G]
4        [B, H]
5        [I, J]
6  [A, C, D, E]

Numpy and __contains__

i = np.arange(len(df)).repeat(df.text.str.len())
a = np.zeros(len(df), int)
np.add.at(a, i, [*map(someList.__contains__, np.concatenate(df.text))])
df.assign(**{'in': a})

           text  in
0        [A, B]   2
1        [C, D]   2
2        [A, F]   1
3        [E, G]   0
4        [B, H]   1
5        [I, J]   0
6  [A, C, D, E]   3

map lambda and __contains__

df.assign(**{'in': df.text.map(lambda x: sum(map(someList.__contains__, x)))})

           text  in
0        [A, B]   2
1        [C, D]   2
2        [A, F]   1
3        [E, G]   0
4        [B, H]   1
5        [I, J]   0
6  [A, C, D, E]   3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.