1

I am trying to create a new variable from a list ('provider') that checks if some ids are present in another column in the data frame:

import pandas as pd

xx = {'provider_id': [1, 2, 30, 8, 8, 7, 9]}
xx = pd.DataFrame(data=xx)

ids = [8,9,30]
names = ["netflix", "prime","sky"]

for id_,name in zip(ids,names):
    provider = []
    if id_ in xx["provider_id"]:
       provider.append(name)
provider

excpected result:

['netflix', 'prime', 'sky']

actual result:

['sky']

So the for loop keeps overwriting the result of name inside the loop? This functionality seems weird to me and I honestly don't know how to prevent this other then to write three individual if statements.

0

3 Answers 3

4

Your loop keeps initialising the list. Move the list outside the loop:

provider = []
for id_,name in zip(ids,names):
    if id_ in xx["provider_id"]:
        provider.append(name)
print(provider)
Sign up to request clarification or add additional context in comments.

Comments

1

Scrap the loops altogether and use the built-in pandas methods. It will work much faster.

df = pd.DataFrame({'ids': [8,9,30], 'names': ["netflix", "prime","sky"]})

cond = df.ids.isin(xx.provider_id)

df.loc[cond, 'names'].tolist()

['netflix', 'prime', 'sky']

Comments

1

One way to make this more efficient is using sets and isin to find the matching ids in the dataframe, and then a list comprehension with zip to keep the corresponding names.

The error as @quamrana points out is that you keep resetting the list inside the loop.

s = set(xx.loc[xx.isin(ids).values, 'provider_id'].values)
# {8, 9, 30}
[name for id_, name in zip(ids, names) if id_ in s]
# ['netflix', 'prime', 'sky']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.