1

I am iterating over a pandas table using the itertuples() iterator function. I would like to set a value in another column when a condition is True. Thats easy. But I would like to set another value based on the previously set value to another column again and thats not working. I have to iterrate a second time to do that, but this is inefficient. How can i set multiple values in different columns within one iteration process.

Here is some example code:

data = {
'Animal': ['cat', 'dog', 'dog', 'cat', 'bird', 'dog', 'cow'],
'Noise': ['muh', 'miau', 'wuff', 'piep', 'piep', 'miau', 'muh']
}

df = pd.DataFrame(data)
df.insert(loc=2, column='Match', value='')
df.insert(loc=3, column='Comment', value='')
for row in df.itertuples():
    if row.Animal == 'cat' and row.Noise == 'miau':
        df.set_value(index=row.Index, col='Match', value=True)
    elif row.Animal == 'dog' and row.Noise == 'wuff':
        df.set_value(index=row.Index, col='Match', value=True)
    elif row.Animal == 'bird' and row.Noise == 'piep':
        df.set_value(index=row.Index, col='Match', value=True)
    elif row.Animal == 'cow' and row.Noise == 'muh':
        df.set_value(index=row.Index, col='Match', value=True)

    # Why is this not getting applied to the 'Comment' column?
    if row.Match is True:
        df.set_value(index=row.Index, col='Comment', value='yeah')

I have to do another iteration instead to get the Comment-column filled:

for row in df.itertuples():
    if row.Match is True:
        df.set_value(index=row.Index, col='Comment', value='yeah')

But with i.e. 500000+ values this is very inefficient and time consuming. So what is a better way to do something like that?

4
  • Why not do it in the same loop? It looks like you're always setting them at the same time Commented Mar 14, 2017 at 21:50
  • Also, consider using a single conditional and using or. That way, you don't need to keep repeating code Commented Mar 14, 2017 at 21:51
  • Because it is not working in the same loop. This is exactly the question why it is not possible in the same loop. ;-) Commented Mar 14, 2017 at 22:57
  • I meant you can do df.set_value(index=row.Index, col='Match', value=True) df.set_value(index=row.Index, col='Comment', value='yeah'). But the below is a much better answer Commented Mar 15, 2017 at 19:22

2 Answers 2

1

Consider your df

data = {
'Animal': ['cat', 'dog', 'dog', 'cat', 'bird', 'dog', 'cow'],
'Noise': ['muh', 'miau', 'wuff', 'piep', 'piep', 'miau', 'muh']
}

df = pd.DataFrame(data)

I'd use an initially calculated dictionary defining what a match is. Then, use map to convert and test for equality. After that, I'd use assign to produce the desired columns.

matches = dict(cat='miau', dog='wuff', bird='piep', cow='muh')

match = df.Animal.map(matches) == df.Noise

df.assign(Match=match, Comment=np.where(match, 'yeah', ''))

  Animal Noise  Match Comment
0    cat   muh  False        
1    dog  miau  False        
2    dog  wuff   True    yeah
3    cat  piep  False        
4   bird  piep   True    yeah
5    dog  miau  False        
6    cow   muh   True    yeah

To answer your specific question:
Your row within the loop is no longer attached to the dataframe. So when you make the assignment of True or False to the dataframe with set_value, you won't be able to access that value you just set from row. Instead, use df.get_value

for row in df.itertuples():
    if row.Animal == 'cat' and row.Noise == 'miau':
        df.set_value(index=row.Index, col='Match', value=True)
    elif row.Animal == 'dog' and row.Noise == 'wuff':
        df.set_value(index=row.Index, col='Match', value=True)
    elif row.Animal == 'bird' and row.Noise == 'piep':
        df.set_value(index=row.Index, col='Match', value=True)
    elif row.Animal == 'cow' and row.Noise == 'muh':
        df.set_value(index=row.Index, col='Match', value=True)

    # This should work
    if df.get_value(index=row.Index, col='Match') is True:
        df.set_value(index=row.Index, col='Comment', value='yeah')
Sign up to request clarification or add additional context in comments.

5 Comments

This is definitely a more elegant way to populate the "Match"- and the 'Comment' - Column for this specific example as I did. Never heard of 'map' and 'assign'. Thanks for that idea. Anyway, the focus of the question is how to populated multiple columns/cells within one loop.
Imagine that the 'Match'-Column is calculated from some other Columns in a more complicated way. I calculate the True/False result within an itertuples loop and based an the result i would like to populate another cell and based on that again another cell in another column. But I don´t want to iterate again for every result. Instead it would be nice to do all the cascading calculations within one loop.
Yes, df.get_value() was the missing part for that specific question, thanks!
Anyway, are 'assign' and 'map' iterating internally too? So in your first solution for this specific example the table is getting looped twice, right?
@qfactor they are vectorized assignments. I wouldn't call anything in my proposed solution a "loop". Actually, my apologies. The map is a loop but an optimized one.
0

Instead of

 # Why is this not getting applied to the 'Comment' column?
    if row.Match is True:
        df.set_value(index=row.Index, col='Comment', value='yeah')

you can use this after the for loop.

df['Comment'] = df['Match'].apply(lambda x: 'yeah' if x == True else '')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.