Set Column Value of Pandas Dataframe Based on Variable

Question

I have the following dataframe:

    col1   col2
0    a      7                    
1    b      3                  
2    c      1                  
3    d      6

I'm trying to add a new column to the dataframe, with the value equal to a variable x. This variable will depend on the values of col1 and col2. I have tried:

for row in df:
    row['col3'] = x

However I get the following error:

TypeError: 'tuple' object does not support item assignment

I had a look into iterrows() however I'm not sure this is the right approach. According to the documentation:

"You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect."

Edit - Additional Info:

What I'm trying to do is create a new dataframe with col3 being a string based on a pre-sorted order of the dataframe. For example, the following dataframe:

    col1   col2
0    a      7                    
1    b      3                  
2    c      1                  
3    d      6

Should become:

    col1   col2   col 3 
0    a      7      001              
1    b      3      002            
2    c      1      003            
3    d      6      004

Where col3 is a string in the format '000' (i.e. with leading zeros where applicable so that the string always contains 3 characters). There will never be more than 999 rows in the dataframe.

This is the code I have so far:

x = 1

for row in df:

    if x < 10:
        formatting = str('00' + str(x))
    elif x < 100:
        formatting = str('0' + str(x))
    else:
        formatting = str(str(x))

    x += 1

    row['col3'] = x

However this seems to change the col3 values for all rows in the dataframe, intsead of just the row in the loop. For example after 4 loops the result is:

    col1   col2   col 3 
0    a      7      004              
1    b      3      004            
2    c      1      004            
3    d      6      004

You can apply a function to a dataframe using apply. pandas.pydata.org/pandas-docs/stable/reference/api/… — Michael Gardner
– Michael Gardner, Commented Aug 30, 2019 at 4:15

jezrael · Accepted Answer · 2019-08-30 08:05:34Z

1

EDIT:

Better here is use Series.str.zfill with index values converted to strings:

df['col3'] = (df.index + 1).astype('str').str.zfill(3)
print (df)
  col1  col2 col3
0    a     7  001
1    b     3  002
2    c     1  003
3    d     6  004

If index is not default RangeIndex create helper Series:

df['col3'] = pd.Series(np.arange(1, len(df) + 1)).astype('str').str.zfill(3)

edited Aug 30, 2019 at 8:05

answered Aug 30, 2019 at 4:10

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Alan Over a year ago

Thanks jezrael. I tried this and it sucessfully creates col3 with string value equal to the index + 1. However since the dataframe will be sorted, the index will not always start at zero, causing col3 to be based on the original index rather than the new order after sorting. Do I need to reset the index after sorting?

jezrael Over a year ago

@Alan - If not default index, use second solution.

jezrael Over a year ago

Or exacly df = df.reset_index(drop=True) and first solution.

Alan Over a year ago

Thank you :) I'd never previously used zfill - this will be a huge time saver.

Collectives™ on Stack Overflow

Set Column Value of Pandas Dataframe Based on Variable

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related