Pandas: Iterate over existing columns and create new columns based on conditionals

Question

The best version of a question that relates to my question is found here. But I'm running into a hiccup somewhere.

My dataframe:

df = pd.DataFrame({'KEY': ['100000003', '100000009', '100000009', '100000009'], 
              'RO_1': [1, 1, 4,1],
              'RO_2': [1, 0, 0,0],
              'RO_3': [1, 1, 1,1],
              'RO_4': [1, 4, 1,1]})

    KEY         RO_1  RO_2   RO_3 RO_4 
0   100000003   1      1     1    1   
1   100000009   1      0     1    4    
2   100000009   4      0     1    1    
3   100000009   1      0     1    1

I want to create 3 addition columns labeled 'Month1', 'Month2', to 'Month4'. Something simple like:

for i in range(3):
    df.loc[1,'Month'+str(i)] = 1 # '1' is just there as a place holder

Although I'm getting a warning message when I execute this code:

"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"

I want to combine this with conditionals to fill in each cell for each column and each row.

The code below will create one one column and flag based on the condition if any column with RO_ has either condition

namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for col in ROList:
    for i in range(3):
        df['Month'] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0') 
df

I treid combining the two codes but I am missing a fundamental understanding of how to do this. Any help would be great.

Final expected result:

    KEY         RO_1  RO_2   RO_3 RO_4 Month1 Month2 Month3 Month4
0   100000003   1      1     1    1    1      1      1      1
1   100000009   1      0     1    4    1      0      1      1
2   100000009   4      0     1    1    1      0      1      1  
3   100000009   1      0     1    1    1      0      1      1

cs95 · Accepted Answer · 2018-02-08 21:29:25Z

2

Use filter + isin + rename, for a single pipelined transformation of your data.

v = (df.filter(regex='^RO_')    # select columns
      .isin([4, 1])             # check if the value is 4 or 1
      .astype(int)              # convert the `bool` result to `int`
      .rename(                  # rename columns
          columns=lambda x: x.replace('RO_', 'Month')
      ))

Or, for the sake of performance,

v = df.filter(regex='^RO_')\
          .isin([4, 1])\
          .astype(int) 
v.columns = v.columns.str.replace('RO_', 'Month')

Finally, concatenate the result with the original.

pd.concat([df, v], axis=1)

         KEY  RO_1  RO_2  RO_3  RO_4  Month1  Month2  Month3  Month4
0  100000003     1     1     1     1       1       1       1       1
1  100000009     1     0     1     4       1       0       1       1
2  100000009     4     0     1     1       1       0       1       1
3  100000009     1     0     1     1       1       0       1       1

edited Feb 8, 2018 at 21:29

answered Feb 8, 2018 at 21:21

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

BENY Over a year ago

Yep, Without loop :-)

cs95 Over a year ago

@Wen It took a little time to figure out a non-loopy solution... but yours also answers the Q perfectly.

CandleWax Over a year ago

Curious about the advantage and disadvantage of doing without a loop.

cs95 Over a year ago

@MartyBobak A non-loopy solution is usually much faster. Disadvantage, less readability ;)

BENY · Accepted Answer · 2018-02-08 21:31:19Z

IIUC enumerate

namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for i,col in enumerate(ROList):

    df['Month'+str(i+1)] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0')
df
Out[194]: 
         KEY  RO_1  RO_2  RO_3  RO_4 Month1 Month2 Month3 Month4
0  100000003     1     1     1     1      1      1      1      1
1  100000009     1     0     1     4      1      0      1      1
2  100000009     4     0     1     1      1      0      1      1
3  100000009     1     0     1     1      1      0      1      1

Your logic seems like change 4 to 1

df.assign(**df.loc[:,ROList].mask(df.loc[:,ROList]==4,1).rename(columns=dict(zip(ROList,list(range(1,len(ROList)+1))))).add_prefix('Month'))
Out[15]: 
         KEY  RO_1  RO_2  RO_3  RO_4  Month1  Month2  Month3  Month4
0  100000003     1     1     1     1       1       1       1       1
1  100000009     1     0     1     4       1       0       1       1
2  100000009     4     0     1     1       1       0       1       1
3  100000009     1     0     1     1       1       0       1       1

Yilun Zhang · Accepted Answer · 2018-02-08 21:20:20Z

0

Seems like you are creating a new column for each existing column in your dataframe. You can do something like:

original_cols = df.columns
for c in original_cols:
    cname = "Month" + c.split("_")[-1]
    df[cname] = df[c].apply(lambda x: 1 if (x == 1) or (x == 4) else 0)

answered Feb 8, 2018 at 21:20

Yilun Zhang

9,0685 gold badges35 silver badges68 bronze badges

Collectives™ on Stack Overflow

Pandas: Iterate over existing columns and create new columns based on conditionals

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related