I used pd.GetDummies to one hot encode my categorical variables to be used as predictors. For some of my columns that had many unique values, I have many new columns and I am trying to find a fast way to create interaction terms for these. (I only want interactions for a subset of my columns, so PolynomialFeatures() won't work...or will it?)
Here is what I am trying to do:
Step 1: Create lists of column names for each of the subset I want to multiply:
channel = [col for col in df if col.startswith('channel')]
quote = [col for col in df if col.startswith('quote')]
print(channel[:1])
Out: 'channel_A'
'channel_B'
Step 2: for loop:
cols = 'channel quote'.split()
for col in cols:
for i in col:
colname = 'value_X_'+i
df[colname] = df['value_days']*df[i]+0
The problem is that the inner loop does not recognize col as an object: it recognizes it as a string (error = 'c', evidenced by:
for col in cols:
for i in col:
print i
Out[1]:
c
h
.
.
.
o
t
e
Goal: My desired outcome is to get a new column that is named for the two columns were originally multiplied and has values for the multiplication.
For example, the first element in channel is channel_A, so I want to get a new column named value_X_channel_A and it should have values that are equivalent to the product of value_days*channel_A.
value_days | channel_A | value_X_channel_A
-------------------------------------------
5 |5 |25
This works perfectly fine if I just run the inner loop and replace col with channel.
How can I get this to work?
Thanks in advance.
channel, but the actual column names arechannel_A,channel_B, and so on. That's the reason for the inner loop: I need to loop through the list of column names starting with'channel'.