Python: for loop iteration through objects, not strings

Question

I used pd.GetDummies to one hot encode my categorical variables to be used as predictors. For some of my columns that had many unique values, I have many new columns and I am trying to find a fast way to create interaction terms for these. (I only want interactions for a subset of my columns, so PolynomialFeatures() won't work...or will it?)

Here is what I am trying to do:

Step 1: Create lists of column names for each of the subset I want to multiply:

channel = [col for col in df if col.startswith('channel')]
quote = [col for col in df if col.startswith('quote')]

print(channel[:1])
Out: 'channel_A'
     'channel_B'

Step 2: for loop:

cols = 'channel quote'.split()
for col in cols:
    for i in col:
        colname = 'value_X_'+i
        df[colname] = df['value_days']*df[i]+0

The problem is that the inner loop does not recognize col as an object: it recognizes it as a string (error = 'c', evidenced by:

for col in cols:
    for i in col:
        print i

Out[1]: 
c
h
.
.
.
o
t
e

Goal: My desired outcome is to get a new column that is named for the two columns were originally multiplied and has values for the multiplication.

For example, the first element in channel is channel_A, so I want to get a new column named value_X_channel_A and it should have values that are equivalent to the product of value_days*channel_A.

value_days | channel_A | value_X_channel_A
-------------------------------------------
5          |5          |25

This works perfectly fine if I just run the inner loop and replace col with channel.

How can I get this to work?

Thanks in advance.

String are objects....and you are iterating character (I.e c,h,a,n) inside a string (I.e channel, quote), what do you expect to get? — Marcus.Aurelianus
– Marcus.Aurelianus, Commented Jul 22, 2018 at 0:41
Thank you for clarifying that @Marcus.Aurelianus. I edited the question to answer this. — NLR
– NLR, Commented Jul 22, 2018 at 0:55
Thanks for the suggestion, but that doesn't get the unique names of each column. That would just give me channel, but the actual column names are channel_A, channel_B, and so on. That's the reason for the inner loop: I need to loop through the list of column names starting with 'channel'. — NLR
– NLR, Commented Jul 22, 2018 at 1:06
ok, you're hard on using for loops? can't you just put the processing in a function and apply it to the dataframe? — skrubber
– skrubber, Commented Jul 22, 2018 at 1:16

ICW · Accepted Answer · 2018-07-22 01:24:37Z

1

Your question is worded in a way that is hard to understand (for me at least). If I'm right about what you want, you wish to multiply each column with a name starting with "channel" or "quote" by the column "value_days" stored in your df, and then store that in a new column named value_X_{i} where {i} is the name of the column that was multiplied. You're close, but you're code is awkward. Use another data structure (Dictionary) to make the code straightforward and readable:

d = { 
    'quote' : [col for col in df if col.startswith('quote')],
    'channel' : [col for col in df if col.startswith('channel')]
}

for columns_string, columns in d.items():
    for col_string in columns:
        colname = 'value_X_'+col_string
        df[colname] = df['value_days'] * df[i] + 0

Explanation:

d = ... - Creates a dictionary with two key value pairs 'quote' and 'channel' with values equal to a list of the desired column names.

for column_string, columns in d.items(): - .items() returns an iterator to a dictionaries key/value pairs, we then loop through this naming each key 'column_string' and the column-names-list is stored in the variable 'columns'.

You can quickly realize that something is wrong with your code by noticing that you create variables channel and quote and set them to there corresponding values, but you never actually use either of those lists in your code.

answered Jul 22, 2018 at 1:24

ICW

5,9146 gold badges32 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

NLR Over a year ago

yeah, the poor wording of my question reflects how comfortable I am with coding. I need to get better at using dictionaries and this is a perfect example of how useful they can be. Thanks for taking the time to explain this!

ICW Over a year ago

@NLR glad I could help. Dictionaries are bread and butter in Python, they're extremely flexible and can be applied to solve endless numbers of problems.

Marcus.Aurelianus · Accepted Answer · 2018-07-22 02:36:00Z

1

Oh I see, in your function you are basically calling 'channel' string. But to loop through value from channel variable, you need to convert string to variable with vars function first.

Example:

channel=['channel_A','channel_B']
quote=['quote_A','quote_B']

cols = 'channel quote'.split()

for col in cols:
    var=vars()[col]
    for ele in var:
        print(ele)

Output:

channel_A
channel_B
quote_A
quote_B

for your function, change it to:

cols = 'channel quote'.split()
for col in cols:
    for i in vars()[col]:
        colname = 'value_X_'+i
        df[colname] = df['value_days']*df[i]+0

Feel free to ask if you are still not clear.

edited Jul 22, 2018 at 2:36

answered Jul 22, 2018 at 1:14

Marcus.Aurelianus

1,51812 silver badges25 bronze badges

5 Comments

ICW Over a year ago

I feel as if it's bad practice to access variables in this way if it's not necessary. I could be wrong but that just seems like an awkward way to do it.

Marcus.Aurelianus Over a year ago

@YungGun, basically it is the data structure OP used.

ICW Over a year ago

I'm not sure what you mean @Marcus.Aurelianus

Marcus.Aurelianus Over a year ago

@YungGun, he tends to use a variable to represent all the strings start with that name, channel = [col for col in df if col.startswith('channel')] quote = [col for col in df if col.startswith('quote')]

NLR Over a year ago

@Marcus.Aurelianus: Thank you for taking the time to answer this. Even if using vars() isn't ideal, I learned that this function exists, which is a big help in and of itself.

Collectives™ on Stack Overflow

Python: for loop iteration through objects, not strings

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related