1

Using: Python 2.7 and Pandas 0.11.0 on Mac OSX Lion

I'm trying to create an empty DataFrame and then populate it from another dataframe, based on a for loop.

I have found that when I construct the DataFrame and then use the for loop as follows:

data = pd.DataFrame()
for item in cols_to_keep:
    if item not in dummies:
        data = data.join(df[item])

Results in an empty DataFrame, but with the headers of the appropriate columns to be added from the other DataFrame.

0

2 Answers 2

5

That's because you are using join incorrectly.

You can use a list comprehension to restrict the DataFrame to the columns you want:

df[[col for col in cols_to_keep if col not in dummies]]
Sign up to request clarification or add additional context in comments.

Comments

2

What about just creating a new frame based off of the columns you know you want to keep, instead of creating an empty one first?

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':np.random.randn(5),
                    'b':np.random.randn(5),
                    'c':np.random.randn(5),
                    'd':np.random.randn(5)})
cols_to_keep = ['a', 'c', 'd']
dummies = ['d']
not_dummies = [x for x in cols_to_keep if x not in dummies]
data = df[not_dummies]
data

          a         c
0  2.288460  0.698057
1  0.097110 -0.110896
2  1.075598 -0.632659
3 -0.120013 -2.185709
4 -0.099343  1.627839

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.