Why are my pandas DataFrame columns Dataframes too, not Series?

Question

Update at end Update 2 at end

I read from here: get list from pandas dataframe column

Pandas DataFrame columns are Pandas Series when you pull them out

However this is not true in my case:

First part (building up the DataFrame reading json scraped) Because it contains business info I cannot show the full code, but basically it reads one row of data (stored in Series) and append at the end of the DataFrame.

dfToWrite = pandas.DataFrame(columns=[lsHeader]) # Empty with column headers
for row in jsAdtoolJSON['rows']:
    lsRow = []
    for col in row['row']:
        lsRow.append((col['primary'])['value'])
    dfRow = pandas.Series(lsRow, index = dfToWrite.columns)
dfToWrite = dfToWrite.append(dfRow, ignore_index = True)

Next part (check type): (Please ignore the functionality of the function)

def CalcMA(df: pandas.DataFrame, target: str, period: int, maname: str):
    print(type(df[target]))

Finally call the function: ("Raw_Impressions" is a column header)

CalcMA(dfToWrite, "Raw_Impressions", 5, "ImpMA5")

Python console shows:

class 'pandas.core.frame.DataFrame'

Additional Question: How to get a list from a Dataframe column if it's not a Series (in which case I can use tolist())?

Update 1 From here: Bokeh: AttributeError: 'DataFrame' object has no attribute 'tolist'

I figured out that I need to use .value.tolist(), however it still doesn't explain why I'm getting another Dataframe, not a Series when I pull out a column.

Update 2 Found out that df has MultiIndex, very surprised:

MultiIndex(levels=[['COST_/CPM', 'CTR', 'ECPM/_ROI', 'Goal_Ratio', 'Hour_of_the_Day', 'IMP./Joins', 'Raw_Clicks_/_Unique_Clicks', 'Raw_Impressions', 'Unique_Goal_/_UniqueGoal_Forecasted_Value']], labels=[[4, 7, 5, 6, 1, 8, 3, 0, 2]])

I don't see the labels when printing out the df / writing to .csv, it's just a normal DataFrame. Not sure where did I get the labels.

jezrael · Accepted Answer · 2019-01-08 06:38:15Z

7

I think you have duplicated columns names, so if want select Series get DataFrame:

df = pd.DataFrame([[1,2],[4,5], [7,8]], index=list('aab')).T
print (df)
   a  a  b
0  1  4  7
1  2  5  8

print (df['a'])
   a  a
0  1  4
1  2  5

print (type(df['a']))
<class 'pandas.core.frame.DataFrame'>

print (df['b'])
0    7
1    8
Name: b, dtype: int64

print (type(df['b']))
<class 'pandas.core.series.Series'>

EDIT:

Here is another problem, one level MultiIndex, solution is reassign first level back to columns with get_level_values:

mux = pd.MultiIndex([['COST_/CPM', 'CTR', 'ECPM/_ROI', 'Goal_Ratio', 'Hour_of_the_Day', 
                      'IMP./Joins',  'Raw_Clicks_/_Unique_Clicks', 'Raw_Impressions',
                      'Unique_Goal_/_UniqueGoal_Forecasted_Value']], 
labels=[[4, 7, 5, 6, 1, 8, 3, 0, 2]])

df = pd.DataFrame([range(9)], columns=mux)
print (type(df['CTR']))
<class 'pandas.core.frame.DataFrame'>

df.columns = df.columns.get_level_values(0)
print (type(df['CTR']))
<class 'pandas.core.series.Series'>

edited Jan 8, 2019 at 6:38

answered Jan 8, 2019 at 6:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Nicholas Humphrey Over a year ago

Thanks @jezrael, just printed out the columns and found out that actually I have a MultiIndex, does it cause the issue? (The levels do not have duplicates) It's very strange, because when I print(df) it doesn't show any of the labels. I'll update the question with labels.

jezrael Over a year ago

@NicholasHumphrey - yes, if MultiIndex then it means duplicated first level :)

jezrael Over a year ago

@NicholasHumphrey - Added solution for your situtation.

Nicholas Humphrey Over a year ago

thanks! I'll take a look in the morning but I think it will work. Now next step is to track down the source of MultiIndex...

jezrael Over a year ago

@NicholasHumphrey - yes, this kind of error is very unpleasent, especially because not seen if print DataFrame.

|

anky · Accepted Answer · 2019-01-08 06:31:37Z

1

Each instance of pandas.core.frame.DataFrame is basically an array so if you are getting this type you can get each column ( which if the column is 1 dimensional will be of type pandas.core.series.Series ) by calling df.columns.

df.columns will give you an iterable object that you can loop through to get your values along each row.

You might also want to look at pandas.read_json or other similar package just to get the json directly into a pandas object which might be easier to manage

edited Jan 8, 2019 at 6:31

anky

75.3k11 gold badges46 silver badges76 bronze badges

answered Jan 8, 2019 at 6:22

NiallJG

1,99120 silver badges25 bronze badges

3 Comments

Nicholas Humphrey Over a year ago

Thanks @NiallJG I managed to use df(target).values.tolist() to get a list from a column. But it still confuses me why df(target), in which target is just a string, does not represent a Series

NiallJG Over a year ago

@jezrael 's answer suggests that maybe there are duplicate columns, try running print(df.columns) and see what the column headings are named, maybe there are two of the same string

Nicholas Humphrey Over a year ago

thanks yeah I found out there is multiindex, I'll check the previous code to find the source of that.

Collectives™ on Stack Overflow

Why are my pandas DataFrame columns Dataframes too, not Series?

2 Answers 2

6 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related