3

Update at end Update 2 at end

I read from here: get list from pandas dataframe column

Pandas DataFrame columns are Pandas Series when you pull them out

However this is not true in my case:

First part (building up the DataFrame reading json scraped) Because it contains business info I cannot show the full code, but basically it reads one row of data (stored in Series) and append at the end of the DataFrame.

dfToWrite = pandas.DataFrame(columns=[lsHeader]) # Empty with column headers
for row in jsAdtoolJSON['rows']:
    lsRow = []
    for col in row['row']:
        lsRow.append((col['primary'])['value'])
    dfRow = pandas.Series(lsRow, index = dfToWrite.columns)
dfToWrite = dfToWrite.append(dfRow, ignore_index = True)

Next part (check type): (Please ignore the functionality of the function)

def CalcMA(df: pandas.DataFrame, target: str, period: int, maname: str):
    print(type(df[target]))

Finally call the function: ("Raw_Impressions" is a column header)

CalcMA(dfToWrite, "Raw_Impressions", 5, "ImpMA5")

Python console shows:

class 'pandas.core.frame.DataFrame'

Additional Question: How to get a list from a Dataframe column if it's not a Series (in which case I can use tolist())?

Update 1 From here: Bokeh: AttributeError: 'DataFrame' object has no attribute 'tolist'

I figured out that I need to use .value.tolist(), however it still doesn't explain why I'm getting another Dataframe, not a Series when I pull out a column.

Update 2 Found out that df has MultiIndex, very surprised:

MultiIndex(levels=[['COST_/CPM', 'CTR', 'ECPM/_ROI', 'Goal_Ratio', 'Hour_of_the_Day', 'IMP./Joins', 'Raw_Clicks_/_Unique_Clicks', 'Raw_Impressions', 'Unique_Goal_/_UniqueGoal_Forecasted_Value']], labels=[[4, 7, 5, 6, 1, 8, 3, 0, 2]])

I don't see the labels when printing out the df / writing to .csv, it's just a normal DataFrame. Not sure where did I get the labels.

2 Answers 2

7

I think you have duplicated columns names, so if want select Series get DataFrame:

df = pd.DataFrame([[1,2],[4,5], [7,8]], index=list('aab')).T
print (df)
   a  a  b
0  1  4  7
1  2  5  8

print (df['a'])
   a  a
0  1  4
1  2  5

print (type(df['a']))
<class 'pandas.core.frame.DataFrame'>

print (df['b'])
0    7
1    8
Name: b, dtype: int64

print (type(df['b']))
<class 'pandas.core.series.Series'>

EDIT:

Here is another problem, one level MultiIndex, solution is reassign first level back to columns with get_level_values:

mux = pd.MultiIndex([['COST_/CPM', 'CTR', 'ECPM/_ROI', 'Goal_Ratio', 'Hour_of_the_Day', 
                      'IMP./Joins',  'Raw_Clicks_/_Unique_Clicks', 'Raw_Impressions',
                      'Unique_Goal_/_UniqueGoal_Forecasted_Value']], 
labels=[[4, 7, 5, 6, 1, 8, 3, 0, 2]])

df = pd.DataFrame([range(9)], columns=mux)
print (type(df['CTR']))
<class 'pandas.core.frame.DataFrame'>

df.columns = df.columns.get_level_values(0)
print (type(df['CTR']))
<class 'pandas.core.series.Series'>
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks @jezrael, just printed out the columns and found out that actually I have a MultiIndex, does it cause the issue? (The levels do not have duplicates) It's very strange, because when I print(df) it doesn't show any of the labels. I'll update the question with labels.
@NicholasHumphrey - yes, if MultiIndex then it means duplicated first level :)
@NicholasHumphrey - Added solution for your situtation.
thanks! I'll take a look in the morning but I think it will work. Now next step is to track down the source of MultiIndex...
@NicholasHumphrey - yes, this kind of error is very unpleasent, especially because not seen if print DataFrame.
|
1

Each instance of pandas.core.frame.DataFrame is basically an array so if you are getting this type you can get each column ( which if the column is 1 dimensional will be of type pandas.core.series.Series ) by calling df.columns.

df.columns will give you an iterable object that you can loop through to get your values along each row.

You might also want to look at pandas.read_json or other similar package just to get the json directly into a pandas object which might be easier to manage

3 Comments

Thanks @NiallJG I managed to use df(target).values.tolist() to get a list from a column. But it still confuses me why df(target), in which target is just a string, does not represent a Series
@jezrael 's answer suggests that maybe there are duplicate columns, try running print(df.columns) and see what the column headings are named, maybe there are two of the same string
thanks yeah I found out there is multiindex, I'll check the previous code to find the source of that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.