0

I have a list like this:

[[{'contributionScore': 0.841473400592804, 'variable': 'series_2'},
  {'contributionScore': 0.6113986968994141, 'variable': 'series_3'},
  {'contributionScore': 0.5985525250434875, 'variable': 'series_1'},
  {'contributionScore': 0.5641148686408997, 'variable': 'series_4'},
  {'contributionScore': 0.138543963432312, 'variable': 'series_0'}],

 [{'contributionScore': 1.1316605806350708, 'variable': 'series_1'},
  {'contributionScore': 0.5188271403312683, 'variable': 'series_4'},
  {'contributionScore': 0.38711458444595337, 'variable': 'series_3'},
  {'contributionScore': 0.35055238008499146, 'variable': 'series_0'},
  {'contributionScore': 0.06044715642929077, 'variable': 'series_2'}]]

How can I obtain a dataframe with a column for each series?

I'd like to get a dataframe with contributionScore for each series.

Thanks!

2 Answers 2

1

I am a bit confused with the statement

How can I obtain a dataframe with a column for each series?

if you meant a single column, for all the series data with column "variable" then Celius Stingher's answer should be good enough.

If you meant as in each series value as its own individual column, I will extend on Celius's answer as :

##As already stated above
df = pd.concat([pd.DataFrame(x) for x in raw_list])
##To get a sorted list of unique Series values
series_list = sorted(df['variable'].unique())
##We first get a dictionary where each key is the unique series value and each dictionary value is the list of contributionScore unique to that series value. We turn it into a DataFrame in the end
series_df = pd.DataFrame({series : list(df[df['variable'] == series]["contributionScore"]) for series in series_list})

The output will look like

    series_0    series_1    series_2    series_3    series_4
0   0.138544    0.598553    0.841473    0.611399    0.564115
1   0.350552    1.131661    0.060447    0.387115    0.518827

A reminder that this will work only when the series values all have the same count of contribution score.(all series have 2 contribution scores each above)

If each series has different counts of contribution score values, this will work when the third statement is replaced with the line shown below:

## We turn each "series" value and their contribution score as DataFrame and concatenate them to accommodate for the varying array lengths of each "series" column.
series_df = pd.concat([pd.DataFrame({series : list(df[df['variable'] == series]["contributionScore"])}) for series in series_list], axis = 1)

Example : If series_3 had 3 contribution Scores it will look like this

    series_0    series_1    series_2    series_3    series_4
0   0.138544    0.598553    0.841473    0.611399    0.564115
1   0.350552    1.131661    0.060447    0.387115    0.518827
2   NaN         NaN         NaN         1.200000    NaN

What pd.concat does here is that it allows us to join pandas DataFrames of different column lengths together. It fills the gap values with NaN. Something that wasnt possible with a mere pd.DataFrame() all together before. The "axis = 1" param tells the function to join the DataFrames created in the list to be "Concatenated" along the columns each.

Sign up to request clarification or add additional context in comments.

2 Comments

this is great! thanks
@lucacanonico. I have added a caveat and a workaround for that as well. please make sure you check that out. And mark the answer as complete if it works for you!
1

You should be able to create a dataframe using pd.DataFrame(). Since each element in the list would be a dataframe itself, you can try using a list comprehension.

Let's say the list its called "raw_list":

df = pd.concat([pd.DataFrame(x).pivot_table(columns='variables') for x in raw_list])

This would output:

   contributionScore  variable
0           0.841473  series_2
1           0.611399  series_3
2           0.598553  series_1
3           0.564115  series_4
4           0.138544  series_0

EDIT:

Given OPs comment, we should pivot the table first so:

df = pd.concat([pd.DataFrame(x).pivot_table(columns='variables') for x in raw_list])

Outputting:

variable           series_0  series_1  series_2  series_3  series_4
contributionScore  0.138544  0.598553  0.841473  0.611399  0.564115
contributionScore  0.350552  1.131661  0.060447  0.387115  0.518827

2 Comments

yes but in this way I append rows I would like to get a dataframe with columns contributionScore, series_0, series_1, series_2, series_3, series_4
Thanks I didn't understand what was your expected output. Please remember to include it to make it easier for us to understand what's needed. Then it's as easy as pivoting the table. Please check the edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.