creating a new dataframe from array of values

Question

0    {'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]}
1    {'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}

Just learning with pandas and I somehow parsed a complex data like this. But how can we create a new pandas dataframe from the array values of the key needed by ignoring the first empty string value and using just the other 2 values in 2 new pandas column named name & value?

ExpectedOutput(Two columns with numbered index)

0    {'name': 'PPP', 'value': 8.414448}
1    {'name': 'FFF', 'value': 7.414448}

Could you add an expected output to your post? It would be easier to understand what should be done. — Arkadiusz
– Arkadiusz, Commented Mar 14, 2021 at 19:56
You wrote you wanna see 2 values in 2 new pandas columns, but looking at the input and the output, it seems you have a dataframe with two long strings in one column and you would like to have them converted to other two strings in one column, but with different pattern. — Arkadiusz
– Arkadiusz, Commented Mar 14, 2021 at 20:14
@Arkadiusz I only needed the values in the key needed. Its an array and want to create a new dataframe from that with each item in the array as a new column. As the first item in the array is an empty string I want to ignore that & just use the last 2 values — Abhilash
– Abhilash, Commented Mar 14, 2021 at 20:21

Georgina Skibinski · Accepted Answer · 2021-03-14 20:20:54Z

2

Assuming your Series has regular schema i.e. all rows have same dict keys, and level of nesting you're touching:

ds1 = ds.str["needed"].str[1:]
ds2 = pd.DataFrame(ds1.to_list(), columns = ["name", "value"])
ds3 = pd.Series(ds2.to_dict("record"))

For the input in pd.Series format:

import pandas as pd

ds = pd.Series([{'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]},
{'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}])

Now to explain steps:

ds1 - the way to interact with list or dict in pandas row is by invoking .str[key] where key can be either dict key or list reference.

ds2 - is the way to break ds1 into columns, with predefined names.

ds3 - to_dict("record") will convert your data frame into list, where each row is represented by single entry of the format {column1_name: column1_value_rowN, column2_name: column2_value_rowN, ...}

answered Mar 14, 2021 at 20:20

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Abhilash Over a year ago

This works. Thank you for taking the time to explain this. Just one doubt, Is it possible to ignore the first empty string during creation of the dataframe?

Georgina Skibinski Over a year ago

No worries, that's exactly what .str[1:] is doing

Abhilash Over a year ago

yes indeed. Just figured that out. Thank you.

Collectives™ on Stack Overflow

creating a new dataframe from array of values

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related