0
0    {'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]}
1    {'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}

Just learning with pandas and I somehow parsed a complex data like this. But how can we create a new pandas dataframe from the array values of the key needed by ignoring the first empty string value and using just the other 2 values in 2 new pandas column named name & value?

ExpectedOutput(Two columns with numbered index)

0    {'name': 'PPP', 'value': 8.414448}
1    {'name': 'FFF', 'value': 7.414448}
4
  • Could you add an expected output to your post? It would be easier to understand what should be done. Commented Mar 14, 2021 at 19:56
  • @Arkadiusz just updated. Sorry for the confusion. Commented Mar 14, 2021 at 20:00
  • You wrote you wanna see 2 values in 2 new pandas columns, but looking at the input and the output, it seems you have a dataframe with two long strings in one column and you would like to have them converted to other two strings in one column, but with different pattern. Commented Mar 14, 2021 at 20:14
  • @Arkadiusz I only needed the values in the key needed. Its an array and want to create a new dataframe from that with each item in the array as a new column. As the first item in the array is an empty string I want to ignore that & just use the last 2 values Commented Mar 14, 2021 at 20:21

1 Answer 1

2

Assuming your Series has regular schema i.e. all rows have same dict keys, and level of nesting you're touching:

ds1 = ds.str["needed"].str[1:]
ds2 = pd.DataFrame(ds1.to_list(), columns = ["name", "value"])
ds3 = pd.Series(ds2.to_dict("record"))

For the input in pd.Series format:

import pandas as pd

ds = pd.Series([{'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]},
{'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}])

Now to explain steps:

ds1 - the way to interact with list or dict in pandas row is by invoking .str[key] where key can be either dict key or list reference.

ds2 - is the way to break ds1 into columns, with predefined names.

ds3 - to_dict("record") will convert your data frame into list, where each row is represented by single entry of the format {column1_name: column1_value_rowN, column2_name: column2_value_rowN, ...}

Sign up to request clarification or add additional context in comments.

3 Comments

This works. Thank you for taking the time to explain this. Just one doubt, Is it possible to ignore the first empty string during creation of the dataframe?
No worries, that's exactly what .str[1:] is doing
yes indeed. Just figured that out. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.