Python multiple pivot from the same column

Question

I have a dataframe with just one column with content like:

view: meta_record_extract
dimension: e_filter
type: string
hidden: yes
sql: "SELECT * FROM files"
dimension: category
type: string
...

What I tried to produce would be a dataframe with columns and the data like this:

____________________________________________________________________    
view                    | dimension |label | type  | hidden | sql      |
     meta_record_extract| e_filter  | NaN  | string| yes    |"SELECT * FROM files" 
NaN                     | category  | NaN  | string ...

Given that splitting the string data like

df.header[0].split(': ')[0]

was giving me label with [0] or value with [1] I tried this:

df.pivot_table(df, columns = df.header.str.split(': ')[0], values = df.header.str.split(': ')[1])

but it did not work giving the error.

Can anyone help me to achieve the result I need?

SeaBean · Accepted Answer · 2021-10-12 15:07:57Z

1

Use `str.findall()` + `map`, as follows:

str.findall() helps you extract the keyword and value pairs into a list. We then map the list of keyword-value pairs into a dict for pd.Dataframe to turn the dict into a dataframe.

(Assuming the column label of your column is Col1):

df_extract = df['Col1'].str.findall(r'(\w+):\s*(.*)')

df_result = pd.DataFrame(map(dict, df_extract))

Result:

print(df_result)



                  view dimension    type hidden                    sql
0  meta_record_extract       NaN     NaN    NaN                    NaN
1                  NaN  e_filter     NaN    NaN                    NaN
2                  NaN       NaN  string    NaN                    NaN
3                  NaN       NaN     NaN    yes                    NaN
4                  NaN       NaN     NaN    NaN  "SELECT * FROM files"
5                  NaN  category     NaN    NaN                    NaN
6                  NaN       NaN  string    NaN                    NaN

Update

To compress the rows to minimize the NaN's, we can further use .apply() with .dropna(), as follows:

df_compressed = df_result.apply(lambda x:  pd.Series(x.dropna().values))

Result:

print(df_compressed)


                  view dimension    type hidden                    sql
0  meta_record_extract  e_filter  string    yes  "SELECT * FROM files"
1                  NaN  category  string    NaN                    NaN

edited Oct 12, 2021 at 15:07

answered Oct 12, 2021 at 14:54

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

18 Comments

SeaBean Over a year ago

@RandyMcKay We can do that. But since you can have some keywords appear more than once and some others only once, it's inevitable that it still leave with some NaN. Anyway, we can minimize that. Will edit the solution for that. Stay tuned.

RandyMcKay Over a year ago

amazing! Thank you a lot!

SeaBean Over a year ago

@RandyMcKay Sorry, not quite understand what you mean, especially the statement I see the dimensions for the first view is below under the other views now.. Can you elaborate ? Is that related to data not in the sample data ?

SeaBean Over a year ago

@RandyMcKay Let me clarify a bit more. Is the relative sequence within one particular column retained or shuffled ? I mean within one particular column, not between columns. This kind of compression is working on column by column. It simply ignore relative sequence between columns.

SeaBean Over a year ago

@RandyMcKay Let's consider only one column. Let's say view. For its values in df_result, assume 3 values in sequence view1, view2, view3. You mean after compression, it becomes e.g. view1, view3, view2 ? Is that true ? If true, it's weird. As the sorted function I just gave you provides stable sort, that's mean, it will maintain sequence.

|

Collectives™ on Stack Overflow

Python multiple pivot from the same column

1 Answer 1

Use `str.findall()` + `map`, as follows:

Update

18 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Use str.findall() + map, as follows:

Update

18 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Use `str.findall()` + `map`, as follows: