1

I have a csv file, showing below:

enter image description here

I am trying to create column for each title and also trying to create columns for each type of genre_and_votes so that the output is something like below : enter image description here

My code is given below:

import pandas as pd
df = pd.read_csv("C:\\Users\\mysite\\Desktop\\practice\\book1.csv")
#print(df)
 print(df['Title'].values,df['genre_and_votes'].values)

Now for the code above, it creates a df but not be able to create coulmns for each genre and votes, I am not sure how to do this now, need help.

6
  • please provide your data as text Commented Sep 14, 2021 at 14:02
  • @mozway. I setup a MRE if you want. Commented Sep 14, 2021 at 14:18
  • @mozway I have provide data in the code snipet Commented Sep 14, 2021 at 14:19
  • @Corralien thanks! I provided another solution, I hope you'll like it ;) Commented Sep 14, 2021 at 14:28
  • @mozway thanks for your code. Actually I am new in coding so I feel regex is a somehow complicated for an amateur like me, is there any easy way to resolve it? Commented Sep 14, 2021 at 14:38

3 Answers 3

2

Use str.split and str.rsplit before pivot your dataframe and merge new columns with your original dataframe:

Setup a MRE

df = pd.DataFrame({'title': ['Inner Circle', 'A Time to Embrace'],
                   'genre_and_votes': ['Young adult 161, Mystery 45, Romance 32',
                                       'Christian Fiction 114, Romance 16']})
print(df)


# Output
               title                          genre_and_votes
0       Inner Circle  Young adult 161, Mystery 45, Romance 32
1  A Time to Embrace        Christian Fiction 114, Romance 16

Code:

out = df['genre_and_votes'].str.split(',').explode() \
                           .str.rsplit(' ', 1, expand=True) \
                           .pivot(columns=0, values=1)

df = pd.concat([df.drop(columns='genre_and_votes'), out], axis=1)

Final output

>>> df
               title  Mystery  Romance Christian Fiction Young adult
0       Inner Circle       45       32               NaN         161
1  A Time to Embrace      NaN       16               114         NaN
Sign up to request clarification or add additional context in comments.

4 Comments

@Corralien, I am getting an error here mentioned below: File "G:\conda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/python/Desktop/book1.py", line 13, in <module> out = df['genre_and_votes'].str.split(',').explode() \ File "G:\conda3\lib\site-packages\pandas\core\generic.py", line 3614, in getattr return object.__getattribute__(self, name) AttributeError: 'Series' object has no attribute 'explode'
Do you use a version of Pandas < 0.25.0??? You have to update with conda update pandas.
@Corralien: its 0.22.0
It could be hard to get help with an older version and you will be limited without new features of Pandas. Pandas before version 1.0.0 are subject to many changes without keep compatibility with older versions.
1

Here is a solution using extractall, a regex with named capturing groups, and pivot:

(df.join(df['genre_and_votes'].str.extractall('(?P<genre>[^,]+) (?P<value>\d+)').droplevel('match'))
   .pivot(index='title', columns='genre', values='value')
)

output:

genre              Mystery  Romance Christian Fiction Young adult
title                                                            
A Time to Embrace      NaN       16               114         NaN
Inner Circle            45       32               NaN         161

Comments

0

There is a "pivot" function https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.