0

I'm working with a dataset available here: https://www.kaggle.com/datasets/lehaknarnauli/spotify-datasets?select=artists.csv. What I want to do is to extract first element of each array in column genres. For example, if I got ['pop', 'rock'] I'd like to extract 'pop'. I tried different approaches but none of them works, I don't know why.

Here is my code:

import pandas as pd

df = pd.read_csv('artists.csv')

# approach 1
df['top_genre'] = df['genres'].str[0]
# Error: 'str' object has no attribute 'str'

# approach 2
df = df.assign(top_genre = lambda x: df['genres'].str[0])
# The result is single bracket '[' in each row. Seems like index=0 refers to first character of a string, not first array element.

# approach 3
df['top_genre'] = df['genres'].apply(lambda x: '[]' if not x else x[0])
# The result is single bracket '[' in each row. Seems like index=0 refers to first character of a string, not first array element.

Why these approaches doesn't work and how to make it work out?

2 Answers 2

2

Another way to do it:

import json
df["top_genre"]=df["genres"].apply(lambda x: None if x == '[]' else json.loads(x)[0])
Sign up to request clarification or add additional context in comments.

Comments

1

Your genres column seems to not actually be a list, but instead, a string that contains a list such as "['a', 'b']". You will have to use eval on the string to convert each row into a list object again, but for safety reasons, its better to use ast.literal_eval

Using Converter during reading the dataset

One way is to apply a converter while loading the dataset itself using the converters parameter. The advantage of this method is that you can do multiple transformations and typecasting using a single dictionary, which can apply on a large number of similar files at once, if needed.

from ast import literal_eval

df = pd.read_csv('/path_do_data/artists.csv', 
                 converters={'genres': literal_eval})
df['genres'].str[0]

0                        NaN
1                        NaN
2                        NaN
3                        NaN
4                        NaN
                 ...        
1104344                  NaN
1104345    deep acoustic pop
1104346                  NaN
1104347                  NaN
1104348                  NaN

Using apply method on a column

Another way to solve this is to just convert the string using literal_eval. This step needs multiple lines of code to overwrite existing columns but works as well, just a bit redundant in my opinion.

from ast import literal_eval

df = pd.read_csv('/path_do_data/artists.csv')
df['genres'] = df['genres'].apply(literal_eval)
df['genres'].str[0]
0                        NaN
1                        NaN
2                        NaN
3                        NaN
4                        NaN
                 ...        
1104344                  NaN
1104345    deep acoustic pop
1104346                  NaN
1104347                  NaN

6 Comments

Glad to help, feel free to mark it if it was helpful!
One more thing - even if I use this solution I still can't extract first element easily. E.g. deep acoustic pop - element deep can't be retrieved using df['genres'].str[0]. I would have to use more complex function to do this. How can I convert genres column to list of arrays?
the "deep acoustic pop" is the first genre in the list of genres present in the data. for that row ['deep acoustic pop', 'mississippi indie'] so the code is working as expected. i double-checked. so you can just use df['genres'].str[0] to get first element / genre from the list of genres in each row
do try and let me know if any issues.
You can simply chain the str methods as such df['genres'].str[0].str.split().str[0]. Avoid using apply as its very slow compared to the vectorized str methods. This will work on the rows that have some text, but may fail on nan values. So you may need to do the df['genres'].str[0] first, then fill nan values, and then try the df['genres'].str.split().str[0] again. I am afk right now, so give me some time to share a solution.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.