2

I have a Pandas dataframe where the columns are 'month' and 'year', and a 'value_list' which is a list of values - one for each day of the month. Something like this -

year month value_list
1990 2 [10, 20, 30, 40, ...... 290, 300]
1990 3 [110, 120, 130, 140, ...... 390, 400]

I want to unpack these into multiple rows - one for each day (new column), so that I get something like the following -

year month day value
1990 2 1 10
1990 2 2 20
1990 2 3 30
1990 2 4 40

and so on.

I tried using df.explode() but to no avail since the index gets reset or is set to a new single index. How can I automatically get the dates (essentially creating a year, month, date multiindex) while unpacking the lists?

3
  • 1
    df.explode('value_list') should work if df['value_list'] is a true list. If not working, try df['value_list']=df['value_list'].str.strip('[]').str.split(',') followed by df.explode('value_list') Commented Apr 9, 2021 at 2:19
  • Apologies if I wasn't very clear. df.explode does help with exploding/unpacking, but the second part where all the exploded rows have a different value in a new column (day) is what I am unable to get. What I have is a true list, but I see your point - thank you! Commented Apr 9, 2021 at 2:33
  • Ok, fine you clarified and there is a solution already. Keep coding Commented Apr 9, 2021 at 2:34

1 Answer 1

3

After you explode, you can create the day sequences using groupby-transform with cumcount:

df = df.explode('value_list').rename(columns={'value_list': 'value'})
df['day'] = df.groupby(['year', 'month']).transform('cumcount').add(1)

#    year  month value  day
# 0  1990      2    10    1
# 0  1990      2    20    2
# 0  1990      2    30    3
# 0  1990      2    40    4
# 0  1990      2   290    5
# 0  1990      2   300    6
# 1  1990      3   110    1
# 1  1990      3   120    2
# 1  1990      3   130    3
# 1  1990      3   140    4
# 1  1990      3   390    5
# 1  1990      3   400    6

Also as @wwnde commented, if value_list doesn't contain true lists but just strings that look like lists, convert them to lists before exploding:

df.value_list = df.value_list.str.strip('[]').str.split(r'\s*,\s*')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.