Extracting data from a list and converting it from Python List to a DataFrame

Question

I have a list below. Could you help me to extract the only highlighted area and have it as a DataFrame?

list_a = ['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'Total', '2019', '', '', '', '919.69', '1043.26', '1158.34', '1245.30', '1112.40', '868.93', '513.33', '432.67', '244.26', '9160.74', '2020', '371.35', '463.13', '722.77', '865.92', '1252.37', '468.15', '', '', '', '', '', '', '4143.69', '', '', '', '', '', '', '', '', '', '', '', '', '', '24572.68', 'Mean value', '383.56', '452.64', '736.38', '915.75', '1112.23', '1186.21', '1266.01', '1101.05', '786.72', '568.93', '412.19', '276.59', '9198.28', 'Year portion', '4.17%', '4.92%', '8.01%', '9.96%', '12.09%', '12.90%', '13.76%', '11.97%', '8.55%', '6.19%', '4.48%', '3.01%', '100.00%', 'Yield expectations *', '274.04', '411.06', '668.98', '878.54', '1007.50', '1063.92', '1039.74', '943.02', '741.52', '515.84', '274.04', '241.80', '8060.00']

Pygirl · Accepted Answer · 2020-06-16 16:55:18Z

1

Try:

n=14
df = pd.DataFrame([list_a[i:i + n] for i in range(0, len(list_a), n)]).T
new_header = df.iloc[0] 
df = df[1:] 
df.columns = new_header

                2019    2020        Mean value  Year portion    Yield expectations *
1   January             371.35      383.56          4.17%       274.04
2   February            463.13      452.64          4.92%       411.06
3   March               722.77      736.38          8.01%       668.98
4   April       919.69  865.92      915.75          9.96%       878.54
5   May         1043.26 1252.37     1112.23         12.09%      1007.50
6   June        1158.34 468.15      1186.21         12.90%      1063.92
7   July        1245.30             1266.01         13.76%      1039.74
8   August      1112.40             1101.05         11.97%      943.02
9   September   868.93              786.72          8.55%       741.52
10  October     513.33              568.93          6.19%       515.84
11  November    432.67              412.19          4.48%       274.04
12  December    244.26              276.59          3.01%       241.80
13  Total       9160.74 4143.69     24572.689198.28 100.00%     8060.00

Edit:

df1 = df[:-1]
df1.drop(['', 'Mean value', 'Year portion', 'Yield expectations *'], axis=1, inplace=True)
df1 = df1.unstack().reset_index(name='value')
df1.set_index(pd.to_datetime(df1[0].astype(str)+ '-' + df1['level_1'].astype(str)), inplace=True)
df1.drop([0, 'level_1'], axis=1, inplace = True)

            value
2019-01-01  
2019-02-01  
2019-03-01  
2019-04-01  919.69
2019-05-01  1043.26
2019-06-01  1158.34
2019-07-01  1245.30
2019-08-01  1112.40
2019-09-01  868.93
2019-10-01  513.33
2019-11-01  432.67
2019-12-01  244.26
2020-01-01  371.35
2020-02-01  463.13
2020-03-01  722.77
2020-04-01  865.92
2020-05-01  1252.37
2020-06-01  468.15
2020-07-01  
2020-08-01  
2020-09-01  
2020-10-01  
2020-11-01  
2020-12-01

edited Jun 16, 2020 at 16:55

answered Jun 12, 2020 at 20:19

Pygirl

13.4k6 gold badges36 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

yeonhodev Over a year ago

What I wanted was to convert the list to the DataFrame like the right table at the right side of my screenshot, but it resolved first step at least. Thanks!

Pygirl Over a year ago

@yeonhodev I have answered your second question check now.

yeonhodev Over a year ago

Wooooow! It works like a magic!! haha You're amazing! Thanks a loooot!

yeonhodev · Accepted Answer · 2020-06-16 16:17:21Z

1

It's complicated, but it does what I wanted.

from datetime import date, timedelta
import datetime
from dateutil import relativedelta

count = 0
output = [[], [], [], [], [], [], [], [], [], [], [], [], [], []]

for item in result:
    output[count % 14].append(item)
    count += 1

production_data = pd.DataFrame(output[1:], columns=output[0])

production_data2 = production_data.drop(12)
production_data2.drop(['', 'Mean value', 'Year portion', 'Yield expectations *'], axis=1, inplace=True)

production_data3 = production_data2.transpose()
production_data4 = production_data3.values.tolist()

production_data_list = []
for item in production_data4:
    for element in item:
        production_data_list.append(element)

start_year = list(production_data2.columns.values)[0]
start_date_str = start_year + "-01-01"
start_date = datetime.datetime.strptime(start_date_str, '%Y-%m-%d')

nummonths = len(production_data_list)
date_list = []
for x in range(0, nummonths):
    date_list.append(start_date.date() + relativedelta.relativedelta(months=x))

combined_dict = dict(zip(date_list, production_data_list))
df = pd.DataFrame(combined_dict, index=[0]).transpose()

df.rename(columns={0: "list_a"}, inplace=True)

answered Jun 16, 2020 at 16:17

yeonhodev

497 bronze badges

2 Comments

Pygirl Over a year ago

You can directly use unstack() for this task.

yeonhodev Over a year ago

Wow! I'm such a newbie in Python. Thanks so much for your followup comment! I will try it out!

Collectives™ on Stack Overflow

Extracting data from a list and converting it from Python List to a DataFrame

2 Answers 2

Edit:

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Edit:

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related