1

I have a list below. Could you help me to extract the only highlighted area and have it as a DataFrame?

list_a = ['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'Total', '2019', '', '', '', '919.69', '1043.26', '1158.34', '1245.30', '1112.40', '868.93', '513.33', '432.67', '244.26', '9160.74', '2020', '371.35', '463.13', '722.77', '865.92', '1252.37', '468.15', '', '', '', '', '', '', '4143.69', '', '', '', '', '', '', '', '', '', '', '', '', '', '24572.68', 'Mean value', '383.56', '452.64', '736.38', '915.75', '1112.23', '1186.21', '1266.01', '1101.05', '786.72', '568.93', '412.19', '276.59', '9198.28', 'Year portion', '4.17%', '4.92%', '8.01%', '9.96%', '12.09%', '12.90%', '13.76%', '11.97%', '8.55%', '6.19%', '4.48%', '3.01%', '100.00%', 'Yield expectations *', '274.04', '411.06', '668.98', '878.54', '1007.50', '1063.92', '1039.74', '943.02', '741.52', '515.84', '274.04', '241.80', '8060.00']

screenshot

0

2 Answers 2

1

Try:

n=14
df = pd.DataFrame([list_a[i:i + n] for i in range(0, len(list_a), n)]).T
new_header = df.iloc[0] 
df = df[1:] 
df.columns = new_header

                2019    2020        Mean value  Year portion    Yield expectations *
1   January             371.35      383.56          4.17%       274.04
2   February            463.13      452.64          4.92%       411.06
3   March               722.77      736.38          8.01%       668.98
4   April       919.69  865.92      915.75          9.96%       878.54
5   May         1043.26 1252.37     1112.23         12.09%      1007.50
6   June        1158.34 468.15      1186.21         12.90%      1063.92
7   July        1245.30             1266.01         13.76%      1039.74
8   August      1112.40             1101.05         11.97%      943.02
9   September   868.93              786.72          8.55%       741.52
10  October     513.33              568.93          6.19%       515.84
11  November    432.67              412.19          4.48%       274.04
12  December    244.26              276.59          3.01%       241.80
13  Total       9160.74 4143.69     24572.689198.28 100.00%     8060.00

Edit:

df1 = df[:-1]
df1.drop(['', 'Mean value', 'Year portion', 'Yield expectations *'], axis=1, inplace=True)
df1 = df1.unstack().reset_index(name='value')
df1.set_index(pd.to_datetime(df1[0].astype(str)+ '-' + df1['level_1'].astype(str)), inplace=True)
df1.drop([0, 'level_1'], axis=1, inplace = True)

            value
2019-01-01  
2019-02-01  
2019-03-01  
2019-04-01  919.69
2019-05-01  1043.26
2019-06-01  1158.34
2019-07-01  1245.30
2019-08-01  1112.40
2019-09-01  868.93
2019-10-01  513.33
2019-11-01  432.67
2019-12-01  244.26
2020-01-01  371.35
2020-02-01  463.13
2020-03-01  722.77
2020-04-01  865.92
2020-05-01  1252.37
2020-06-01  468.15
2020-07-01  
2020-08-01  
2020-09-01  
2020-10-01  
2020-11-01  
2020-12-01  
Sign up to request clarification or add additional context in comments.

3 Comments

What I wanted was to convert the list to the DataFrame like the right table at the right side of my screenshot, but it resolved first step at least. Thanks!
@yeonhodev I have answered your second question check now.
Wooooow! It works like a magic!! haha You're amazing! Thanks a loooot!
1

It's complicated, but it does what I wanted.

from datetime import date, timedelta
import datetime
from dateutil import relativedelta

count = 0
output = [[], [], [], [], [], [], [], [], [], [], [], [], [], []]

for item in result:
    output[count % 14].append(item)
    count += 1

production_data = pd.DataFrame(output[1:], columns=output[0])

production_data2 = production_data.drop(12)
production_data2.drop(['', 'Mean value', 'Year portion', 'Yield expectations *'], axis=1, inplace=True)

production_data3 = production_data2.transpose()
production_data4 = production_data3.values.tolist()

production_data_list = []
for item in production_data4:
    for element in item:
        production_data_list.append(element)

start_year = list(production_data2.columns.values)[0]
start_date_str = start_year + "-01-01"
start_date = datetime.datetime.strptime(start_date_str, '%Y-%m-%d')

nummonths = len(production_data_list)
date_list = []
for x in range(0, nummonths):
    date_list.append(start_date.date() + relativedelta.relativedelta(months=x))

combined_dict = dict(zip(date_list, production_data_list))
df = pd.DataFrame(combined_dict, index=[0]).transpose()

df.rename(columns={0: "list_a"}, inplace=True)

2 Comments

You can directly use unstack() for this task.
Wow! I'm such a newbie in Python. Thanks so much for your followup comment! I will try it out!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.