pandas python convert some columns into rows

Question

Attempting to convert a single dataframe column into a row. I scrape a website with this following code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = 'https://www.kemendag.go.id/id'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

table =soup.select('table')[1]
columns = table.find_all('thead')
column_ = []
for column in columns:
    cols = column.find_all('th')
    cols = [item.text.strip() for item in cols]
    column_.append([item for item in cols if item])
rows = table.find_all('tr')
output = []
for row in rows:
    x = row.find_all('td')
    x = [item.text.strip() for item in x]
    output.append([item for item in x if item])
df = pd.DataFrame(output, columns=column_)

and this is the outputs looks like:

    Tahun   Jan     Feb     Mar     Apr     Mei     Jun     Jul     Ags     Sep     Okt     Nov  
0   2020    0.39    0.28    0.10    0.08    0.07    0.18    -0.10   -0.05   -0.05   0.07    0.28
1   2021    0.26    0.10    0.08    0.13    0.32    -0.16   0.08    0.00    0.00    0.00    0.00

What I would like is for it to look like:

Tahun Month Value
2020  Jan   0.39
2020  Feb   0.28
2020  Mar   0.10
2020  Apr   0.08
2020  Mei   0.07
2020  Jun   0.18
2020  Jul   -0.10
2020  Ags   -0.05
2020  Sep   -0.05
2020  Okt   0.07
2020  Nov   0.28
2021  Jan   0.26
2021  Feb   0.10
2021  Mar   0.08
2021  Apr   0.13
2021  Mei   0.32
2021  Jun   -0.16
2021  Jul   0.08
2021  Ags   0.00
2021  Sep   0.00
2021  Okt   0.00
2021  Nov   0.00

The problem is I've tried

df.melt(id_vars=["Tahun"], 
        var_name="Month", 
        value_name="Value")

but it got error TypeError: only integer scalar arrays can be converted to a scalar index, any idea? thanks I have tried this too

print(
    df.set_index(["Tahun"])
    .stack()
    .reset_index(name="Value")
    .rename(columns={"level_2": "Month"})
    .sort_values("Month")
    .reset_index(drop=True)
)

and got same error

but it got error - what is error?

jezrael
– jezrael

2021-09-14 07:20:20 +00:00
Commented Sep 14, 2021 at 7:20 — jezrael
– jezrael, Commented Sep 14, 2021 at 7:20

jezrael · Accepted Answer · 2021-09-14 07:29:01Z

1

There is problem in columns is MultiIndex:

print (df.columns)
MultiIndex([('Tahun',),
            (  'Jan',),
            (  'Feb',),
            (  'Mar',),
            (  'Apr',),
            (  'Mei',),
            (  'Jun',),
            (  'Jul',),
            (  'Ags',),
            (  'Sep',),
            (  'Okt',),
            (  'Nov',),
            (  'Des',)],
           )

Possible solution:

df = pd.DataFrame(output, columns=column_[0])

Or:

column_ = []
for column in columns:
    cols = column.find_all('th')
    cols = [item.text.strip() for item in cols]
    column_.extend([item for item in cols if item])
...

df = pd.DataFrame(output, columns=column_)

Last solution with melt working well.

answered Sep 14, 2021 at 7:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ohai Over a year ago

thanks for answering, would you like to help to answer on stackoverflow.com/questions/69023522/selecting-the-dataframe

haneulkim · Accepted Answer · 2021-09-14 08:03:29Z

0

It's pretty dirty but it gets the job done...

df_1 = pd.DataFrame(np.repeat(df.iloc[:, :1].values, 12, axis=0), columns=["Tahun"])
df_2 = (df.iloc[0][1:].append(df.iloc[1][1:])
                      .to_frame()
                      .reset_index()
                      .rename(columns={"level_0":"Month", 0:"Value"}))
final_df = pd.concat([df_1, df_2], axis=1)

answered Sep 14, 2021 at 8:03

haneulkim

5,01812 gold badges54 silver badges106 bronze badges

Collectives™ on Stack Overflow

pandas python convert some columns into rows

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related