1

Attempting to convert a single dataframe column into a row. I scrape a website with this following code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = 'https://www.kemendag.go.id/id'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

table =soup.select('table')[1]
columns = table.find_all('thead')
column_ = []
for column in columns:
    cols = column.find_all('th')
    cols = [item.text.strip() for item in cols]
    column_.append([item for item in cols if item])
rows = table.find_all('tr')
output = []
for row in rows:
    x = row.find_all('td')
    x = [item.text.strip() for item in x]
    output.append([item for item in x if item])
df = pd.DataFrame(output, columns=column_)

and this is the outputs looks like:

    Tahun   Jan     Feb     Mar     Apr     Mei     Jun     Jul     Ags     Sep     Okt     Nov  
0   2020    0.39    0.28    0.10    0.08    0.07    0.18    -0.10   -0.05   -0.05   0.07    0.28
1   2021    0.26    0.10    0.08    0.13    0.32    -0.16   0.08    0.00    0.00    0.00    0.00    

What I would like is for it to look like:

Tahun Month Value
2020  Jan   0.39
2020  Feb   0.28
2020  Mar   0.10
2020  Apr   0.08
2020  Mei   0.07
2020  Jun   0.18
2020  Jul   -0.10
2020  Ags   -0.05
2020  Sep   -0.05
2020  Okt   0.07
2020  Nov   0.28
2021  Jan   0.26
2021  Feb   0.10
2021  Mar   0.08
2021  Apr   0.13
2021  Mei   0.32
2021  Jun   -0.16
2021  Jul   0.08
2021  Ags   0.00
2021  Sep   0.00
2021  Okt   0.00
2021  Nov   0.00

The problem is I've tried

df.melt(id_vars=["Tahun"], 
        var_name="Month", 
        value_name="Value")

but it got error TypeError: only integer scalar arrays can be converted to a scalar index, any idea? thanks I have tried this too

print(
    df.set_index(["Tahun"])
    .stack()
    .reset_index(name="Value")
    .rename(columns={"level_2": "Month"})
    .sort_values("Month")
    .reset_index(drop=True)
)

and got same error

1
  • but it got error - what is error? Commented Sep 14, 2021 at 7:20

2 Answers 2

1

There is problem in columns is MultiIndex:

print (df.columns)
MultiIndex([('Tahun',),
            (  'Jan',),
            (  'Feb',),
            (  'Mar',),
            (  'Apr',),
            (  'Mei',),
            (  'Jun',),
            (  'Jul',),
            (  'Ags',),
            (  'Sep',),
            (  'Okt',),
            (  'Nov',),
            (  'Des',)],
           )

Possible solution:

df = pd.DataFrame(output, columns=column_[0])

Or:

column_ = []
for column in columns:
    cols = column.find_all('th')
    cols = [item.text.strip() for item in cols]
    column_.extend([item for item in cols if item])
...

df = pd.DataFrame(output, columns=column_)

Last solution with melt working well.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks for answering, would you like to help to answer on stackoverflow.com/questions/69023522/selecting-the-dataframe
0

It's pretty dirty but it gets the job done...

df_1 = pd.DataFrame(np.repeat(df.iloc[:, :1].values, 12, axis=0), columns=["Tahun"])
df_2 = (df.iloc[0][1:].append(df.iloc[1][1:])
                      .to_frame()
                      .reset_index()
                      .rename(columns={"level_0":"Month", 0:"Value"}))
final_df = pd.concat([df_1, df_2], axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.