1

I am trying to access data using web-scraping and making it into a data frame using pandas. With the following code, I am already able to get the data frame. I want to combine all the data frames with append into one large data frame.

import requests
import re
import pandas as pd
from urllib.parse import unquote
from json import loads
from bs4 import BeautifulSoup

# Download URL
url = "https://riwayat-file-covid-19-dki-jakarta-jakartagis.hub.arcgis.com/"
req = requests.get(url)

# Get encoded JSON from HTML source
encoded_data = re.search("window\.__SITE=\"(.*)\"", req.text).groups()[0]

# Decode and load as dictionary
json_data = loads(unquote(encoded_data))

# Get the HTML source code for the links
html_src = json_data["site"]["data"]["values"]["layout"]["sections"][1]["rows"][0]["cards"][0]["component"]["settings"]["markdown"]

# Parse it using BeautifulSoup
soup = BeautifulSoup(html_src, 'html.parser')

# Get links
links = soup.find_all('a')

# For each link...
link_list = []
id_list = []
date_list = []
dataframe_csv = []

for link in links:
    if "2021" in link.text:
       link_list.append(link.text+" - "+link.attrs['href'])

link_list.remove("31 Januari 2021 Pukul 10.00 - https://drive.google.com/file/d/1vd1tToQbx3A420KMDA63aKviLjgGPJMd/view?usp=sharing")

for i in link_list:
    id_list.append(i.split("/")[5])
    date_list.append(i.split("/")[0][:-21])
    
for ID in id_list:
    dataframe_csv.append("https://docs.google.com/spreadsheets/d/"+ID+"/export?format=csv")

I want to combine all the data frames that I have by using a loop. For every loop, I want to remove the index 0 row and add a new column which is Date. The code is as follows:

date_num = 0
df_total = pd.DataFrame()

for i in dataframe_csv:
    df = pd.read_csv(i)
    df = df.drop(index=df.index[0], axis=0, inplace=True)
    df = df.assign(Date = date_list[date_num])
    
    date_num += 1
    
    df_total.append(df,ignore_index=True)

The problem is, I get an error like this:

AttributeError                            Traceback (most recent call last)
<ipython-input-11-ef67f0a87a8e> in <module>
      5     df = pd.read_csv(i)
      6     df = df.drop(index=df.index[0], axis=0, inplace=True)
----> 7     df = df.assign(Date = date_list[date_num])
      8 
      9     date_num += 1

AttributeError: 'NoneType' object has no attribute 'assign'

1 Answer 1

2

inplace=True modifies the dataframe directly, so either remove it:

date_num = 0
df_total = pd.DataFrame()

for i in dataframe_csv:
    df = pd.read_csv(i)
    df = df.drop(index=df.index[0], axis=0)
    df = df.assign(Date = date_list[date_num])
    
    date_num += 1
    
    df_total.append(df,ignore_index=True)

Or not assign it back:

date_num = 0
df_total = pd.DataFrame()

for i in dataframe_csv:
    df = pd.read_csv(i)
    df.drop(index=df.index[0], axis=0, inplace=True)
    df = df.assign(Date = date_list[date_num])
    
    date_num += 1
    
    df_total.append(df,ignore_index=True)

As mentioned in the documentation of drop:

inplace : bool, default False
     If False, return a copy. Otherwise, do operation inplace and return None.

Sign up to request clarification or add additional context in comments.

4 Comments

+Docs for reference: pandas.pydata.org/pandas-docs/stable/reference/api/… "inplacebool, default False If False, return a copy. Otherwise, do operation inplace and return None." :)
@h4z3 Added it in :)
I tried deleting "inplace = True", and now there is no more error. But, the data frame "df_total" has no value inside it. I tried df_total.shape and it returns (0,0) meaning the dataframe is still empty.
@ChristianEvanBudiawan That's problems with other stuff in your code, I solved your current problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.