Python: Scraping .json, feeding into pandas DataFrame and output as excel

Question

Good morning/afternoon/evening to everyone reading this.

So as you will see from the snippet below I am not a programmer and I'm really struggling to make this code give me the desired output.
Quick rundown:

Trying to get detailed company information from a stock exchange
I'm calling an API to retrieve the data based on a pandas dataframe that contains the company codes
I then use these codes to call each individual firms webpage and save them to a list
I then tried to use .get with the appropriate keyword such as ('direktur') to save the relevant information to a second list (fdir). That gets fed into another pandas Dataframe and then output as excel using df.to_excel()

Now what I'm hoping to get for the output excel files would look something like this:

[ [1]

However what I'm currently getting as output looks like this:

This is what fdir[0] returns:

  [{'Nama': 'Santosa', 'Jabatan': 'PRESIDEN DIREKTUR', 'Afiliasi': False}, {'Nama': 'Joko Supriyono', 'Jabatan': 'WAKIL PRESIDEN DIREKTUR', 'Afiliasi': False}, {'Nama': 'M. Hadi Sugeng Wahyudiono', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Bambang Wijanarko', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Rujito Purnomo', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Handoko Pranoto', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Mario Casimirus Surung Gultom', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}]

So if anyone has an idea of how to get from what I have to the intended output I would be very grateful! Thanks for taking the time, complete code is below:

import requests
import pandas as pd ; import xlsxwriter
import json
import time
# gets broad data of main page of the stock exchange
sxov = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')

data = sxov.json() # save the request as .json file
df = pd.DataFrame.from_dict(data['data']) #creates DataFrame based on the data (.json) file
# removes unecessary columns from df
df.drop(["BAE", "DataID",   "Divisi",   "EfekEmiten_EBA",   "EfekEmiten_ETF",   "EfekEmiten_Obligasi",  "EfekEmiten_SPEI",  "EfekEmiten_Saham",
"Fax",  "JenisEmiten", "KodeDivisi", "Logo",    "NPKP",     "NPWP", "PapanPencatatan", "Status", "TanggalPencatatan", "id"
], axis=1, inplace=True)

cdate = time.strftime ("%Y%m%d") # creating string-variable w/ current date year|month|day
df.to_excel(f"{cdate}StockExchange_Overview.xlsx") # outputs DataFrame as Excel file

list_of_json = []
for nested_json in data['data'] :
    list_of_json.append(requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten='+nested_json['KodeEmiten']).json())

#create empty lists for directors
fdir = []

for i in range (len(list_of_json)) :
    fdir.append(list_of_json[i].get('Direktur'))
    i += 1


print (fdir[0])
# create writer object that converts DataFrame to {currentdate}Firm Details.xlsx
writer = pd.ExcelWriter(f'{cdate}Firm Details.xlsx', engine = 'xlsxwriter')

#creating the dataframes
dfdir = pd.DataFrame([fdir])
dfdir = (dfdir.T)

dfdir.to_excel(writer, sheet_name = 'Directors')

writer.save()

B Troy · Accepted Answer · 2019-02-28 02:47:11Z

1

I suspect that json_normalize will help. It'll transfer your json file to a flat table structure.
documentation_link

eg:

from pandas.io.json import json_normalize
with open('example_1.json') as data_file:    
    d = json.load(data_file)
    df = json_normalize(d)

answered Feb 28, 2019 at 2:47

B Troy

1911 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nick Over a year ago

@B Troy, Gonna accept your answer as it does work for a single entry (fdir[0]), so thank you for that!. However if i feed all items in the list into the .json file it doesn't work anymore because json.normalize(d) only sees the list objects. Any idea on how to circumenvent that without creating tons of individual files?

Collectives™ on Stack Overflow

Python: Scraping .json, feeding into pandas DataFrame and output as excel

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related