0

Good morning/afternoon/evening to everyone reading this.

So as you will see from the snippet below I am not a programmer and I'm really struggling to make this code give me the desired output.
Quick rundown:

  • Trying to get detailed company information from a stock exchange
  • I'm calling an API to retrieve the data based on a pandas dataframe that contains the company codes
  • I then use these codes to call each individual firms webpage and save them to a list

  • I then tried to use .get with the appropriate keyword such as ('direktur') to save the relevant information to a second list (fdir). That gets fed into another pandas Dataframe and then output as excel using df.to_excel()

Now what I'm hoping to get for the output excel files would look something like this:

[[1]

However what I'm currently getting as output looks like this:

enter image description here

This is what fdir[0] returns:

  [{'Nama': 'Santosa', 'Jabatan': 'PRESIDEN DIREKTUR', 'Afiliasi': False}, {'Nama': 'Joko Supriyono', 'Jabatan': 'WAKIL PRESIDEN DIREKTUR', 'Afiliasi': False}, {'Nama': 'M. Hadi Sugeng Wahyudiono', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Bambang Wijanarko', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Rujito Purnomo', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Handoko Pranoto', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}, {'Nama': 'Mario Casimirus Surung Gultom', 'Jabatan': 'DIREKTUR', 'Afiliasi': False}]

So if anyone has an idea of how to get from what I have to the intended output I would be very grateful! Thanks for taking the time, complete code is below:

import requests
import pandas as pd ; import xlsxwriter
import json
import time
# gets broad data of main page of the stock exchange
sxov = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')

data = sxov.json() # save the request as .json file
df = pd.DataFrame.from_dict(data['data']) #creates DataFrame based on the data (.json) file
# removes unecessary columns from df
df.drop(["BAE", "DataID",   "Divisi",   "EfekEmiten_EBA",   "EfekEmiten_ETF",   "EfekEmiten_Obligasi",  "EfekEmiten_SPEI",  "EfekEmiten_Saham",
"Fax",  "JenisEmiten", "KodeDivisi", "Logo",    "NPKP",     "NPWP", "PapanPencatatan", "Status", "TanggalPencatatan", "id"
], axis=1, inplace=True)

cdate = time.strftime ("%Y%m%d") # creating string-variable w/ current date year|month|day
df.to_excel(f"{cdate}StockExchange_Overview.xlsx") # outputs DataFrame as Excel file

list_of_json = []
for nested_json in data['data'] :
    list_of_json.append(requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten='+nested_json['KodeEmiten']).json())

#create empty lists for directors
fdir = []

for i in range (len(list_of_json)) :
    fdir.append(list_of_json[i].get('Direktur'))
    i += 1


print (fdir[0])
# create writer object that converts DataFrame to {currentdate}Firm Details.xlsx
writer = pd.ExcelWriter(f'{cdate}Firm Details.xlsx', engine = 'xlsxwriter')

#creating the dataframes
dfdir = pd.DataFrame([fdir])
dfdir = (dfdir.T)

dfdir.to_excel(writer, sheet_name = 'Directors')

writer.save()

1 Answer 1

1

I suspect that json_normalize will help. It'll transfer your json file to a flat table structure.
documentation_link

eg:

from pandas.io.json import json_normalize
with open('example_1.json') as data_file:    
    d = json.load(data_file)
    df = json_normalize(d)
Sign up to request clarification or add additional context in comments.

1 Comment

@B Troy, Gonna accept your answer as it does work for a single entry (fdir[0]), so thank you for that!. However if i feed all items in the list into the .json file it doesn't work anymore because json.normalize(d) only sees the list objects. Any idea on how to circumenvent that without creating tons of individual files?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.