0

Can I create multiple dataframes in a loop?

I have a long list of webscraped information but I want to turn them into multiple dataframes. Not sure if this is possible....

Below is my original webscraped code:

indicator =  {'SI.POV.GINI?date=2000:2020','SL.UEM.TOTL.ZS?date=2000:2020','NE.IMP.GNFS.ZS?date=2000:2020','NE.EXP.GNFS.ZS?date=2000:2020'}

url_list = []
for i in indicator:
    url = "http://api.worldbank.org/v2/countries/all/indicators/%s&format=json&per_page=5000" % i
    url_list.append(url)

result_list = []
for i in url_list:
    response = requests.get(i)
    print(response)
    result_list.append(response.content)

result_json = []
for i in range(len(result_list)):
    result_json.append(json.loads(result_list[i]))

result_json

If not, I've also opted to do it manually but i'm getting an error

gini_df = pd.DataFrame.from_dict(result_json[0])
gini_df

AttributeError: 'list' object has no attribute 'keys'

1

1 Answer 1

2

Creating multiple dataframes in a loop is straightforward You can append your dataframes to a list or store them in a dictionary under specific keys. Here's some example code:

import numpy as np
import pandas as pd

df_list = []
for i in range(10):
    df = pd.DataFrame(np.random.rand(3,3), columns=['a', 'b', 'c'])
    df_list.append(df)

print(df_list[5])

a         b         c
0  0.361910  0.521254  0.763633
1  0.030419  0.098978  0.929679
2  0.304616  0.563361  0.326490

For your task, the issue is parsing and flattening a hierarchical data structure into a format pandas can understand. For example:

indicator = {'SI.POV.GINI?date=2000:2020'}
url_list = []
for i in indicator:
    url = "http://api.worldbank.org/v2/countries/all/indicators/%s&format=json&per_page=5000" % i
    url_list.append(url)

result_list = []
for i in url_list:
    response = requests.get(i)
    print(response)
    result_list.append(response.content)

result_json = []
for i in range(len(result_list)):
    result_json.append(json.loads(result_list[i]))

columns = ['indicator_id', 'indicator_value','country_id', 'country_value','countryiso3code', 'date', 'value', 'unit', 'obs_status', 'decimal']
data = {}
for i in columns:
    data[i] = []

for i in result_json:
    for record in i[1]:
        for k in columns:
            try:
                key = k.split('_')
                val = record[key[0]]
                if type(val) == dict:
                    data[k].append(val[key[1]])
                else:
                    data[k].append(val)
            except:
                data[k].append('')
df = pandas.DataFrame(data)
print(df)

Note that my example code only runs on one of your indicators. If you wanted to loop each indicator, you would append the final df to a list as I did in the upper example.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks! I've just edited my question. will your comment still apply?
Your error is unrelated to your question. You are getting an error because the json data is not in a format the to_dict method can understand.
yeah, actually i'm also shocked with the error because when I ran the model for each indicator, it worked fine. the json output was converted to a DataFrame even if the format is a list.
i tried the gini_df = pd.DataFrame(result_json[0]) but the result was 'list' object has no attribute 'keys'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.