0

I'm working on JSON data from this API call: https://api.nfz.gov.pl/app-umw-api/agreements?year=2022&branch=01&productCode=01.0010.094.01&page=1&limit=10&format=json&api-version=1.2

This is page 1, but there are 49 pages in total, therefore a part of my code deals (successfully) with pagination. I don't want to save this JSON in a file and, if I can avoid it, don't really want to import the 'json' package - but will do if necessary.

A variation of this code works correctly if I'm pulling entire ['data']['agreements'] dictionary (or is it a list...). But I don't want that, I want individual parameters for all the 'attributes' of each 'agreement'. In my code below I'm trying to pull the 'provider-name' attribute, and would like to get a list of all the provider names, without any other data there.

But I keep getting the "list indices must be integers or slices, not str" error in line 18. I've tried many ways to get this data which is nested within a list nested within a dictionary, etc. like splitting it further into another 'for' loop, but no success.

import requests
import math
import pandas as pd


baseurl = 'https://api.nfz.gov.pl/app-umw-api/agreements?year=2022&branch=01&productCode=01.0010.094.01&page=1&limit=10&format=json&api-version=1.2'

def main_request(baseurl, x):
    r = requests.get(baseurl + f'&page={x}')
    return r.json()

def get_pages(response):
    return math.ceil(response['meta']['count'] / 10)

def get_names(response):
    providerlist = []
    all_data = response['data']['agreements']
    for attributes1 in all_data ['data']['agreements']:
        item = attributes1['attributes']['provider-name']
        providers = {
            'page1': item,
        }

    providerlist.append(providers)
    return providerlist

mainlist = []
data = main_request(baseurl, 1)
for x in range(1,get_pages(data)+1):
    mainlist.extend(get_names(main_request(baseurl, x)))

mydataframe = pd.DataFrame(mainlist)

print(mydataframe)
1
  • The simple solution is that you need to use integers to index lists. If you use something else than an integer and you expected to index something else than a list, you need to figure out why that something is a list and not what you expect it to be. Commented Jan 14, 2023 at 20:54

1 Answer 1

2

To get the data from the Json to the dataframe you can use next example:

import requests
import pandas as pd


api_url = "https://api.nfz.gov.pl/app-umw-api/agreements?year=2022&branch=01&productCode=01.0010.094.01&page={}&limit=10&format=json&api-version=1.2"

all_data = []
for page in range(1, 5): # <-- increase page numbers here
    data = requests.get(api_url.format(page)).json()

    for a in data["data"]["agreements"]:
        all_data.append({"id": a["id"], **a["attributes"], "link": a["links"]['related']})

df = pd.DataFrame(all_data)
print(df.head().to_markdown(index=False))

Prints:

id code technical-code origin-code service-type service-name amount updated-at provider-code provider-nip provider-regon provider-registry-number provider-name provider-place year branch link
75f1b5a0-34d1-d827-8970-89b6b593be86 0113/3202010/01/2022/01 0113/3202010/01/2022/01 0113/3202010/01/2022/01 01 Podstawowa Opieka Zdrowotna 14583.7 2022-07-11T20:04:39 3202010 8851039259 89019398100026 000000001951-W-02 NZOZ PRAKTYKA LEKARZA RODZINNEGO JAN WOLAŃCZYK JEDLINA-ZDRÓJ 2022 01 https://api.nfz.gov.pl/app-umw-api/agreements/75f1b5a0-34d1-d827-8970-89b6b593be86?format=json&api-version=1.2
1840cf6e-10ba-33a1-81f1-9f58c613d705 0113/3302665/01/2022/01 0113/3302665/01/2022/01 0113/3302665/01/2022/01 01 Podstawowa Opieka Zdrowotna 1479 2022-08-03T20:00:22 3302665 9281731555 390737391 000000023969-W-02 NZOZ "MEDICA" PĘCŁAW 2022 01 https://api.nfz.gov.pl/app-umw-api/agreements/1840cf6e-10ba-33a1-81f1-9f58c613d705?format=json&api-version=1.2
954eb365-e232-fd29-10f7-c8af21c07470 0113/3402005/01/2022/01 0113/3402005/01/2022/01 0113/3402005/01/2022/01 01 Podstawowa Opieka Zdrowotna 1936 2022-09-02T20:01:17 3402005 6121368883 23106871400021 000000002014-W-02 PRZYCHODNIA OGÓLNA TSARAKHOV OLEG BOLESŁAWIEC 2022 01 https://api.nfz.gov.pl/app-umw-api/agreements/954eb365-e232-fd29-10f7-c8af21c07470?format=json&api-version=1.2
7dd72607-ab9f-7217-87b9-8e4ed2bc5537 0113/3202025/01/2022/01 0113/3202025/01/2022/01 0113/3202025/01/2022/01 01 Podstawowa Opieka Zdrowotna 0 2022-04-14T20:01:42 3202025 8851557014 891487450 000000002063-W-02 "PRZYCHODNIA LEKARSKA ZDROWIE BIELAK, PIEC I SZYMANIAK SPÓŁKA PARTNERSKA" NOWA RUDA 2022 01 https://api.nfz.gov.pl/app-umw-api/agreements/7dd72607-ab9f-7217-87b9-8e4ed2bc5537?format=json&api-version=1.2
bb60b21d-38da-1f2e-a7fd-5a45453e7370 0113/3102115/01/2022/01 0113/3102115/01/2022/01 0113/3102115/01/2022/01 01 Podstawowa Opieka Zdrowotna 414 2022-10-18T20:01:17 3102115 8941504470 93009444900038 000000001154-W-02 PRAKTYKA LEKARZA RODZINNEGO WALDEMAR CHRYSTOWSKI WROCŁAW 2022 01 https://api.nfz.gov.pl/app-umw-api/agreements/bb60b21d-38da-1f2e-a7fd-5a45453e7370?format=json&api-version=1.2
Sign up to request clarification or add additional context in comments.

3 Comments

Andrej, thank you very much for your help. That's a neat code. I'll learn from this. But I'm still stuck on part of my initial question - what if I don't want all the 'attributes' but only a couple of them, e.g. only 'amount' and 'provider-code'? I'm trying things like all_data.append({**a["attributes"]['amount']}) and that's not working...
@MichaelWiz Then construct the dataframe as I shown in the question. Then you can filter the dataframe for the columns you want. For example df = df[['id', 'code']] will give you dataframe with only two columns.
Great stuff. Haven't thought of that. Works beautifully. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.