0

I am working with an API to pull back data using python. My functions work fine but I feel like I am repeating myself over and over again and there is probably something I should be doing to make this more efficient.

What each one does is gets the number of results then hits the api back up to bring back the exact number of records.

First function:

def get_categories():
    headers = {"Authorization": "Bearer " + access_token} # auth plus token
    response = requests.get("https://api.destination.com/categories", headers=headers) # response
    data = json.loads(response.text) # load the json data
    records = str(data['totalResults']) # get number of results for next call
    response = requests.get("https://api.destination.com/categories?$skip=0&$top="+records, headers=headers)
    all_data = json.loads(response.text) # load the json data
    list_of_dict = all_data['resources'] # get rid of all but lists of dictionaries
    df = pd.DataFrame.from_records(list_of_dict) # create dataframe
    df['links'] = df['links'].str[0].str['href'] # just grab the links(key) items
    return df # return the final dataframe

Second function:

def get_groups():
    headers = {"Authorization": "Bearer " + access_token} # auth plus token
    response = requests.get("https://api.destination.com/groups", headers=headers) # response
    data = json.loads(response.text) # load the json data
    records = str(data['totalResults']) # get number of results
    response = requests.get("https://api.destination.com/groups?$skip=0&$top="+records, headers=headers)
    all_data = json.loads(response.text) # load the json data
    list_of_dict = all_data['resources']  # get rid of all but lists of dictionaries
    df = pd.DataFrame.from_records(list_of_dict) # create dataframe
    df['links'] = df['links'].str[0].str['href'] # just grab the links(key) items
    return df # return the final dataframe

And 3 more functions like users that do the same thing. The only difference between them as you can see is the getlike https://api.destination.com/categories vs https://api.destination.com/groups and the number of records returned for each will be different. Is there a way to combine these and call it a certain way?

2 Answers 2

4

Looks like you already know how to make functions, just extend it one step further to abstract away everything that is common amongst the functions.

BASE_URL = "https://api.destination.com/{}"

def make_headers():
    headers = {"Authorization": "Bearer " + access_token}
    return headers

def make_params(recs):
    params = {'$skip': 0, '$top': recs}
    return params

def make_df(data):
    list_of_dict = data['resources']
    df = pd.DataFrame.from_records(list_of_dict)
    df['links'] = df['links'].str[0].str['href']
    return df

def process(process):
    headers = make_headers()
    url = BASE_URL.format(process)
    resp = requests.get(url, headers=headers)
    data = resp.json()
    records = data['totalResults']

    params = make_params(records)
    resp = requests.get(url, headers=headers, params=params)
    all_data = resp.json()
    return make_df(all_data)

Then you can call it like the following:

process('groups')
process('categories')

You can break it up further, but you get the idea.

Sign up to request clarification or add additional context in comments.

2 Comments

Was getting a TypeError: Can't convert 'int' object to str implicitly but just changed it to records = str(data['totalResults']) and it works like a charm @gold_cy
My most recent edit should get around that since we build the payload in it's own function and then let requests handle all the escaping on its own. glad it helps you, cheers.
1

You can just add a parameter to this function. As an example:

def get_categories():
    headers = {"Authorization": "Bearer " + access_token} # auth plus token
    response = requests.get("https://api.destination.com/categories", headers=headers) # response
    data = json.loads(response.text) # load the json data
    records = str(data['totalResults']) # get number of results for next call
    response = requests.get("https://api.destination.com/categories?$skip=0&$top="+records, headers=headers)
    all_data = json.loads(response.text) # load the json data
    list_of_dict = all_data['resources'] # get rid of all but lists of dictionaries
    df = pd.DataFrame.from_records(list_of_dict) # create dataframe
    df['links'] = df['links'].str[0].str['href'] # just grab the links(key) items
    return df # return the final dataframe

You can just refactor to:

def get_elements(element):
    if element is None:
        return 'not found' #defaults to 404 error.
    headers = {"Authorization": "Bearer " + access_token} # auth plus token
    response = requests.get("https://api.destination.com/{}".format(element), headers=headers) # response
    data = json.loads(response.text) # load the json data
    records = str(data['totalResults']) # get number of results for next call
    response = requests.get("https://api.destination.com/{}?$skip=0&$top={}".format(element,records), headers=headers)
    all_data = json.loads(response.text) # load the json data
    list_of_dict = all_data['resources'] # get rid of all but lists of dictionaries
    df = pd.DataFrame.from_records(list_of_dict) # create dataframe
    df['links'] = df['links'].str[0].str['href'] # just grab the links(key) items
    return df # return the final dataframe

2 Comments

if element == None: return 'not found'? Firstly, never do == None, always do is None. Next, shouldn't that be an exception? But if you're checking for an invalid element, why do you only check for None?
Sorry, already corrected that, just did the code by heart, do not tested in terminal.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.