0

I have the following json that I extracted using request with python and json.loads. The whole json basically repeats itself with changes in the ID and names. It has a lot of information but I`m just posting a small sample as an example:

"status":"OK",
   "statuscode":200,
   "message":"success",
   "apps":[
      {
         "id":"675832210",
         "title":"AGED",
         "desc":"No annoying ads&easy to play",
         "urlImg":"https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
         "urlImgWide":"https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
         "urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
         "androidPackage":"com.agedstudio.freecell",
         "revenueType":"cpi",
         "revenueRate":"0.10",
         "categories":"Card",
         "idx":"2",
         "country":[
            "CH"
         ],
         "cityInclude":[
            "ALL"
         ],
         "cityExclude":[
            
         ],
         "targetedOSver":"ALL",
         "targetedDevices":"ALL",
         "bannerId":"675832210",
         "campaignId":"495181210",
         "campaignType":"network",
         "supportedVersion":"",
         "storeRating":"4.3",
         "storeDownloads":"10000+",
         "appSize":"34603008",
         "urlVideo":"",
         "urlVideoHigh":"",
         "urlVideo30Sec":"https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
         "urlVideo30SecHigh":"https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
         "offerId":"5825774"
      },

I dont need all that data, just a few like 'title', 'country', 'revenuerate' and 'urlApp' but I dont know if there is a way to extract only that. My solution so far was to make the json a dataframe and then drop the columns, however, I wanted to find an easier solution.

My ideal final result would be to have a dataframe with selected keys and arrays Does anybody know an easy solution for this problem?

Thanks

1
  • id = your_json['apps'][0]['id'] where your_json is your parsed json using json.loads, and then you access to attributes using the square brackets (note the [0] index since 'apps' is an array) Commented May 28, 2021 at 21:35

3 Answers 3

1

I assume you have that data as a dictionary, let's call it json_data. You can just iterate over the apps and write them into a list. Alternatively, you could obviously also define a class and initialize objects of that class.

EDIT: I just found this answer: https://stackoverflow.com/a/20638258/6180150, which tells how you can convert a list of dicts like from my sample code into a dataframe. See below adaptions to the code for a solution.

json_data = {
    "status": "OK",
    "statuscode": 200,
    "message": "success",
    "apps": [
        {
            "id": "675832210",
            "title": "AGED",
            "desc": "No annoying ads&easy to play",
            "urlImg": "https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
            "urlImgWide": "https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
            "urlApp": "https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
            "androidPackage": "com.agedstudio.freecell",
            "revenueType": "cpi",
            "revenueRate": "0.10",
            "categories": "Card",
            "idx": "2",
            "country": [
                "CH"
            ],
            "cityInclude": [
                "ALL"
            ],
            "cityExclude": [

            ],
            "targetedOSver": "ALL",
            "targetedDevices": "ALL",
            "bannerId": "675832210",
            "campaignId": "495181210",
            "campaignType": "network",
            "supportedVersion": "",
            "storeRating": "4.3",
            "storeDownloads": "10000+",
            "appSize": "34603008",
            "urlVideo": "",
            "urlVideoHigh": "",
            "urlVideo30Sec": "https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
            "urlVideo30SecHigh": "https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
            "offerId": "5825774"
        },
    ]
}

filtered_data = []
for app in json_data["apps"]:
    app_data = {
        "id": app["id"],
        "title": app["title"],
        "country": app["country"],
        "revenueRate": app["revenueRate"],
        "urlApp": app["urlApp"],
    }
    filtered_data.append(app_data)

print(filtered_data)

# Output
d = [
    {
        'id': '675832210',
        'title': 'AGED',
        'country': ['CH'],
        'revenueRate': '0.10',
        'urlApp': 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='
    }
]

d = pd.DataFrame(filtered_data)
print(d)

# Output
          id title country revenueRate urlApp
0  675832210  AGED    [CH]        0.10 https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=
Sign up to request clarification or add additional context in comments.

4 Comments

thanks for your reply, that is very helpful but how can I put the result in a dataframe? Cause the result is no longer a json, is a text
Sorry I'm not familiar with dataframes. Maybe there is a way to transform lists to dataframes, then you can just forward the list. I hope you find a good solution, maybe my idea helps :)
@MarianaRodrigues I've just updated my answer with a way to transform the list of dicts into a dataframe after filtering out the desired fields.
Awesome, thank you so much. In the case where the json is nested, like this: "payouts":{ "total":2, "entries":[ { "network_offer_payout_revenue_id":3217, "custom_payout_overwrite":false, "network_custom_payout_revenue_setting_id":0} How that for app... would work in this case?
0

if your endgame is dataframe, just load the dataframe and take the columns you want:

setting the json to data

df = pd.json_normalize(data['apps'])

yields

          id title                          desc                                             urlImg  ... urlVideoHigh                                      urlVideo30Sec                                  urlVideo30SecHigh  offerId
0  675832210  AGED  No annoying ads&easy to play  https://test.com/pImg.aspx?b=675832&z=1041813&...  ...               https://cdn.test.com/banner/video/video-675832...  https://cdn.test.com/banner/video/video-675832...  5825774

[1 rows x 28 columns]

then if you want certain columns:

df_final = df[['title', 'desc', 'urlImg']]

  title                          desc                                             urlImg
0  AGED  No annoying ads&easy to play  https://test.com/pImg.aspx?b=675832&z=1041813&...

2 Comments

Yes, that is what Im doing already, but Im looking for a solution to do before creating the dataframe. I have several Json structures similar to this one when I have to join all together so if I have a way to select only what is necessary before creating a df it would be easier
why not collect all the jsons into a list, then read that into dataframe? or create individual dataframes and concatenate them? It's not necessarily easier to what you are asking. the answer by schilli is the approach your asking for but you still aren't getting what you want. you are going to have to do some sort of transformations or collections of the individual json structures.
0

use a dictionary comprehension to extract a dictionary of key/value pairs you want

import json

json_string="""{
     "id":"675832210",
     "title":"AGED",
     "desc":"No annoying ads&easy to play",
     "urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
     "revenueRate":"0.10",
     "categories":"Card",
     "idx":"2",
     "country":[
        "CH"
     ],
     "cityInclude":[
        "ALL"
     ],
     "cityExclude":[
        
     ]
  }"""

 json_dict = json.loads(json_string)

 filter_fields=['title','country','revenueRate','urlApp']    

 dict_result = { key: json_dict[key]  for key in json_dict if key in filter_fields}

 json_elements = []
 for key in dict_result:
     json_elements.append((key,json_dict[key]))

 print(json_elements)

output:

 [('title', 'AGED'), ('urlApp', 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='), ('revenueRate', '0.10'), ('country', ['CH'])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.