pull 2 columns from json file into pandas dataframe

Question

    {"id": 814984317021495298, "date": "2016-12-30", "time": "18:59:37", "timezone": 
    "-0400", "replies_count": 7708, "username": "im_theantitrump"}
    {"id": 814984316195311616, "date": "2016-12-30", "time": "18:59:37", "timezone": 
    "-0400", "replies_count": 25772, "username": "bishyoucray2"}

My json file looks like that. How to create pandas dataframe with "date" and "replies count" without duplicates and in ascending date order? My current code drops one of the headers names and mixing dates sorting. df['date'].value_counts()

What does your expected output look like for these two entries? — Henry Ecker
– Henry Ecker ♦, Commented Jul 1, 2021 at 21:18

Henry Ecker · Accepted Answer · 2021-07-01 21:05:43Z

1

Use pd.read_json with lines=True then select the desired columns:

df = pd.read_json('test.json', lines=True)[['date', 'replies_count']]

df:

        date  replies_count
0 2016-12-30           7708
1 2016-12-30          25772

test.json:

 {"id": 814984317021495298, "date": "2016-12-30", "time": "18:59:37", "timezone": "-0400", "replies_count": 7708, "username": "im_theantitrump"}
 {"id": 814984316195311616, "date": "2016-12-30", "time": "18:59:37", "timezone": "-0400", "replies_count": 25772, "username": "bishyoucray2"}

answered Jul 1, 2021 at 21:05

Henry Ecker♦

35.9k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Corralien Over a year ago

I just learned json_normalize that I forget read_json :-) lol +1 (and gg for your 15k)

Henry Ecker Over a year ago

Thank you... I'm unsure if this is correct since there's the "without duplicates and in ascending date order?" part that this answer doesn't address, but I don't know. Thank you for that as well (likewise congrats on your 5, 6, and 7k)!

Corralien Over a year ago

Maybe you can add .groupby('date').sum().sort_index(ascending=True)

Corralien · Accepted Answer · 2021-07-01 21:15:10Z

Use json_normalize:

# records = json.load(open('data.json'))
>>> records
[
  {"id": 814984317021495298, "date": "2016-12-30", "time": "18:59:37", "timezone": 
    "-0400", "replies_count": 7708, "username": "im_theantitrump"},
  {"id": 814984316195311616, "date": "2016-12-30", "time": "18:59:37", "timezone": 
    "-0400", "replies_count": 25772, "username": "bishyoucray2"}
]


# Simple extraction of the 2 columns
>>> pd.json_normalize(records)[['date', 'replies_count']]

         date  replies_count
0  2016-12-30           7708
1  2016-12-30          25772


# Without duplicates and ascending sort dates
>>> pd.json_normalize(records)[['date', 'replies_count']] \
      .groupby('date').sum().sort_index(ascending=True)

            replies_count
date
2016-12-30          33480

Collectives™ on Stack Overflow

pull 2 columns from json file into pandas dataframe

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related