0

There're lots of questions regarding JSON to CSV Conversion in Python , but unfortunately couldn't solve my problem.

I've this simple simple JSON Data which is in a file and looks like this after loading.

Raw data in single line [I've structured this to understand better]:

{
    "t_id":"80740185.1558980000000.120184.121164",
    "s_id":"80740185",
    "pt_slot":"null:null",
    "ch_id":1,"o_id":121164,"c_id":120184,
    "msg_type":1,
    "amd":"{
                \"msg\":\" some Bengali text\",
                \"mask\":\"1GB_OFFER\",
                \"ec\":\"1\",
                \"time-out\":\"0\",
                \"validity\":\"30052019 000000\"
           }",
    "time":1558960217731,
    "dlr":"1",
    "msisdn":"xxxxx",
    "entity":1
}

**After loading to JSON formated data looks like below **

{
    u't_id': u'80740185.1558980000000.120184.121164', 
    u'c_id': 120184, 
    u'msg_type': 1, 
    u'dlr': u'1', 
    u'msisdn': u'xxxxxxxx', 
    u'amd': u'{
                "msg":" \u0986\u099c \u09b0\u09be\u09a4 \u09e7\u09e8\u099f\u09be\u09b0 \u09ae\u09a7\u09cd\u09af\u09c7 *21291*609# \u09a1\u09be\u09df\u09be\u09b2\u09c7 \u0995\u09bf\u09a8\u09c1\u09a8 \u09e7\u099c\u09bf\u09ac\u09bf \u09ef\u099f\u09be\u0995\u09be\u09a4\u09c7 (\u09e9\u09a6\u09bf\u09a8)",
                "mask":"1GB_OFFER",
                "ec":"1",
                "time-out":"0",
                "validity":"30052019 000000"
               }', 
    u'entity': 1, 
    u's_id': u'80740185', 
    u'ch_id': 1, 
    u'time': 1558960217731, 
    u'pt_slot': u'null:null', 
    u'o_id': 121164
}

I've above very simple JSON Data which I'm trying to convert to CSV data. But getting below error.

Here is my code

#!/usr/bin/python

import json
import csv

def write_sms_dat_to_csv_file():
    f = csv.writer(open('csv_data.txt','wb+'),delimiter = '|')
    with open('test.dat') as fh:
            data = json.load(fh)

    for dt in data:
            f.writerow([dt['c_id'],dt['msisdn'],dt["amd"]["mask"]])

if __name__=="__main__":
    write_sms_dat_to_csv_file()

Error Message

Traceback (most recent call last):
File "./sms_data_read.py", line 16, in <module>
write_sms_dat_to_csv_file()
File "./sms_data_read.py", line 13, in write_sms_dat_to_csv_file
f.writerow([dt['c_id'],dt['msisdn'],dt['amd']['mask']])
TypeError: string indices must be integers

Removing for loops with below statement gives same error:

f.writerow([data['c_id'],data['msisdn'],data['amd']["mask"]])
8
  • 1
    Is your test.dat a single json record? i.e. a single dictionary? Because if so the issue is that for dt in data iterates over the keys in that dict, and the string keys are only indexed by ints Commented May 29, 2019 at 18:59
  • 1
    The JSON encoding isn't correct. amd has ended up as a string Commented May 29, 2019 at 19:00
  • 4
    amd is holding a string, and you're trying to index that long string by ["mask"] Commented May 29, 2019 at 19:00
  • 1
    I don't have 2.7 handy, but does import ast and then f.writerow([dt['c_id'],dt['msisdn'],ast.literal_eval(dt['amd'])['mask']]) work? Commented May 29, 2019 at 19:01
  • why are you doing for dt in data: just do data["c_id"] directly Commented May 29, 2019 at 19:02

3 Answers 3

1

Looks like the problem is that the dictionary you're trying to access through key 'amd' is actually a string. You can convert it to an actual dictionary by importing ast

import ast

sub_dict = ast.literal_eval(dt['amd'])
Sign up to request clarification or add additional context in comments.

Comments

0

The problem is with the loop. json.load returns a dictionary, and iterating over a dictionary with a for...in loop iterates over the keys. You are treating dt as a dictionary in the body of the loop, but it is actually a string—a key in the dictionary data. It also looks like something about the raw JSON data is causing json.load to not parse the value mapped to by amd as a JSON object, so data["amd"] is a string rather than a dictionary. You can get around this by parsing this string separately. Putting both of these things together, you should be able to replace the loop with

amd = json.load(data["amd"])
f.writerow([data['c_id'],data['msisdn'],amd["mask"]])

to get the result you're looking for.

3 Comments

I've tried this... it still doesn't work. edited question. Kindly check.
@Leon I gave you a suggestion in the comments that you haven't replied to
@roganjosh .. trying... give me 10 minutes.
0

Something is a bit odd about your source JSON encoding, but if the structure is consistent with what you've provided, then you just need to parse the value in dt['amd'] as well:

$ python
Python 3.7.2 (default, Dec 27 2018, 07:35:06) 
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json_string = '''
... [
...     {
...         "t_id": "80740185.1558980000000.120184.121164",
...         "s_id": "80740185",
...         "pt_slot": "null:null",
...         "ch_id": 1,
...         "o_id": 121164,
...         "c_id": 120184,
...         "msg_type": 1,
...         "amd": "{\\"msg\\": \\" some Bengali text\\", \\"mask\\": \\"1GB_OFFER\\", \\"ec\\": \\"1\\", \\"time-out\\": \\"0\\", \\"validity\\": \\"30052019 000000\\"}",
...         "time": 1558960217731,
...         "dlr": "1",
...         "msisdn": "xxxxx",
...         "entity": 1
...     }
... ]
... '''
>>> json_data = json.loads(json_string)
>>> for row in json_data:
...     row['amd'] = json.loads(row['amd'])
...     # Write row to CSV
... 
>>> json_data
[{'amd': {'ec': '1',
          'mask': '1GB_OFFER',
          'msg': ' some Bengali text',
          'time-out': '0',
          'validity': '30052019 000000'},
  'c_id': 120184,
  'ch_id': 1,
  'dlr': '1',
  'entity': 1,
  'msg_type': 1,
  'msisdn': 'xxxxx',
  'o_id': 121164,
  'pt_slot': 'null:null',
  's_id': '80740185',
  't_id': '80740185.1558980000000.120184.121164',
  'time': 1558960217731}]

Edited to provide full working example.

5 Comments

loading in for loop doesnt work as well ... same error pops up
Could you provide the value of your dt variable at the time of the error (i.e., provide the offending row)?
This would not work because dt takes on the keys of the dictionary data.
yes for loop wont work. when problem is popping up dt holds only "t_id".
Updated to provide a better description of what I was suggesting. I think the actual data is in a list (per "this is only one row"), so the iteration is over a list and not over a dict.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.