2

I'm trying to build a python script that imports json files into a MongoDB. This part of my script keeps jumping to the except ValueError for larger json files. I think it has something to do with parsing the json file line by line because very small json files seem to work.

def read(jsonFiles):
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client[args.db]

counter = 0
for jsonFile in jsonFiles:
    with open(jsonFile, 'r') as f:
        for line in f:
            # load valid lines (should probably use rstrip)
            if len(line) < 10: continue
            try:
                db[args.collection].insert(json.loads(line))
                counter += 1
            except pymongo.errors.DuplicateKeyError as dke:
                if args.verbose:
                    print "Duplicate Key Error: ", dke
            except ValueError as e:
                if args.verbose:
                    print "Value Error: ", e

                    # friendly log message
            if 0 == counter % 100 and 0 != counter and args.verbose: print "loaded line:", counter
            if counter >= args.max:
                break

I'm getting the following error message:

Value Error:  Extra data: line 1 column 10 - line 2 column 1 (char 9 - 20)
Value Error:  Extra data: line 1 column 8 - line 2 column 1 (char 7 - 18)
1
  • 1
    The file is probably not in a valid json format. Commented Jul 20, 2016 at 6:57

2 Answers 2

4

Look at this example:

s = """{ "data": { "one":1 } },{ "1": { "two":2 } }"""
json.load( s )

It will produce the "Extra data" error like in your json file:

ValueError: Extra data: line 1 column 24 - line 1 column 45 (char 23 - 44)

This is because this is not a valid JSON object. It contains two independend "dict"s, separated by a colon. Perhaps this could help you finding the error in your JSON file.

in this post you find more information.

Sign up to request clarification or add additional context in comments.

1 Comment

Ok so it looks like I need to define multiple dicts (my json file is pretty large and has five levels on indentation at some points), dump the dicts, wrap them in a list, and dump the list. How will this look in my code?
1

Figured it out. Looks like breaking it up into lines was the mistake. Here's what the final code looks like.

counter = 0
for jsonFile in jsonFiles:
    with open(jsonFile) as f:
        data = f.read()
        jsondata = json.loads(data)
        try:
            db[args.collection].insert(jsondata)
            counter += 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.