2

I initially dumped a file which contained a particular sentence using:

 with open(labelFile, "wb") as out:
        json.dump(result, out,indent=4)

This sentence within the JSON looks like:

"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .", 

I then proceeded to load this in via:

with open(sys.argv[1]) as sentenceFile:
    sentenceFile = json.loads(sentenceFile.read())

process it and then write this out to a CSV using:

with open(sys.argv[2], 'wb') as csvfile:
    fieldnames = ['x','y','z'
                  ]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for sentence in sentence2locations2values:
         sentence = unicode(sentence['parsedSentence']).encode("utf-8")
         writer.writerow({'x': sentence})

Which made the sentence in the CSV file opened in Excel for Mac:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

I then proceeded to take this from Excel for Macs to Google Sheets, where it is:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

Note, very slightly different, the  has replaced the Ã.

and then labelled it, bringing it back into Excel for Mac at which point it became back to:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

How do I initially read in the CSV, containing a sentence like:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .

to a value which is:

"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating 45,000 per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in Hong Kong are granted a `` one way permit '' .", 

So that it matches what was in the original json dump right at the start of this question?

EDIT

I check from this and see that the encoding of \u00c3 to Ã, the format in Google sheets, is actually Latin 8.

EDIT

I ran enca and see that the original dumped file is in 7bit ASCII characters, and my CSV is in unicode. So I need to load in as unicode and convert to 7bit ASCII?

2
  • reading it as a normal file instead of using CSV classes should do the trick Commented Aug 30, 2016 at 14:04
  • Can you post a solution or example? Commented Aug 30, 2016 at 14:05

1 Answer 1

1

I figured out the solution to this. The solution was to decode the CSV file from its original format (identified as UTF-8) and then the sentence becomes the original one. So:

csvfile = open(sys.argv[1], 'r')

fieldnames = ("x","y","z")
reader = csv.DictReader(csvfile, fieldnames)
next(reader)

for i,row in enumerate(reader):
    row['x'] = row['x'].decode("utf-8")

The very strange thing that happened is that when I edited the CSV file in Excel for Mac and saved, every time it seems to convert to a different encoding. I warn other users about this as it is a huge headache.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.