I initially dumped a file which contained a particular sentence using:
with open(labelFile, "wb") as out:
json.dump(result, out,indent=4)
This sentence within the JSON looks like:
"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .",
I then proceeded to load this in via:
with open(sys.argv[1]) as sentenceFile:
sentenceFile = json.loads(sentenceFile.read())
process it and then write this out to a CSV using:
with open(sys.argv[2], 'wb') as csvfile:
fieldnames = ['x','y','z'
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for sentence in sentence2locations2values:
sentence = unicode(sentence['parsedSentence']).encode("utf-8")
writer.writerow({'x': sentence})
Which made the sentence in the CSV file opened in Excel for Mac:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
I then proceeded to take this from Excel for Macs to Google Sheets, where it is:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
Note, very slightly different, the  has replaced the Ã.
and then labelled it, bringing it back into Excel for Mac at which point it became back to:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
How do I initially read in the CSV, containing a sentence like:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
to a value which is:
"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating 45,000 per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in Hong Kong are granted a `` one way permit '' .",
So that it matches what was in the original json dump right at the start of this question?
EDIT
I check from this and see that the encoding of \u00c3 to Ã, the format in Google sheets, is actually Latin 8.
EDIT
I ran enca and see that the original dumped file is in 7bit ASCII characters, and my CSV is in unicode. So I need to load in as unicode and convert to 7bit ASCII?