Read a mixed encoding json file in python

Question

I have a json file with following contents

{
"2ndStrike": {
    "SECONDSTKE_FIGHT_BUTTON": "攻撃を続ける",
    "SECONDSTKE_RESOURCE_DESC": "残り資源",
    "SECONDSTKE_RESOURCE_REM1": "残りの資源を得るため小隊を修理し戦闘を続けろ：",
    "SECONDSTKE_RESOURCE_REM2": "悪名を高めるためにも戦い続け、この基地を破壊しろ！",
    "SECONDSTKE_SURR_BUTTON": "降伏",
    "SECONDSTKE_TITLE": "敗北"
},
"AccountManagementUI": {
    "CHOOSE_BASE_AGE_{x}": "{x} 日目",
    "CHOOSE_BASE_CC_LEVEL_{x}": "CC レベル {x}",
    "CHOOSE_BASE_CONFIRM_MESSAGE": "本当にこれから全てのデバイスでこの基地を使用しますか？",
    "CHOOSE_BASE_CONTINUE_BUTTON": "続ける",
    "CHOOSE_BASE_DESCRIPTION": "この{social_network}アカウントには2つの基地が存在してます。基地の数は一人のプレイヤーにつき一つに限定されています。基地を選択するか、キャンセルしてください。",
    "CHOOSE_BASE_LEVEL_{x}": "レベル {x}",
    "CHOOSE_BASE_LOCKED_BUTTON": "基地の選択",
    "CHOOSE_BASE_PANEL_TITLE": "アクティブな基地の選択"
}
}

I want to extract the occurences of all the unique non-English characters in this file . Could anyone tell me how to do that?

Taku · Accepted Answer · 2017-04-01 04:53:08Z

1

You can still use json.load, it will work the same as any other normal ascii strings.

import json
data = json.load(open("yourfilename.json"))

If you couldn't print the data on screen, it's a whole different topic.

If you only want to count the time a single char occur, you can do this:

import re, collections
with open("/users/apple/desktop/me.txt", 'rb') as data:
    counted = collections.Counter(re.findall('[^\x00-\x7F]', data.read().decode(), re.UNICODE))
print(counted)

Output:

Counter({'の': 10, 'を': 8, '基': 7, '地': 7, 'る': 5, 'し': 5, 'に': 5, '続': 4, 'け': 4, 'こ': 4, 'て': 4, 'す': 4, 'め': 3, 'い': 3, 'レ': 3, 'ル': 3, 'か': 3, 'ま': 3, 'つ': 3, '。': 3, '選': 3, '択': 3, '残': 2, 'り': 2, '資': 2, '源': 2, 'た': 2, '戦': 2, 'ろ': 2, '、': 2, 'ベ': 2, 'れ': 2, 'イ': 2, 'ア': 2, 'ン': 2, 'は': 2, '一': 2, 'さ': 2, '攻': 1, '撃': 1, '得': 1, '小': 1, '隊': 1, '修': 1, '理': 1, '闘': 1, '：': 1, '悪': 1, '名': 1, '高': 1, 'も': 1, '破': 1, '壊': 1, '！': 1, '降': 1, '伏': 1, '敗': 1, '北': 1, '日': 1, '目': 1, '本': 1, '当': 1, 'ら': 1, '全': 1, 'デ': 1, 'バ': 1, 'ス': 1, 'で': 1, '使': 1, '用': 1, '？': 1, 'カ': 1, 'ウ': 1, 'ト': 1, 'が': 1, '存': 1, '在': 1, '数': 1, '人': 1, 'プ': 1, 'ヤ': 1, 'ー': 1, 'き': 1, '限': 1, '定': 1, 'キ': 1, 'ャ': 1, 'セ': 1, 'く': 1, 'だ': 1, 'ク': 1, 'テ': 1, 'ィ': 1, 'ブ': 1, 'な': 1})

edited Apr 1, 2017 at 4:53

answered Apr 1, 2017 at 4:24

Taku

34.1k12 gold badges79 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Asad Mahmood Over a year ago

how do i find the unique occurences of non-english letters in it.?

Taku Over a year ago

What do you mean by that?

Taku Over a year ago

So it doesn't have anything to do with json? Then just use the re module to search your file

Taku Over a year ago

A easy one will be re.findall('[^\x00-\x7F]', data, re.UNICODE) which will return a list of all the letters that is not in the standard ascii codex

Taku Over a year ago

And if you want to see how much each letters appeared, you can just do collections.Counter(re.findall('[^\x00-\x7F]', data, re.UNICODE))

|

Collectives™ on Stack Overflow

Read a mixed encoding json file in python

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related