0

I have a json file with following contents

{
"2ndStrike": {
    "SECONDSTKE_FIGHT_BUTTON": "攻撃を続ける",
    "SECONDSTKE_RESOURCE_DESC": "残り資源",
    "SECONDSTKE_RESOURCE_REM1": "残りの資源を得るため小隊を修理し戦闘を続けろ:",
    "SECONDSTKE_RESOURCE_REM2": "悪名を高めるためにも戦い続け、この基地を破壊しろ!",
    "SECONDSTKE_SURR_BUTTON": "降伏",
    "SECONDSTKE_TITLE": "敗北"
},
"AccountManagementUI": {
    "CHOOSE_BASE_AGE_{x}": "{x} 日目",
    "CHOOSE_BASE_CC_LEVEL_{x}": "CC レベル {x}",
    "CHOOSE_BASE_CONFIRM_MESSAGE": "本当にこれから全てのデバイスでこの基地を使用しますか?",
    "CHOOSE_BASE_CONTINUE_BUTTON": "続ける",
    "CHOOSE_BASE_DESCRIPTION": "この{social_network}アカウントには2つの基地が存在してます。基地の数は一人のプレイヤーにつき一つに限定されています。基地を選択するか、キャンセルしてください。",
    "CHOOSE_BASE_LEVEL_{x}": "レベル {x}",
    "CHOOSE_BASE_LOCKED_BUTTON": "基地の選択",
    "CHOOSE_BASE_PANEL_TITLE": "アクティブな基地の選択"
}
}

I want to extract the occurences of all the unique non-English characters in this file . Could anyone tell me how to do that?

1 Answer 1

1

You can still use json.load, it will work the same as any other normal ascii strings.

import json
data = json.load(open("yourfilename.json"))

If you couldn't print the data on screen, it's a whole different topic.

If you only want to count the time a single char occur, you can do this:

import re, collections
with open("/users/apple/desktop/me.txt", 'rb') as data:
    counted = collections.Counter(re.findall('[^\x00-\x7F]', data.read().decode(), re.UNICODE))
print(counted)

Output:

Counter({'の': 10, 'を': 8, '基': 7, '地': 7, 'る': 5, 'し': 5, 'に': 5, '続': 4, 'け': 4, 'こ': 4, 'て': 4, 'す': 4, 'め': 3, 'い': 3, 'レ': 3, 'ル': 3, 'か': 3, 'ま': 3, 'つ': 3, '。': 3, '選': 3, '択': 3, '残': 2, 'り': 2, '資': 2, '源': 2, 'た': 2, '戦': 2, 'ろ': 2, '、': 2, 'ベ': 2, 'れ': 2, 'イ': 2, 'ア': 2, 'ン': 2, 'は': 2, '一': 2, 'さ': 2, '攻': 1, '撃': 1, '得': 1, '小': 1, '隊': 1, '修': 1, '理': 1, '闘': 1, ':': 1, '悪': 1, '名': 1, '高': 1, 'も': 1, '破': 1, '壊': 1, '!': 1, '降': 1, '伏': 1, '敗': 1, '北': 1, '日': 1, '目': 1, '本': 1, '当': 1, 'ら': 1, '全': 1, 'デ': 1, 'バ': 1, 'ス': 1, 'で': 1, '使': 1, '用': 1, '?': 1, 'カ': 1, 'ウ': 1, 'ト': 1, 'が': 1, '存': 1, '在': 1, '数': 1, '人': 1, 'プ': 1, 'ヤ': 1, 'ー': 1, 'き': 1, '限': 1, '定': 1, 'キ': 1, 'ャ': 1, 'セ': 1, 'く': 1, 'だ': 1, 'ク': 1, 'テ': 1, 'ィ': 1, 'ブ': 1, 'な': 1})

Sign up to request clarification or add additional context in comments.

11 Comments

how do i find the unique occurences of non-english letters in it.?
What do you mean by that?
So it doesn't have anything to do with json? Then just use the re module to search your file
A easy one will be re.findall('[^\x00-\x7F]', data, re.UNICODE) which will return a list of all the letters that is not in the standard ascii codex
And if you want to see how much each letters appeared, you can just do collections.Counter(re.findall('[^\x00-\x7F]', data, re.UNICODE))
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.