Assume that i have a list which has more than one list for example:
l = [['a'],['a','b'],['c'],['d',['a','b'],'f']]
with this:
l = [x.decode('UTF8') for x in l]
probably i will get error: list object has no attribute 'decode'
("l" list created from tokenized text which has its every words made list object. Tried many solution for overcome decode struggle but still cant print non-ascii characters)
with open(path, "r") as myfile:
text=myfile.read()
text = word_tokenize(text)
d = [[item] if not isinstance(item, list) else item for item in text]
arr = sum(([[x[0] for x in g]] if k else list(g)
for k, g in groupby(d, key=lambda x: x[0][0].isupper())),
[])
arr = [x.decode('UTF8') for x in arr]
INPUT (my text file):
Çanakkale çok güzel bir şehirdir. Çok beğendik.
OUTPUT :
[[u'\xc7anakkale'], [u'\xe7ok'], [u'g\xfczel'], [u'bir'], [u'\u015fehirdir'], [u'.']. [u'\xe7ok'], [u'be\u011fendik'], [u'.']]
my desired output is list but exactly like my input format.