I have a pandas data frame with two columns ('no1' & 'no2'), some of the values contain Chinese characters, some do not.
no1 no2
Paul Pogba 贝克汉姆
Gianluigi Buffon 莱奥内尔・梅西
莱奥内尔・梅西 莱奥内尔・梅西
Cristiano Ronaldo 莱奥内尔・梅西
STEVE HARRIS zinedine zidane
Cristiano Ronaldo Gianluigi Buffon
I would like to add a column which has the value of 1 if either of the two columns has a string with a Chinese character in it, and 0 if neither do. The function looks like this:
def find_china_symbols(text):
counter = 0
if isinstance(text,str):
for char in text:
if ord(char) > 10000:
counter += 1
if counter > 0:
return True
else:
return False
else:
return False
Previously I have used np.where to create this column (as below) but it doesn't work in this case. Why does it not?
df["Chinese"] = np.where(find_china_symbols(df["no1"]) | find_china_symbols(df["no2"]),1,0)
Ideally, this would be the result:
no1 no2 Chinese
Paul Pogba 贝克汉姆 1
Gianluigi Buffon 莱奥内尔・梅西 1
莱奥内尔・梅西 莱奥内尔・梅西 1
Cristiano Ronaldo 莱奥内尔・梅西 1
STEVE HARRIS zinedine zidane 0
Cristiano Ronaldo Gianluigi Buffon 0