1

I have a pandas data frame with two columns ('no1' & 'no2'), some of the values contain Chinese characters, some do not.

no1                     no2
Paul Pogba              贝克汉姆
Gianluigi Buffon        莱奥内尔・梅西
莱奥内尔・梅西           莱奥内尔・梅西
Cristiano Ronaldo       莱奥内尔・梅西
STEVE HARRIS            zinedine zidane
Cristiano Ronaldo       Gianluigi Buffon

I would like to add a column which has the value of 1 if either of the two columns has a string with a Chinese character in it, and 0 if neither do. The function looks like this:

def find_china_symbols(text):
    counter = 0
    if isinstance(text,str):
        for char in text:
            if ord(char) > 10000:
                counter += 1
        if counter > 0:
            return True
        else:
            return False
    else:
        return False

Previously I have used np.where to create this column (as below) but it doesn't work in this case. Why does it not?

df["Chinese"] = np.where(find_china_symbols(df["no1"]) | find_china_symbols(df["no2"]),1,0)

Ideally, this would be the result:

no1                     no2                  Chinese
Paul Pogba              贝克汉姆              1
Gianluigi Buffon        莱奥内尔・梅西         1
莱奥内尔・梅西           莱奥内尔・梅西         1
Cristiano Ronaldo       莱奥内尔・梅西         1
STEVE HARRIS            zinedine zidane       0
Cristiano Ronaldo       Gianluigi Buffon      0

1 Answer 1

2

I'd approach it like this with applymap

def find_china_symbols(text):
  return any(map(lambda char: ord(char) > 1000, text))

df['Chinese'] = df.applymap(find_china_symbols).any(1).astype(int)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.