1

I am importing a table with some text in it into a pandas dataframe. One of the strings contains the text 'NF-κB' - i.e. the 'kappa' character (some of the text in the tables also contains alphas and betas etc.).

When I read in the table using:

pd.read_table('table_processed.txt', sep='\t')

The kappa character is converted to '\xce\xba' so that part of the string now reads 'NF-\xce\xbaB' when viewed in iPython.

Is there any way to maintain string encoding during the import to maintain the kappa charater when the string is viewed as part of the dataframe?

Thanks in advance

1 Answer 1

1

Straight from the docs, try using an encoding

http://pandas.pydata.org/pandas-docs/dev/io.html#dealing-with-unicode-data

In [1079]: data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'

In [1080]: df = pd.read_csv(StringIO(data), encoding='latin-1')

In [1081]: df

      word  length
0  Träumen       7
1    Grüße       5

In [1082]: df['word'][1]
u'Gr\xfc\xdfe'
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.