Importing text with pandas/python

Question

I am importing a table with some text in it into a pandas dataframe. One of the strings contains the text 'NF-κB' - i.e. the 'kappa' character (some of the text in the tables also contains alphas and betas etc.).

When I read in the table using:

pd.read_table('table_processed.txt', sep='\t')

The kappa character is converted to '\xce\xba' so that part of the string now reads 'NF-\xce\xbaB' when viewed in iPython.

Is there any way to maintain string encoding during the import to maintain the kappa charater when the string is viewed as part of the dataframe?

Thanks in advance

Jeff · Accepted Answer · 2013-06-29 13:48:19Z

1

Straight from the docs, try using an encoding

http://pandas.pydata.org/pandas-docs/dev/io.html#dealing-with-unicode-data

In [1079]: data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'

In [1080]: df = pd.read_csv(StringIO(data), encoding='latin-1')

In [1081]: df

      word  length
0  Träumen       7
1    Grüße       5

In [1082]: df['word'][1]
u'Gr\xfc\xdfe'

answered Jun 29, 2013 at 13:48

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Importing text with pandas/python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related