2

I have a DataFrame that has columns such as ID, Name, Specification, Time.

my file path to open them

mc = pd.read_csv("C:\\data.csv", sep = ",", header = 0, dtype = str)

When I checked my columns values, using

mc.coulumns.values 

I found my ID had it with a weird character looked like this,

['/ufeffID', 'Name', 'Specification', 'Time']  

After this I assigned that columns with ID like this,

 mc.columns.values[0] = "ID"

When I checked this using

mc.columns.values 

I got my result as,

Array(['ID', 'Name', 'Specification', 'Time']) 

Then, I checked with,

"ID" in mc.columns.values

it gave me "True"

Then I tried,

mc["ID"]

I got an error stating like this,

keyError 'ID'.

I want to get the values of ID column and get rid of that weird characters in front of ID column? Is there any way to solve that? Any help would be appreciated. Thank you in advance.

2
  • Can you try passing encoding=utf-16' like I suggested, additionally you can confirm what the real column names are by printing them using mc.columns.tolist() Commented Aug 4, 2016 at 18:28
  • This is related: stackoverflow.com/a/38316355/2285236 Commented Aug 4, 2016 at 18:31

1 Answer 1

1

That's utf-16 BOM, pass encoding='utf-16' to read_csv see: https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

mc = pd.read_csv("C:\\data.csv", sep=",", header=0, dtype=str, encoding='utf-16')

the above should work FE FF is the BOM for utf-16 Big endian to be specific

Also you should use rename rather than try to overwrite the np array value:

mc.rename(columns={mc.columns[0]: "ID"}, inplace=True)

should work correctly

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.