Python automatically converting specific data to date format and losing data

Question

I'm cleaning a csv file with pandas, mainly removing special characters such as ('/', '#', etc). The file has 7 columns (none of which are dates).

In some of the columns, there's just numerical data such as '11/6/1980'.

I've noticed that directly after reading the csv file,

df = pd.read_csv ('report16.csv', encoding ='ANSI')

this data becomes '11/6/80', after cleaning it becomes '11 6 80' (it's the same result in the output file). So wherever the data has ' / ', it's being interpreted as a date and python is eliminating the first 2 digits from the data.

Data	Expected result	Actual Result
11/6/1980	11 6 1980	11 6 80
12/8/1983	12 8 1983	12 8 83

Both of the above results are wrong because in the Actual Result column, I'm losing 2 digits towards the end.

The data looks like this

Org Name	Code	Code copy
ABC	11/6/1980	11/6/1980
DEF	12/8/1983	12/8/1983
GH	11/5/1987	11/5/1987

OrgName,    Code,   Code copy
ABC,    11/6/1980,  11/6/1980
DEF,    12/8/1983,  12/8/1983
GH, 11/5/1987,  11/5/1987
KC,      9000494,          9000494

It's worth mentioning that the column contains other data such as '900490', strings, etc but in these instances, there aren't any problems.

What could be done to not allow this conversion?

Welcome to SO! You will find help here, provided you ask questions in the way we are used to. Here, if you show us the code you use, with data exhibiting the problem - in fact if you provide a minimal reproducible example, you could get far more relevant and detailed answers. If you do not really understand what a minimal reproducible example is, please read How to Ask... — Serge Ballesta
– Serge Ballesta, Commented Mar 5, 2021 at 12:54
I cannot reproduce. Please see my (non) answer below for more details. — Serge Ballesta
– Serge Ballesta, Commented Mar 5, 2021 at 13:30
Still not an useable format here. As the problem is at the time of reading the csv file, you should show the file not in a spreadsheet format but in raw text format like when you use type file.csv in a windows console CMD window, or cat file.csv on a Unix-like. Or when you open it in a simple text editor like Windows notepad, or vi on Unix-like, or notepad++ — Serge Ballesta
– Serge Ballesta, Commented Mar 5, 2021 at 13:47

Serge Ballesta · Accepted Answer · 2021-03-05 13:25:32Z

1

Not an answer, but comments do not allow to include well presented code and data.

Here is what I call a minimal reproducible example:

Content of the sample.csv file:

Data,Expected result,Actual Result
11/6/1980,11 6 1980,11 6 80
12/8/1983,12 8 1983,12 8 83

Code:

df = pd.read_csv('sample.csv')
print(df)
s = df['Data'].str.replace('/', ' ')
print((df['Expected result'] == s).all())

It gives :

        Data Expected result Actual Result
0  11/6/1980       11 6 1980       11 6 80
1  12/8/1983       12 8 1983       12 8 83
True

This proves that read_csv has correctly read the file and has not changed anything.

PLEASE SHOW THE CONTENT OF YOUR CSV FILE AS TEXT, along with enough code to reproduce your problem.

answered Mar 5, 2021 at 13:25

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Serge Ballesta Over a year ago

This is not an answer and will probably be deleted in a while... It should be anyway ;-)

jim Over a year ago

This looks like an answer to me. A good answer. Shows OP a very good first step in problem solving: break the problem into pieces that can be proved/disproved ("proves that read_csv has correctly read the file"). Also shows OP how to get better support by providing the core of the problem: the text file. Well done.

Serge Ballesta Over a year ago

@jim Thank you for supporting :-) . I said that it is not an answer precisely for those reasons: it explains OP how to present a question, it explains them what they should control on their own system before asking here or at least what they should show if they need explainations. All things that should normally go into comments :-(

AMD_Uz · Accepted Answer · 2021-03-05 13:16:32Z

0

How about trying string operation?! First select the column that you would like to modify and replace "/" or "#" with whitespace : column.str.replace("/", " "). I hope this is gonna work !

answered Mar 5, 2021 at 13:16

AMD_Uz

1

1 Comment

qnt003 Over a year ago

I tried this however the problem becomes apparent upon reading the csv file. So immediatedly after the file is read, they become dates and the 2 digits of the 'year' column are lost. What I've done until now is replaced the '/' with ' * ' in excel and re-read the file and that worked. However I'd like to know if there's a workaround to this in Python.

jim · Accepted Answer · 2021-03-05 17:48:24Z

0

The behavior of converting dates is not strictly a python issue. You are using pandas read_csv.

Try to explicitly declare a separator. If sep not declared, it makes guesses.

df = pd.read_csv ('report16.csv', encoding ='ANSI', sep =',')

answered Mar 5, 2021 at 17:48

jim

4565 silver badges13 bronze badges

Collectives™ on Stack Overflow

Python automatically converting specific data to date format and losing data

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related