I'm cleaning a csv file with pandas, mainly removing special characters such as ('/', '#', etc). The file has 7 columns (none of which are dates).
In some of the columns, there's just numerical data such as '11/6/1980'.
I've noticed that directly after reading the csv file,
df = pd.read_csv ('report16.csv', encoding ='ANSI')
this data becomes '11/6/80', after cleaning it becomes '11 6 80' (it's the same result in the output file). So wherever the data has ' / ', it's being interpreted as a date and python is eliminating the first 2 digits from the data.
| Data | Expected result | Actual Result |
|---|---|---|
| 11/6/1980 | 11 6 1980 | 11 6 80 |
| 12/8/1983 | 12 8 1983 | 12 8 83 |
Both of the above results are wrong because in the Actual Result column, I'm losing 2 digits towards the end.
The data looks like this
| Org Name | Code | Code copy |
|---|---|---|
| ABC | 11/6/1980 | 11/6/1980 |
| DEF | 12/8/1983 | 12/8/1983 |
| GH | 11/5/1987 | 11/5/1987 |
OrgName, Code, Code copy
ABC, 11/6/1980, 11/6/1980
DEF, 12/8/1983, 12/8/1983
GH, 11/5/1987, 11/5/1987
KC, 9000494, 9000494
It's worth mentioning that the column contains other data such as '900490', strings, etc but in these instances, there aren't any problems.
What could be done to not allow this conversion?
type file.csvin a windows console CMD window, orcat file.csvon a Unix-like. Or when you open it in a simple text editor like Windows notepad, or vi on Unix-like, or notepad++