0

My csv file looks like this:

"City","Name","Comment"
"A","Jay","Like it"
"B","Rosy","Well, good"
...

"K","Anna","Works "fine""

The expected output(dataframe):

City,Name,Comment

A,Jay,'Like it'

B,Rosy,'Well, good'

...

K,Anna,'Works "fine"'

I am trying to read it by doing this :

df=pd.read_csv("test.csv", sep=',', engine='python',encoding='utf8', quoting=csv.QUOTE_ALL)

And it is giving error like this :

ParserError: unexpected end of data

I need to have all the data. So I can not skip any lines with error_bad_lines=True.

How I can fix this issue?

UPDATE: It turns out my original CSV file is missing a quote at the end of the file. I solved the problem by identifying the errors in the file and modifying them.

9
  • Try to do quoting=csv.QUOTE_NONE Commented Aug 6, 2021 at 16:17
  • No luck. I will get an error like this ParserError: Expected 578 fields in line 96, saw 580. (Just a heads up. My csv file has hundreds of columns) Commented Aug 6, 2021 at 16:19
  • @Claire_L Is double quote after "fine"" intentionally added? "Works " ends after white space? Commented Aug 6, 2021 at 16:36
  • @Xitiz Yes. This string has double quotes in it. Commented Aug 6, 2021 at 16:42
  • @Claire_L Expected output for that provided input? Commented Aug 6, 2021 at 16:52

1 Answer 1

1

I believe the trick is to preprocess and then read the data


import re
from io import StringIO
import pandas as pd

data = """
"City","Name","Comment"
"A","Jay","Like it"
"B","Rosy","Well, good"
"K","Anna","Works "fine""
"""

data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', data, flags=re.M)

x = pd.read_csv(StringIO(data), escapechar='\\')

print(x)

Outputs

   City  Name       Comment
0     A   Jay       Like it
1     B  Rosy    Well, good
2     K  Anna  Works "fine"

And in theory this should work the same with the file

with open('test.csv', 'r') as f:
    data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', f.read(), flags=re.M)
    df = pd.read_csv(StringIO(data), escapechar='\\')
    print(df)

Edit : It outputs as following

  City  Name       Comment
0    A   Jay       Like it
1    B  Rosy    Well, good
2    K  Anna  Works "fine"

From kayoz answer

Edit 2: and the last column is easy to change with a lambda or similar function.

df['Comment'] = df['Comment'].apply(lambda x: "'" + str(x) + "'")

Link to the original

TL;DR

import re
from io import StringIO
import pandas as pd

with open('test.csv', 'r') as f:
    data = re.sub('(?<!^)"(?!,")(?<!,")(?!$)', '\\"', f.read(), flags=re.M)
    df = pd.read_csv(StringIO(data), escapechar='\\')
    df['Comment'] = df['Comment'].apply(lambda x: "'" + str(x) + "'")
    print(df)
  City  Name  Comment
0    A   Jay  'Like it'
1    B  Rosy  'Well, good'
2    K  Anna  'Works "fine"'
Sign up to request clarification or add additional context in comments.

1 Comment

If you found this answer helpful please let me know.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.