I have a dataset looks like this on wordpad.
"state","industry","2000","2005"
"A","art,music",2934,2454
"B","farm",3949,2343
And I want to read this on python like this.
| "state" | "industry" | "2000" | "2005" |
|---|---|---|---|
| "A" | "art,music" | 2934 | 2454 |
| "B" | "farm" | 3949 | 2343 |
I tried the codes below.
df = pd.read_csv(os.path.join(path, filename), engine='python', sep=',' , quoting=3)
this casts an error "ParserError: Expected 6 fields in line 8, saw 8"
df = pd.read_csv(os.path.join(path, filename), engine='python', sep='",' , quoting=3)
this puts all the numbers in a same cell.
I read a lot of posts asking similar question, but mine is a bit different from then because 1) I have a data which contains commas within double quotes and 2) employment numbers are not quoted.
How can I handle it? Help appreciated!
quoting=3tells pandas that nothing in the csv file is quoted, which isn't the case for this file. Usequoting=0instead. See stackoverflow.com/a/63357614/494134state(5 characters). But you want quotes around them? The python literal would be'"state"'(7 characters).