Start from passing appropriate parameters for this case:
- sep='[|,]' - there are two separators: a pipe char and a comma,
so define them as a regex.
- skipinitialspace=True - your source text contains extra spaces (after
separators), so you should drop them.
- engine='python' - to suppress a warning concerning Falling back to the
'python' engine.
The above options alone allow to call read_csv with no error, but the downside
(for now) is that double quotes remain.
To eliminate them, at least from the data rows, another trick is needed:
Define a converter (lambda) function:
cnv = lambda txt: txt.replace('"', '')
and apply it to all source columns.
In your case you have 5 columns, so to keep the code concise,
you can use a dictionary comprehension:
{ i: cnv for i in range(5) }
So the whole code can be:
df = pd.read_csv(io.StringIO(txt), sep='[|,]', skipinitialspace=True,
engine='python', converters={ i: cnv for i in range(5) })
and the result is:
"column1" "column2" "column3" "column4" "column5"
0 123 sometext this somedata 8 inches hello
But remember that now all columns are of string type, so you should
convert required columns to numbers.
An alternative is to pass second converter for numeric columns,
returning a number instead of a string.
To have proper column names (without double quotes), you can pass additional parameters:
- skiprows=1 - to omit the initial line,
- names=["column1", "column2", "column3", "column4", "column5"] - to
define the column list on your own.
|and,) as well? Or is the entire entry between the final bar and the end of the line a single column? Can you include the converter you're using?|as seperator, the issue is actually with csv file, but this a special case, and i am getting a double quote within the stringException while performing pandas.read_csv operation. error: Error tokenizing data. C error: EOF inside string starting at row 0, pandas config:obj['Body']before trying to read it in? Are you sure there are no null characters in the first row prior to the end of the line?