How to specify custom parser in Pandas.read_csv? [duplicate]

Question

I need to open a file.csv in Pandas. For that, I can use pd.read_csv('file.csv').

The problem is, the file is not properly formatted:

a b   c
1 2   5
3 4   6

The first delimiter is 1 space and the second delimiter is 3 spaces.

I couldn't find a way on pandas documentation on how to do that.

I can pre process the file beforehand, transform it to a StringIO and open with pandas, but it seems hackish to me.

with open('file.csv', 'r') as f:
    text = f.read()
    text = text.replace('   ', ' ')
    text = StringIO(text)
    df = pd.read_csv(text)

How can I do that with pandas directly?

regex argument to sep invokes the python parser which is slower. I'm not sure if delim_whitespace does the same but it is definitely more idiomatic. — cs95
– cs95, Commented Jun 11, 2019 at 20:07

Quang Hoang · Accepted Answer · 2019-06-11 19:26:14Z

2

Did you try pd.read_csv('file.csv', sep='\s+')?

answered Jun 11, 2019 at 19:26

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dammit, I need to learn regex. Many thanks