1

I need to open a file.csv in Pandas. For that, I can use pd.read_csv('file.csv').

The problem is, the file is not properly formatted:

a b   c
1 2   5
3 4   6

The first delimiter is 1 space and the second delimiter is 3 spaces.

I couldn't find a way on pandas documentation on how to do that.

I can pre process the file beforehand, transform it to a StringIO and open with pandas, but it seems hackish to me.

with open('file.csv', 'r') as f:
    text = f.read()
    text = text.replace('   ', ' ')
    text = StringIO(text)
    df = pd.read_csv(text)

How can I do that with pandas directly?

4
  • Use delim_whitespace=True instead. More info Commented Jun 11, 2019 at 19:28
  • What's the benefit over sep='\s+'? Commented Jun 11, 2019 at 19:31
  • regex argument to sep invokes the python parser which is slower. I'm not sure if delim_whitespace does the same but it is definitely more idiomatic. Commented Jun 11, 2019 at 20:07
  • Indeed more idiomatic. Thank you Commented Jun 11, 2019 at 20:12

1 Answer 1

2

Did you try pd.read_csv('file.csv', sep='\s+')?

Sign up to request clarification or add additional context in comments.

1 Comment

Dammit, I need to learn regex. Many thanks

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.