1

I am trying to read an unstructured CSV file without any header, using Pandas. The number of columns differ in different rows and there is no clear upper limit for the num of columns. Right now it is 10 but it will increase to maybe 15.

Example CSV file content:

a;b;c
a;b;c;d;e;;;f
a;;
a;b;c;d;e;f;g;h;;i
a;b;
....

Here is how I read it using Python Pandas:

pd.DataFrame(pd.read_csv(path, sep=";", header=None, usecols=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                                            names=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'],
                                            nrows=num_of_rows + 1))

However this produces FutureWarning: Defining usecols with out of bounds indices is deprecated and will raise a ParserError in a future version. warning message. And I don't want my code to stop working in the future because of this reason.

My question is that is there a way to read such an unstructured CSV file using Pandas (or any other equivalently fast library) in a future-safe way?

1 Answer 1

1

You can use:

# choose a bad seperator
df = (pd.read_csv('data.csv', sep='@', header=None).squeeze()
        .str.split(';', expand=True).fillna(''))

df.columns = [chr(65+c) for c in df.columns]  # or whatever you want
print(df)

# Output
   A  B  C  D  E  F  G  H I  J
0  a  b  c                    
1  a  b  c  d  e        f     
2  a                          
3  a  b  c  d  e  f  g  h    i
4  a  b                       

Update

Other possibility:

df = (pd.read_csv('data.csv', sep='@', header=None).squeeze()
        .str.replace(r';{2,}', ';')
        .str.split(';', expand=True).fillna(''))
df.columns = [chr(65+c) for c in df.columns]  # or whatever you want
print(df)

# Output
   A  B  C  D  E  F  G  H  I
0  a  b  c                  
1  a  b  c  d  e  f         
2  a                        
3  a  b  c  d  e  f  g  h  i
4  a  b                     
Sign up to request clarification or add additional context in comments.

2 Comments

Adding 65 to each char doesn't seem to be effective tbh, but it solves the problem I mentioned. So thank you.
Glad to read that :-) Use what you want to rename columns, it was just for demo.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.