Pandas: read_csv (read multiple tables in a single file)

Question

I have a file (example shown below) that has multiple CSV tables. This file is uploaded to a database. I would like to do some operations on this file. For that, I was thinking of using pandas to read each table into a separate dataframe using read_csv function. However, going through the documentation, I didn't see an option to specify a subset of lines to read/parse. Is this possible? If not, are there other alternatives?

Sample file:

TABLE_1
col1,col2
val1,val2
val3,val4

TABLE_2
col1,col2,col3,col4
val1,val2,val3,val4
...

...

I can do an initial pass through the file to determine the start/end lines of each table. However, one of read_csv arguments is "filepath_or_buffer", but I am not totally certain what the 'buffer' part is. Is it a list of strings or one big string or something else? What can I use for a buffer? Can someone point me to an small example that uses read_csv with a buffer? Thanks for any ideas.

It is possible to read this type of file with read.csv using the skip and nrow arguments. First step is to run readLines and find the gap between tables. Helps if there is some consistency. — Richard Telford
– Richard Telford, Commented Apr 25, 2016 at 16:36

MaxU - stand with Ukraine · Accepted Answer · 2016-04-25 17:40:30Z

UPDATE:

if you want to skip specific lines [0,1,5,16,57,58,59], you can use skiprows:

df = pd.read_csv(filename, header=None, 
                 names=['col1','col2','col3'], skiprows=[0,1,5,16,57,58,59])

for skipping first two lines and reading following 100 lines you can use skiprows and nrows parameters as @Richard Telford mentioned in the comment:

df = pd.read_csv(filename, header=None, names=['col1','col2','col3'],
                 skiprows=2, nrows=100)

here is a small example for "buffer":

import io
import pandas as pd

data = """\
        Name
0  JP2015121
1    US14822
2    US14358
3  JP2015539
4  JP2015156
"""
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0)
print(df)

the same without header:

data = """\
0  JP2015121
1    US14822
2    US14358
3  JP2015539
4  JP2015156
"""
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0,
                 header=None, names=['Name'])

Collectives™ on Stack Overflow

Pandas: read_csv (read multiple tables in a single file)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related