5

I have a file (example shown below) that has multiple CSV tables. This file is uploaded to a database. I would like to do some operations on this file. For that, I was thinking of using pandas to read each table into a separate dataframe using read_csv function. However, going through the documentation, I didn't see an option to specify a subset of lines to read/parse. Is this possible? If not, are there other alternatives?

Sample file:

TABLE_1
col1,col2
val1,val2
val3,val4

TABLE_2
col1,col2,col3,col4
val1,val2,val3,val4
...

...

I can do an initial pass through the file to determine the start/end lines of each table. However, one of read_csv arguments is "filepath_or_buffer", but I am not totally certain what the 'buffer' part is. Is it a list of strings or one big string or something else? What can I use for a buffer? Can someone point me to an small example that uses read_csv with a buffer? Thanks for any ideas.

1
  • 2
    It is possible to read this type of file with read.csv using the skip and nrow arguments. First step is to run readLines and find the gap between tables. Helps if there is some consistency. Commented Apr 25, 2016 at 16:36

1 Answer 1

4

UPDATE:

if you want to skip specific lines [0,1,5,16,57,58,59], you can use skiprows:

df = pd.read_csv(filename, header=None, 
                 names=['col1','col2','col3'], skiprows=[0,1,5,16,57,58,59])

for skipping first two lines and reading following 100 lines you can use skiprows and nrows parameters as @Richard Telford mentioned in the comment:

df = pd.read_csv(filename, header=None, names=['col1','col2','col3'],
                 skiprows=2, nrows=100)

here is a small example for "buffer":

import io
import pandas as pd

data = """\
        Name
0  JP2015121
1    US14822
2    US14358
3  JP2015539
4  JP2015156
"""
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0)
print(df)

the same without header:

data = """\
0  JP2015121
1    US14822
2    US14358
3  JP2015539
4  JP2015156
"""
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0,
                 header=None, names=['Name'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.