python pandas read text file, skip particular lines

Question

I am trying to read a text file using pd.read_csv

df = pd.read_csv('filename.txt', delimiter = "\t")

My text file (see below) has a few lines of text before the dataset I need to import begins. How do I skip the lines before the dataset headers? I don't want to use any solution that involves counting the number of lines I need to skip because I have to do this for multiple (similar, not same) text files. Any help is appreciated!

Note: I cannot upload the text file as it is confidential

========================================= 
hello 123
========================================= 
Dir: /x/y/z/RTchoice/release001/data 
Date: 17-Mar-2020 10:0:08 
Output File: /a/b/c/filename.txt 
N: 2842
-----------------------------------------
Subject col1    col2    col3    
001 10.00000    1.00000 3.00000 
002 11.00000    2.00000 4.00000

use the skiprows argmument. pd.read_csv('filename.txt', delimeter='\t', skiprows=8) — piRSquared
– piRSquared, Commented Mar 24, 2021 at 20:39
I don't want to use any solution that involves counting the number of lines I need to skip because I have to do this for multiple (similar, not same) text files. Do you think there is a way I can count the number of rows I need to skip without opening up the text files maybe? Thanks! — akhosla
– akhosla, Commented Mar 24, 2021 at 20:42
Then you have to identify the line somehow. Is it the '---------' that breaks the header from the data? You tell me. You can't just craft magic. There has to be some logic to it. — piRSquared
– piRSquared, Commented Mar 24, 2021 at 20:44
yeah I get that. I was just wondering if there is a more efficient way but maybe not. So I guess I could just index for the last '---------' and use that index in the skiprows argument — akhosla
– akhosla, Commented Mar 24, 2021 at 20:52

piterbarg · Accepted Answer · 2021-03-24 20:57:36Z

2

Here is an attempt to 'craft magic'. The idea is to try read_csv with different skiprows until it works

import pandas as pd
from io import StringIO
data = StringIO(
'''
========================================= 
hello 123
========================================= 
Dir: /x/y/z/RTchoice/release001/data 
Date: 17-Mar-2020 10:0:08 
Output File: /a/b/c/filename.txt 
N: 2842
-----------------------------------------
Subject col1    col2    col3    
001 10.00000    1.00000 3.00000 
002 11.00000    2.00000 4.00000
''')

for n in range(1000):
    try:
        data.seek(0)
        df = pd.read_csv(data, delimiter = "\s+", skiprows=n)
    except:
        print(f'skiprows = {n} failed (exception)')   
    else:
        if len(df.columns) == 1: # do not let it get away with a single-column df
            print(f'skiprows = {n} failed (single column)')
        else:   
            break
print('\n', df)

output:


skiprows = 0 failed (exception)
skiprows = 1 failed (exception)
skiprows = 2 failed (exception)
skiprows = 3 failed (exception)
skiprows = 4 failed (exception)
skiprows = 5 failed (exception)
skiprows = 6 failed (exception)
skiprows = 7 failed (exception)
skiprows = 8 failed (single column)

    Subject  col1  col2  col3
0        1  10.0   1.0   3.0
1        2  11.0   2.0   4.0

answered Mar 24, 2021 at 20:57

piterbarg

8,2292 gold badges9 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

piterbarg Over a year ago

this is dark magic, I agree.. but was inspired by your comment!

piRSquared Over a year ago

I got the reference (-:

piterbarg Over a year ago

see if it works in 'real life' for you! I myself would describe this as more 'hacky' than 'beautiful' but thanks :-)

akhosla Over a year ago

yup, it worked! I had never encountered try/except/else before so this was also a good learning opportunity :]

Collectives™ on Stack Overflow

python pandas read text file, skip particular lines

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related