removing some part of a text file in python

Question

I have a very big text file and I want to filter out some lines. the first line is Identifier and it is followed by many lines (numbers in different lines) like this example:

example:

fixedStep ch=GL000219.1 start=52818 step=1
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
fixedStep ch=GL000320.1 start=52959 step=1
1.000000
1.000000
1.000000
fixedStep ch=M start=52959 step=1
1.000000
1.000000

this line is identifier: fixedStep ch=GL000219.1 start=52818 step=1 I want to filter out all identifier lines containing ch=GL000219.1 and ch=GL000320.1 and the following lines (the numbers) and keep other identifiers and the corresponding lines (numbers) below them. each identifier is repeated multiple times. like this output:

fixedStep ch=M start=52959 step=1
1.000000
1.000000

I have tried this code:

l = ["ch=GL000219.1", "ch=GL000320.1"] # since I have more identifiers that should be removed 

with open('file.txt', 'r') as f:
    with open('outfile.txt', 'w') as outfile:
        good_data = True
        for line in f:
            if line.startswith('fixedStep'):
                for i in l:
                    good_data = i not in line
            if good_data:
                outfile.write(line)

my code does not return what I want. do you know how to modify the code?

Add a break under good_data = i not in line if it ever becomes False. good_data can take multiple values for a single line because it's overwriting itself, so it only has to be True for the last value of i — roganjosh
– roganjosh, Commented Jul 26, 2017 at 12:33
There's a few changes you need to make if I understand your question correctly. What did you try? — roganjosh
– roganjosh, Commented Jul 26, 2017 at 12:39
if I do not call the list and try the identifiers one by one it works for one of them each time perfectly but it took me lot of time to try that for all of them. I would like to do that for all identifiers at once. — john
– john, Commented Jul 26, 2017 at 12:42

gushitong · Accepted Answer · 2017-07-26 13:16:20Z

1

You placed this line in the wrong place:

good_data = True

Once it is set to false, it won't to be true again.

You can write like this:

l = ["ch=GL000219.1", "ch=GL000320.1"]
flag = False                                                                        

with open('file.txt', 'r') as f, open('outfile.txt', 'w') as outfile:                                                                                
    for line in f:                                                                  
        if line.strip().startswith("fixedStep"):                                    
            flag = all(i not in line for i in l)                                    
        if flag:                                                                    
            outfile.write(line)

edited Jul 26, 2017 at 13:16

answered Jul 26, 2017 at 12:49

gushitong

2,06618 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

john Over a year ago

it removes every line below the identifiers even the ones that I am interested in

gushitong Over a year ago

@john what do you mean by "removes every line", I didn't understand !

john Over a year ago

every identifier has some lines below (like the example). I would like to remove the some of the identifiers that I am not interested in and the following lines. indeed there are also some identifiers that I am interested in and I want them and corresponding lines that are below them. like example

gushitong Over a year ago

@john I understand. I updated the code, is that what you want ?

JerryLong · Accepted Answer · 2017-07-26 12:43:50Z

0

you need to split strings(the content of the text file)into lines after you read them from a text file . using

print(f)

after read to f, you will find that is a string not lines.

if it's a unix ending text file,using

f = f.split("\n")

to convert string to list, then you can loop it by lines.

answered Jul 26, 2017 at 12:43

JerryLong

791 silver badge9 bronze badges

Collectives™ on Stack Overflow

removing some part of a text file in python

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related