How to extract specific rows from an input text file and print them in python?

Question

I have this text file containing transition lines of FeII emissions. The heads are: n_high, n_low, wavelength, intensity (where n_high and n_low are the upper and lower transitions, starting from

2 --> 1,,,371 --> 1,3 --> 2,,,371 --> 2,,, (and so on till the last chunk) 371 --> 370

The input file looks like:

#n_hi n_lo WL(A) logI
2   1   259811.86   1.158
3   1   149730.41   -2.054
4   1   115894.98   -2.134
5   1   102320.80   -2.389
6   1   53387.13    0.256
7   1   41138.69    -0.277
8   1   35226.70    -1.585
9   1   32068.36    -1.741
10  1   12566.77    2.323
.
.
.
.
369 1   1069.66 1.461
370 1   1065.75 -7.901
371 1   1065.64 -8.011
3   2   353390.47   0.759
4   2   209224.17   -2.390
5   2   168797.89   -2.607
.
.
.
370 369 291200.84   -10.337
371 369 283465.88   -10.436
371 370 10672868.00 -12.012

There are in total 68635 rows.

The task here is that I'd like to select only those specific transitions that are within the wavelength range, say [x1,x2] and print the entire row into another file.

So, what I have been able to do is sort of prepare an algorithm to do that:

for n_low from 1 to 370:
  for n_hi from n_low+1 to 371:
    if x2 <= wavelength <= x1:
      print this row to file
    else:
      exit

I'd like to execute this using python.

You'd be much better off loading all of this into pandas versus mucking with a csv reader. In pandas you can filter much more easily and consistently. — boot-scootin
– boot-scootin, Commented Jan 18, 2017 at 14:08

user783836 · Accepted Answer · 2017-01-18 14:43:37Z

3

if you want to use standard python, something like the function below should work (assuming the data is tab separated):

def filter_wavelength(x1, x2, input_path, output_path):
    with open(output_path, 'w') as output_file:
        with open(input_path) as input_file:
            for line in input_file:
                try:
                    tokens = line.split('\t')
                    wave_length = float(tokens[2])
                    if x1 <= wave_length <= x2:
                        output_file.write(line)
                except Exception, e:
                    print(str(e))

call it like so:

filter_wavelength(1,2,'path/to/input', 'path/to/output')

answered Jan 18, 2017 at 14:43

user783836

3,5997 gold badges32 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

furas · Accepted Answer · 2017-01-18 15:21:24Z

You can use powerfull pandas

I use io.StringIO to simulate file with data but you have to use filename instead of f

data = '''2   1   259811.86   1.158
3   1   149730.41   -2.054
4   1   115894.98   -2.134
5   1   102320.80   -2.389
6   1   53387.13    0.256
7   1   41138.69    -0.277
8   1   35226.70    -1.585
9   1   32068.36    -1.741
10  1   12566.77    2.323
369 1   1069.66 1.461
370 1   1065.75 -7.901
371 1   1065.64 -8.011
3   2   353390.47   0.759
4   2   209224.17   -2.390
5   2   168797.89   -2.607
370 369 291200.84   -10.337
371 369 283465.88   -10.436
371 370 10672868.00 -12.012'''

import pandas as pd

# simulate file
import io 
f = io.StringIO(data)

# use filename instead of `f` 
# it reads data from file using spaces as separators 
# and add headers 'n_hi','n_lo', 'WL(A)', 'logI'
df = pd.read_csv(f, names=['n_hi','n_lo', 'WL(A)', 'logI'], sep='\s+')

#print(df)

# get rows which have 1000 < WL < 25000
selected = df[ df['WL(A)'].between(1000, 25000) ] 
print(selected)

selected.to_csv('result.csv', sep=' ', header=False)

Seif · Accepted Answer · 2017-01-18 15:23:47Z

1

You don't need to care for n_hi and n_lo if your only concern is WL(A), try this:

def extract_wave_lengths(x1, x2, input_file, output_file):
    with open(input_file, 'r') as ifile, open(output_file, 'w') as ofile:
        next(ifile)  # Skip header
        for line in ifile:
            parts = line.split()
            wave_length = float(parts[2])
            if x2 <= wave_length <= x1:
                ofile.write(line)

You can then call it this way:

extract_wave_lengths(100000, 5000, "/path/to/input/file", "/path/to/output/file")

edited Jan 18, 2017 at 15:23

answered Jan 18, 2017 at 14:37

Seif

1,09711 silver badges19 bronze badges

1 Comment

furas Over a year ago

wave_length is text so you have to convert to float/int to compare w2 <= wave_length <= x1

Collectives™ on Stack Overflow

How to extract specific rows from an input text file and print them in python?

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related