Efficient way of extracting data into a matrix or numpy array in python

Question

I would like to extract data from a txt file while removing the text present in the file using python.

I have a file, say ABC.txt as follows:

STEP = 1

22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
STEP = 2

22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
STEP = 3

22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000
22.530183726628522 0.0000000000000000

disregarding the 'STEP = ' and the following space, I want to store all the numeric data into a numpy array.

I tried the following script that worked :

import numpy as np

with open("ABC.txt", "r") as f:
    lines = f.readlines()
    

data =np.zeros([24,2])

kk=0

for ii in range(3):         
  
    for jj in range(10*ii+2, 10*ii+9+1):
    
        data[kk,:] = np.fromstring(lines[jj], dtype=float, sep=' ')
        kk=kk+1

Is there a more direct way of doing this operation ?

@MadPhysicist I think the two loops are iterating over the right line numbers. — bb1
– bb1, Commented Feb 17, 2021 at 3:43
A common way is read the file line by line. If the line has data, split and append to a list. np.array(alist, dtype=float) will convert the list of lists to a numeric array. The step lines can be ignored or used to start a new group. — hpaulj
– hpaulj, Commented Feb 17, 2021 at 3:55

bb1 · Accepted Answer · 2021-02-17 03:53:16Z

1

You can try this:

import re
with open("abc.txt") as f:
    s = f.read()

# get a list of all lines of the text file which start with a digit
lines =  re.findall(r"^\d.*", s, re.M)

# split every line at the space character and convert 
# the resulting substrings into floats 
numlist = [list(map(float, line.split())) for line in lines]

# convert the resulting list of lists of floats into a numpy array
np.array(numlist)

edited Feb 17, 2021 at 3:53

answered Feb 17, 2021 at 3:37

bb1

7,9232 gold badges11 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mechanician Over a year ago

Thank you ! but could you please explain what are we doing here ?

bb1 Over a year ago

I added comments to the code. Let me know if this is sufficient.

popeye · Accepted Answer · 2021-02-17 04:03:02Z

1

Alternatively, if you don't have access to external libraries and still want to perform this task. You can do the following:

with open("ABC.txt", "r") as f:
    lines = f.readlines()

arr = list()

for line in lines:
    if line[0].isdecimal(): # for every line see if it begins with a decimal number
        arr.append(line.split())

The above can also be done with list comprehensions as follows, both will give same results:

arr1 = [line.split() for line in lines if line[0].isdecimal()]

answered Feb 17, 2021 at 4:03

popeye

9361 gold badge11 silver badges29 bronze badges

Collectives™ on Stack Overflow

Efficient way of extracting data into a matrix or numpy array in python

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related