1

let's say I have a file that looks like this

text a

bla bla

1 2 3   
4 5 6

text b

bla

7 8 9
10 11 12

text c

bla bla bla

13 14 15
16 17 18

I am trying to extract only the number arrays and place them into a numpy array:

array([[ 1, 2, 3,
         4, 5, 6,],
       [ 7, 8, 9,
         10, 11, 12],
       [ 13, 14, 15,
         16, 17, 18]])

I tried using np.genfromtxt('test.txt',usecols=[0,1,2],invalid_raise=False)

array([[  1.,   2.,   3.],
       [  4.,   5.,   6.],
       [  7.,   8.,   9.],
       [ 10.,  11.,  12.],
       [ nan,  nan,  nan],
       [ 13.,  14.,  15.],
       [ 16.,  17.,  18.]])

but it doesn't create sub-arrays and converts the text into nans. Is there a better way of doing this?

6
  • Why is the 1 in the first line not included? Commented Apr 27, 2018 at 21:52
  • @chrisz: Because it's part of the text "text 1". I'm just interested in the number arrays after the "bla" Commented Apr 27, 2018 at 22:00
  • Read the lines as ordinary text, and pass the array lines to genfromtxt Commented Apr 27, 2018 at 22:17
  • or filter the bad rows out after parsing and reshape the rest. Commented Apr 27, 2018 at 22:54
  • @hpaulj: Thanks for your comment. Reading the text file with np.loadtxt raises an exception. Are you suggesting reading the file outside numpy? I don't understand the filtering out after parsing bit, could you maybe provide an answer to this post? Commented Apr 27, 2018 at 22:59

2 Answers 2

1

You could use itertools.groupby along the lines of

>>> import itertools
>>> import numpy as np
>>> 
>>> content = """text a
... 
... bla bla
... 
... 1 2 3   
... 4 5 6
... 
... text b
... 
... bla
... 
... 7 8 9
... 10 11 12
... 
... text c
... 
... bla bla bla
... 
... 13 14 15
... 16 17 18"""
>>> 
>>> import io
>>> filelike = io.StringIO(content)

# you may want to refine this test
>>> allowed_characters = set('0123456789 ')
>>> def isnumeric(line):
...     return set() < set(line.strip()) <= allowed_characters
... 
>>> [np.genfromtxt(gr) for k, gr in itertools.groupby(filelike, isnumeric) if k]
[array([[1., 2., 3.],
       [4., 5., 6.]]), array([[ 7.,  8.,  9.],
       [10., 11., 12.]]), array([[13., 14., 15.],
       [16., 17., 18.]])]
Sign up to request clarification or add additional context in comments.

Comments

0

You'll likely have to resort to a bit of "manual" parsing. Assuming a form like given here's one solution (there are surely others):

import numpy as np

def parser(fname):
    with open(fname) as fh:
        for i, line in enumerate(fh):
            p = i % 7
            if p not in (5, 6):
                continue
            yield line.rstrip()

a = ' '.join(parser(filename))
arr = np.fromstring(a, dtype=int, sep=' ')
arr = arr.reshape((-1, 6))
print(arr)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.