python, read '.dat' file with differents columns for each lines

Question

I need to extract some data from .dat file which I usually do with

import numpy as np
file = np.loadtxt('blablabla.dat')

Here my data are not separated by a specific delimiter but have predefined length (digits) and some lines don't have any values for some columns. Here an sample to be clear :

 3  0  36  0  0 0  0   0    0  0         99. 
-2  0   0  0  0 0  0   0    0  0         99. 
 2  0   0  0  0 0  0   0    0  0 .LA.0?.  3. 
 5  0   0  0  0 2  4   0    0  0 .SAS7?. 99. 
-5  0   0  0  0 0  0   0    0  0         99. 
99  0   0  0  0 0  0   0    0  0 .S..3*.  3.5

My little code above get the error :

# Convert each value according to its column and store
ValueError: Wrong number of columns at line 3

Does someone have an idea about how to collect this kind of data?

By the way I have the format of the file which is for the given example : I2 / I3 / I2 / I2 / I1 / I2 / I3 / I4 / I2 / A7 / F4.1 — V.Henne
– V.Henne, Commented Feb 29, 2016 at 12:09

David Wilkinson · Accepted Answer · 2016-02-29 12:07:38Z

1

numpy.genfromtxt seems to be what you want; it you can specify field widths for each column and treats missing data as NaNs.

For this case:

import numpy as np
data = np.genfromtxt('blablabla.dat',delimiter=[2,3,4,3,3,2,3,4,5,3,8,5])

If you want to keep information in the string part of the file, you could read twice and specify the usecols parameter:

import numpy as np
number_data = np.genfromtxt('blablabla.dat',delimiter=[2,3,4,3,3,2,3,4,5,3,8,5],\
                            usecols=(0,1,2,3,4,5,6,7,8,9,11))
string_data = np.genfromtxt('blablabla.dat',delimiter=[2,3,4,3,3,2,3,4,5,3,8,5],\
                            usecols=(10),dtype=str)

answered Feb 29, 2016 at 12:07

David Wilkinson

1514 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

volcano Over a year ago

I think it should be usecols=(10,)

V.Henne Over a year ago

I finally get my data extracted thanks for the help. My file is to big to use exactly the same method (the delimiter array is 375 long) but the usecols option helps !

volcano · Accepted Answer · 2016-02-29 12:07:21Z

0

What you essentially need is to get list of empty "columns" position that serve as delimiters That will get you started

In [108]: table = ''' 3  0  36  0  0 0  0   0    0  0         99. 
   .....: -2  0   0  0  0 0  0   0    0  0         99. 
   .....:  2  0   0  0  0 0  0   0    0  0 .LA.0?.  3. 
   .....:  5  0   0  0  0 2  4   0    0  0 .SAS7?. 99. 
   .....: -5  0   0  0  0 0  0   0    0  0         99. 
   .....: 99  0   0  0  0 0  0   0    0  0 .S..3*.  3.5'''.split('\n')

In [110]: max_row_len = max(len(row) for row in table)

In [117]: spaces = reduce(lambda res, row: res.intersection(idx for idx, c in enumerate(row) if c == ' '), table, set(range(max_row_len)))

This code builds set of character positions in the longest row - and reduce leaves only set of positions that have spaces in all rows

answered Feb 29, 2016 at 12:07

volcano

3,61223 silver badges28 bronze badges

1 Comment

V.Henne Over a year ago

Thanks Volcano i didn't use what you propose but your code does work if someone wants to understand more

Collectives™ on Stack Overflow

python, read '.dat' file with differents columns for each lines

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related