1

I have a file with this form:

label1, value1, value2, value3,
label2, value1, value2, value3,
...

I want to read it using numpy loadtxt function so I can have each label with its values in an array, so the final result will be an array of arrays, each array of them include the label and an array of features like this:

array([[label1, [value1, value2, value3]],
       [label2, [value1, value2, value3]]])

I have tried the following but did not work:

c = StringIO(u"text.txt")
np.loadtxt(c,
   dtype={'samples': ('label', 'features'), 'formats': ('s9',np.float)},
   delimiter=',', skiprows=0)

any idea?

0

2 Answers 2

3

The most modern and versatile way to do that is to use pandas, whose parser have many more options and manage labels.

Suppose your file contains :

A,7,5,1
B,4,2,7

Then :

In [29]: import pandas as pd
In [30]: df=pd.read_csv('data.csv',sep=',',header=None,index_col=0)

In [31]: df
Out[31]: 
   1  2  3
0         
A  7  5  1
B  4  2  7

You can easily convert it in an struct array now :

In [32]: a=df.T.to_records(index=False)
Out[32]: 
rec.array([(7, 4), (5, 2), (1, 7)], 
          dtype=[('A', '<i8'), ('B', '<i8')])

In [33]: a['A']
Out[33]: array([7, 5, 1], dtype=int64)

With loadtext you will have to do a lot of low level operations manually.

Sign up to request clarification or add additional context in comments.

Comments

2

You are on the right tract with defining the dtype. You are just missing the field shape.

I'll demonstrate:

A 'text' file - a list of lines (bytes in Py3):

In [95]: txt=b"""label1, 12, 23.2, 232
   ....: label2, 23, 2324, 324
   ....: label3, 34, 123, 2141
   ....: label4, 0, 2, 3
   ....: """

In [96]: txt=txt.splitlines()

A dtype with 2 fields, one with strings, the other with floats (3 for 'field shape'):

In [98]: dt=np.dtype([('label','U10'),('values', 'float',(3))])

In [99]: data=np.genfromtxt(txt,delimiter=',',dtype=dt)

In [100]: data
Out[100]: 
array([('label1', [12.0, 23.2, 232.0]), ('label2', [23.0, 2324.0, 324.0]),
       ('label3', [34.0, 123.0, 2141.0]), ('label4', [0.0, 2.0, 3.0])], 
      dtype=[('label', '<U10'), ('values', '<f8', (3,))])

In [101]: data['label']
Out[101]: 
array(['label1', 'label2', 'label3', 'label4'], 
      dtype='<U10')

In [103]: data['values']
Out[103]: 
array([[  1.20000000e+01,   2.32000000e+01,   2.32000000e+02],
       [  2.30000000e+01,   2.32400000e+03,   3.24000000e+02],
       [  3.40000000e+01,   1.23000000e+02,   2.14100000e+03],
       [  0.00000000e+00,   2.00000000e+00,   3.00000000e+00]])

With this definition the numeric values can be accessed as a 2d array. Sub-arrays like this are under appreciated.

The dtype could be been specified with the dictionary syntax, but I'm more familiar with the list of tuples form.

Equivalent dtype specs:

np.dtype("U10, (3,)f")
np.dtype({'names':['label','values'], 'formats':['S10','(3,)f']})
np.genfromtxt(txt,delimiter=',',dtype='S10,(3,)f')

===============================

I think that this txt, if parsed with dtype=None would produce

In [30]: y
Out[30]: 
array([('label1', 12.0, 23.2, 232.0), ('label2', 23.0, 2324.0, 324.0),
       ('label3', 34.0, 123.0, 2141.0), ('label4', 0.0, 2.0, 3.0)], 
      dtype=[('f0', '<U10'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])

The could be converted to the subfield form with

y.view(dt)

This works as long as the underlying data representation (seen as a flat list of bytes) is compatible (here 10 unicode characters (40 bytes), and 3 floats, per record).

4 Comments

that's very useful for me but I get an error "size of tuple must match number of fields." my actual txt file is the same posted example however, there's a label and 22 other values, so my code was txt=StringIO(u"dataset.txt") dt=np.dtype([('label','U10'),('features', 'float',(22))]) data=np.genfromtxt(txt,delimiter=',',dtype=dt)
Yes, the total number fields, named or in sub-arrays, needs to match the number of columns in the file, or in your usecols parameter.
Yes, I noticed that, however, I don't know why it does not work and showed me that error ..
Try a dtype=None to see what sort of dtype it deduces from the data. That might help you correct your definition. Within limits you might even be able to translate from that dtype to yours with astype or view.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.