0

I am unable to find a method to take input of a given file in numpy matrix. I have tried the np.loadtxt() but was unable to get the data.

My file format is something like this: Number of col = 9 (except the first field all others are in float).

M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19

I have also tried taking input in a list and then trying to make it numpy matrix but it was also a failure.

3
  • 1
    A numpy matrix can only hold one type of data, not strings and floats both: stackoverflow.com/questions/6999617/… Commented Sep 9, 2016 at 14:27
  • @StefanS In that case is there a way to convert a 2D list of homogeneous type to numpy matrix? Commented Sep 9, 2016 at 14:35
  • 1
    What was wrong with the loadtxt? error, results you don't understand? What do you expect or want? How is the first column supposed to be handled? Commented Sep 9, 2016 at 16:26

2 Answers 2

1

You might want to consider using pandas instead - it's much better suited to homogeneous data, and its read_csv function will take your data file and convert it immediately to something you can work with.

You can give each column a name - if you don't do this, the function will interpret the first data row as column headings.

>>> import pandas as pd
>>> data = pd.read_csv("/tmp/data.txt",
                 names=['sex', 'one', 'two', 'three', 'four',
                        'five', 'six', 'seven', 'eight'])
>>> print(data)
  sex    one   two  three    four    five     six  seven  eight
0   M  0.475  0.37  0.125  0.5095  0.2165  0.1125  0.165      9
1   F  0.550  0.44  0.150  0.8945  0.3145  0.1510  0.320     19
Sign up to request clarification or add additional context in comments.

Comments

1

With your sample as a list of lines:

In [1]: txt=b"""
   ...: M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
   ...: F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19
   ...: """
In [2]: txt=txt.splitlines()

genfromtxt can load it with dtype=None:

In [16]: data = np.genfromtxt(txt, delimiter=',', dtype=None)
In [17]: data
Out[17]: 
array([(b'M', 0.475, 0.37, 0.125, 0.5095, 0.2165, 0.1125, 0.165, 9),
       (b'F', 0.55, 0.44, 0.15, 0.8945, 0.3145, 0.151, 0.32, 19)], 
      dtype=[('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8'), ('f7', '<f8'), ('f8', '<i4')])
In [18]: data['f0']
Out[18]: 
array([b'M', b'F'], 
      dtype='|S1')
In [19]: data['f3']
Out[19]: array([ 0.125,  0.15 ])
In [20]: 

The result is a 1d array (here 2 elements), with many fields, which are accessed by name. Here the first is deduced to be a string, the rest float, except the last integer.

I could be more specific about the dtype, and define a field with multiple columns

In [21]: data=np.genfromtxt(txt,delimiter=',',dtype=['S3','8float'])
In [22]: data
Out[22]: 
array([(b'M', [0.475, 0.37, 0.125, 0.5095, 0.2165, 0.1125, 0.165, 9.0]),
       (b'F', [0.55, 0.44, 0.15, 0.8945, 0.3145, 0.151, 0.32, 19.0])], 
      dtype=[('f0', 'S3'), ('f1', '<f8', (8,))])
In [23]: data['f1']
Out[23]: 
array([[  0.475 ,   0.37  ,   0.125 ,   0.5095,   0.2165,   0.1125,
          0.165 ,   9.    ],
       [  0.55  ,   0.44  ,   0.15  ,   0.8945,   0.3145,   0.151 ,
          0.32  ,  19.    ]])

The f1 field is a 2d array of shape (2,8).

np.loadtxt will also work, but it's dtype interpretation isn't as flexible. Copying the dtype from the genfromtxt example produces the same thing.

 datal=np.loadtxt(txt,delimiter=',',dtype=data.dtype)

pandas also has a good csv reader, with more speed and flexibility. It's a good choice if you are already working with pandas.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.