2

I am importing data using numpy.genfromtxt, and I would like to add a field of values derived from some of those within the dataset. As this is a structured array, it seems like the most simple, efficient way of adding a new column to the array is by using numpy.lib.recfunctions.append_fields(). I found a good description of this library HERE.

Is there a way of doing this without copying the array, perhaps by forcing genfromtxt to create an empty column to which I can append derived values?

2
  • 1
    the first parameter to genfromtxt can be a generator, within which, you can create an empty column on each line of your file while you're reading it in. Commented Apr 10, 2013 at 5:29
  • mtadd, i've just ran into this problem again, and I'm wondering if you could illustrate what you are referring to in an answer. thanks! Commented Apr 8, 2014 at 19:53

2 Answers 2

1

Here's a simple example using a generator to add a field to a data file using genfromtxt

Our example data file will be data.txt with the contents:

1,11,1.1
2,22,2.2
3,33,3.3

So

In [19]: np.genfromtxt('data.txt',delimiter=',')
Out[19]:
array([[  1. ,  11. ,   1.1],
       [  2. ,  22. ,   2.2],
       [  3. ,  33. ,   3.3]])

If we make a generator such as:

def genfield():
    for line in open('data.txt'):
        yield '0,' + line

which prepends a comma-delimited 0 to each line of the file, then:

In [22]: np.genfromtxt(genfield(),delimiter=',')
Out[22]:
array([[  0. ,   1. ,  11. ,   1.1],
       [  0. ,   2. ,  22. ,   2.2],
       [  0. ,   3. ,  33. ,   3.3]])

You can do the same thing with comprehensions as follows:

In [26]: np.genfromtxt(('0,'+line for line in open('data.txt')),delimiter=',')
Out[26]:
array([[  0. ,   1. ,  11. ,   1.1],
       [  0. ,   2. ,  22. ,   2.2],
       [  0. ,   3. ,  33. ,   3.3]])
Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant. If only genfromtxt could take a regex for the delimiter, it would now be a perfect tool for me.
1

I was trying to make genfromtxt read this:

11,12,13,14,15
21,22,
31,32,33,34,35
41,42,43,,45

using:

import numpy as np
print np.genfromtxt('tmp.txt',delimiter=',',filling_values='0')

but it did not work. I had to change the input adding commas to represent the empty columns:

11,12,13,14,15
21,22,,,
31,32,33,34,35
41,42,43,,45

then it worked, returning:

[[ 11.  12.  13.  14.  15.]
 [ 21.  22.   0.   0.   0.]
 [ 31.  32.  33.  34.  35.]
 [ 41.  42.  43.   0.  45.]]

3 Comments

Thanks Saullo. What I am actually looking for is to have an additional row, that does not exist in the data file that I am reading in.
@shootingstars to add additional rows you can use np.vstack((a, np.zeros((num_rows, a.shape[1]))))
My problem is that i call this with one of the fields being a datetime object, which prevents the stack and numpy.lib.recfuntions add_field from merging the arrays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.