0

I have the following data in geo.dat

id  lon  lat inhab  name
 1   9.  45.   100  Ciriè
 2  10.  45.    60  Acquanegra

and I get it in a ndarray

import numpy as np
data = np.genfromtxt('geo.dat', dtype=None, names=True)

so far, so good, I have a data structure that I can address by column name

print(data['name'][1]) #>>> Acquanegra

Next step, and question — I have a function that takes in input two vectors of geographical coordinates (data['LON'] and data['LAT'] of course) and returns two arrays x and y of projected positions on a map (this works ok).

I can live with separate vectors x and y but I'd like to augment data with two new columns, data['x'] and data['y']. My naive attempt

data['x'], data['y'] = convert(data['LON'], data['LAT'])

raised a ValueError: no field of name x, teaching me that data has some traits of a dictionary but a dictionary is not.

Is it possible to do as I want? tia

Please consider that .hstack() doesn't work with structured arrays aka record arrays, most previous answers work only for homogeneous arrays (the exception is mentioned in below comment by Warren).


PS I'd prefer not to pandas.

4
  • See stackoverflow.com/questions/25427197/… Commented Jul 4, 2017 at 13:19
  • @WarrenWeckesser AH! I suspected that the key was to manipulate the .dtype of the array but I would had never devised all the steps involved... I've upvoted your answer of course. May I ask you if my title "structured array" is terminologically correct? If yes, I'd like to answer my question summarizing your answer and giving a link to it because I feel that the title of the question you answered is a bit generic. Commented Jul 4, 2017 at 13:28
  • But then you would be creating a duplicate question, and stackoverflow frowns on that. It would be better to edit the title of the other question. In fact, I'll do that right now... Commented Jul 4, 2017 at 13:31
  • The key point about hstack or other concatenate functions is that dtype fields are not an axis (even though there some similarities in data layout). reshape also doesn't work across that axis/field boundary. Commented Jul 4, 2017 at 16:45

1 Answer 1

5

You can use np.lib.recfunctions:

import numpy.lib.recfunctions as rfn

data = rfn.append_fields(data, ['x', 'y'], [x, y])
Sign up to request clarification or add additional context in comments.

4 Comments

Works. Sort of. The array returned by rfn.append_fields() is a collection of masked arrays, while previously both the columns of data and the arrays returned by transform() were non-masked arrays. Could you mention in your answer the origin and the (possible) implications of this unexpected nehaviour? —— Further, you may want to add your answer to the question mentioned by Warren in a comment to my question, because I feel that my question is going to be closed...
To comment on my comment, the optional argument usemask=False could take care of my issue...
rfn.append_fields performs the same kind of action as @Warren's link - make a new array of desired dtype and size and copy fields by name. It is more general in that it allows for missing data, can make masked_arrays and can make recarrays.
recfunctions are a bit buggy, e.g. stackoverflow.com/questions/42364725/…; stackoverflow.com/questions/44769632/…. They aren't heavily used, and interact with other array subclasses.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.