1

I have a numpy array full of objects (dtype=object) of the cftime class.

In [1]: a
Out[1]: 
array([cftime.DatetimeNoLeap(2000, 1, 1, 11, 29, 59, 999996, 5, 1),
       cftime.DatetimeNoLeap(2000, 1, 2, 11, 29, 59, 999996, 6, 2),
       cftime.DatetimeNoLeap(2000, 1, 3, 11, 29, 59, 999996, 0, 3)],
      dtype=object)

In [2]: type(a[0])
Out[2]: cftime._cftime.DatetimeNoLeap

Each of these objects has an attribute month.

a[0].month
Out[66]: 1

I'd like to get a new numpy array with the same shape, but filled with this attribute for each of the elements of the original array. Something like b=a.month. But obviously this fails, as a is a numpy array without month attribute. How can I achieve this result?

PS: of course I could do this with a plain Python loop, but I'd like to follow a fully numpy approach:

b=np.zeros_like(a, dtype=int)
for i in range(a.size):
    b[i] = a[i].month
4
  • Not a numpy answer but short of that you should use a loop/list comprehension. You can create a list by saying list = [ele] * n , but the elements all reference the same memory space - modifying any of them will affect the others. Loop/list comprehension avoids this. Commented Jan 15, 2019 at 13:36
  • Why the object array instead of a list? It's not any faster or easier. Commented Jan 15, 2019 at 16:30
  • Not my choice. This is how I get the data from a preliminary call to the num2date function of the the cftime package. Commented Jan 16, 2019 at 9:46
  • cftime is written in cython (Python compiled to c (as much as possible)). So make sure you use its own functionality as much as possible. Commented Jan 16, 2019 at 17:06

2 Answers 2

4

You can use np.vectorize, in order to map a function to every element in the array. For this case you can define a custom lambda function to extract the month of each entry lambda x: x.month:

np.vectorize(lambda x: x.month)(a)
array([1, 1, 1])
Sign up to request clarification or add additional context in comments.

6 Comments

Using np.frompyfunc might be faster. vectorize uses it, but tends to be slower.
Thanks for your comment, will give it a look :-)
Did it help @Onturenio ? Don't forget to upvote/accept the answer if it did, thanks!
I tried it and it worked, but I also read about the fact that vectorize is pretty much a wrapper for a loop, so not really a numpy-performance approach. But I acknowledge that your solution works, so I'll accept it as solved. Still, I'll try to research other options that might be faster, perhaps the frompyfunc is the way to proceed.
Yes, that's right. I do not think this can be vectorized, after all numpy is not really a tool to work with datetime objects or similar. So you'll have to use something similar to a map in standart python, and in numpy you can either use vectorize or frompyfunc as @hpaulj suggestes
|
2

I don't have cftime installed, so will demonstrate with regular datetime objects.

First make an array of datetime objects - the lazy way using numpy's own datetime dtype:

In [599]: arr = np.arange('2000-01-11','2000-12-31',dtype='datetime64[D]')
In [600]: arr.shape
Out[600]: (355,)

Make an object dtype array from that:

In [601]: arrO = arr.astype(object)

and a list of datetimes as well:

In [602]: alist = arr.tolist()

Timing for regular list comprehension:

In [603]: timeit [d.month for d in alist]
20.1 µs ± 62.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

List comprehension on a object dtype array is usually a bit slower (but faster than a list comprehension on a regular array):

In [604]: timeit [d.month for d in arrO]
30.7 µs ± 266 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

frompyfunc - here it's slower; other times I've see it 2x faster than a list comprehension:

In [605]: timeit np.frompyfunc(lambda x: x.month, 1,1)(arrO)
51 µs ± 32.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

vectorize is (nearly) always slower than frompyfunc (even though it uses frompyfunc for the actual iteration):

In [606]: timeit np.vectorize(lambda x: x.month, otypes=[int])(arrO)
76.7 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Here are samples of the arrays and list:

In [607]: arr[:5]
Out[607]: 
array(['2000-01-11', '2000-01-12', '2000-01-13', '2000-01-14',
       '2000-01-15'], dtype='datetime64[D]')
In [608]: arrO[:5]
Out[608]: 
array([datetime.date(2000, 1, 11), datetime.date(2000, 1, 12),
       datetime.date(2000, 1, 13), datetime.date(2000, 1, 14),
       datetime.date(2000, 1, 15)], dtype=object)
In [609]: alist[:5]
Out[609]: 
[datetime.date(2000, 1, 11),
 datetime.date(2000, 1, 12),
 datetime.date(2000, 1, 13),
 datetime.date(2000, 1, 14),
 datetime.date(2000, 1, 15)]

frompyfunc and vectorize are best used when you want the generality of broadcasting and multidimensional arrays. For 1d arrays, a list comprehension is nearly always better.

To fairer to frompyfunc, I should return an array from the list comprehension:

In [610]: timeit np.array([d.month for d in arrO])
50.1 µs ± 36.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

To get the best speed with dates in numpy, use the datatime64 dtype instead of object dtype. This makes more use of compiled numpy code.

In [611]: timeit arr = np.arange('2000-01-11','2000-12-31',dtype='datetime64[D]'
     ...: )
3.16 µs ± 51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [616]: arr.astype('datetime64[M]')[::60]
Out[616]: 
array(['2000-01', '2000-03', '2000-05', '2000-07', '2000-09', '2000-11'],
      dtype='datetime64[M]')

1 Comment

Thanks for your comprehensive answer. Just what I needed as I am dealing with datetime64.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.