2

I have the following array of type <class 'numpy.ndarray'>

array([20181010, 20181031, 20181116, 20181012, 20181005, 20181008,
       20181130, 20181011, 20181005, 20181116])

How can I convert its constituents from the current type <class 'numpy.int64'> to datetime in numpy? I want to find a quick way and my understanding is that using a loop or list comprehension, as well as converting this numpy.array to pandas or to a list will be slower.

Please correct me if I am wrong.

P.S. This question may have been answered somewhere, but I could not find a single solution which works.

3
  • 2
    @Nixon, The answers there, though.. aren't ideal. Surely there's a better target? Commented Dec 10, 2018 at 17:44
  • 1
    I think @Sebastian's answer is superior to the duplicate, https://stackoverflow.com/questions/27103044/converting-datetime-string-to-datetime-in-numpy-python, even though both use pd.to_datetime`. Commented Dec 10, 2018 at 20:31
  • @hpaulj I tried the duplicate and I tried Sebastian's answer, and indeed I share your view as well. Hence the reason why I upvoted and marked as correct (and why I could not get a full answer from what was marked as duplicate when asking the question). Commented Dec 10, 2018 at 20:33

1 Answer 1

4

pandas has a better concept of what can be considered a date:

import numpy as np
import pandas as pd
arr = np.array([20181010, 20181031, 20181116, 20181012, 20181005, 
                20181008, 20181130, 20181011, 20181005, 20181116])
pd.to_datetime(arr.astype(str)).values

Running this over a set of 10,000,000 entries:

%%prun import numpy as np; import pandas as pd
lst = [20181010, 20181031, 20181116, 20181012, 20181005, 
       20181008, 20181130, 20181011, 20181005, 20181116]*1000000
arr = np.array(lst)
arr_str = arr.astype(str)
pd.to_datetime(arr_str).values

produces a prun of

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    8.977    8.977    8.977    8.977 {method 'astype' of 'numpy.ndarray' objects}
        1    4.394    4.394    4.394    4.394 {built-in method pandas._libs.tslib.array_to_datetime}
        2    2.344    1.172    2.344    1.172 {built-in method pandas._libs.algos.ensure_object}
        4    0.918    0.229    0.918    0.229 {built-in method numpy.core.multiarray.array}
        1    0.313    0.313    7.053    7.053 datetimes.py:106(to_datetime)
...

It's efficient enough.

Sign up to request clarification or add additional context in comments.

5 Comments

In terms of time, how does the conversion to pandas play out? Also, I want to have np.array as a final result. Will this be costly?
No, pandas is also vectorized like numpy; see my edit.
pandas does a lot of stuff with Python iterations. vectorized is a slippery term. To get speed the underlying code needs to be compiled, without a lot of repetative calls to Python classes and objects.
In the prun I don't see any function which is called 10,000,000 times as one would expect by creating 10,000,000 datetime objects. This leads me to believe that pandas does through some level of compilation.
This actually produces np.datetime[ns] dtype array. That may be what you want. But if you actually want datetime.date objects you'll need an added step: .astype('datetime64[D]').tolist()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.