3

I have a bigger dataframe than what I'm showing here but what I'm trying to do is wherever there is certain value in a series (or even better the whole datarame) to change that value to a None. I need these to be None so I can write the dataframe to a database and it will be recognised as null.

series = (['2014/06/05 13:03:56', '2014/07/23 13:03:56', None, '2014/08/21 13:03:56'])
data = pd.DataFrame(series)


0   2014/06/05 13:03:56
1   2014/07/23 13:03:56
2   None
3   2014/08/21 13:03:56

data = pd.to_datetime(data[0], coerce=True)

data
0   2014-06-05 13:03:56
1   2014-07-23 13:03:56
2                   NaT
3   2014-08-21 13:03:56
Name: 0, dtype: datetime64[ns]

data = data.map(str)

data
0    2014-06-05 13:03:56
1    2014-07-23 13:03:56
2                    NaT
3    2014-08-21 13:03:56
Name: 0, dtype: object

data.replace(to_replace='NaT', value=None)
0    2014-06-05 13:03:56
1    2014-07-23 13:03:56
2    2014-07-23 13:03:56
3    2014-08-21 13:03:56
Name: 0, dtype: object

In the above example, when I try to replace 'NaT' the dataframe actually fills the value with preceeding value and not None. This won't help as it needs to be None. In the actual dataframe I'm working with this usually throws up a type error telling me that I can't do replace None with method pad. I'm using a datetime series here but really I'll need this for more than just datetime series. It seems like it should be basic functionality with pandas but I can't find an answer.

Thanks, Colin

7
  • 2
    That NaT isn't a string, it's a special "not a time" value, similar to NaN for floats. So, just change to_replace='NaT' with to_replace=pd.NaT and it'll do what you're trying to do. But I don't think that what you're trying to do is what you actually want. Commented Apr 24, 2015 at 8:56
  • 1
    @abarnert that will convert the dtype to object and the datetimes will be converted to some int values: In [506]: df.replace(pd.NaT, 'None') ​ Out[506]: 0 1401973436000000000 1 1406120636000000000 2 None 3 1408626236000000000 Name: 0, dtype: object Commented Apr 24, 2015 at 8:57
  • 2
    @EdChum: That's why I said "I don't think what you're trying to do is what you actually want". Commented Apr 24, 2015 at 9:01
  • 2
    @EdChum: Also, your code replaces the NaT with the string 'None', which causes a different problem then replacing it with an actual None would. Not that either one is what he actually wants, but… Pandas tries to deal with None values in a semi-intelligent way; sometimes they get converted to NaN/NaT/0, sometimes they mean "repeat the last value", etc. But 'None' doesn't have any special meaning; it's just "some value that I don't know what to do with, I'd better switch to object". Commented Apr 24, 2015 at 9:07
  • 2
    @EdChum: I don't know if it's better or worse in general… but since it's exactly the behavior he got by accident and wants to know how to get rid of, I think I'd have to agree with you that it's worse in this user's case. :) Commented Apr 24, 2015 at 9:31

2 Answers 2

5

First, the reason your code isn't working is that those NaT values aren't the string 'NaT', they're the value pd.NaT. But since I don't think fixing that would actually give you what you want, let's ignore that for the moment.

A Pandas DataFrame, like the NumPy ndarray it's built on, is a compact array of typed, low-level values. That's what makes it small, fast, and type-safe. But that inherently means it can only store values of the specified type. And None is not a value of the datetime64[ns] type.

More specifically, a datetime64[ns] can only hold 64-bit integers that represent datetimes as nanoseconds since the epoch, and None is not a 64-bit integer.

Pandas does have a special value to deal with this, called NaT, for "Not a Time"; that's the closest thing to None that you can get in any kind of datetime field (just like the more familiar NaN for floats). And that's what you've already got.


Meanwhile, Pandas has special support for None in various different places that tries to do what you often want—storing NaN/NaT/0, or repeating the last value, or various other things. But when that's not what you want, that isn't much help.


If you actually need to have None, the only way to do that is to store normal boxed-up Python objects instead of typed low-level values, which you do by using dtype=object. Then you can stick any Python value into any element, including None, of course.

But doing that largely defeats the purpose of using Pandas and NumPy. You're probably better off using NaT, and changing the rest of your code—either to expect NaT where you're currently expecting None, or to wrap the DataFrame in something that converts NaT values to None during extraction or printout.


For completeness, if you want to get crazy, there's nothing stopping you from defining an optionaldatetime64[ns] dtype that's just like datetime64[ns] except that it uses the special value reserved for NaT to mean None instead. Or it could even reserve another special value, or a whole bit, to mean None, while leaving NaT alone. Either way, this would be a lot of work, and it would completely break any operations that depend on datetime arithmetic (d - NaT == NaT for any d, but d - None is a TypeError for any d…), and ultimately it's no better than the wrapper solution for any purpose I can think of…

Sign up to request clarification or add additional context in comments.

2 Comments

Guys, first of all thank you for taking some time over this and interpreting what I wanted from what I actually said. I do actually need None instead of the other signifiers of missing data. If I try pass NaT the database throws up an error. I'm using pyodbc (code.google.com/p/pyodbc/wiki/DataTypes). Fair to say I don't know enough about the underlying structure of the dataframe and the datatypes I'm working with and I appreciate the effort to spell it out for me here. I'm going to try and wrap the dataframe in something that converts NaT (& some others) to None as its passed
@ColinO'Brien: I don't know pyodbc (I haven't used ODBC in any language since last century…), but most Python database interfaces I've used provide a way to convert unknown data types at the database API side, which may be easier than doing it at the Pandas side.
0

In the end, this worked for my needs. I don't think map(str) was working in the code I gave for the question but it works below where I assign it to only one field instead of the whole dataframe.

def change_date_to_string(field):
    data[field] = data[field].map(str)
    data[field].loc[data[field] == 'NaT'] = None

change_date_to_string(field='usr_datetime')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.