1

I am trying to set the dtype on two of my columns, but it is not working. I want to set [trans_typ] to 'category' and [date] to date.time. There is also an index [date] that I have already set to date.time but I want to set the first column to date.time as well.

import numpy as np
import pandas as pd
import glob

df = pd.read_csv('/home/jayaramdas/anaconda3/cf_data', low_memory=False, \
                 parse_dates = True) 
df.set_index(pd.to_datetime(df['date']), inplace=True)                 


df['trans_typ'].astype('category')

pd.to_datetime(df['date'])
df.dtypes

My output

date          object
cmte_id       object
trans_typ     object
amount       float64
fec_id        object
cand_id       object
dtype: object

This is my data output from a print (df)

             date    cmte_id trans_typ  amount     fec_id    cand_id
date                                                                     
2007-08-15  2007-08-15  C00112250       24K    2000  C00431569  P00003392
2007-09-26  2007-09-26  C00119040       24K    1000  C00367680  H2FL05127
2007-09-26  2007-09-26  C00119040       24K    1000  C00140715  H2MD05155
2007-07-20  2007-07-20  C00346296       24K    1000  C00434571  H8CA37137

2 Answers 2

1

You can use:

#if you need copy of column date to index
df.set_index(df['date'], inplace=True) 
print df
                 date    cmte_id trans_typ entity_typ state  employer  \
date                                                                    
2007-08-15 2007-08-15  C00112250       24K        ORG    DC       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    FL       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    MD       NaN   
2011-02-25 2011-02-25  C00478404       24K        COM    MN       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-22 2011-02-22  C00140855       24K        CCM    MD       NaN   
2011-02-28 2011-02-28  C00093963       24K        CCM    ND       NaN   

            occupation  amount     fec_id    cand_id  
date                                                  
2007-08-15         NaN    2000  C00431569  P00003392  
2007-09-26         NaN    1000  C00367680  H2FL05127  
2007-09-26         NaN    1000  C00140715  H2MD05155  
2011-02-25         NaN    2400  C00326629  H8MN06047  
2011-02-01         NaN    1000  C00373464  H2OH17109  
2011-02-01         NaN    1000  C00289983  H4KY01040  
2011-02-22         NaN    2500  C00140715  H2MD05155  
2011-02-28         NaN    1000  C00474619  H0ND00135 


#convert column trans_typ to category
#column date is datetime, no converted
df['trans_typ'] = df['trans_typ'].astype('category')
print df
                 date    cmte_id trans_typ entity_typ state  employer  \
date                                                                    
2007-08-15 2007-08-15  C00112250       24K        ORG    DC       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    FL       NaN   
2007-09-26 2007-09-26  C00119040       24K        CCM    MD       NaN   
2011-02-25 2011-02-25  C00478404       24K        COM    MN       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-01 2011-02-01  C00140855       24K        CCM    DC       NaN   
2011-02-22 2011-02-22  C00140855       24K        CCM    MD       NaN   
2011-02-28 2011-02-28  C00093963       24K        CCM    ND       NaN   

            occupation  amount     fec_id    cand_id  
date                                                  
2007-08-15         NaN    2000  C00431569  P00003392  
2007-09-26         NaN    1000  C00367680  H2FL05127  
2007-09-26         NaN    1000  C00140715  H2MD05155  
2011-02-25         NaN    2400  C00326629  H8MN06047  
2011-02-01         NaN    1000  C00373464  H2OH17109  
2011-02-01         NaN    1000  C00289983  H4KY01040  
2011-02-22         NaN    2500  C00140715  H2MD05155  
2011-02-28         NaN    1000  C00474619  H0ND00135
print df.dtypes
date          datetime64[ns]
cmte_id               object
trans_typ           category
entity_typ            object
state                 object
employer             float64
occupation           float64
amount                 int64
fec_id                object
cand_id               object
dtype: object

Or:

#if you DONT need copy of column date to index
df.set_index('date', inplace=True) 
print df
              cmte_id trans_typ entity_typ state  employer  occupation  \
date                                                                     
2007-08-15  C00112250       24K        ORG    DC       NaN         NaN   
2007-09-26  C00119040       24K        CCM    FL       NaN         NaN   
2007-09-26  C00119040       24K        CCM    MD       NaN         NaN   
2011-02-25  C00478404       24K        COM    MN       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-22  C00140855       24K        CCM    MD       NaN         NaN   
2011-02-28  C00093963       24K        CCM    ND       NaN         NaN   

            amount     fec_id    cand_id  
date                                      
2007-08-15    2000  C00431569  P00003392  
2007-09-26    1000  C00367680  H2FL05127  
2007-09-26    1000  C00140715  H2MD05155  
2011-02-25    2400  C00326629  H8MN06047  
2011-02-01    1000  C00373464  H2OH17109  
2011-02-01    1000  C00289983  H4KY01040  
2011-02-22    2500  C00140715  H2MD05155  
2011-02-28    1000  C00474619  H0ND00135 
df['trans_typ'] = df['trans_typ'].astype('category')
print df
              cmte_id trans_typ entity_typ state  employer  occupation  \
date                                                                     
2007-08-15  C00112250       24K        ORG    DC       NaN         NaN   
2007-09-26  C00119040       24K        CCM    FL       NaN         NaN   
2007-09-26  C00119040       24K        CCM    MD       NaN         NaN   
2011-02-25  C00478404       24K        COM    MN       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-01  C00140855       24K        CCM    DC       NaN         NaN   
2011-02-22  C00140855       24K        CCM    MD       NaN         NaN   
2011-02-28  C00093963       24K        CCM    ND       NaN         NaN   

            amount     fec_id    cand_id  
date                                      
2007-08-15    2000  C00431569  P00003392  
2007-09-26    1000  C00367680  H2FL05127  
2007-09-26    1000  C00140715  H2MD05155  
2011-02-25    2400  C00326629  H8MN06047  
2011-02-01    1000  C00373464  H2OH17109  
2011-02-01    1000  C00289983  H4KY01040  
2011-02-22    2500  C00140715  H2MD05155  
2011-02-28    1000  C00474619  H0ND00135 

print df.dtypes
cmte_id         object
trans_typ     category
entity_typ      object
state           object
employer       float64
occupation     float64
amount           int64
fec_id          object
cand_id         object
dtype: object

print df.index
DatetimeIndex(['2007-08-15', '2007-09-26', '2007-09-26', '2011-02-25',
               '2011-02-01', '2011-02-01', '2011-02-22', '2011-02-28'],
              dtype='datetime64[ns]', name=u'date', freq=None)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, this is helpful. Unfortunately I still have the problem mentioned at (stackoverflow.com/questions/35939558/…). If you can shine a light on that it would be great!
Ok, give me a time, I try it.
0

I just used df['date'] = df['date'].astype('datetime64') and it works!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.