Pandas/python and working with a column, in a dataframe, with a date

Question

I am currently working on a Python/Pandas data science project for fun. The data that I am looking at has a Date column where the date looks like the following: 2016-07-16. The data type is also an object. What I want to do is go through each date and pull data from across that row. Now, some rows may have the same date because two separate attacks occurred on that date. (I am looking at terrorism data.) What I currently have done is the following:

dates = []
start = 0;
while start < 300: 
    date = data.iat[start, 1]
    dates.append(date)
    start += 1

This will get me ALMOST what I want. However, I have two problems, the start variable is set to 0 but I cannot go to 365 since, like I said, each date may have multiple attacks. So one year may have like 400 attacks. Is there a way that I could end the data collection at 2016-12-31 or 2017-01-01 for example? Basically, is there a way to quickly determine the number of attacks, per year for year after year? Thank you for any help!

Oh I will say that I was trying something like:

newDate = pd.to_datetime(startdate) + pd.DateOffset(days=1)

or

data['Date']) + timedelta(days=1)

to add one to the date to end at the year. Not getting what I wanted plus, there could be more than one entry per day.

to explain further I could have something like this:

Date            Deaths     Country 
2002-01-01         2         India 
2002-01-02         0         Pakistan
2001-01-02         1         France

The data has about 20,000 points and I need to find a way to stop it at the end of each year. That is my main issue. I cannot go to 365 because there may be multiple terrorist attacks on the same date around the world.

It's not very clear what do you want to achieve...Could you post a sample reproducible input data set (5-7 rows) and a desired data set? — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Sep 19, 2016 at 19:45
I have tried to add a little bit more. However, just know that the data starts at 2002-01-01 and goes until 2016-07-23 or so. Thus, every time there is a terrorist attack the data records it. I know my explanations are not that good. I did make some progress but main issue is trying to get when the year ends since the data flows from one year to the next. — ravenUSMC
– ravenUSMC, Commented Sep 19, 2016 at 19:55
Assuming this is your question: "is there a way to quickly determine the number of attacks, per year for year after year?". I would try to create a column that is your date's year and then use built-in pandas functionality like groupby and count. — Leo
– Leo, Commented Sep 19, 2016 at 19:56
are you after selecting rows/data for the specific year(s) or do you want to calculate statistics like attacks per year (per year and country)? — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Sep 19, 2016 at 19:57
Leo-You have my question right and I think you are on the right track-I really like the idea of creating a column with simply the dates year...have to research more on that part to learn how to do that. I need a function that could take a date like 2002-01-01 and simply return 2002 in that new column. But you have me thinking now! Thank you. — ravenUSMC
– ravenUSMC, Commented Sep 19, 2016 at 20:00

MaxU - stand with Ukraine · Accepted Answer · 2016-09-19 21:15:53Z

1

IMO there is no need to add a new column:

In [132]: df
Out[132]:
        Date  Deaths   Country
0 2002-01-01       2     India
1 2002-01-02       0  Pakistan
2 2001-01-02       1    France

In [217]: df.groupby(df.Date.dt.year)['Deaths'].sum()
Out[217]:
Date
2001    1
2002    2
Name: Deaths, dtype: int64

or:

In [218]: df.groupby(pd.TimeGrouper(freq='AS', key='Date'))['Deaths'].sum()
Out[218]:
Date
2001-01-01    1
2002-01-01    2
Freq: AS-JAN, Name: Deaths, dtype: int64

In [219]: df.groupby(pd.TimeGrouper(freq='A', key='Date'))['Deaths'].sum()
Out[219]:
Date
2001-12-31    1
2002-12-31    2
Freq: A-DEC, Name: Deaths, dtype: int64

and you can always access different parts (year, month, day, weekday, hour, etc.) of your DateTime column:

In [137]: df.Date.dt.year
Out[137]:
0    2002
1    2002
2    2001
Name: Date, dtype: int64

In [138]: df.Date.dt.
df.Date.dt.ceil             df.Date.dt.freq             df.Date.dt.microsecond      df.Date.dt.strftime         df.Date.dt.weekday
df.Date.dt.date             df.Date.dt.hour             df.Date.dt.minute           df.Date.dt.time             df.Date.dt.weekday_name
df.Date.dt.day              df.Date.dt.is_month_end     df.Date.dt.month            df.Date.dt.to_period        df.Date.dt.weekofyear
df.Date.dt.dayofweek        df.Date.dt.is_month_start   df.Date.dt.nanosecond       df.Date.dt.to_pydatetime    df.Date.dt.year
df.Date.dt.dayofyear        df.Date.dt.is_quarter_end   df.Date.dt.normalize        df.Date.dt.tz
df.Date.dt.days_in_month    df.Date.dt.is_quarter_start df.Date.dt.quarter          df.Date.dt.tz_convert
df.Date.dt.daysinmonth      df.Date.dt.is_year_end      df.Date.dt.round            df.Date.dt.tz_localize
df.Date.dt.floor            df.Date.dt.is_year_start    df.Date.dt.second           df.Date.dt.week

edited Sep 19, 2016 at 21:15

answered Sep 19, 2016 at 20:01

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ravenUSMC Over a year ago

MaxU what does the dt.year do? I am just curious and when I use it I get the following error message: AttributeError: Can only use .dt accessor with datetimelike values. However, that key that you provided me is almost exactly what I am looking for! I just need to figure out this dt.year thing and that error message and I may have it! Thank you!

MaxU - stand with Ukraine Over a year ago

@MikeCuddy, i guess your Date column is of string (object) dtype, so you have first to convert it to datetime dtype: df.Date = pd.to_datetime(df.Date)

ravenUSMC Over a year ago

Yup, the one thing that I forget to do is convert! But that helped a lot and thank you for your help! Yeah, I think you really did get it for me and I learned how to work better with dates! Thank you!

MaxU - stand with Ukraine Over a year ago

@MikeCuddy, you are welcome! :) One hint for future questions: try always to post sample data set(s) and desired resulting data set when asking Pandas/NumPy questions - it makes it a lot easier for the SO community to understand what do you want to achieve... ;)

Yulia · Accepted Answer · 2016-09-19 21:02:04Z

0

Another way of dealing with the problem is through a dictionary

# Get column with the dates 
dates = df.iloc[:,0].values
year_attacks = {}
for date in dates:
    # Get year from the date
    year=str(date).split('-')[0]
    # If year is already in the dictionary increase number of attacks by 1
    if year in year_attacks:
       year_attacks[year]=year_attacks[year]+1
    # Else create new key
    else:
       year_attacks[year]=1

answered Sep 19, 2016 at 21:02

Yulia

416 bronze badges

Collectives™ on Stack Overflow

Pandas/python and working with a column, in a dataframe, with a date

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related