1

Say we have this data:

list1, list2, list3 = [1,2,3,4], [1990, 1990, 1990, 1991], [2009, 2009, 2009, 2009]
df = pd.DataFrame(list(zip(list1, list2, list3)), columns = ['Index', 'Y0', 'Y1'])

> df

Index  Y0          Y1
1      1990        2009
2      1990        2009
3      1990        2009
4      1991        2009

I want to count, for each year, how many rows ("index") fall within each year, but excluding the Y0.

So say we start at the first available year, 1990:

How many rows do we count? 0.

1991:

  • Three (row 1, 2, 3)

1992:

  • Four (row 1, 2, 3, 4)

...

2009:

  • Four (row 1, 2, 3, 4)

So I want to end up with a dataframe that says:

Count  Year
0      1990     
3      1991     
4.     1992
...    ...    
4      2009     

My attempt:

df['Y0'] = pd.to_datetime(df['Y0'], format='%Y')
df['Y1'] = pd.to_datetime(df['Y1'], format='%Y')

# Group by the interval between Y0 and Y1 
df = d.groupby([d['Y0'].dt.year, d['Y1'].dt.year]).agg({'count'})
df.columns = ['count', 'Y0 count', 'Y1 count']

# sum the total
df_sum = pd.DataFrame(df.groupby(df.index)['count'].sum())

But the result doesn't look right.

Appreciate any help.

2 Answers 2

1

you could do:

min_year = df[['Y0', 'Y1']].values.min()
max_year = df[['Y0', 'Y1']].values.max()
year_range = np.arange(min_year, max_year+1)
counts = ((df[['Y0']].values < year_range) & (year_range<= df[['Y1']].values)).sum(axis=0)
o = pd.DataFrame({"counts": counts, 'year': year_range})
counts  year
0   0   1990
1   3   1991
2   4   1992
3   4   1993
4   4   1994
5   4   1995
6   4   1996
7   4   1997
8   4   1998
9   4   1999
10  4   2000
11  4   2001
12  4   2002
13  4   2003
14  4   2004
15  4   2005
16  4   2006
17  4   2007
18  4   2008
19  4   2009
Sign up to request clarification or add additional context in comments.

Comments

0

The following should do your job:

counts=[]
years=[]

def count_in_interval(year):
    n=0
    for i in range(len(df)):
        if df['Y0'][i]<year<=df['Y1'][i]:
            n+=1
    return n

for i in range(1990, 2010):
    counts.append(count_in_interval(i))
    years.append(i)

result=pd.DataFrame(zip(counts, years), columns=['Count', 'Year'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.