0

I have a DataFrame called 'mydata', and if I do

len(mydata.loc['2015-9-2'])

It counts the number of rows in mydata that have that date, and returns a number like

1067

I have another DataFrame called 'yourdata' which looks something like

     timestamp
51   2015-06-22
52   2015-06-23
53   2015-06-24
54   2015-06-25
43   2015-07-13

Now I want use each date in yourdata so instead of typing in each date

len(mydata.loc['2015-9-2'])

I can iterate through 'yourdata' using them like

len(mydata.loc[yourdata['timestamp']])

and produce a new DataFrame with the results or just add a new column to yourdata with the result for each date, but I'm lost as how to do this?

The following does not work

yourdata['result'] = len(mydata.loc[yourdata['timestamp']])

neither does this

yourdata['result'] = len(mydata.loc[yourdata.iloc[:,-3]])

this does work

yourdata['result'] = len(mydata.loc['2015-9-2'])

buts that no good as I want to use the date in each row not some fixed date.

Edit: first few rows of mydata

    timestamp            BPM
 0  2015-08-30 16:48:00   65
 1  2015-08-30 16:48:10   65
 2  2015-08-30 16:48:15   66
 3  2015-08-30 16:48:20   67
 4  2015-08-30 16:48:30   70

2 Answers 2

1
import numpy as np
import pandas as pd

mydata = pd.DataFrame({'timestamp': ['2015-06-22 16:48:00']*3 +
                                    ['2015-06-23 16:48:00']*2 +
                                    ['2015-06-24 16:48:00'] +
                                    ['2015-06-25 16:48:00']*4 +
                                    ['2015-07-13 16:48:00',
                                     '2015-08-13 16:48:00'],
                       'BPM': [65]*8 + [70]*4})
mydata['timestamp'] = pd.to_datetime(mydata['timestamp'])
print(mydata)

#     BPM           timestamp
# 0    65 2015-06-22 16:48:00
# 1    65 2015-06-22 16:48:00
# 2    65 2015-06-22 16:48:00
# 3    65 2015-06-23 16:48:00
# 4    65 2015-06-23 16:48:00
# 5    65 2015-06-24 16:48:00
# 6    65 2015-06-25 16:48:00
# 7    65 2015-06-25 16:48:00
# 8    70 2015-06-25 16:48:00
# 9    70 2015-06-25 16:48:00
# 10   70 2015-07-13 16:48:00
# 11   70 2015-08-13 16:48:00

yourdata = pd.Series(['2015-06-22', '2015-06-23', '2015-06-24',
                      '2015-06-25', '2015-07-13'], name='timestamp')
yourdata = pd.to_datetime(yourdata).to_frame()
print(yourdata)

# 0   2015-06-22
# 1   2015-06-23
# 2   2015-06-24
# 3   2015-06-25
# 4   2015-07-13

result = (mydata.set_index('timestamp').resample('D')
                .size().loc[yourdata['timestamp']]
                .reset_index())
result.columns = ['timestamp', 'result']
print(result)

#    timestamp  result
# 0 2015-06-22       3
# 1 2015-06-23       2
# 2 2015-06-24       1
# 3 2015-06-25       4
# 4 2015-07-13       1
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks but this gives the error: TypeError: 'str' object is not callable
@RunnerBean: try now.
no difference, I cant even see how youve change the code it looks the same as before
With the second one I get this error: ValueError: Cannot index with multidimensional key
for the first approach i get the error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects
|
0

I think you need value_counts, but first convert to dates by dt.date, convert to to_datetime and last use join:

print (yourdata.join(pd.to_datetime(mydata.timestamp.dt.date)
                       .value_counts()
                       .rename('len'), on='timestamp'))

Sample:

print (mydata)
             timestamp  BPM
0  2015-06-23 16:48:00   65
1  2015-06-23 16:48:10   65
2  2015-06-23 16:48:15   66
3  2015-06-23 16:48:20   67
4  2015-06-22 16:48:30   70

print (yourdata)
     timestamp
51  2015-06-22
52  2015-06-23
53  2015-06-24
54  2015-06-25
43  2015-07-13

#if dtype not datetime
mydata['timestamp'] = pd.to_datetime(mydata['timestamp'])
yourdata['timestamp'] = pd.to_datetime(yourdata['timestamp'])

print (yourdata.join(pd.to_datetime(mydata.timestamp.dt.date)
                       .value_counts()
                       .rename('len'), on='timestamp'))
    timestamp  len
51 2015-06-22  1.0
52 2015-06-23  4.0
53 2015-06-24  NaN
54 2015-06-25  NaN
43 2015-07-13  NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.