-1

I have a dataframe (reportingDatesDf) which the head looks like this:

           unique_stock_id reporting_type
date                                     
2015-01-28  BBG.MTAA.STM.S         2014:A
2015-01-28  BBG.MTAA.STM.S        2014:S2
2015-01-28  BBG.MTAA.STM.S        2014:Q4
2014-10-29  BBG.MTAA.STM.S        2014:C3
2014-10-29  BBG.MTAA.STM.S        2014:Q3

I am trying to reduce the dataframe to include entries that are only between 2 dates with the following line:

reportingDatesDf = reportingDatesDf[(reportingDatesDf.index >= startDate) and (reportingDatesDf.index <= endDate)]

The dataframe is created from a CSV using the following code:

def getReportingDatesData(rawStaticDataPath,startDate,endDate):
    pattern = 'ReportingDates'+ '.csv'
    staticPath = rawStaticDataPath
    
    with open(staticPath+pattern,'rt') as f:
        
         reportingDatesDf = pd.read_csv(f, 
                 header=None,
                 usecols=[0,1,2],
                 parse_dates=[1],
                 dayfirst=True,
                 index_col=[1],
                 names=['unique_stock_id','date','reporting_type'])       
         #print(reportingDatesDf.head())
         print('reportingDatesDf.index',reportingDatesDf)      
         reportingDatesDf = reportingDatesDf[(reportingDatesDf.index >= startDate) and (reportingDatesDf.index <= endDate)]

I however get the error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Why has this happened? I am using similar code elsewhere which works.

1
  • Even better than the answers below, you can just use slicing: reportingDatesDf.loc[startDate:endDate] (assuming the index is datetime dtype) Commented Dec 1, 2024 at 15:35

3 Answers 3

2

and doesn't broadcast. It can't, because it has to short-circuit, and there's no good answer for making the short-circuiting broadcast.

If you need to do an elementwise and, you should use & instead.

Sign up to request clarification or add additional context in comments.

Comments

0

It looks to me like in the line

reportingDatesDf = reportingDatesDf[(reportingDatesDf.index >= startDate) and (reportingDatesDf.index <= endDate)]

the variable

reportingDatesDf.index

is an array. Thus, saying

(reportingDatesDf.index >= startDate)

is ambiguous. You need to specify whether you are checking if all of the values in the array are greater than startDate, or if it contains any values greater than startDate. Editing your code to the following

reportingDatesDf = reportingDatesDf[any(reportingDatesDf.index >= startDate) and any(reportingDatesDf.index <= endDate)]

or

reportingDatesDf = reportingDatesDf[all(reportingDatesDf.index >= startDate) and all(reportingDatesDf.index <= endDate)]

should fix the issue.

1 Comment

Thanks Tim, I get a KeyError: True error when I implement the code, Does this mean I'm possibly missing some data somewhere?
0

Try changing this:

reportingDatesDf = reportingDatesDf[(reportingDatesDf.index >= startDate) and (reportingDatesDf.index <= endDate)]

to this:

reportingDatesDf = reportingDatesDf[(reportingDatesDf.index >= startDate) & (reportingDatesDf.index <= endDate)]

In other words, user the proper operator ;)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.