0

I am writing a quick and dirty data sanitation script and I need to check that the data columns have the correct relative ranking.

The dataframe looks like this:

dt                        op     hi      lo        cl        vol           adj       prev                                                        
1986-01-02  1986-01-02  177.00  177.00  177.0000  177.00   75.8732         0.0         NaN
1986-01-03  1986-01-03  176.00  176.00  176.0000  176.00   75.4447         0.0  1986-01-02
1986-01-06  1986-01-06  172.00  172.00  172.0000  172.00   73.7299         0.0  1986-01-03
1986-01-07  1986-01-07  167.00  167.00  167.0000  167.00   71.5868         0.0  1986-01-06
1986-01-09  1986-01-09  168.00  168.00  168.0000  168.00   72.0153         0.0  1986-01-07
...                ...     ...     ...       ...     ...       ...         ...         ...
2020-09-14  2020-09-14  102.20  105.60  101.6500  104.70  104.7000   9720916.0  2020-09-11
2020-09-15  2020-09-15  106.45  110.70  106.4500  109.25  109.2500  15923105.0  2020-09-14
2020-09-16  2020-09-16  107.95  112.55  107.9500  112.10  112.1000  15399144.0  2020-09-15
2020-09-17  2020-09-17  110.40  112.85  110.0500  112.00  112.0000   6737225.0  2020-09-16
2020-09-18  2020-09-18  111.50  111.75  109.3923  110.75  110.7500  25308704.0  2020-09-17

I want to create a mask like this:

mask = df[(df.hi >= df.op) & (df.hi >= df.lo) & (df.hi >= df.cl) & (df.lo <= df.op) & (df.lo <= df.cl)]

However, when I try to select from df using df[mask], I get the error message:

ValueError: Boolean array expected for the condition, not object

This is what I want to do:

  1. Set boolean flag which is the result of the test above
  2. Convert boolean to int (0,1)
  3. Sum the column of the ints to see if it is a non zero number

How do I set the flag in a column in the dataframe based on my test condition?

3
  • arent you already creating the data that you need since you have df before all those conditions? Commented Dec 7, 2020 at 19:02
  • you are doing dataset[dataset] Commented Dec 7, 2020 at 19:03
  • df[df] ValueError: Must pass DataFrame with boolean values only I might be wrong but thats what i see Commented Dec 7, 2020 at 19:03

1 Answer 1

2

The mask should be:

mask = (df.hi >= df.op) & (df.hi >= df.lo) & (df.hi >= df.cl) & (df.lo <= df.op) & (df.lo <= df.cl)

Insert that into the df with df[mask]

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.