Comparing floats in Python

Question

I am having trouble comparing floats within conditional statements in python. I have a dataset that looks like this:

         CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates  
0                        7        UNK              4  
1                        3     #NAME?           -243  
2                        1        UNK           -591  
3                        0  Exec Code            -93  
4                        0        UNK            -77

What I am trying to accomplish is to loop through the CVSS score(type float) and if it is in the range 0<=score<6 then I add a column to that row(Class Number) and make it equal to 1. If it is in the range 6<=score<7.5 then the class number will be 2, and if it is in the range 7.5<=score<10 then the class number will be 3. If done correctly this is what it should look like:

           CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates Class Number  
0                        7        UNK              4            1  
1                        3     #NAME?           -243            3  
2                        1        UNK           -591            2  
3                        0  Exec Code            -93            3  
4                        0        UNK            -77            3

My code right now looks like this:

data = pd.read_csv('tag_SA.txt', sep='|')
for score in data['CVSS Score']:
    if 0.0 < score < 6.0:
        data["Class Number"] = 1
    elif(6 <= score < 7.5):
        data["Class Number"] = 2
    else:
        data["Class Number"] = 3

and the output I am getting is this:

           CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates Class Number  
0                        7        UNK              4            3  
1                        3     #NAME?           -243            3  
2                        1        UNK           -591            3  
3                        0  Exec Code            -93            3  
4                        0        UNK            -77            3

So it is just going to the else statement and considering the other statements to be false. Is there something I am missing with float comparisons in python? Any help would be appreciated

You are not assigning your values when a condition is met correctly. You are overwriting every row when each condition is met. Look into using numpy.select — It_is_Chris
– It_is_Chris, Commented Sep 13, 2021 at 20:23
Try to assign the value via loc: data.loc[idx, "Class Number"] = 3 — Sprizgola
– Sprizgola, Commented Sep 13, 2021 at 20:24
This has nothing to do with comparing floats in Python. You really must learn the basics of the pandas / numpy API. You should start with the official user guide. You shouldn't even be approaching it this way to begin with. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Sep 13, 2021 at 20:30
data["Class Number"] = 3 assigns the value 3 to that column IN EVERY ROW. — Tim Roberts
– Tim Roberts, Commented Sep 13, 2021 at 20:43
@gmorton You are almost there. for score in data['CVSS Score']: should be for n, score in enumerate(data['CVSS Score']): if ...: data["Class Number"][n] = .... — Guimoute
– Guimoute, Commented Sep 13, 2021 at 21:09

MaxNoe · Accepted Answer · 2021-09-13 21:26:55Z

1

Your problem is not about comparing floats, it is that you are overwriting the whole column of the dataframe when you assign.

You need to set only those rows, where the condition is fulfilled, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html, you should probably also go over https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html.

Using what is documented there:

data = pd.read_csv('tag_SA.txt', sep='|')


data['Class Number'] = 3

mask = (0.0 < data['CVSS Score']) & (data['CVSS Score'] <= 6.0)
data.loc[mask, 'Class Number'] = 1

mask = (6.0 < data['CVSS Score']) & (data['CVSS Score'] <= 7.5)
data.loc[mask, 'Class Number'] = 2

You can also use pandas.cut like this:

max_val = data['CVSS Score'].max()
# codes start at 0, add 1 if needed
data['Class Number'] = pd.cut(data['CVSS Score'], [0, 6, 7.5, max_val]).codes + 1

answered Sep 13, 2021 at 21:26

MaxNoe

15.1k3 gold badges44 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

gmorton Over a year ago

Thank you for the helpful links, they are proving to be very beneficial. I appreciate your assistance.

Mike Sukmanowsky · Accepted Answer · 2021-09-13 21:23:55Z

0

Try using the apply method of a Series and assign the result to a new column named Class Number.

In your case, it'll look something like:

data = pd.DataFrame({'CVSS Score': [1, 2, .5, 6.2, 6.3, 9, 19, 6.1, 2, .5]})

def classify_cvss_score(score):
    if 0 < score < 6:
        return 1
    elif 6 <= score <= 7.5:
        return 2
   
    return 3

data['Class Number'] = data['CVSS Score'].apply(classify_cvss_score)

answered Sep 13, 2021 at 21:23

Mike Sukmanowsky

4,5273 gold badges29 silver badges33 bronze badges

3 Comments

Marius Wallraff Over a year ago

While correct and probably fast enough for your dataset, note that apply is pretty slow compared to using numpy functions on the whole column (vectorized computation).

Mike Sukmanowsky Over a year ago

Definitely true and good for anyone new to Pandas/Numpy to be aware of, though in this case, I'm not sure there's a choice? Can you do conditional logic like this via any built-in numpy function?

MaxNoe Over a year ago

Sure you can, but this is even simpler, as this is actually just np.digitize

Collectives™ on Stack Overflow

Comparing floats in Python

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related