0

I am having trouble comparing floats within conditional statements in python. I have a dataset that looks like this:

         CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates  
0                        7        UNK              4  
1                        3     #NAME?           -243  
2                        1        UNK           -591  
3                        0  Exec Code            -93  
4                        0        UNK            -77 

What I am trying to accomplish is to loop through the CVSS score(type float) and if it is in the range 0<=score<6 then I add a column to that row(Class Number) and make it equal to 1. If it is in the range 6<=score<7.5 then the class number will be 2, and if it is in the range 7.5<=score<10 then the class number will be 3. If done correctly this is what it should look like:

           CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates Class Number  
0                        7        UNK              4            1  
1                        3     #NAME?           -243            3  
2                        1        UNK           -591            2  
3                        0  Exec Code            -93            3  
4                        0        UNK            -77            3 

My code right now looks like this:

data = pd.read_csv('tag_SA.txt', sep='|')
for score in data['CVSS Score']:
    if 0.0 < score < 6.0:
        data["Class Number"] = 1
    elif(6 <= score < 7.5):
        data["Class Number"] = 2
    else:
        data["Class Number"] = 3

and the output I am getting is this:

           CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates Class Number  
0                        7        UNK              4            3  
1                        3     #NAME?           -243            3  
2                        1        UNK           -591            3  
3                        0  Exec Code            -93            3  
4                        0        UNK            -77            3 

So it is just going to the else statement and considering the other statements to be false. Is there something I am missing with float comparisons in python? Any help would be appreciated

5
  • 2
    You are not assigning your values when a condition is met correctly. You are overwriting every row when each condition is met. Look into using numpy.select Commented Sep 13, 2021 at 20:23
  • 1
    Try to assign the value via loc: data.loc[idx, "Class Number"] = 3 Commented Sep 13, 2021 at 20:24
  • This has nothing to do with comparing floats in Python. You really must learn the basics of the pandas / numpy API. You should start with the official user guide. You shouldn't even be approaching it this way to begin with. Commented Sep 13, 2021 at 20:30
  • 2
    data["Class Number"] = 3 assigns the value 3 to that column IN EVERY ROW. Commented Sep 13, 2021 at 20:43
  • @gmorton You are almost there. for score in data['CVSS Score']: should be for n, score in enumerate(data['CVSS Score']): if ...: data["Class Number"][n] = .... Commented Sep 13, 2021 at 21:09

2 Answers 2

1

Your problem is not about comparing floats, it is that you are overwriting the whole column of the dataframe when you assign.

You need to set only those rows, where the condition is fulfilled, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html, you should probably also go over https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html.

Using what is documented there:

data = pd.read_csv('tag_SA.txt', sep='|')


data['Class Number'] = 3

mask = (0.0 < data['CVSS Score']) & (data['CVSS Score'] <= 6.0)
data.loc[mask, 'Class Number'] = 1

mask = (6.0 < data['CVSS Score']) & (data['CVSS Score'] <= 7.5)
data.loc[mask, 'Class Number'] = 2

You can also use pandas.cut like this:

max_val = data['CVSS Score'].max()
# codes start at 0, add 1 if needed
data['Class Number'] = pd.cut(data['CVSS Score'], [0, 6, 7.5, max_val]).codes + 1 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the helpful links, they are proving to be very beneficial. I appreciate your assistance.
0

Try using the apply method of a Series and assign the result to a new column named Class Number.

In your case, it'll look something like:

data = pd.DataFrame({'CVSS Score': [1, 2, .5, 6.2, 6.3, 9, 19, 6.1, 2, .5]})

def classify_cvss_score(score):
    if 0 < score < 6:
        return 1
    elif 6 <= score <= 7.5:
        return 2
   
    return 3

data['Class Number'] = data['CVSS Score'].apply(classify_cvss_score)

3 Comments

While correct and probably fast enough for your dataset, note that apply is pretty slow compared to using numpy functions on the whole column (vectorized computation).
Definitely true and good for anyone new to Pandas/Numpy to be aware of, though in this case, I'm not sure there's a choice? Can you do conditional logic like this via any built-in numpy function?
Sure you can, but this is even simpler, as this is actually just np.digitize

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.