Calculating a new column in Python with Pandas and different Input and output types

Question

The 3rd and hopefully last type of these questions is as following. Based on this and this Question, how would I create calculate and calculate a new column in Pandas where the input is an integer and a range and the output a string?

This is my original definition as I had it in ArcPY:

def Gefaehrdestufe(staok_klasse, nFK):
    x = ""
    if staok_klasse == 1:
        if nFK in range(0, 36):
            x = "Geringes Risiko"
        elif nFK in range(36, 51):
            x = "Geringes Risiko"
        elif nFK in range (51, 66):
            x = "Geringes Risiko"
        elif nFK in range(66, 86):
            x = "Gering bis mäßig"
        elif nFK >= 86:
            x = "Mäßig"
    elif staok_klasse == 2:
        if nFK in range(0, 36):
            x = "Geringes Risiko"
        elif nFK in range(36, 51):
            x = u"Gering bis mäßig"
        elif nFK in range (51, 66):
            x = u"Gering bis mäßig"
        elif nFK in range(66, 86):
            x = u"Mäßig"
        elif nFK >= 86:
            x = u"Mäßig hoch"
    return x

I have tried with .apply():

df_joined["Gef_Stufe"] = df_joined["StaokKlass", "nFK"].apply(Gefaehrdestufe)

and with the method mentioned in one of my other questions:

st_1_nfk_0_36 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(0,36))
st_1_nfk_36_51 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(36, 51))
st_1_nfk_51_66 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(51, 66))
st_1_nfk_66_85 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(66, 85))
st_1_nfk_85_x = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] >= 86)      
df_joined.loc[st_1_nfk_0_36, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_36_51, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_51_66, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_66_85, "Gef_stufe"] = u"Gering bis mäßig"
df_joined.loc[st_1_nfk_85_x, "Gef_stufe"] = u"Mäßig"

also with this style:

st_1_nfk_0_36 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] > 0) & (df_joined["nFK_Proz"] < 36)

But none worked.

EDIT:

So I have updated my code to @EdChum 's suggestions but I keep getting this error: exceptions.TypeError: invalid type comparison. Now for testing purposes I have taken the first half of the condition out (df_joined["StaokKlass"] == "1") and the code runs through without error, however it does not give me the desired (or any) output. So the problem is definately in this part but I cannot figure out why. I have tried with and without brackets but every time the same error. df_joined.info() confirms that the column df_joined["Staokklass"] is an integer and nFK is a float.

st_1_nfk_0_36 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(0,36))))
st_1_nfk_36_51 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(36, 51))))
st_1_nfk_51_66 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(51, 66))))
st_1_nfk_66_85 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(66, 85))))
 & (df_joined["nFK_Proz"].isin(list(range(86,1000))))      

df_joined.loc[st_1_nfk_0_36, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_36_51, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_51_66, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_66_85, "Gef_stufe"] = u"Gering bis mäßig"
df_joined.loc[st_1_nfk_85_x, "Gef_stufe"] = u"Mäßig"

So question 1: How do I have to change the first condition to be accepted and question 2: I want Python to create me a new column df_joined[Gef_Stufe] that has the string declarations (preferably with the unicode characters).

One more thing: I would like the last condition to be like >= 86 instead of range(0, 1000), while that would do the job because the ranges will never be that high. But out of curiosity and learning purposes (and a clean code) I would like to know how I could accomplish that.

EDIT 2:

Here the output for df_joined.info() and df_joined.dtypes:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 63117 entries, 0 to 63116
Data columns (total 38 columns):
OBJECTID      63117 non-null int64
FORSTAMT      63117 non-null int64
REVIER        63117 non-null int64
ABTEILUNG     63117 non-null int64
LAND          63117 non-null object
VEG           63117 non-null int64
Ortsname      63117 non-null object
DWD_ID        63117 non-null object
ForstortID    63117 non-null object
nFK_staok     63117 non-null int64
Wald_Typ      63117 non-null object
Datum         63117 non-null datetime64[ns]
nFK           63117 non-null int64
NS            63117 non-null int64
NV            63117 non-null float64
NS_Prog_1     63117 non-null int64
NS_Prog_2     63117 non-null int64
NS_Prog_3     63117 non-null int64
FET           63117 non-null int64
NS_Cap        63117 non-null int64
NS_Cap_P1     63117 non-null int64
NS_Cap_P2     63117 non-null int64
NS_Cap_P3     63117 non-null int64
Monat         63117 non-null object
Saison        63117 non-null object
IVbest        63117 non-null float64
NVbest        63117 non-null float64
nFK_140       63117 non-null float64
NV_Prog_1     63117 non-null float64
NV_Prog_2     63117 non-null float64
NV_Prog_3     63117 non-null float64
IV_Prog_1     63117 non-null float64
IV_Prog_2     63117 non-null float64
IV_Prog_3     63117 non-null float64
nFK_Prog      63117 non-null float64
nFK_ges       63117 non-null float64
nFK_Proz      63117 non-null float64
StaokKlass    63117 non-null int64

dtypes: datetime64[ns](1), float64(13), int64(17), object(7)
memory usage: 17.1+ MB

The df_joined["StaokKlass"] column consists of numbers (integers) from 1 to 6 and is then divided into each ranges from 0 to 36, to 55 and so on (that is df_joined["Gef_stufe"]

in won't work with arrays use isin: df_joined["nFK_Proz"].isin(list(range(0,36))) — EdChum
– EdChum, Commented Sep 27, 2016 at 14:43
Gets me this error exceptions.TypeError: invalid type comparison. What about the parts > 86? Can those work or is there another way tell Python "86 and higher"? — Kai
– Kai, Commented Sep 27, 2016 at 14:56
that should work unless your dtype is not numeric what does df_joined.info() show? — EdChum
– EdChum, Commented Sep 27, 2016 at 14:59
Shows me this: nFK 63117 non-null int64 StaokKlass 63117 non-null int64 dtypes: datetime64[ns](1), float64(13), int64(17), object(7) memory usage: 17.1+ MB — Kai
– Kai, Commented Sep 27, 2016 at 15:03
I just found that the problem must be in the part ´(df_joined["StaokKlass"] == "1")´of my code. When I remove that code for test purposes the program runs through. — Kai
– Kai, Commented Sep 27, 2016 at 18:13

Kai · Accepted Answer · 2016-10-01 10:58:08Z

1

Found the solution! The problem was a wrong bracket in the first expression of the conditions.

I had st_1_nfk_0_36 = (df_joined["StaokKlass"]) == 1 & (df_joined["nFK_Proz"].between(0,36))

when it should have been

st_1_nfk_0_36 = (df_joined["StaokKlass"] == 1) & (df_joined["nFK_Proz"].between(0,36))

So now it works fine and I get my wanted output! Thanks so much! Hopefully the final column goes without problems :-D

answered Oct 1, 2016 at 10:58

Kai

3576 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2016-09-27 16:18:06Z

0

Option 1

data = {
    1: [
        (u'Geringes Risiko', 66),
        (u'Gering bis mäßig', 86),
        (u'Mäßig',),
    ],    
    2: [
        (u'Geringes Risiko', 51),
        (u'Gering bis mäßig', 66),
        (u'Mäßig', 86),
        (u'Mäßig hoch',),
    ],    
}

def Gefaehrdestufe(staok_klasse, nFK):
    for group in data[staok_klasse][:-1]:
        if nFK < group[1]:
            return group[0]
    return data[staok_klasse][-1][0]

answered Sep 27, 2016 at 16:18

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

Kai Over a year ago

Interesting solution, but for me as quite a Python beginner quite ununderstandable. I will look into it. Is that meant for Pandas?

Collectives™ on Stack Overflow

Calculating a new column in Python with Pandas and different Input and output types

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related