The 3rd and hopefully last type of these questions is as following. Based on this and this Question, how would I create calculate and calculate a new column in Pandas where the input is an integer and a range and the output a string?
This is my original definition as I had it in ArcPY:
def Gefaehrdestufe(staok_klasse, nFK):
x = ""
if staok_klasse == 1:
if nFK in range(0, 36):
x = "Geringes Risiko"
elif nFK in range(36, 51):
x = "Geringes Risiko"
elif nFK in range (51, 66):
x = "Geringes Risiko"
elif nFK in range(66, 86):
x = "Gering bis mäßig"
elif nFK >= 86:
x = "Mäßig"
elif staok_klasse == 2:
if nFK in range(0, 36):
x = "Geringes Risiko"
elif nFK in range(36, 51):
x = u"Gering bis mäßig"
elif nFK in range (51, 66):
x = u"Gering bis mäßig"
elif nFK in range(66, 86):
x = u"Mäßig"
elif nFK >= 86:
x = u"Mäßig hoch"
return x
I have tried with .apply():
df_joined["Gef_Stufe"] = df_joined["StaokKlass", "nFK"].apply(Gefaehrdestufe)
and with the method mentioned in one of my other questions:
st_1_nfk_0_36 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(0,36))
st_1_nfk_36_51 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(36, 51))
st_1_nfk_51_66 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(51, 66))
st_1_nfk_66_85 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] in range(66, 85))
st_1_nfk_85_x = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] >= 86)
df_joined.loc[st_1_nfk_0_36, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_36_51, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_51_66, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_66_85, "Gef_stufe"] = u"Gering bis mäßig"
df_joined.loc[st_1_nfk_85_x, "Gef_stufe"] = u"Mäßig"
also with this style:
st_1_nfk_0_36 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"] > 0) & (df_joined["nFK_Proz"] < 36)
But none worked.
EDIT:
So I have updated my code to @EdChum 's suggestions but I keep getting this error: exceptions.TypeError: invalid type comparison.
Now for testing purposes I have taken the first half of the condition out (df_joined["StaokKlass"] == "1") and the code runs through without error, however it does not give me the desired (or any) output. So the problem is definately in this part but I cannot figure out why. I have tried with and without brackets but every time the same error.
df_joined.info() confirms that the column df_joined["Staokklass"] is an integer and nFK is a float.
st_1_nfk_0_36 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(0,36))))
st_1_nfk_36_51 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(36, 51))))
st_1_nfk_51_66 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(51, 66))))
st_1_nfk_66_85 = (df_joined["StaokKlass"] == "1") & (df_joined["nFK_Proz"].isin(list(range(66, 85))))
& (df_joined["nFK_Proz"].isin(list(range(86,1000))))
df_joined.loc[st_1_nfk_0_36, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_36_51, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_51_66, "Gef_stufe"] = "Geringes Risiko"
df_joined.loc[st_1_nfk_66_85, "Gef_stufe"] = u"Gering bis mäßig"
df_joined.loc[st_1_nfk_85_x, "Gef_stufe"] = u"Mäßig"
So question 1: How do I have to change the first condition to be accepted and
question 2: I want Python to create me a new column df_joined[Gef_Stufe] that has the string declarations (preferably with the unicode characters).
One more thing: I would like the last condition to be like >= 86 instead of range(0, 1000), while that would do the job because the ranges will never be that high. But out of curiosity and learning purposes (and a clean code) I would like to know how I could accomplish that.
EDIT 2:
Here the output for df_joined.info() and df_joined.dtypes:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 63117 entries, 0 to 63116
Data columns (total 38 columns):
OBJECTID 63117 non-null int64
FORSTAMT 63117 non-null int64
REVIER 63117 non-null int64
ABTEILUNG 63117 non-null int64
LAND 63117 non-null object
VEG 63117 non-null int64
Ortsname 63117 non-null object
DWD_ID 63117 non-null object
ForstortID 63117 non-null object
nFK_staok 63117 non-null int64
Wald_Typ 63117 non-null object
Datum 63117 non-null datetime64[ns]
nFK 63117 non-null int64
NS 63117 non-null int64
NV 63117 non-null float64
NS_Prog_1 63117 non-null int64
NS_Prog_2 63117 non-null int64
NS_Prog_3 63117 non-null int64
FET 63117 non-null int64
NS_Cap 63117 non-null int64
NS_Cap_P1 63117 non-null int64
NS_Cap_P2 63117 non-null int64
NS_Cap_P3 63117 non-null int64
Monat 63117 non-null object
Saison 63117 non-null object
IVbest 63117 non-null float64
NVbest 63117 non-null float64
nFK_140 63117 non-null float64
NV_Prog_1 63117 non-null float64
NV_Prog_2 63117 non-null float64
NV_Prog_3 63117 non-null float64
IV_Prog_1 63117 non-null float64
IV_Prog_2 63117 non-null float64
IV_Prog_3 63117 non-null float64
nFK_Prog 63117 non-null float64
nFK_ges 63117 non-null float64
nFK_Proz 63117 non-null float64
StaokKlass 63117 non-null int64
dtypes: datetime64[ns](1), float64(13), int64(17), object(7)
memory usage: 17.1+ MB
The df_joined["StaokKlass"] column consists of numbers (integers) from 1 to 6 and is then divided into each ranges from 0 to 36, to 55 and so on (that is df_joined["Gef_stufe"]
inwon't work with arrays useisin:df_joined["nFK_Proz"].isin(list(range(0,36)))exceptions.TypeError: invalid type comparison. What about the parts> 86? Can those work or is there another way tell Python "86 and higher"?df_joined.info()show?nFK 63117 non-null int64 StaokKlass 63117 non-null int64 dtypes: datetime64[ns](1), float64(13), int64(17), object(7) memory usage: 17.1+ MB