Nested if using lambda in Python

Question

I have a dataset like this:

Build_year Max_cnt_year   b1920  b1945 b1975 b1995 
NaN        120            120    35    45    70    
0          67             35     67    21    34    
1921       145            39     67    22    145   
...

Desired output:

Build_year Max_cnt_year   b1920  b1945 b1975 b1995 year_build1
NaN        120            120    35    45    70    1920
0          67             35     67    21    34    1945
1921       145            39     67    22    145   1921
...

I want to compare the max_cnt_year against the values of b1920, b1945, b1975, b1995 and want to assign the values accordingly if it matches to that year ,conditional on Build_year>1500

I am trying this code unsuccessfully:

    def mapper(item):
    max_val = df_all['max_cnt_year']
    comp_val=df_all['build_year']
    for comp in comp_val:
         if comp<1500 or comp is None: 
             if max_val==df_all['b1920']:
                 return 1920
             elif max_val==df_all['b1945']:
                 return 1945 
             elif max_val==df_all['b1970']:
                 return 1970 
             elif max_val==df_all['b1995']:
                 return 1995 
             else: return 2005
         else: return comp_val

df_all['build_year1'] = map(mapper, df_all)

I have modified the data a bit, to replicate the problem. Actual dataset looks like:

  max_cnt_year  build_year  build_count_before_1920  build_count_1921-1945  \
0         246.0         NaN                      1.0                    0.0   
1         304.0         NaN                      0.0                    0.0   
2         108.0         NaN                      0.0                   52.0   
3         278.0         NaN                     23.0                  181.0   
4          86.0         1945                    14.0                   45.0   

   build_count_1946-1970  build_count_1971-1995  
0                  246.0                   63.0  
1                  304.0                   21.0  
2                   44.0                  108.0  
3                  278.0                  131.0  
4                   86.0                    8.0

Why doesn't your lambda use its x argument? But why do you want a huge lambda like that? Why not just write a proper def function? — PM 2Ring
– PM 2Ring, Commented Apr 29, 2017 at 11:37
Create a function, def assign_year(): #logic for assignment of value to year_build1 variable return year_build1 — Gustav Rasmussen
– Gustav Rasmussen, Commented Apr 29, 2017 at 11:45

Zohaib Ijaz · Accepted Answer · 2017-04-29 12:04:40Z

1

You can create a function and then pass it as mapper.

def mapper(item):
    max_val = df_all['max_cnt_year']
    years = ['1920', '1945', '1975', '1995']
    for year in years:
         if max_val == df_all['b' + year]:
             return year

And then you can pass this function in your map function

df_all['build_year1'] = map(mapper, df_all)

answered Apr 29, 2017 at 12:04

Zohaib Ijaz

23k7 gold badges43 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user2542275 Over a year ago

where does it assign the values build_year to build_year1, when build_year>1500?

Zohaib Ijaz Over a year ago

@user2542275 We are here to help you not to give a ready made solution. idea is to give you a hint.

user2542275 Over a year ago

thnx, on your lines, I updated my code, but can you please point out the possible error in that? It is creating some junk values

Zohaib Ijaz Over a year ago

can you paste here your data, not in tabular form but what it looks like in python

user2542275 Over a year ago

added the actual data in problem statement

|

MaxU - stand with Ukraine · Accepted Answer · 2017-04-29 13:00:46Z

0

Here is my ugly Pandas solution - it will parse a year from the column name:

DF

In [57]: df
Out[57]:
   Build_year  Max_cnt_year  b1920  b1945  b1975  b1995
0         NaN           120    120     35     45     70
1         0.0            67     35     67     21     34
2      1921.0           145     39     67     22    145

Solution:

df['year_build1'] = np.where(df['Build_year'] > 1500, df['Build_year'], -1)

df.loc[df['year_build1']==-1, 'year_build1'] = \
    df.loc[df['year_build1']==-1] \
      .apply(lambda x: x.loc['b':].eq(x['Max_cnt_year']).idxmax().replace('b',''),
             axis=1)

df['year_build1'] = df['year_build1'].astype(int)

Result:

In [156]: df
Out[156]:
   Build_year  Max_cnt_year  b1920  b1945  b1975  b1995  year_build1
0         NaN           120    120     35     45     70         1920
1         0.0            67     35     67     21     34         1945
2      1921.0           145     39     67     22    145         1921

In [157]: df.dtypes
Out[157]:
Build_year      float64
Max_cnt_year      int64
b1920             int64
b1945             int64
b1975             int64
b1995             int64
year_build1       int32
dtype: object

edited Apr 29, 2017 at 13:00

answered Apr 29, 2017 at 12:16

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

1 Comment

user2542275 Over a year ago

thnx, but for the 3rd row, the year_build1=1921, because year_build>1500. How can i include that in your code?

Collectives™ on Stack Overflow

Nested if using lambda in Python

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related