1

I have a dataset like this:

Build_year Max_cnt_year   b1920  b1945 b1975 b1995 
NaN        120            120    35    45    70    
0          67             35     67    21    34    
1921       145            39     67    22    145   
...

Desired output:

Build_year Max_cnt_year   b1920  b1945 b1975 b1995 year_build1
NaN        120            120    35    45    70    1920
0          67             35     67    21    34    1945
1921       145            39     67    22    145   1921
...

I want to compare the max_cnt_year against the values of b1920, b1945, b1975, b1995 and want to assign the values accordingly if it matches to that year ,conditional on Build_year>1500

I am trying this code unsuccessfully:

    def mapper(item):
    max_val = df_all['max_cnt_year']
    comp_val=df_all['build_year']
    for comp in comp_val:
         if comp<1500 or comp is None: 
             if max_val==df_all['b1920']:
                 return 1920
             elif max_val==df_all['b1945']:
                 return 1945 
             elif max_val==df_all['b1970']:
                 return 1970 
             elif max_val==df_all['b1995']:
                 return 1995 
             else: return 2005
         else: return comp_val

df_all['build_year1'] = map(mapper, df_all)

I have modified the data a bit, to replicate the problem. Actual dataset looks like:

  max_cnt_year  build_year  build_count_before_1920  build_count_1921-1945  \
0         246.0         NaN                      1.0                    0.0   
1         304.0         NaN                      0.0                    0.0   
2         108.0         NaN                      0.0                   52.0   
3         278.0         NaN                     23.0                  181.0   
4          86.0         1945                    14.0                   45.0   

   build_count_1946-1970  build_count_1971-1995  
0                  246.0                   63.0  
1                  304.0                   21.0  
2                   44.0                  108.0  
3                  278.0                  131.0  
4                   86.0                    8.0  
4
  • 1
    what is structure of you data in python code? Commented Apr 29, 2017 at 11:36
  • 4
    Why doesn't your lambda use its x argument? But why do you want a huge lambda like that? Why not just write a proper def function? Commented Apr 29, 2017 at 11:37
  • 1
    Create a function, def assign_year(): #logic for assignment of value to year_build1 variable return year_build1 Commented Apr 29, 2017 at 11:45
  • updated my code, doesn't seem to work either( Commented Apr 30, 2017 at 8:58

2 Answers 2

1

You can create a function and then pass it as mapper.

def mapper(item):
    max_val = df_all['max_cnt_year']
    years = ['1920', '1945', '1975', '1995']
    for year in years:
         if max_val == df_all['b' + year]:
             return year

And then you can pass this function in your map function

df_all['build_year1'] = map(mapper, df_all)
Sign up to request clarification or add additional context in comments.

7 Comments

where does it assign the values build_year to build_year1, when build_year>1500?
@user2542275 We are here to help you not to give a ready made solution. idea is to give you a hint.
thnx, on your lines, I updated my code, but can you please point out the possible error in that? It is creating some junk values
can you paste here your data, not in tabular form but what it looks like in python
added the actual data in problem statement
|
0

Here is my ugly Pandas solution - it will parse a year from the column name:

DF

In [57]: df
Out[57]:
   Build_year  Max_cnt_year  b1920  b1945  b1975  b1995
0         NaN           120    120     35     45     70
1         0.0            67     35     67     21     34
2      1921.0           145     39     67     22    145

Solution:

df['year_build1'] = np.where(df['Build_year'] > 1500, df['Build_year'], -1)

df.loc[df['year_build1']==-1, 'year_build1'] = \
    df.loc[df['year_build1']==-1] \
      .apply(lambda x: x.loc['b':].eq(x['Max_cnt_year']).idxmax().replace('b',''),
             axis=1)

df['year_build1'] = df['year_build1'].astype(int)

Result:

In [156]: df
Out[156]:
   Build_year  Max_cnt_year  b1920  b1945  b1975  b1995  year_build1
0         NaN           120    120     35     45     70         1920
1         0.0            67     35     67     21     34         1945
2      1921.0           145     39     67     22    145         1921

In [157]: df.dtypes
Out[157]:
Build_year      float64
Max_cnt_year      int64
b1920             int64
b1945             int64
b1975             int64
b1995             int64
year_build1       int32
dtype: object

1 Comment

thnx, but for the 3rd row, the year_build1=1921, because year_build>1500. How can i include that in your code?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.