Creating a new dataframe column based on row values from multiple columns

Question

this is my first question on StackOverflow, so I apologize if the formatting isn't perfect.

I've concatenated multiple dataframes and now I'm unable to figure out how to create a new column - df["population"] based on values from other columns - df["2013 pop"], df["2014 pop"] etc. For example, if the event occurred in 2014, meaning the df["Year"] == 2014, I want to take the population from the df["2014 pop"] column and plug it into the new df["population"] column. I'm explaining this horribly I know, I'm just frustrated over something I feel like I should be able to do easily. Here's a summarization of the dataframe and what I've tried so far.

"Year" : [2013,2014,2015...],
"State" : ["Louisana", "Texas", "California"... ],
"City" : ["New Orleans", "Dallas", "Sacramento"...],
"Number Killed" : [4,6,2,4],
"Safety Grade" : ["A", "B", "C", "D"...],
"2013 Pop" : [421329, 232321, 2454543....],
"2014 Pop" : [454545, 655654, 3421342....],
"2015 Pop" : [142314, 454355, 4324323....],
"Incident Date(datetime dtype)" : [12-29-2014, 3-12-2017...]
}

df = pd.DataFrame(d)

I've tried mapping, loc, apply, and I just can't find a solution. I think I'm on the right track with defining a function with conditionals but I'm getting thrown an error.

def categorise(row):
  if row["Year"] == 2014:
    return df["2014 Pop"]
  elif row["Year"] == 2015:
    return df["2015 Pop"]
  elif row["Year"] == 2016:
    return df["2016 Pop"]
  elif row["Year"] == 2017:
    return df["2017 Pop"]
  else:
    return "NONE"

When I try this:

df["Population"] = df.apply(lambda row : categorise(row), axis = 1)

I get the Value Error " Wrong number of items passed 3609 (length of the df), placement implies 1

Does anyone have a suggestion for how to create the df["Population"] column based on my poorly worded question?

Ynjxsjmh · Accepted Answer · 2022-06-06 16:47:59Z

1

You should change df to row in your categorise function

def categorise(row):
  if row["Year"] == 2014:
    return row["2014 Pop"]
  elif row["Year"] == 2015:
    return row["2015 Pop"]
  elif row["Year"] == 2016:
    return row["2016 Pop"]
  elif row["Year"] == 2017:
    return row["2017 Pop"]
  else:
    return "NONE"

df["Population"] = df.apply(categorise, axis = 1)

Or use np.select

df["Population"] = np.select(
    [df["Year"] == 2014,
     df["Year"] == 2015,
     df["Year"] == 2016,
     df["Year"] == 2017,
     ],
    [df["2014 Pop"],
     df["2015 Pop"],
     df["2016 Pop"],
     df["2017 Pop"],
     ],
    default='NONE'
)

Or with pd.factorize

idx, cols = pd.factorize(df['Year'])
pop = df.filter(like='Pop').rename(columns=lambda x: int(x.split(' ')[0]))
out = pop.reindex(cols, axis=1).to_numpy()[np.arange(len(pop)), idx]

edited Jun 6, 2022 at 16:47

answered Jun 6, 2022 at 16:38

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nickguild1993 Over a year ago

Using np.select() worked perfectly- thank you so much!

Collectives™ on Stack Overflow

Creating a new dataframe column based on row values from multiple columns

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related