2

this is my first question on StackOverflow, so I apologize if the formatting isn't perfect.

I've concatenated multiple dataframes and now I'm unable to figure out how to create a new column - df["population"] based on values from other columns - df["2013 pop"], df["2014 pop"] etc. For example, if the event occurred in 2014, meaning the df["Year"] == 2014, I want to take the population from the df["2014 pop"] column and plug it into the new df["population"] column. I'm explaining this horribly I know, I'm just frustrated over something I feel like I should be able to do easily. Here's a summarization of the dataframe and what I've tried so far.

"Year" : [2013,2014,2015...],
"State" : ["Louisana", "Texas", "California"... ],
"City" : ["New Orleans", "Dallas", "Sacramento"...],
"Number Killed" : [4,6,2,4],
"Safety Grade" : ["A", "B", "C", "D"...],
"2013 Pop" : [421329, 232321, 2454543....],
"2014 Pop" : [454545, 655654, 3421342....],
"2015 Pop" : [142314, 454355, 4324323....],
"Incident Date(datetime dtype)" : [12-29-2014, 3-12-2017...]
}
df = pd.DataFrame(d)

I've tried mapping, loc, apply, and I just can't find a solution. I think I'm on the right track with defining a function with conditionals but I'm getting thrown an error.

def categorise(row):
  if row["Year"] == 2014:
    return df["2014 Pop"]
  elif row["Year"] == 2015:
    return df["2015 Pop"]
  elif row["Year"] == 2016:
    return df["2016 Pop"]
  elif row["Year"] == 2017:
    return df["2017 Pop"]
  else:
    return "NONE"

When I try this:

df["Population"] = df.apply(lambda row : categorise(row), axis = 1)

I get the Value Error " Wrong number of items passed 3609 (length of the df), placement implies 1

Does anyone have a suggestion for how to create the df["Population"] column based on my poorly worded question?

0

1 Answer 1

1

You should change df to row in your categorise function

def categorise(row):
  if row["Year"] == 2014:
    return row["2014 Pop"]
  elif row["Year"] == 2015:
    return row["2015 Pop"]
  elif row["Year"] == 2016:
    return row["2016 Pop"]
  elif row["Year"] == 2017:
    return row["2017 Pop"]
  else:
    return "NONE"

df["Population"] = df.apply(categorise, axis = 1)

Or use np.select

df["Population"] = np.select(
    [df["Year"] == 2014,
     df["Year"] == 2015,
     df["Year"] == 2016,
     df["Year"] == 2017,
     ],
    [df["2014 Pop"],
     df["2015 Pop"],
     df["2016 Pop"],
     df["2017 Pop"],
     ],
    default='NONE'
)

Or with pd.factorize

idx, cols = pd.factorize(df['Year'])
pop = df.filter(like='Pop').rename(columns=lambda x: int(x.split(' ')[0]))
out = pop.reindex(cols, axis=1).to_numpy()[np.arange(len(pop)), idx]
Sign up to request clarification or add additional context in comments.

1 Comment

Using np.select() worked perfectly- thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.