Pandas New Column Calculation Based on Existing Columns Values

Question

Please see above my source table:

I am interesting to calculate new column "Group" based on list of interest lots:

Value of column "Group" based on condition if lots in source table in column "Lot" exist in lots of interest. If this is not true, value in "Group"column will be copied from "LOT_VIRTUAL_LINE" cell

Desired output:

Instead of including images (which can't be copied and pasted), please include plain text (which can). — DSM
– DSM, Commented Dec 8, 2015 at 3:30

Alexander · Accepted Answer · 2015-12-08 06:56:23Z

1

Because this question is tagged Pandas, I assume we are talking dataframes and series instead of plain lists. You can use loc to locate the rows and columns that match you criteria (e.g. whether each element in the LOT column isin the series of lots of interest).

df = pd.DataFrame({'LOT': ['A1111', 'A2222', 'A3333', 'B1111', 'B2222', 'B3333'], 
                   'LOT_VIRTUAL_LINE': ['AAA'] * 3 + ['BBB'] * 3})
s = pd.Series(['A1111', 'B2222'], name='Lots Of Interest')
# or... df2 = pd.read_csv('file_path/file_name.csv')

# Value of 'GROUP' defaults to 'LOT_VIRTUAL_LINE'.
df['GROUP'] = df.LOT_VIRTUAL_LINE

# But gets overwritten by 'LOT' if it is in the 'Lots of Interest' series.
mask = df.LOT.isin(s)
# or... mask = df.LOT.isin(df2['Lots of Interest'])  # Whatever the column name is.
df.loc[mask, 'GROUP'] = df.loc[mask, 'LOT']

# Confirm results.
>>> df
     LOT LOT_VIRTUAL_LINE  GROUP
0  A1111              AAA  A1111
1  A2222              AAA    AAA
2  A3333              AAA    AAA
3  B1111              BBB    BBB
4  B2222              BBB  B2222
5  B3333              BBB    BBB

edited Dec 8, 2015 at 6:56

answered Dec 8, 2015 at 6:31

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Felix Over a year ago

Thank you, Alexander. Can you please show how I am reading csv file contains one column into Pandas series? Thanks

Alexander Over a year ago

can you please show me the first two and last two list of the output from pd.read_csv('full_path_to_filename') where full_path_to_filename is just that.

Stefan · Accepted Answer · 2015-12-08 04:08:13Z

1

Assuming we have a list named lots_of_interest, for instance as the result of read_csv(path).loc[:, 'lots_of_interest'].tolist():

df['Group'] = df.apply(lambda x: x['LOT'] if x['LOT'].isin(lots_of_interest) else x['LOT_VIRTUAL_LINE'], axis=1)

edited Dec 8, 2015 at 4:08

answered Dec 8, 2015 at 3:47

Stefan

43.1k13 gold badges80 silver badges84 bronze badges

5 Comments

Felix Over a year ago

Thank you a lot, Stefan. And if lots_of_interest is data frame read from csv file?

Felix Over a year ago

pd.read_csv("C:/Users/fdoktorm/Documents/Databases/SQL Pathfinder/Startup Integration Toolkit/Examples For Development/LotsOfInterest.csv").loc[:, 'LotsOfInterest'].to_list() gives error:(type(self).__name__, name)) AttributeError: 'Series' object has no attribute 'to_list'

Stefan Over a year ago

Sorry, it's .tolist().

Felix Over a year ago

Thank you, Now in lambda function row following error: AttributeError: ("'str' object has no attribute 'isin'", u'occurred at index 0')

Felix Over a year ago

Fixed with if x['SPC_LOT'] in LotsOfInterest statement. Thank you a lot, Stefan. Great help

Collectives™ on Stack Overflow

Pandas New Column Calculation Based on Existing Columns Values

2 Answers 2

2 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related