1

Searched multiple threads on here for an answer but nothing is quite like what I am looking to do. I am trying to make a boolean column in a dataframe where one of the parameters is based on metal and associated cost maximum, see Dict below.

Cost_ranges = {'Metals': ["Cu", "Pb", "Zn", "Ni", "Mo", "Co", "Sn", "U3O8", "Ag", "Au", "Pt", "Pd", "Rh", "Os", "Ru", "Ir"],
               'Cost Maximum': [350, 200, 200, 500, 800, 1000, 250, 2500, 30, 2500, 500, 1000, 6000, 2500, 2500, 2500]}

The dict is used to complete the below formula:

df_Cost['Total Cost'] >= Cost_ranges['Cost Maximum']

i.e. df_Cost['Metal'] contains to value 'Cu', it calls the 'Cost Maximum' of 350 from the dict and uses this for the boolean expression hence formula would read:

df_Cost[Total Cost] >= 350

I need it to apply to all rows in a dataframe. I have been using df.eval() but need an extra layer of processing to match up the right limit per metal.

I have tried using df.eval(), df.query(), df.loc and df.apply() but keep getting 'TypeError: 'Series' objects are mutable, thus they cannot be hashed' or 'ValueError: ('Lengths must match to compare', (9999,), (16,))' for each solution.

Look forward to the responses.

0

2 Answers 2

1

Depending on your actual data, you could do something like:

import numpy as np
import pandas as pd

Cost_ranges = {'Metals': ["Cu", "Pb", "Zn", "Ni", "Mo", "Co", "Sn", "U3O8", "Ag", "Au", "Pt", "Pd", "Rh", "Os", "Ru", "Ir"],
           'Cost Maximum': [350, 200, 200, 500, 800, 1000, 250, 2500, 30, 2500, 500, 1000, 6000, 2500, 2500, 2500]}

N = 20
d = pd.DataFrame({'Metals': np.random.choice(["Cu", "Pb", "Zn", "Ni"], N),
                  'Cost': np.random.random(N) * 1000})

d.merge(pd.DataFrame(Cost_ranges).astype({'Cost Maximum': float}),
        on = "Metals", how = "left")\
    .eval('want = Cost > `Cost Maximum`')

#    Metals        Cost  Cost Maximum   want
# 0      Cu  297.386007         350.0  False
# 1      Pb   55.570657         200.0  False
# 2      Pb   91.803336         200.0  False
# 3      Cu  916.273995         350.0   True
# 4      Zn  796.383326         200.0   True
# 5      Pb  112.504581         200.0  False
Sign up to request clarification or add additional context in comments.

Comments

0

Assuming your df_Cost looks a little like this (with potentially more rows and columns):

>>> df_Cost
   Total Cost Metal
0         315    Cu
1         420    Cu

The easiest way is to use a dictionary to translate from Metal to max cost. Then you need to make Cost_ranges into a dictionary mapping metal name to cost:

>>> cost_lookup = dict(zip(Cost_ranges['Metals'], Cost_ranges['Cost Maximum']))
>>> cost_lookup
{'Cu': 350, 'Pb': 200, 'Zn': 200, 'Ni': 500, 'Mo': 800, 'Co': 1000, 'Sn': 250, 'U3O8': 2500, 'Ag': 30, 'Au': 2500, 'Pt': 500, 'Pd': 1000, 'Rh': 6000, 'Os': 2500, 'Ru': 2500, 'Ir': 2500}
>>> df_Cost['Metal'].map(cost_lookup)
0    350
1    350
Name: Metal, dtype: int64
>>> df_Cost['Total Cost'] >= df_Cost['Metal'].map(cost_lookup)
0    False
1     True
dtype: bool

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.