Replace multiple "less than values" in different columns in pandas dataframe

Question

I am working with python and pandas. I have a dataset of lab analysis where I am dealing with multiple parameters and detection limits(dl). Many of the samples are reported as below the dl (e.g.<dl,<4)

For example:

import pandas as pd

df=pd.DataFrame([['<4','88.72','<0.09'],['<1','5','<0.09'],['2','17.6','<0.09']], columns=['var_1','var_2','var_3'])
df

My goal is to replace all <dl with dl/2 as a float value.

I can do this for one column pretty easily.


df['var_3'] = df.var_3.str.replace('<' ,'').astype(float)
df['var_3'] = df['var_3'].apply(lambda x: x/2 if x == 0.09 else x)
df

but this requires me looking at the dl and inputting it.

I would like to streamline it to apply it across all variables with one or more detection limits per variable as I have many variables and the detection limit will not always be constant from data frame to data frame this is applied to.

I found something similar in R but not sure how to apply it in python. Any solutions would be appreciated.

Update

So the

df=df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval)

works well with dataframe with just columns with numbers. I assume that is a limitation of the eval function. For some reason I can get the code to work on smaller dataframes but after I concatenate them the code will not work on the larger dataframe and I get this error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/9_/w2qcdj_x2x5852py8xl6b0sh0000gn/T/ipykernel_9403/3946462310.py in <module>
----> 1 MS=MS.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval)

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
   8738             kwargs=kwargs,
   8739         )
-> 8740         return op.apply()
   8741 
   8742     def applymap(

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/apply.py in apply(self)
    686             return self.apply_raw()
    687 
--> 688         return self.apply_standard()
    689 
    690     def agg(self):

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
    810 
    811     def apply_standard(self):
--> 812         results, res_index = self.apply_series_generator()
    813 
    814         # wrap results

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
    826             for i, v in enumerate(series_gen):
    827                 # ignore SettingWithCopy here in case the user mutates
--> 828                 results[i] = self.f(v)
    829                 if isinstance(results[i], ABCSeries):
    830                     # If we have a view on v, we need to make a copy because

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    351         eng = ENGINES[engine]
    352         eng_inst = eng(parsed_expr)
--> 353         ret = eng_inst.evaluate()
    354 
    355         if parsed_expr.assigner is None:

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/engines.py in evaluate(self)
     78 
     79         # make sure no names in resolvers and locals/globals clash
---> 80         res = self._evaluate()
     81         return reconstruct_object(
     82             self.result_type, res, self.aligned_axes, self.expr.terms.return_type

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/engines.py in _evaluate(self)
    119         scope = env.full_scope
    120         _check_ne_builtin_clash(self.expr)
--> 121         return ne.evaluate(s, local_dict=scope)
    122 
    123 

~/opt/anaconda3/lib/python3.9/site-packages/numexpr/necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
    821 
    822     # Create a signature
--> 823     signature = [(name, getType(arg)) for (name, arg) in
    824                  zip(names, arguments)]
    825 

~/opt/anaconda3/lib/python3.9/site-packages/numexpr/necompiler.py in <listcomp>(.0)
    821 
    822     # Create a signature
--> 823     signature = [(name, getType(arg)) for (name, arg) in
    824                  zip(names, arguments)]
    825 

~/opt/anaconda3/lib/python3.9/site-packages/numexpr/necompiler.py in getType(a)
    703     if kind == 'U':
    704         raise ValueError('NumExpr 2 does not support Unicode as a dtype.')
--> 705     raise ValueError("unknown type %s" % a.dtype.name)
    706 
    707 

ValueError: unknown type object

What is the expected dataframe please?

Corralien
– Corralien

2022-02-13 21:55:43 +00:00
Commented Feb 13, 2022 at 21:55 — Corralien
– Corralien, Commented Feb 13, 2022 at 21:55

Corralien · Accepted Answer · 2022-03-09 06:46:41Z

1

Use replace instead str.replace than eval all expressions:

>>> df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval)
   var_1  var_2  var_3
0    2.0  88.72  0.045
1    0.5   5.00  0.045
2    2.0  17.60  0.045

\1 will be replace by the first capture group .*

Update

Alternative:

out = df.melt(ignore_index=False)
m = out['value'].str.startswith('<')
out.loc[m, 'value'] = out.loc[m, 'value'].str.strip('<').astype(float) / 2
out = out.reset_index().pivot('index', 'variable', 'value') \
         .rename_axis(index=None, columns=None)

Output:

>>> out
  var_1  var_2  var_3
0   2.0  88.72  0.045
1   0.5      5  0.045
2     2   17.6  0.045

Update

Alternative using melt to flatten your dataframe and pivot to reshape to your original dataframe:

df1 = df.melt(ignore_index=False)
m = df1['value'].str.startswith('<')
df1['value'] = df1['value'].mask(~m).str[1:].astype(float).div(2) \
                           .fillna(df1['value']).astype(float)
df1 = df1.reset_index().pivot_table('value', 'index', 'variable') \
         .rename_axis(index=None, columns=None)

Output:

>>> df1
   var_1  var_2  var_3
0    2.0  88.72  0.045
1    0.5   5.00  0.045
2    2.0  17.60  0.045

edited Mar 9, 2022 at 6:46

answered Feb 13, 2022 at 21:53

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

A_Miller Over a year ago

The first code you wrote didn't work for my larger dataset but this update seems to have worked perfectly. Thanks for dealing with my horrible formatting. I really appreciate it.

Corralien Over a year ago

What's wrong with your large dataset?

A_Miller Over a year ago

So with smaller datasets I am able to use the df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval) line of code. When I apply this to larger datasets that have been concatenated it does not work. Even if I can get it to work on individual groups before using pd.concat. I am not sure what is the best way to show the error message I am receiving.

Corralien Over a year ago

@A_Miller. I updated my answer with an alternative method. Can you check it please? Update your post with your error message. For your sample which I suppose it's a subset of your data, my solution works so it solved your problem...

Collectives™ on Stack Overflow

Replace multiple "less than values" in different columns in pandas dataframe

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related