1

I am working with python and pandas. I have a dataset of lab analysis where I am dealing with multiple parameters and detection limits(dl). Many of the samples are reported as below the dl (e.g.<dl,<4)

For example:

import pandas as pd

df=pd.DataFrame([['<4','88.72','<0.09'],['<1','5','<0.09'],['2','17.6','<0.09']], columns=['var_1','var_2','var_3'])
df

My goal is to replace all <dl with dl/2 as a float value.

I can do this for one column pretty easily.


df['var_3'] = df.var_3.str.replace('<' ,'').astype(float)
df['var_3'] = df['var_3'].apply(lambda x: x/2 if x == 0.09 else x)
df

but this requires me looking at the dl and inputting it.

I would like to streamline it to apply it across all variables with one or more detection limits per variable as I have many variables and the detection limit will not always be constant from data frame to data frame this is applied to.

I found something similar in R but not sure how to apply it in python. Any solutions would be appreciated.

Update

So the

df=df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval) 

works well with dataframe with just columns with numbers. I assume that is a limitation of the eval function. For some reason I can get the code to work on smaller dataframes but after I concatenate them the code will not work on the larger dataframe and I get this error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/9_/w2qcdj_x2x5852py8xl6b0sh0000gn/T/ipykernel_9403/3946462310.py in <module>
----> 1 MS=MS.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval)

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwargs)
   8738             kwargs=kwargs,
   8739         )
-> 8740         return op.apply()
   8741 
   8742     def applymap(

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/apply.py in apply(self)
    686             return self.apply_raw()
    687 
--> 688         return self.apply_standard()
    689 
    690     def agg(self):

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
    810 
    811     def apply_standard(self):
--> 812         results, res_index = self.apply_series_generator()
    813 
    814         # wrap results

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
    826             for i, v in enumerate(series_gen):
    827                 # ignore SettingWithCopy here in case the user mutates
--> 828                 results[i] = self.f(v)
    829                 if isinstance(results[i], ABCSeries):
    830                     # If we have a view on v, we need to make a copy because

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    351         eng = ENGINES[engine]
    352         eng_inst = eng(parsed_expr)
--> 353         ret = eng_inst.evaluate()
    354 
    355         if parsed_expr.assigner is None:

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/engines.py in evaluate(self)
     78 
     79         # make sure no names in resolvers and locals/globals clash
---> 80         res = self._evaluate()
     81         return reconstruct_object(
     82             self.result_type, res, self.aligned_axes, self.expr.terms.return_type

~/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/engines.py in _evaluate(self)
    119         scope = env.full_scope
    120         _check_ne_builtin_clash(self.expr)
--> 121         return ne.evaluate(s, local_dict=scope)
    122 
    123 

~/opt/anaconda3/lib/python3.9/site-packages/numexpr/necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
    821 
    822     # Create a signature
--> 823     signature = [(name, getType(arg)) for (name, arg) in
    824                  zip(names, arguments)]
    825 

~/opt/anaconda3/lib/python3.9/site-packages/numexpr/necompiler.py in <listcomp>(.0)
    821 
    822     # Create a signature
--> 823     signature = [(name, getType(arg)) for (name, arg) in
    824                  zip(names, arguments)]
    825 

~/opt/anaconda3/lib/python3.9/site-packages/numexpr/necompiler.py in getType(a)
    703     if kind == 'U':
    704         raise ValueError('NumExpr 2 does not support Unicode as a dtype.')
--> 705     raise ValueError("unknown type %s" % a.dtype.name)
    706 
    707 

ValueError: unknown type object
1
  • What is the expected dataframe please? Commented Feb 13, 2022 at 21:55

1 Answer 1

1

Use replace instead str.replace than eval all expressions:

>>> df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval)
   var_1  var_2  var_3
0    2.0  88.72  0.045
1    0.5   5.00  0.045
2    2.0  17.60  0.045

\1 will be replace by the first capture group .*

Update

Alternative:

out = df.melt(ignore_index=False)
m = out['value'].str.startswith('<')
out.loc[m, 'value'] = out.loc[m, 'value'].str.strip('<').astype(float) / 2
out = out.reset_index().pivot('index', 'variable', 'value') \
         .rename_axis(index=None, columns=None)

Output:

>>> out
  var_1  var_2  var_3
0   2.0  88.72  0.045
1   0.5      5  0.045
2     2   17.6  0.045

Update

Alternative using melt to flatten your dataframe and pivot to reshape to your original dataframe:

df1 = df.melt(ignore_index=False)
m = df1['value'].str.startswith('<')
df1['value'] = df1['value'].mask(~m).str[1:].astype(float).div(2) \
                           .fillna(df1['value']).astype(float)
df1 = df1.reset_index().pivot_table('value', 'index', 'variable') \
         .rename_axis(index=None, columns=None)

Output:

>>> df1
   var_1  var_2  var_3
0    2.0  88.72  0.045
1    0.5   5.00  0.045
2    2.0  17.60  0.045
Sign up to request clarification or add additional context in comments.

4 Comments

The first code you wrote didn't work for my larger dataset but this update seems to have worked perfectly. Thanks for dealing with my horrible formatting. I really appreciate it.
What's wrong with your large dataset?
So with smaller datasets I am able to use the df.replace(r'<(.*)', r'\1/2', regex=True).apply(pd.eval) line of code. When I apply this to larger datasets that have been concatenated it does not work. Even if I can get it to work on individual groups before using pd.concat. I am not sure what is the best way to show the error message I am receiving.
@A_Miller. I updated my answer with an alternative method. Can you check it please? Update your post with your error message. For your sample which I suppose it's a subset of your data, my solution works so it solved your problem...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.