1

I am interested to loop through column to convert into processed series.
Below is an example of two row, four columns data frame:

import pandas as pd
from rapidfuzz import process as process_rapid
from rapidfuzz import utils as rapid_utils

data = [['r/o ac. nephritis.  /.  nephrotic syndrome', ' ac. nephritis.  /.  nephrotic syndrome',1,'ac   nephritis      nephrotic syndrome'], [ 'sternocleidomastoid contracture','sternocleidomastoid contracture',0,"NA"]]
 

# Create the pandas DataFrame

df_diagnosis = pd.DataFrame(data, columns = ['diagnosis_name', 'diagnosis_name_edited','is_spell_corrected','spell_corrected_value'])

I want to use spell_corrected_value column if is_spell_corrected column is more than 1. Else, use diagnosis_name_edited

At the moment, I have following code to directly use diagnosis_name_edited column. How do I make into if-else/lambda check for is_spell_corrected column?

unmapped_diag_series = (rapid_utils.default_process(d) for d in df_diagnosis['diagnosis_name_edited'].astype(str)) # characters (generator)
unmapped_processed_diagnosis = pd.Series(unmapped_diag_series) #

Thank you.

1 Answer 1

1

If I get you right, try out this fast solution using numpy.where:

df_diagnosis['new_column'] = np.where(df_diagnosis['is_spell_corrected'] > 1, df_diagnosis['spell_corrected_value'], df_diagnosis['diagnosis_name_edited'])
Sign up to request clarification or add additional context in comments.

1 Comment

You got it right. Thank you. There is a slight change if I may suggest: np.where(df_diagnosis['is_spell_corrected'] > 0, df_diagnosis['spell_corrected_value'], df_diagnosis['diagnosis_name_edited']) That is I am looking to compare is_spell_corrected value greater than 0.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.