1

I developed a program that needs to calibrate >1 milion data and I want to vectorize it for time performances.

I have a dataframe with columns: ['time', 'raw_data'] and I want to create a new column with the calibrated data

I have another dataframe in which I have the calibrations data. The dataframe is organized with columns: ['calibration_name', 'raw_value', 'calibrated_value']

Now I developed a function that retrieves the calibrated_value, and I can use apply method to do so:

def calibrate(value, calibration):
    df_calibrations = pd.read_csv('calibration_data.csv', usecols=['calibration_name', 'raw_value', 'calibrated_value'])
    y_out = df_calibrations.loc[df_calibrations ['calibration_name'] == value]['calibrated_value'].iloc[0]


df = pd.read_csv('data_to_calibrate.csv', usecols=['time', 'raw'])
calibration = 'calibration_name'
df['eng'] = df['raw'].apply(calibrate, calibration=calibration)

Now my code works fine but I want to improve performances, so I decided to vectorize as:

df['eng'] = calibrate(df['raw'], calibration)

However I get an error such as:

('Lengths must match to compare', (11,), (7630,))

I cannot come up with a solution to vectorize the line:

y_out = df_calibrations.loc[df_calibrations ['calibration_name'] == value]['calibrated_value'].iloc[0]

Is there a way to do so?

data_to_calibrate.csv:

time,   raw
1571348671638000000,    1
1571348676493000000,    3
1571348681180000000,  2

calibration_data.csv:

calibration_name,  raw_value,   raw_value
XXXX01  0   A
XXXX01  1   B
XXXX01  2   C
XXXX01  3   D
4
  • Can you use merge instead of applying using two dataframes. This looks really inefficient Commented Aug 30, 2021 at 12:42
  • How to use merge? I have the correspondance between the raw and calibrated value on a different file Commented Aug 30, 2021 at 12:50
  • Can you share a sample of both data. It seems like you only have to merge on calibration_name It should be relatively easy. Commented Aug 30, 2021 at 12:52
  • I have added it to the question so it is more readable. In the example the new column of data_to_calibrate.csv shall be B-D-C Commented Aug 30, 2021 at 12:57

1 Answer 1

1

By merging on the common column you can perform all the necessary business logic in a vectorized manner

data_to_calibrate = data_to_calibrate.merge(calibration_data, how='left', left_on='raw', right_on='raw_value')

data_to_calibrate.loc[data_to_calibrate['raw_value'].notna(), 'time'] = data_to_calibrate['raw_value']
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I have actually used a different method. I transformed my data_to_calibrate dataframe to a dictionary and I have mapped the data as: df['eng'] = df.raw.map(df2)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.