I developed a program that needs to calibrate >1 milion data and I want to vectorize it for time performances.
I have a dataframe with columns: ['time', 'raw_data'] and I want to create a new column with the calibrated data
I have another dataframe in which I have the calibrations data. The dataframe is organized with columns: ['calibration_name', 'raw_value', 'calibrated_value']
Now I developed a function that retrieves the calibrated_value, and I can use apply method to do so:
def calibrate(value, calibration):
df_calibrations = pd.read_csv('calibration_data.csv', usecols=['calibration_name', 'raw_value', 'calibrated_value'])
y_out = df_calibrations.loc[df_calibrations ['calibration_name'] == value]['calibrated_value'].iloc[0]
df = pd.read_csv('data_to_calibrate.csv', usecols=['time', 'raw'])
calibration = 'calibration_name'
df['eng'] = df['raw'].apply(calibrate, calibration=calibration)
Now my code works fine but I want to improve performances, so I decided to vectorize as:
df['eng'] = calibrate(df['raw'], calibration)
However I get an error such as:
('Lengths must match to compare', (11,), (7630,))
I cannot come up with a solution to vectorize the line:
y_out = df_calibrations.loc[df_calibrations ['calibration_name'] == value]['calibrated_value'].iloc[0]
Is there a way to do so?
data_to_calibrate.csv:
time, raw
1571348671638000000, 1
1571348676493000000, 3
1571348681180000000, 2
calibration_data.csv:
calibration_name, raw_value, raw_value
XXXX01 0 A
XXXX01 1 B
XXXX01 2 C
XXXX01 3 D
calibration_nameIt should be relatively easy.