I know there exists a duplicate question
But after such a long time, are there any new method to achieve the same target?
def process(id):
temp_df = df[id]
return temp_df.apply(another_function)
Parallel(n_jobs=-2)(delayed(process)(id) for id in df.columns)
The dataframe seems to be copied for each process, which is not possible for large dataframe. Are there any method or packages to fix this?
np.intXXandnp.floatXXtypes). That is not for non-uniform dataframes or the ones containing strings/objects. This also force you to use Numpy arrays rather than dataframe which is not convenient. CPython Multithreading is fundamentally limited by the GIL.