I have profiled the function. I have one solution, it may work only for the sample database you gave out.
Timer unit: 1e-07 s
Total time: 0.629038 s
File: C:\Users\@user@\AppData\Roaming\Python\Python37\site-packages\scipy\stats\morestats.py
Function: boxcox at line 948
Line # Hits Time Per Hit % Time Line Contents
==============================================================
948 def boxcox(x, lmbda=None, alpha=None):
949 r"""
950 Return a dataset transformed by a Box-Cox power transformation.
951
952 Parameters
953 ----------
954 x : ndarray
955 Input array. Must be positive 1-dimensional. Must not be constant.
956 lmbda : {None, scalar}, optional
957 If `lmbda` is not None, do the transformation for that value.
958
959 If `lmbda` is None, find the lambda that maximizes the log-likelihood
960 function and return it as the second output argument.
961 alpha : {None, float}, optional
962 If ``alpha`` is not None, return the ``100 * (1-alpha)%`` confidence
963 interval for `lmbda` as the third output argument.
964 Must be between 0.0 and 1.0.
965
966 Returns
967 -------
968 boxcox : ndarray
969 Box-Cox power transformed array.
970 maxlog : float, optional
971 If the `lmbda` parameter is None, the second returned argument is
972 the lambda that maximizes the log-likelihood function.
973 (min_ci, max_ci) : tuple of float, optional
974 If `lmbda` parameter is None and ``alpha`` is not None, this returned
975 tuple of floats represents the minimum and maximum confidence limits
976 given ``alpha``.
977
978 See Also
979 --------
980 probplot, boxcox_normplot, boxcox_normmax, boxcox_llf
981
982 Notes
983 -----
984 The Box-Cox transform is given by::
985
986 y = (x**lmbda - 1) / lmbda, for lmbda > 0
987 log(x), for lmbda = 0
988
989 `boxcox` requires the input data to be positive. Sometimes a Box-Cox
990 transformation provides a shift parameter to achieve this; `boxcox` does
991 not. Such a shift parameter is equivalent to adding a positive constant to
992 `x` before calling `boxcox`.
993
994 The confidence limits returned when ``alpha`` is provided give the interval
995 where:
996
997 .. math::
998
999 llf(\hat{\lambda}) - llf(\lambda) < \frac{1}{2}\chi^2(1 - \alpha, 1),
1000
1001 with ``llf`` the log-likelihood function and :math:`\chi^2` the chi-squared
1002 function.
1003
1004 References
1005 ----------
1006 G.E.P. Box and D.R. Cox, "An Analysis of Transformations", Journal of the
1007 Royal Statistical Society B, 26, 211-252 (1964).
1008
1009 Examples
1010 --------
1011 >>> from scipy import stats
1012 >>> import matplotlib.pyplot as plt
1013
1014 We generate some random variates from a non-normal distribution and make a
1015 probability plot for it, to show it is non-normal in the tails:
1016
1017 >>> fig = plt.figure()
1018 >>> ax1 = fig.add_subplot(211)
1019 >>> x = stats.loggamma.rvs(5, size=500) + 5
1020 >>> prob = stats.probplot(x, dist=stats.norm, plot=ax1)
1021 >>> ax1.set_xlabel('')
1022 >>> ax1.set_title('Probplot against normal distribution')
1023
1024 We now use `boxcox` to transform the data so it's closest to normal:
1025
1026 >>> ax2 = fig.add_subplot(212)
1027 >>> xt, _ = stats.boxcox(x)
1028 >>> prob = stats.probplot(xt, dist=stats.norm, plot=ax2)
1029 >>> ax2.set_title('Probplot after Box-Cox transformation')
1030
1031 >>> plt.show()
1032
1033 """
1034 2 153.0 76.5 0.0 x = np.asarray(x)
1035 2 55.0 27.5 0.0 if x.ndim != 1:
1036 raise ValueError("Data must be 1-dimensional.")
1037
1038 2 41.0 20.5 0.0 if x.size == 0:
1039 return x
1040
1041 2 168219.0 84109.5 2.7 if np.all(x == x[0]):
1042 raise ValueError("Data must not be constant.")
1043
1044 2 67990.0 33995.0 1.1 if any(x <= 0):
1045 raise ValueError("Data must be positive.")
1046
1047 2 47.0 23.5 0.0 if lmbda is not None: # single transformation
1048 1 161912.0 161912.0 2.6 return special.boxcox(x, lmbda)
1049
1050 # If lmbda=None, find the lmbda that maximizes the log-likelihood function.
1051 1 5891911.0 5891911.0 93.7 lmax = boxcox_normmax(x, method='mle')
1052 1 29.0 29.0 0.0 y = boxcox(x, lmax)
1053
1054 1 15.0 15.0 0.0 if alpha is None:
1055 1 8.0 8.0 0.0 return y, lmax
1056 else:
1057 # Find confidence interval
1058 interval = _boxcox_conf_interval(x, lmax, alpha)
1059 return y, lmax, interval
As you can see, most of the time consumed by the function passes by calculating lambda. Improving the efficiency of the lambda obtainer function should be matter of research, I am sure, but here is a quick fix if your data resembles my sample dataframe.
In:
%%timeit
for column in df.columns:
stats.boxcox(df[column])
Out:
1min 59s ± 2.46 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you print the results, in my example it resulted that all lambdas were around some float: 0.71...
(array([24.3917785 , 26.59336098, 28.42778141, ..., 0.89856936,
20.09087919, 25.34396741]), 0.7191265555780741)
(array([32.21200102, 21.42019555, 17.61088955, ..., 2.37689377,
11.43847546, 4.80996571]), 0.7186638451956158)
(array([32.04912451, 5.34619315, 2.37388797, ..., 28.25042847,
21.65944327, 35.68435748]), 0.717094388354133)
(array([24.12034838, 20.22153029, 17.46007125, ..., 9.20987077,
27.79432177, 28.38850624]), 0.7152672101519897)
(array([24.43175536, 29.97646547, 15.44100467, ..., 29.67889106,
33.75136616, 18.01618903]), 0.719690932457849)
(array([31.3006977 , 14.24153427, 8.80686258, ..., 27.74442602,
29.54262716, 5.35448321]), 0.7182204065752503)
(array([ 0.89885059, 33.21042971, 7.41516615, ..., 26.66002733,
32.05761174, 2.37938055]), 0.719960442157806)
(array([18.75921571, 20.15657425, 32.38744267, ..., 32.09731377,
34.95687043, 33.82390653]), 0.7203446358711867)
(array([30.61614136, 16.82387108, 23.61599906, ..., 26.74368558,
26.43727409, 26.43727409]), 0.7171650690241015)
(array([32.03243895, 18.46213843, 15.23999702, ..., 33.70140582,
34.52403407, 7.82011755]), 0.7141993257439302)
(array([16.39388107, 20.23652878, 25.38777257, ..., 27.81775952,
3.63937585, 12.98507701]), 0.7155422854115251)
(array([23.84209605, 24.47289056, 32.48229038, ..., 29.90484308,
11.37093225, 6.8765052 ]), 0.7158092390730108)
You can make the mean of the first N values and use it for the rest. After running the whole database no value was off value +- standard deviation. Use it as second argument in stats.boxcox function.
In:
%%timeit
for column in df.columns:
stats.boxcox(df[column], 0.715)
Out:
3.67 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As I told you, it could be easier to conduct a research either with the original database or with the backup of a company. Good luck!