A pandas (Limitation no longer exists.)DataFrame has the limitation of fixed integer datatypes (int64). NumPy arrays don't have this limitation; we can use np.int8, for example (we also have different float sizes available).
Will scikit-learn performance generally improve on large datasets if we first convert the DataFrame to a raw NumPy array with datatypes of reduced size (e.g. from np.float64 to np.float16)? If so, does this possible performance boost only come into play when memory is limited?
It seems that really high float precision is often unimportant to ML relative to computational size and complexity.
If more context is needed, I'm considering the application of ensemble learners like RandomForestRegressor to large datasets (4-16GB, tens of millions of records consisting of ~10-50 features). However, I'm most interested in the general case.