1

I have this Random forest model where X_train, X_test as well as y_train, y_test are numpy arrays of shape of (1,n) and (1,m) i.e. input consists of only one feature

model_1 = RandomForestRegressor(n_estimators=50,random_state=42)
model_1.fit(X_train.reshape(-1,1), y_train.reshape(-1,1))
print(model_1.score(X_test.reshape(-1,1), y_test.reshape(-1,1)))

which works totally fine to fit training data and then also gives a score of around 0.95 respectively for test data but now if I want to predict for

future = np.array([int(i) for i in range(len(X)+1,len(X)+11)])

so future is

array([155, 156, 157, 158, 159, 160, 161, 162, 163, 164])

I did this :

model_1.predict(future.reshape(-1, 1))

But in the output I got all same values

array([2985.02, 2985.02, 2985.02, 2985.02, 2985.02, 2985.02, 2985.02,
       2985.02, 2985.02, 2985.02])

Can somebody tell me why I am getting all predictions to be a same number ? and this is just not happening for 10 future values but even for 100 values. Is there any other way to predict results manually ?

3
  • What happens when you try preditcting for test data? I mean what is the result of model_1.predict(X_test.reshape(-1,1))? Commented Aug 15, 2020 at 17:16
  • @büşraçelik for X_test output is not a same number Commented Aug 15, 2020 at 18:34
  • 1
    Could you put a small example (perhaps with m and n being less than 5 or less than 10) for {X|y}_train and {X|y}_test that would show the problem? I don't know if that would involve restructuring your whole model, but if it doesn't, it would help to be able to reproduce your problem. Commented Aug 15, 2020 at 19:30

1 Answer 1

2

I don't have the means to try running code, but it sounds like a random-number-generator seed doesn't get changed. Often, the type of repeatability/reproducibility you've described is desired, as in this SO situation - it helps to test certain things. In that example, the OP is concerned because the results are not reproducible.

The first thing to look at, I think, is the random_state = 42. You might be able to find there if the same random seed is used each time.

As for predicting the result (if that means that you want to predict the "same number" you're getting each time, you'll need to find the (PseudoRandom Number Generator (PRNG, wikipedia article linked here).

Actually, that article has a nice description of what you might be running into:

The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed.

You'll need to look through the source code. Hopefully, someone commented their code well enough that it will be easy to find. Look for words such as seed, generator, and possible other words from the wiki article.

Once again, without being able to try things out or being able to see the source code, I can't tell you this is the actual problem. However, it reminds me of a simulation we messed up during grad school. The goal was to run a particle-collision simulator for something on the order of 10^12 events; we didn't reset the seed, so we had about 10^12 identical simulations. That didn't help with the statistics we were trying to do.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.