2

I am reproducing some simple 10-arm bandit experiments from Sutton and Barto's book Reinforcement Learning: An Introduction. Some of these require significant computation time so I tried to get the advantage of my multicore CPU.

Here is the function which i need to run 2000 times. It has 1000 sequential steps which incrementally improve the reward:

import numpy as np

def foo(eps): # need an (unused) argument to use pool.map()
    # initialising
    # the true values of the actions
    q = np.random.normal(0, 1, size=10)
    # the estimated values
    q_est = np.zeros(10)
    # the counter of how many times each of the 10 actions was chosen
    n = np.zeros(10)

    rewards = []
    for i in range(1000):
        # choose an action based on its estimated value
        a = np.argmax(q_est)
        # get the normally distributed reward 
        rewards.append(np.random.normal(q[a], 1)) 
        # increment the chosen action counter
        n[a] += 1 
        # update the estimated value of the action
        q_est[a] += (rewards[-1] - q_est[a]) / n[a] 
    return rewards

I execute this function 2000 times to get (2000, 1000) array:

reward = np.array([foo(0) for _ in range(2000)])

Then I plot the mean reward across 2000 experiments:

import matplotlib.pyplot as plt
plt.plot(np.arange(1000), reward.mean(axis=0))

sequential plot

which fully corresponds the expected result (looks the same as in the book). But when I try to execute it in parallel, I get much greater standard deviation of the average reward:

import multiprocessing as mp
with mp.Pool(mp.cpu_count()) as pool:
    reward_p = np.array(pool.map(foo, [0]*2000))
plt.plot(np.arange(1000), reward_p.mean(axis=0))

parallel plot

I suppose this is due to the parallelization of a loop inside of the foo. As i reduce the number of cores allocated to the task, the reward plot approaches the expected shape.

Is there a way to get the advantage of the multiprocessing here while getting the correct results?

UPD: I tried running the same code on Windows 10 and sequential vs parallel and the results turned out to be the same! What may be the reason?

Ubuntu 20.04, Python 3.8.5, jupyter

Windows 10, Python 3.7.3, jupyter

5
  • I can't reproduce it. Works the same on 8 cores. Commented Nov 1, 2020 at 14:55
  • @Marcin Hmm... I've just executed it on a two-core machine and got different results. Commented Nov 1, 2020 at 15:22
  • I'm doing it on Python 3.8.5 on windows via jupyter Commented Nov 1, 2020 at 15:24
  • I'm on Ubuntu 20.04, python 3.8.5 via jupyter. Shall try it on windows right now Commented Nov 1, 2020 at 15:32
  • Wow, windows shows the same graph after parallel execution as after sequential! I wonder what are the reasons... Commented Nov 1, 2020 at 15:40

1 Answer 1

1

As we found out it is different on windows and ubuntu. It is probably because of this:

spawn The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows and macOS.

fork The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

Try adding this line to your code:

mp.set_start_method('spawn')
Sign up to request clarification or add additional context in comments.

2 Comments

That worked! Thank you so much Marcin, I've been struggling this for two days! Pitty I cant upvote your answer (need 15 reputation). Got ~7-fold speed advantage
No prob, I used to have a problem with forking a long time ago, so I kinda new what to look for, hahah

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.