Using new python environment for DASK workers

Question

I am running my DASK server on hpc where I have all basic necessary modules to run dask and I am loading that module in jupyter notebook. I would like to run some processing task using dask and the modules which are not available in the base environment of dask. For that I have my custom environment created using conda. Is there an easy way to link this new condo environment to the dask client before running my task.

I have tried using

from dask.distributed import Client,LocalCluster
client = Client(scheduler_file=schedule_json)
print(client)
client.upload_file('condaenvfile.tar')

also I have tried using client.run(os.system,'conda install -c conda-forge package -y') but still I am getting a message like module not found.

I am making my problem more clear so that I can figure out if there are any other alternatives to handle such issues.

import skimage
import dask.distributed import Client

client=Client(schedule_json)


def myfunc(param):
   process using skimage


r=[]
for in [list]:
     myres=dask.delayed(myfun)(param)
     r.append(myres)

allres=dask.compute(*r)

In the above example, I have dask module running on hpc environment which I don't have any control just I can load that module. I have my own condo environment inside my user profile I have to run some process using skilearn (and other modules) using the dask worker. What would be alternative to work around for such issue?

quasiben · Accepted Answer · 2020-06-22 14:05:39Z

3

Once dask is running you can't switch out the underlying Python environment. Instead, you should build an environment with all the libraries and dependencies you need and run from the newly created env. To help with creating a environment I would recommend using conda-pack. If you want to modify an existing an environment you can do this but I would not recommend it. If you care deeply about this issue you might be interested in https://github.com/dask/distributed/issues/3111

answered Jun 22, 2020 at 14:05

quasiben

1,4641 gold badge11 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

PUJA Over a year ago

Thanks @quasiben for your suggestion. Now my question where is the use of client.run () or client.upload_file () isn't this mean to use for such purpose.. I am bit confused. Also I tried to explain my problem more explicitly if there would be some other alternatives.

quasiben Over a year ago

client.run can be useful for all sorts of things -- any time you want to run something once. client.upload_file is used for adding a library to an existing dask process but it is not robust . Still, nothing here will help swap out an entire environment only modify an existing one

PUJA Over a year ago

Thanks for the clarification. So using client.upload_file() adds library temporarily only during the dask process, do we need to have write access to environment where dask is running ?

quasiben Over a year ago

I believe it uploads to a temporary directory, not the env from where dask is running. I'd recommend reading through: distributed.readthedocs.io/en/latest/… and stackoverflow.com/questions/39295200/…

Collectives™ on Stack Overflow

Using new python environment for DASK workers

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related