2

I am running my DASK server on hpc where I have all basic necessary modules to run dask and I am loading that module in jupyter notebook. I would like to run some processing task using dask and the modules which are not available in the base environment of dask. For that I have my custom environment created using conda. Is there an easy way to link this new condo environment to the dask client before running my task.

I have tried using

from dask.distributed import Client,LocalCluster
client = Client(scheduler_file=schedule_json)
print(client)
client.upload_file('condaenvfile.tar')

also I have tried using client.run(os.system,'conda install -c conda-forge package -y') but still I am getting a message like module not found.


I am making my problem more clear so that I can figure out if there are any other alternatives to handle such issues.

import skimage
import dask.distributed import Client

client=Client(schedule_json)


def myfunc(param):
   process using skimage


r=[]
for in [list]:
     myres=dask.delayed(myfun)(param)
     r.append(myres)

allres=dask.compute(*r)

In the above example, I have dask module running on hpc environment which I don't have any control just I can load that module. I have my own condo environment inside my user profile I have to run some process using skilearn (and other modules) using the dask worker. What would be alternative to work around for such issue?

1 Answer 1

3

Once dask is running you can't switch out the underlying Python environment. Instead, you should build an environment with all the libraries and dependencies you need and run from the newly created env. To help with creating a environment I would recommend using conda-pack. If you want to modify an existing an environment you can do this but I would not recommend it. If you care deeply about this issue you might be interested in https://github.com/dask/distributed/issues/3111

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @quasiben for your suggestion. Now my question where is the use of client.run () or client.upload_file () isn't this mean to use for such purpose.. I am bit confused. Also I tried to explain my problem more explicitly if there would be some other alternatives.
client.run can be useful for all sorts of things -- any time you want to run something once. client.upload_file is used for adding a library to an existing dask process but it is not robust . Still, nothing here will help swap out an entire environment only modify an existing one
Thanks for the clarification. So using client.upload_file() adds library temporarily only during the dask process, do we need to have write access to environment where dask is running ?
I believe it uploads to a temporary directory, not the env from where dask is running. I'd recommend reading through: distributed.readthedocs.io/en/latest/… and stackoverflow.com/questions/39295200/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.