4

I have a large data set (185GB) that I plan to perform some machine learning algorithms on it. The data is on a local computer with restricted computational powers. I have access to a remote cluster where I can perform my computationally expensive algorithms. It has 1TB of memory and is pretty fast. But for some reasons I only have 2GB(!) of disk storage on the remote server.

I can connect to the cluster via SSH, is there any way on python that I can load the database to the RAM via SSH?

Any general tips on how to tackle this problem is much appreciated.

1
  • 1
    Maybe this is what you are looking for? Commented Nov 5, 2015 at 3:50

1 Answer 1

1

You may want to use paramiko so that you can connect with SSH from within Python. Then, you can run commands that output your data and read it from the stream. This would work better than copying the files over because it wont involve copying the data to disk. If the data is in files, then you can just use paramiko to cat the files and read the data from the stream.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.