0

I want to use the Docker image with Apache Spark on Ubuntu 18.04.

The more popular image from the hub has Spark 1.6. The second image has a more recent version Spark 2.2

No image has numpy installed. The basic examples for Spark MLlib main guide require it.

I've tried running Dockerfile for installing numpy unsuccessfully, adding this to the original Dockerfile for Spark 2.2 image:

RUN apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose

How do you set the container to use the OS's numpy installation? What is the procedure? Is this the correct direction at all?

Edit: OS is Ubuntu 18.04

4
  • pip install numpy? Commented May 30, 2019 at 4:43
  • @atline in the dockerfile doesn't work i.e. RUN pip install numpy . Commented May 30, 2019 at 6:14
  • What's the error when you say it doesn't work? Commented May 30, 2019 at 6:31
  • 1
    Fully works on my side, see answer. Commented May 30, 2019 at 6:35

1 Answer 1

1

Dockerfile:

FROM p7hb/docker-spark

RUN apt-get update && apt install -y python-numpy

Build command:

docker build -t my_image .

Run container:

docker run -it --rm my_image /bin/bash

Check numpy:

root@55ce4c59122c:~# python
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> print(numpy.__version__)
1.12.1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.