4

I'm trying to install sklearn on top of a Docker image (FROM astronomerinc/ap-airflow:master-1.10.5-onbuild). Environment coming with the source image:

  • Alpine Linux v3.10 (kernel 4.9.93-linuxkit-aufs)
  • Python 3.7.3
  • numpy==1.17.2
  • pandas==0.25.1
  • pandas-gbq==0.11.0
  • ...

I had scipy==1.3.1 in my requirements.txt and had no issues installing it with pip, however when I added scikit-learn to requirements.txt and rebuilt again, I got this error saying a numpy header is missing:

    creating build/temp.linux-x86_64-3.7
    creating build/temp.linux-x86_64-3.7/sklearn
    creating build/temp.linux-x86_64-3.7/sklearn/svm
    creating build/temp.linux-x86_64-3.7/sklearn/svm/src
    creating build/temp.linux-x86_64-3.7/sklearn/svm/src/libsvm
    compile options: '-I/usr/lib/python3.7/site-packages/numpy/core/include -c'
    g++: sklearn/svm/src/libsvm/libsvm_template.cpp
    ar: adding 1 object files to build/temp.linux-x86_64-3.7/liblibsvm-skl.a
    running build_ext
    customize UnixCCompiler
    customize UnixCCompiler using build_ext
    resetting extension 'sklearn.svm.liblinear' language from 'c' to 'c++'.
    customize UnixCCompiler
    customize UnixCCompiler using build_ext
    building 'sklearn.__check_build._check_build' extension
    compiling C sources
    C compiler: gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -DTHREAD_STACK_SIZE=0x100000 -fPIC

    creating build/temp.linux-x86_64-3.7/sklearn/__check_build
    compile options: '-I/usr/lib/python3.7/site-packages/numpy/core/include -I/usr/lib/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c'
    gcc: sklearn/__check_build/_check_build.c
    gcc -shared -Wl,--as-needed -Wl,--as-needed build/temp.linux-x86_64-3.7/sklearn/__check_build/_check_build.o -L/usr/lib -Lbuild/temp.linux-x86_64-3.7 -lpython3.7m -o build/lib.linux-x86_64-3.7/sklearn/__check_build/_check_build.cpython-37m-x86_64-linux-gnu.so
    building 'sklearn.cluster._dbscan_inner' extension
    compiling C++ sources
    C compiler: g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -DTHREAD_STACK_SIZE=0x100000 -fPIC

    creating build/temp.linux-x86_64-3.7/sklearn/cluster
    compile options: '-I/usr/lib/python3.7/site-packages/numpy/core/include -I/usr/lib/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c'
    g++: sklearn/cluster/_dbscan_inner.cpp
    sklearn/cluster/_dbscan_inner.cpp:652:10: fatal error: numpy/arrayobject.h: No such file or directory
     #include "numpy/arrayobject.h"
              ^~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    error: Command "g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -Os -fomit-frame-pointer -g -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/lib/python3.7/site-packages/numpy/core/include -I/usr/lib/python3.7/site-packages/numpy/core/include -I/usr/include/python3.7m -c sklearn/cluster/_dbscan_inner.cpp -o build/temp.linux-x86_64-3.7/sklearn/cluster/_dbscan_inner.o -MMD -MF build/temp.linux-x86_64-3.7/sklearn/cluster/_dbscan_inner.o.d" failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-o8ktwf40/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-o8ktwf40/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-p6ejlhi_/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
WARNING: You are using pip version 19.2.1, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip install --no-cache-dir -q -r requirements.txt' returned a non-zero code: 1


Several things I've tried:

  • upgrading pip
  • specifying an older version of scikit-learn
  • "explicitly" installing py3-numpy

None of them worked unfortunately. This post recommends setting the path manually but that just wasn't the answer I was looking for.

Insights? Any help is appreciated!

1
  • @LinPy thanks, in my case there isn't an include folder inside /usr/lib/python3.7/site-packages/numpy/core/. Also find /usr/lib/python3.7/site-packages/numpy/ -name arrayobject.h gives no results. Any pointers? Commented Sep 24, 2019 at 9:30

1 Answer 1

4

I suggest you to install py-numpy-dev in your Dockerfile:

 RUN apk add  py-numpy-dev
Sign up to request clarification or add additional context in comments.

2 Comments

Awesome. RUN apk update && apk add py3-numpy-dev@edge-community to the rescue. Thanks a lot!
As of now (May 2021), instead of edge, it works with RUN echo "https://dl-cdn.alpinelinux.org/alpine/latest-stable/community" >> /etc/apk/repositories && apk update && apk add py3-numpy-dev

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.