3

it's my first time using Apache-Spark with python (pyspark), and I was trying to run Quick Start Examples, but when I run the line saying:

>>> textFile = spark.read.text("README.md")

it gives me the following error (I'm pasting just the first part because i think it's the most important):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/daniele/Scaricati/spark/python/pyspark/sql/readwriter.py", line 311, in text
    return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))
  File "/home/daniele/Scaricati/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/home/daniele/Scaricati/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
  File "/home/daniele/Scaricati/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o22.text.
: java.lang.reflect.InaccessibleObjectException: Unable to make field private transient java.lang.String java.net.URI.scheme accessible: module java.base does not "opens java.net" to unnamed module @779d0812
at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335)

Can someone help me to solve this? Sorry if my post is not that clear, but it's the first one on this forum. Thanks to everyone who will try to help, Daniele.

6
  • What is your Java version? Support for Java 7 was removed as of Spark2.2.0 Commented Nov 7, 2017 at 22:49
  • openjdk version "9-Ubuntu" OpenJDK Runtime Environment (build 9-Ubuntu+0-9b161-1) OpenJDK 64-Bit Server VM (build 9-Ubuntu+0-9b161-1, mixed mode) Commented Nov 8, 2017 at 0:21
  • Can you check if you get the same error trying to read another text file (use the full absolute path to make sure it is correct). Also try loading a parquet. If the error persists, there might be a problem with your spark-hadoop installation Commented Nov 8, 2017 at 7:07
  • I tried to read another file (using the full path) and I got the same error, I don't know what a parquet is. To install Spark I did something like this: - installed latest Java version: $sudo apt-get install openjdk-9-jre - downloaded Apache Spark ("Pre-built for Apache Hadoop 2.7 and later") Did I skip something important? Commented Nov 8, 2017 at 10:35
  • I think the problem comes from the fact that spark does not support Java 9 (it will in Spark 3.X probably). Try installing Java 8 instead, setting all necessary environment variables (JAVA_HOME, JRE_HOME) Commented Nov 8, 2017 at 11:22

1 Answer 1

4

The issue is that your spark version and java version are incompatible. In order to resolve this you must do the following:

  1. Check you PySpark version:

    pyspark

  2. Check which Java version is required for your PySpark version (e.g. for PySpark 2.4.6 we need Java 8 - https://spark.apache.org/docs/2.4.6/)

  3. Check your available Java versions installed

    /usr/libexec/java_home -V

  4. If your Java version is not available install it (e.g. brew install adoptopenjdk8)

  5. Change your JAVA_HOME to point to the correct version. Example:

    export JAVA_HOME="/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home"

  6. Confirm version java -version

After this you should be able to perform your functions as required

textFile = spark.read.text("README.md")
textFile.show()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.