0

I'm trying to read a postgreSQL table available in the Azure cloud subscription using pyspark, but getting the below error. I'm aware that when we use load function, we should include the format as well. But since this PostgreSQL instance is available in the different azure subscription, i dont have access to PostgreSQL database at all, if that is the case how to infer the schema? or is there any better way to read the data from databricks.

df = spark.read.option("url", "jdbc:postgresql://{hostname}:5432&user={username}&password={xxxxx}&sslmode=require").option("dbtable", {tablename}).load()

Error:

 ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    327                     "An error occurred while calling {0}{1}{2}.\n".
--> 328                     format(target_id, ".", name), value)
    329             else:

Py4JJavaError: An error occurred while calling o1169.load.
: org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:211)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$8.apply(DataSource.scala:211)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:210)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:311)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:297)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:203)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

1 Answer 1

1

The error is because the code implicitly assumes that the format is parquet. No matter of the options defined, they're simply ignored by the format.

In other words, the structured query does not use load data using JDBC at all.

That's this part of the error that says so (pretty much):

org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;

If you want to read from a JDBC data source, you should include format("jdbc") in the query:

spark.read.format("jdbc")
Sign up to request clarification or add additional context in comments.

2 Comments

Hello @jacekLaskowski, many thanks for the response, when i try to use spark.read.format("jdbc") stuck with the error message: org.postgresql.util.PSQLException: FATAL: SSL connection is required. Please specify SSL options and retry
Well, that's another issue then. Please read JDBC To Other Databases in the official docs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.