0

thanks for your time.

I have a dataframe in pyspark in Databricks that reads json. The data from the source does not always have the same structure, sometimes the 'emailAddress' field does not appear, causing me the error "org.apache.spark.sql.AnalysisException: cannot resolve ...".

I have tried to solve by applying a Try-Except function in this way:

try:
  df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet", "emailAddress")

except ValueError:
  None

But it does not work for me, it returns the same error that I mentioned.

I am even trying to take another alternative but without results:

 if 'Id_Cliente' in s_fields:  
  try:
    df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet", "emailAddress")
  except ValueError:
    df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet")

Please help me with some idea to control this situation? I need to stop the execution of my notebook when it does not find the field in the structure, otherwise (it finds the emailAddress variable) to continue processing.

From already thank you very much.

Greetings.

1 Answer 1

1

You're catching ValueError while the exception is AnalysisException, that's why it doesn't work.

from pyspark.sql.utils import AnalysisException

try:
    df.select('xyz')
except AnalysisException:
    print(123)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.