variable structure in json data source

Question

thanks for your time.

I have a dataframe in pyspark in Databricks that reads json. The data from the source does not always have the same structure, sometimes the 'emailAddress' field does not appear, causing me the error "org.apache.spark.sql.AnalysisException: cannot resolve ...".

I have tried to solve by applying a Try-Except function in this way:

try:
  df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet", "emailAddress")

except ValueError:
  None

But it does not work for me, it returns the same error that I mentioned.

I am even trying to take another alternative but without results:

 if 'Id_Cliente' in s_fields:  
  try:
    df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet", "emailAddress")
  except ValueError:
    df_json = df_json.select("responseID", "surveyID", "surveyName","timestamp", "customVariables.Id_Cliente", "timestamp", "responseSet")

Please help me with some idea to control this situation? I need to stop the execution of my notebook when it does not find the field in the structure, otherwise (it finds the emailAddress variable) to continue processing.

From already thank you very much.

Greetings.

pltc · Accepted Answer · 2022-04-29 04:38:45Z

1

You're catching ValueError while the exception is AnalysisException, that's why it doesn't work.

from pyspark.sql.utils import AnalysisException

try:
    df.select('xyz')
except AnalysisException:
    print(123)

answered Apr 29, 2022 at 4:38

pltc

6,0371 gold badge16 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

variable structure in json data source

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related