0

I would like to do the following query by passing the value of concepts as a parameter value to the UDF has_any_concept.

The following is in the environment

concepts
['CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE',
 'CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE',
 'CREATININE_QUANTITATIVE_SERUM_OBSTYPE']

This is the query without passing the parameters.

(spark.sql("""
select 
   
   resultCode.standard.primaryDisplay                                           as display
   
   from results 
   WHERE has_any_concept(resultCode, array("CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE","CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE","CREATININE_QUANTITATIVE_SERUM_OBSTYPE"))
   
   LIMIT 3
""".format(concepts = concepts))\
   .toPandas()
)
display
0   Creatinine [Mass/volume] in Serum or Plasma
1   Creatinine [Mass/volume] in Serum or Plasma
2   Creatinine [Mass/volume] in Serum or Plasma

This also works

(spark.sql("""
select 
   
   resultCode.standard.primaryDisplay                                           as display,
   ontologicalCategoryAliases                                                   as category
   
   from results 
   WHERE has_any_concept(resultCode, array("{concepts[0]}","{concepts[1]}","{concepts[2]}"))
   
   LIMIT 3
""".format(concepts = concepts))\
   .toPandas()
)
display     category
0   Creatinine [Mass/volume] in Serum or Plasma     [LABS_OBSTYPE]
1   Creatinine [Mass/volume] in Serum or Plasma     [LABS_OBSTYPE]
2   Creatinine [Mass/volume] in Serum or Plasma     [LABS_OBSTYPE]

This does not work

(spark.sql("""
select 
   
   resultCode.standard.primaryDisplay                                           as display,
   ontologicalCategoryAliases                                                   as category
   
   from results 
   WHERE has_any_concept(resultCode, array({concepts}))
   
   LIMIT 3
""".format(concepts = [''' "{concept}"   '''.format(concept = concept) for concept in concepts]))\
   .toPandas()
)
ParseException: '\nmismatched input \'from\' expecting <EOF>(line 7, pos 3)\n\n== SQL ==\n\nselect \n   \n   resultCode.standard.primaryDisplay                                           as display,\n   ontologicalCategoryAliases                                                   as category\n   \n   from results \n---^^^\n   WHERE has_any_concept(resultCode, array([\' "CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE"   \', \' "CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE"   \', \' "CREATININE_QUANTITATIVE_SERUM_OBSTYPE"   \']))\n   AND normalizedValue.typedValue.type = "NUMERIC" \n   AND interpretation.standard.primaryDisplay NOT IN (\'Not applicable\', \'Normal\')\n   \n   LIMIT 10\n'

I did not write the UDF has_any_concepts

1 Answer 1

1

If you're using python 3.6+, the code can look a little cleaner if you use f-strings.

You can't directly pass a list to the array function within the SQL syntax.

spark.sql(
    f"""
    select 
       resultCode.standard.primaryDisplay as display,
       ontologicalCategoryAliases as category
    from results 
    WHERE has_any_concept(resultCode, array({", ".join([f"'{x}'" for x in concepts])}))
    LIMIT 3
    """
).toPandas()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.