9

I created a dataframe in spark when find the max date I want to save it to the variable. Just trying to figure out how to get the result, which is a string, and save it to a variable.

code so far:

sqlDF = spark.sql("SELECT MAX(date) FROM account")
sqlDF.show()

what results look likes:

+--------------------+
| max(date)|
+--------------------+
|2018-04-19T14:11:...|
+--------------------+

thanks

3 Answers 3

13

Assuming you're computing a global aggregate (where the output will have a single row) and are using PySpark, the following should work:

spark.sql("SELECT MAX(date) as maxDate FROM account").first()["maxDate"]

I believe this will return a datetime object but you can either convert that to a string in your driver code or do a SELECT CAST(MAX(DATE) as string) instead.

Sign up to request clarification or add additional context in comments.

Comments

5

Try something like this :

from pyspark.sql.functions import max as max_

# get last partition from all deltas
alldeltas=sqlContext.read.json (alldeltasdir)
last_delta=alldeltas.agg(max_("ingest_date")).collect()[0][0]

last_delta will give you a value, in this sample the maximum value of the column ingest_date in the dataframe.

1 Comment

Your_max_date = spark.sql("SELECT MAX(date) FROM account").collect()[0][0]
0

Assuming that sqlDF is a pandas dataframe and the value you want to get is at index 0:

max_date = str(sqlDF.get_value(0, 'max(date)'))

1 Comment

that doesn't work in spark. I got sqlDF.first() to show the first row. but I just want the value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.