0

I want to replicate the below code using pyspark DataFrame functions instead of SQL query.

spark.sql("select date from walmart_stock order by high desc limit 1").show()

Link of dataset

1 Answer 1

1

Here is the code if you start from the linked CSV file. You should recognize the SQL functions. Note that we use the inferSchema option in order to directly parse the numbers into doubles and obtain the correct ordering (it would not work as expected with the default string type). Another way would be to cast the column after reading the CSV.

spark.read
    .option("header", "true")
    .option("inferSchema", "true")
    .csv("walmart_stock.csv")
    .orderBy(f.col("High"), desc=True)
    .limit(1)
    .select("Date")
    .show()

which yields

+----------+
|      Date|
+----------+
|2015-11-13|
+----------+
Sign up to request clarification or add additional context in comments.

2 Comments

I created a table named walmart_stock and working with it. Yeah from your code I got my answer. Thank you.
Is there any way to automate the same in python? so that it should work for any given generic sql query

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.