Sql query to pyspark dataframe function

Question

I want to replicate the below code using pyspark DataFrame functions instead of SQL query.

spark.sql("select date from walmart_stock order by high desc limit 1").show()

Link of dataset

Oli · Accepted Answer · 2021-11-23 14:21:37Z

1

Here is the code if you start from the linked CSV file. You should recognize the SQL functions. Note that we use the inferSchema option in order to directly parse the numbers into doubles and obtain the correct ordering (it would not work as expected with the default string type). Another way would be to cast the column after reading the CSV.

spark.read
    .option("header", "true")
    .option("inferSchema", "true")
    .csv("walmart_stock.csv")
    .orderBy(f.col("High"), desc=True)
    .limit(1)
    .select("Date")
    .show()

which yields

+----------+
|      Date|
+----------+
|2015-11-13|
+----------+

answered Nov 23, 2021 at 14:21

Oli

10.5k5 gold badges31 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

BEing MECHon Over a year ago

I created a table named walmart_stock and working with it. Yeah from your code I got my answer. Thank you.

TheCodeCache Over a year ago

Is there any way to automate the same in python? so that it should work for any given generic sql query

Collectives™ on Stack Overflow

Sql query to pyspark dataframe function

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related