Create the DataFrame First
df_b = spark.createDataFrame([("A","2020-08-05"),("B","2020-08-01"),("B","2020-09-20"),("B","2020-12-31"),("C","2020-05-10")],[ "col1","col2"])
_w = W.partitionBy("col1").orderBy("col1")
df_b = df_b.withColumn("rn", F.row_number().over(_w))
The logic here to pick the second element of each group if any group has a more than one row. In order to do that we can first assign a row number to every group and we will pick first element of every group where row count is 1 and , first 2 row of every group where row count is more than 1 in every group.
case = F.expr("""
CASE WHEN rn =1 THEN 1
WHEN rn =2 THEN 1
END""")
df_b = df_b.withColumn('case_condition', case)
df_b = df_b.filter(F.col("case_condition") == F.lit("1"))
Intermediate Output
+----+----------+---+--------------+
|col1| col2| rn|case_condition|
+----+----------+---+--------------+
| B|2020-08-01| 1| 1|
| B|2020-09-20| 2| 1|
| C|2020-05-10| 1| 1|
| A|2020-08-05| 1| 1|
+----+----------+---+--------------+
Now, finally just take the last element of every group --
df = df_b.groupBy("col1").agg(F.last("col2").alias("col2")).orderBy("col1")
df.show()
+----+----------+
|col1| col2|
+----+----------+
| A|2020-08-05|
| B|2020-09-20|
| C|2020-05-10|
+----+----------+