i'm trying to create a index in a dataframe with pyspark, windown and row_number function.
For example:
Original dataframe
Obs: the data are random
| Coldata |
|---|
| A |
| B |
| C |
| D |
| E |
| F |
| G |
| H |
| I |
Expected Dataframe:
| Coldata | index |
|---|---|
| A | 1 |
| B | 1 |
| C | 1 |
| D | 2 |
| E | 2 |
| F | 2 |
| G | 3 |
| H | 3 |
| I | 3 |
My Code in moment is:
w = Window.orderBy("Coldata")
df_expected= df.withColumn("index", row_number().over(w))
But this returns 1,2,3,4,5