I would like to create column with sequential numbers in pyspark dataframe starting from specified number. For instance, I want to add column A to my dataframe df which will start from 5 to the length of my dataframe, incrementing by one, so 5, 6, 7, ..., length(df).
Some simple solution using pyspark methods?
df = df.rdd.zipWithIndex().toDF(cols + ["index"]).withColumn("index", f.col("index") + 5)wherecols = df.columnsandfrefers topyspark.sql.functions. But you should ask yourself why you're doing this, bc almost surely there's a better way. DataFrames are inherently unordered, so this operation is not efficient.max(id) + spark_func.row_number().over(Window.orderBy(unique_field_in_my_df)