Well you have two options I can think of. In case you have only unique names, you can simply apply the monotonically_increasing_id function. This will create an unique but not consecutive id for each row.
import pyspark.sql.functions as F
from pyspark.ml.feature import StringIndexer
l = [
('nameone', ),
('nametwo', ),
('nameone', )
]
columns = ['Name']
df=spark.createDataFrame(l, columns)
#use Name instead of uniqueId to overwrite the column
df = df.withColumn('uniqueId', F.monotonically_increasing_id())
df.show()
Output:
+-------+----------+
| Name| uniqueId|
+-------+----------+
|nameone| 0|
|nametwo|8589934592|
|nameone|8589934593|
+-------+----------+
In case you want to assign the same id to rows which have the same value for Name, you have to use a StringIndexer:
indexer = StringIndexer(inputCol="Name", outputCol="StringINdex")
df = indexer.fit(df).transform(df)
df.show()
Output:
+-------+----------+-----------+
| Name| uniqueId|StringINdex|
+-------+----------+-----------+
|nameone| 0| 0.0|
|nametwo|8589934592| 1.0|
|nameone|8589934593| 0.0|
+-------+----------+-----------+