Create a dictionary (map) with string, index in pyspark

Question

I have a RDD like this:

rdd = sc.parallelize(['a','b','a','c','d','b','e'])

I want to create a map(dictionary) of each unique value to an index.

The output will be a map (key, value) like:

{'a':0, 'b':1, 'c':2,'d':3,'e':4}

It's super easy to do in Python but I don't know how to do this in Spark.

user3689574 · Accepted Answer · 2018-05-13 07:37:50Z

1

What you are looking for is zipWithIndex

So for your example (The "sort" part is only to get a to be 0 and so on):

rdd = sc.parallelize(['a','b','a','c','d','b','e'])

print rdd.distinct().sortBy(lambda x: x).zipWithIndex().collectAsMap()

{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}

edited May 13, 2018 at 7:37

answered May 12, 2018 at 16:56

user3689574

1,6861 gold badge12 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alper t. Turker · Accepted Answer · 2018-05-12 16:55:59Z

1

If you can accept gaps this should do the trick:

rdd.zipWithIndex().reduceByKey(min).collectAsMap()
# {'b': 1, 'c': 3, 'a': 0, 'e': 6, 'd': 4}

Otherwise (much more expensive)

(rdd
    .zipWithIndex()
    .reduceByKey(min)
    .sortBy(lambda x: x[1])
    .keys()
    .zipWithIndex()
    .collectAsMap())
# {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}

answered May 12, 2018 at 16:55

Alper t. Turker

35.3k9 gold badges89 silver badges118 bronze badges

2 Comments

YOLO Over a year ago

this looks fine. but what do you mean by gaps?

Alper t. Turker Over a year ago

The first solution behaves like sparse rank, the second (which gives the result you seem to expect), more like dense rank (note that e is mapped to 6 - index in the original input, not 4 - index in RDD after dropping duplicates).

Collectives™ on Stack Overflow

Create a dictionary (map) with string, index in pyspark

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related