Map unique strings to integers in Python [duplicate]

I would just add enumerate(set(sorted(l))) since questioner didn't specify an alphabetical sort, but their desired output has it.

Also, you could use a dict comprehension: d = {k: v for v, k in enumerate(sorted(set(l)))}

Whether this works depends on whether the OP wants just "a number" as described or in fact the first index+1 as shown in their output; also use a dict comprehension

[3, 2, 3, 1, 0, 2] is not the result OP wanted, am I missing something here?

The answerer didn't sort the list or 1-index the mapping. The following will use the same approach and give the same output: d = {k: v+1 for v, k in enumerate(sorted(set(L)))}, then Lnew = [d[x] for x in L].

|

Colonel Beauvel · Accepted Answer · 2017-04-04 09:41:25Z

4

x = list(set(L))
dic = dict(zip(x, list(range(1,len(x)+1))))

>>> [dic[v] for v in L]
[1, 2, 1, 3, 4, 2]

edited Apr 4, 2017 at 9:41

answered Apr 4, 2017 at 9:33

Colonel Beauvel

31.3k11 gold badges49 silver badges88 bronze badges

4 Comments

zmbq Over a year ago

And, of course, use x.index(v)+1 if you want the first word to have the number 1

Code only, not the desired output, and list.index is O(n) per call

Has quadratic runtime unfortunately, this can be done in O(n).

Douglas Leeder Over a year ago

list(set(sorted(L))) and [x.index(v)+1... to get output questioner wanted.

ᴀʀᴍᴀɴ · Accepted Answer · 2017-04-04 09:42:33Z

4

You can use a map dictionary:

d = {'apple':1, 'bat':2, 'car':3, 'pet':4}
L = ['apple','bat','apple','car','pet','bat']
[d[x] for x in L] # [1, 2, 1, 3, 4, 2]

For auto creating map dictionary you can use defaultdict(int) with a counter.

from collections import defaultdict
d = defaultdict(int)
co = 1
for x in L:
    if not d[x]:
        d[x] = co
        co+=1
d # defaultdict(<class 'int'>, {'pet': 4, 'bat': 2, 'apple': 1, 'car': 3})

Or as @Stuart mentioned you can use d = dict(zip(set(L), range(len(L)))) for creating dictionary

edited Apr 4, 2017 at 9:42

answered Apr 4, 2017 at 9:31

ᴀʀᴍᴀɴ

4,5368 gold badges41 silver badges61 bronze badges

4 Comments

BuggerNot Over a year ago

I have a lot of strings. So copying manually in the code is not feasible.

ᴀʀᴍᴀɴ Over a year ago

@Mustafa I edited answer that how you should automate the dictionary creation

Stuart Over a year ago

You could make the map automatically with d = dict(zip(set(L), range(len(L))))

@Mustafa You need to define the mapping between strings and integers somewhere?

timgeb · Accepted Answer · 2017-04-04 09:41:57Z

2

You'd use a hashmap in Python, too, but we call it a dict.

>>> L = ['apple','bat','apple','car','pet','bat']
>>> idx = 1
>>> seen_first = {}
>>>
>>> for word in L:
...     if word not in seen_first:
...         seen_first[word] = idx
...         idx += 1
... 
>>> [seen_first[word] for word in L]
[1, 2, 1, 3, 4, 2]

edited Apr 4, 2017 at 9:41

answered Apr 4, 2017 at 9:32

timgeb

79.2k20 gold badges129 silver badges150 bronze badges

4 Comments

+1 for the most obvious and sensible answer; but how about {x:len(L)-i for i,x in enumerate(L[::-1])} to build the dict

@Chris_Rands I just realized OP does not want to go by index + 1, but give the first unique word the number 1, the second unique word the number 2, and so on. (I edited my answer accordingly.)

I now think what they actually want (based on the top answer) is this stackoverflow.com/questions/42350029/… but frankly the question is not clear and should be closed IMO

@Chris_Rands yeah I'm confused now.

Harsha Biyani · Accepted Answer · 2017-04-04 09:38:44Z

0

You can try:

>>> L = ['apple','bat','apple','car','pet','bat']
>>> l_dict = dict(zip(set(L), range(len(L))))
>>> print l_dict
{'pet': 0, 'car': 1, 'bat': 2, 'apple': 3}
>>> [l_dict[x] for x in L]
[3, 2, 3, 1, 0, 2]

answered Apr 4, 2017 at 9:38

Harsha Biyani

7,28810 gold badges42 silver badges64 bronze badges

Comments

Ash Ketchum · Accepted Answer · 2017-04-04 09:32:11Z

-2

Lnew = []
for s in L:
    Lnew.append(hash(s))  # hash(x) returns a unique int based on string

answered Apr 4, 2017 at 9:32

Ash Ketchum

2,1501 gold badge14 silver badges6 bronze badges

4 Comments