15

I have a list, let say L = ['apple','bat','apple','car','pet','bat'].

I want to convert it into Lnew = [ 1,2,1,3,4,2].

Every unique string is associated with a number.

I have a java solution using hashmap, but I don't know how to use hashmap in python. Please help.

6
  • 2
    what you have tried? Commented Apr 4, 2017 at 9:27
  • Dict in python works like hashmap Commented Apr 4, 2017 at 9:31
  • @RaminNietzsche, I can't speak for Java's hashmap, but Python's dicts don't give the integer indexes the questioners wants, especially alphabetically sorted (which was not specificially requested, but was evident in their desired output). Commented Apr 4, 2017 at 9:34
  • 1
    How do you work out the number to associate with a string? Commented Apr 4, 2017 at 9:35
  • 1
    @RaminNietzsche, still, you've got the right idea, you can use a dict to create a mapping this way: d = {k: v for v, k in enumerate(sorted(set(L)))} and then Lnew = [d[x] for x in L. Commented Apr 4, 2017 at 9:40

6 Answers 6

20

Here's a quick solution:

l = ['apple','bat','apple','car','pet','bat']

Create a dict that maps all unique strings to integers:

d = dict([(y,x+1) for x,y in enumerate(sorted(set(l)))])

Map each string in the original list to its respective integer:

print [d[x] for x in l]
# [1, 2, 1, 3, 4, 2]
Sign up to request clarification or add additional context in comments.

7 Comments

I would just add enumerate(set(sorted(l))) since questioner didn't specify an alphabetical sort, but their desired output has it.
Also, you could use a dict comprehension: d = {k: v for v, k in enumerate(sorted(set(l)))}
Whether this works depends on whether the OP wants just "a number" as described or in fact the first index+1 as shown in their output; also use a dict comprehension
[3, 2, 3, 1, 0, 2] is not the result OP wanted, am I missing something here?
The answerer didn't sort the list or 1-index the mapping. The following will use the same approach and give the same output: d = {k: v+1 for v, k in enumerate(sorted(set(L)))}, then Lnew = [d[x] for x in L].
|
4
x = list(set(L))
dic = dict(zip(x, list(range(1,len(x)+1))))

>>> [dic[v] for v in L]
[1, 2, 1, 3, 4, 2]

4 Comments

And, of course, use x.index(v)+1 if you want the first word to have the number 1
Code only, not the desired output, and list.index is O(n) per call
Has quadratic runtime unfortunately, this can be done in O(n).
list(set(sorted(L))) and [x.index(v)+1... to get output questioner wanted.
4

You can use a map dictionary:

d = {'apple':1, 'bat':2, 'car':3, 'pet':4}
L = ['apple','bat','apple','car','pet','bat']
[d[x] for x in L] # [1, 2, 1, 3, 4, 2]

For auto creating map dictionary you can use defaultdict(int) with a counter.

from collections import defaultdict
d = defaultdict(int)
co = 1
for x in L:
    if not d[x]:
        d[x] = co
        co+=1
d # defaultdict(<class 'int'>, {'pet': 4, 'bat': 2, 'apple': 1, 'car': 3})

Or as @Stuart mentioned you can use d = dict(zip(set(L), range(len(L)))) for creating dictionary

4 Comments

I have a lot of strings. So copying manually in the code is not feasible.
@Mustafa I edited answer that how you should automate the dictionary creation
You could make the map automatically with d = dict(zip(set(L), range(len(L))))
@Mustafa You need to define the mapping between strings and integers somewhere?
2

You'd use a hashmap in Python, too, but we call it a dict.

>>> L = ['apple','bat','apple','car','pet','bat']
>>> idx = 1
>>> seen_first = {}
>>>
>>> for word in L:
...     if word not in seen_first:
...         seen_first[word] = idx
...         idx += 1
... 
>>> [seen_first[word] for word in L]
[1, 2, 1, 3, 4, 2]

4 Comments

+1 for the most obvious and sensible answer; but how about {x:len(L)-i for i,x in enumerate(L[::-1])} to build the dict
@Chris_Rands I just realized OP does not want to go by index + 1, but give the first unique word the number 1, the second unique word the number 2, and so on. (I edited my answer accordingly.)
I now think what they actually want (based on the top answer) is this stackoverflow.com/questions/42350029/… but frankly the question is not clear and should be closed IMO
@Chris_Rands yeah I'm confused now.
0

You can try:

>>> L = ['apple','bat','apple','car','pet','bat']
>>> l_dict = dict(zip(set(L), range(len(L))))
>>> print l_dict
{'pet': 0, 'car': 1, 'bat': 2, 'apple': 3}
>>> [l_dict[x] for x in L]
[3, 2, 3, 1, 0, 2]

Comments

-2
Lnew = []
for s in L:
    Lnew.append(hash(s))  # hash(x) returns a unique int based on string

4 Comments

From the question, I think they're looking for 1-based integers, not the very long integers hash() gives.
consider providing an explanation to your code
hash does not return a unique int for each string. Hash collisions are possible.
The general approach here is fine if you explain that this is lossy encoding (mapping is not guaranteed to be 1:1 and may not be fully reversible). The bigger issue is that the built in hash function is not consistent for any two runs. hashlib with blake2s and reducing to int would be better.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.