0

let's say I have a list:

a = [['a','b'],['a','c'],['d','e'],['c','a']]

I need it to be

a = [[1,2],[1,3],[4,5],[3,1]]

I've tried to change the value using a counter but it does not work

4
  • Do you have a mapping given, or do you just need to map the same string to the same integer? Commented Feb 26, 2017 at 3:46
  • I just need to map the same string to the same integer Commented Feb 26, 2017 at 3:47
  • Is it always in a sub-array form? Commented Feb 26, 2017 at 3:48
  • yes it is a list of lists Commented Feb 26, 2017 at 3:49

3 Answers 3

3

This will work if you you have a list of lists and no other nesting. It will not only work for strings of any length, but for all hashable types. It also has the benefit of always using contiguous indexes starting at 0 (or 1) no matter which strings are used:

> from collections import defaultdict
> d = defaultdict(lambda: len(d))  # use 'len(d) + 1' if you want 1-based counting
> a = [[d[s] for s in x] for x in a]
> a
[[0, 1], [0, 2], [3, 4], [2, 0]]
# [[1, 2], [1, 3], [4, 5], [3, 1]]

It uses a defaultdict that always returns its current size for unknown items, thus always producing a unique integer.

Sign up to request clarification or add additional context in comments.

Comments

0

You could also use Gensims Dictionary using the following:

from gensim.corpora import Dictionary
a = [['a','b'],['a','c'],['d','e'],['c','a']]

# create Dictionary object
dic = Dictionary(a)

# map letters to tokens

def mapper(l):
    # take in list and return numeric representation using Dic above
    return map(lambda x: dic.token2id[x], l)

a2 = map(lambda x: mapper(x), a) # run mapper on each sublist to get letter id's
a2 # [[0, 1], [0, 2], [4, 3], [2, 0]]

And if you wanted to convert to id's with counts (bag of words) you can use:

map(lambda x: dic.doc2bow(x), a) 
# [[(0, 1), (1, 1)], [(0, 1), (2, 1)], [(3, 1), (4, 1)], [(0, 1), (2, 1)]]

Comments

0

If you're looking to expand this to multi-character strings, Python provides a builtin hash function that has a low chance of collisions:

a = [['a','b'],['a','c'],['d','e'],['c','a']]

b = [[hash(f), hash(s)] for f,s in a]

b
Out[11]: 
[[-351532772472455791, 5901277274144560781],
[-351532772472455791, 791873246212810409],
[3322017449728496367, 3233520255652808905],
[791873246212810409, -351532772472455791]]

If the strings are single characters, I'd define a function that translates them, then do the same list comprehension:

def to_int(char):
    return ord(char) - ord('a') + 1

b = [[to_int(f), to_int(s)] for f,s in a]

b
Out[14]: [[1, 2], [1, 3], [4, 5], [3, 1]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.