Replace numbers in an array with letters

Question

I'm trying to convert numbers within an array to letters. So '001' would be changed to 'A', '002' to 'B', all the way to '025' to 'Y'.

So far I've tried using a dictionary to replace the values but that doesn't seem to work, using np.place doesn't work since it's an if/else condition and I have more variables than that.

Polymer_data = Polymer_data.sort_values(['ID'])
for i in Polymer_data.ID:
    first_arr = np.array(i.split('-'))
    print(first_arr)

A small sample of the data in the array

['001' '001' '001' '021']
['001' '001' '001' '022']
['006' '009' '019' '016']
['006' '009' '019' '017']
['019' '025' '001' '025']
['019' '025' '002' '022']
['025' '013' '025' '025']
['025' '014' '017' '025']
['025' '014' '020' '025']
['025' '015' '022' '025']
['025' '015' '025' '025']
['025' '017' '017' '025']
['025' '017' '017' '025']

So the data above should be converted to

['A' 'A' 'A' 'U']
['A' 'A' 'A' 'V']
['F' 'I' 'S' 'P']
['F' 'I' 'S' 'Q']
['S' 'Y' 'A' 'Y']
['S' 'Y' 'B' 'V']
['Y' 'M' 'Y' 'Y']
['Y' 'N' 'Q' 'Y']
['Y' 'N' 'T' 'Y']
['Y' 'O' 'V' 'Y']
['Y' 'O' 'Y' 'Y']
['Y' 'Q' 'Q' 'Y']
['Y' 'Q' 'Q' 'Y']

Edit: Formatting on the code

Also in terms of the array structure '001' to '025' is arranged in a sequence of four which is repeated until all permutations are accounted for, so the full list of array has over 180000 rows.

Please provide a clearer structure of your lists for me to update my answer and give you a complete answer. — marc
– marc, Commented Jul 16, 2019 at 13:52
Is your original array an array of strings containing three digit integers, or an array of integers? — Linuxios
– Linuxios, Commented Jul 16, 2019 at 13:55
Your for loop appears to be empty. Remember that statements in a for loop have to be indented in Python. — jjramsey
– jjramsey, Commented Jul 16, 2019 at 13:55
@marc I've updated the post with more info on the list structure. — C.Y
– C.Y, Commented Jul 16, 2019 at 14:18

yatu · Accepted Answer · 2019-07-16 13:53:14Z

3

The way I would do this is by creating a dictionary mapping integers to letters, and use it to map the values in the array using np.vectorize with dict.get:

from string import ascii_uppercase
d = dict(enumerate(ascii_uppercase))
# {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'...
np.vectorize(d.get)(a.astype(int)-1)

array([['A', 'A', 'A', 'U'],
       ['A', 'A', 'A', 'V'],
       ['F', 'I', 'S', 'P'],
       ['F', 'I', 'S', 'Q'],
       ['S', 'Y', 'A', 'Y'],
       ['S', 'Y', 'B', 'V'],
       ['Y', 'M', 'Y', 'Y'],
       ['Y', 'N', 'Q', 'Y'],
       ['Y', 'N', 'T', 'Y'],
       ['Y', 'O', 'V', 'Y'],
       ['Y', 'O', 'Y', 'Y'],
       ['Y', 'Q', 'Q', 'Y'],
       ['Y', 'Q', 'Q', 'Y']], dtype='<U1')

answered Jul 16, 2019 at 13:53

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Nils Werner · Accepted Answer · 2019-07-16 14:21:31Z

1

If speed is an issue you can vectorize this operation if you create a mapping array and cast your array to real integers first.

import string
map = np.array(list(string.ascii_uppercase))
data = data.astype(int)

map[data - 1]
# array([['A', 'A', 'A', 'U'],
#        ['A', 'A', 'A', 'V'],
#        ['F', 'I', 'S', 'P'],
#        ['F', 'I', 'S', 'Q'],
#        ['S', 'Y', 'A', 'Y'],
#        ['S', 'Y', 'B', 'V'],
#        ['Y', 'M', 'Y', 'Y'],
#        ['Y', 'N', 'Q', 'Y'],
#        ['Y', 'N', 'T', 'Y'],
#        ['Y', 'O', 'V', 'Y'],
#        ['Y', 'O', 'Y', 'Y'],
#        ['Y', 'Q', 'Q', 'Y'],
#        ['Y', 'Q', 'Q', 'Y']], dtype='<U1')

answered Jul 16, 2019 at 14:21

Nils Werner

37.2k7 gold badges85 silver badges108 bronze badges

Comments

marc · Accepted Answer · 2019-07-17 11:56:36Z

I would suggest using "chr()" like so:

def numToChar(num):
    asciiInt = int(num) + 64
    character = str(chr(asciiInt))
    return character

a = '002'    
print(numToChar(a)) # prints 'B'

EDIT :

Supposing your data looks like this:

arr = ['001', '001', '001', '021', '001', '001', '001', '022', '006', '009', '019', '016', '006', '009', '019', '017', '019', '025', '001', '025', '019', '025', '002', '022', '025', '013', '025', '025', '025', '014', '017', '025']


def numToChar(num):
    asciiInt = int(num) + 64
    character = str(chr(asciiInt))
    return character


for i in range(len(arr)):
    arr[i] = numToChar(arr[i])


print(arr)
# Would print ['A', 'A', 'A', 'U', 'A', 'A', 'A', 'V', 'F', 'I', 'S', 'P', 'F', 'I', 'S', 'Q', 'S', 'Y', 'A','Y', 'S', 'Y', 'B', 'V', 'Y', 'M', 'Y', 'Y', 'Y', 'N', 'Q', 'Y']

neko · Accepted Answer · 2019-07-16 13:56:24Z

0

You can use the string modules to create a mapping between integers and ascii characters:

import string

alphabet = string.ascii_uppercase

numbers = ["001", "0022", "003", "005"]
letters = [alphabet[int(number)-1] for number in numbers]
print(letters)

Returns

['A', 'V', 'C', 'E']

answered Jul 16, 2019 at 13:56

neko

3891 silver badge5 bronze badges

Comments

Divakar · Accepted Answer · 2019-07-16 16:36:29Z

0

We could convert to int/uint dtype, add 64 to it to make it convert to ascii equivalent of those numbers and then simply view as the string format. The view part would be virtually free on runtime and hence should be pretty efficient -

# a is input array
def convert_to_char_typeconv(a):
    return (a.astype(np.uint32)+64).view('U1')

Another way would be to view as uint8/uint4 dtype values, convert each triplet as a number each, then view as the string format. Again, the ascii equivalent idea would come into the method. Hence, the implementation would be -

def convert_to_char_view_sumr(a):    
    b = (a.view('u4')-48).reshape(-1,3)[:,1:]
    return (b[:,0]*10+b[:,1]+64).view('U1').reshape(len(a),-1)

Sample run -

# Input array of dtype <U3
In [419]: a
Out[419]: 
array([['001', '001', '001', '021'],
       ['001', '017', '001', '022']], dtype='<U3')

In [420]: convert_to_char_typeconv(a)
Out[420]: 
array([['A', 'A', 'A', 'U'],
       ['A', 'Q', 'A', 'V']], dtype='<U1')

In [421]: convert_to_char_view_sumr(a)
Out[421]: 
array([['A', 'A', 'A', 'U'],
       ['A', 'Q', 'A', 'V']], dtype='<U1')

Benchmarking

3 Comments

Brenlla Over a year ago

convert_to_char_view_dot returns AA instead fo A. Should it be view(U1) at the end? Also, could you replace tensordot by b@s?

Divakar Over a year ago

@Brenlla Well, it was fine at my end. So, I am guessing the issue could had been with different input dtype or just something else. Edited with explicit type conversion for the first method, removed second and use u4 for the third one. Can you check out how these work at your end? Appreciate the feedback!

Brenlla Over a year ago

My a shape is (180000,4), dtype <U3 (or str96). When I ran the original convert_to_char_view_dot I get output shape (180000,2), each row has 2 elements: AA, AU. Now it works fine

Collectives™ on Stack Overflow

Replace numbers in an array with letters

5 Answers 5

Comments

Comments

Comments

Comments

Benchmarking

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Benchmarking

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related