2

I'm trying to convert numbers within an array to letters. So '001' would be changed to 'A', '002' to 'B', all the way to '025' to 'Y'.

So far I've tried using a dictionary to replace the values but that doesn't seem to work, using np.place doesn't work since it's an if/else condition and I have more variables than that.

Polymer_data = Polymer_data.sort_values(['ID'])
for i in Polymer_data.ID:
    first_arr = np.array(i.split('-'))
    print(first_arr)

A small sample of the data in the array

['001' '001' '001' '021']
['001' '001' '001' '022']
['006' '009' '019' '016']
['006' '009' '019' '017']
['019' '025' '001' '025']
['019' '025' '002' '022']
['025' '013' '025' '025']
['025' '014' '017' '025']
['025' '014' '020' '025']
['025' '015' '022' '025']
['025' '015' '025' '025']
['025' '017' '017' '025']
['025' '017' '017' '025']

So the data above should be converted to

['A' 'A' 'A' 'U']
['A' 'A' 'A' 'V']
['F' 'I' 'S' 'P']
['F' 'I' 'S' 'Q']
['S' 'Y' 'A' 'Y']
['S' 'Y' 'B' 'V']
['Y' 'M' 'Y' 'Y']
['Y' 'N' 'Q' 'Y']
['Y' 'N' 'T' 'Y']
['Y' 'O' 'V' 'Y']
['Y' 'O' 'Y' 'Y']
['Y' 'Q' 'Q' 'Y']
['Y' 'Q' 'Q' 'Y']

Edit: Formatting on the code

Also in terms of the array structure '001' to '025' is arranged in a sequence of four which is repeated until all permutations are accounted for, so the full list of array has over 180000 rows.

6
  • Please provide a clearer structure of your lists for me to update my answer and give you a complete answer. Commented Jul 16, 2019 at 13:52
  • Is your original array an array of strings containing three digit integers, or an array of integers? Commented Jul 16, 2019 at 13:55
  • Your for loop appears to be empty. Remember that statements in a for loop have to be indented in Python. Commented Jul 16, 2019 at 13:55
  • @marc I've updated the post with more info on the list structure. Commented Jul 16, 2019 at 14:18
  • @Linuxios it's just an array of integers from 1 to 25 Commented Jul 16, 2019 at 14:18

5 Answers 5

3

The way I would do this is by creating a dictionary mapping integers to letters, and use it to map the values in the array using np.vectorize with dict.get:

from string import ascii_uppercase
d = dict(enumerate(ascii_uppercase))
# {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'...
np.vectorize(d.get)(a.astype(int)-1)

array([['A', 'A', 'A', 'U'],
       ['A', 'A', 'A', 'V'],
       ['F', 'I', 'S', 'P'],
       ['F', 'I', 'S', 'Q'],
       ['S', 'Y', 'A', 'Y'],
       ['S', 'Y', 'B', 'V'],
       ['Y', 'M', 'Y', 'Y'],
       ['Y', 'N', 'Q', 'Y'],
       ['Y', 'N', 'T', 'Y'],
       ['Y', 'O', 'V', 'Y'],
       ['Y', 'O', 'Y', 'Y'],
       ['Y', 'Q', 'Q', 'Y'],
       ['Y', 'Q', 'Q', 'Y']], dtype='<U1')
Sign up to request clarification or add additional context in comments.

Comments

1

If speed is an issue you can vectorize this operation if you create a mapping array and cast your array to real integers first.

import string
map = np.array(list(string.ascii_uppercase))
data = data.astype(int)

map[data - 1]
# array([['A', 'A', 'A', 'U'],
#        ['A', 'A', 'A', 'V'],
#        ['F', 'I', 'S', 'P'],
#        ['F', 'I', 'S', 'Q'],
#        ['S', 'Y', 'A', 'Y'],
#        ['S', 'Y', 'B', 'V'],
#        ['Y', 'M', 'Y', 'Y'],
#        ['Y', 'N', 'Q', 'Y'],
#        ['Y', 'N', 'T', 'Y'],
#        ['Y', 'O', 'V', 'Y'],
#        ['Y', 'O', 'Y', 'Y'],
#        ['Y', 'Q', 'Q', 'Y'],
#        ['Y', 'Q', 'Q', 'Y']], dtype='<U1')

Comments

1

I would suggest using "chr()" like so:

def numToChar(num):
    asciiInt = int(num) + 64
    character = str(chr(asciiInt))
    return character

a = '002'    
print(numToChar(a)) # prints 'B'

EDIT :

Supposing your data looks like this:

arr = ['001', '001', '001', '021', '001', '001', '001', '022', '006', '009', '019', '016', '006', '009', '019', '017', '019', '025', '001', '025', '019', '025', '002', '022', '025', '013', '025', '025', '025', '014', '017', '025']


def numToChar(num):
    asciiInt = int(num) + 64
    character = str(chr(asciiInt))
    return character


for i in range(len(arr)):
    arr[i] = numToChar(arr[i])


print(arr)
# Would print ['A', 'A', 'A', 'U', 'A', 'A', 'A', 'V', 'F', 'I', 'S', 'P', 'F', 'I', 'S', 'Q', 'S', 'Y', 'A','Y', 'S', 'Y', 'B', 'V', 'Y', 'M', 'Y', 'Y', 'Y', 'N', 'Q', 'Y']

Comments

0

You can use the string modules to create a mapping between integers and ascii characters:

import string

alphabet = string.ascii_uppercase

numbers = ["001", "0022", "003", "005"]
letters = [alphabet[int(number)-1] for number in numbers]
print(letters)

Returns

['A', 'V', 'C', 'E']

Comments

0

We could convert to int/uint dtype, add 64 to it to make it convert to ascii equivalent of those numbers and then simply view as the string format. The view part would be virtually free on runtime and hence should be pretty efficient -

# a is input array
def convert_to_char_typeconv(a):
    return (a.astype(np.uint32)+64).view('U1')

Another way would be to view as uint8/uint4 dtype values, convert each triplet as a number each, then view as the string format. Again, the ascii equivalent idea would come into the method. Hence, the implementation would be -

def convert_to_char_view_sumr(a):    
    b = (a.view('u4')-48).reshape(-1,3)[:,1:]
    return (b[:,0]*10+b[:,1]+64).view('U1').reshape(len(a),-1)

Sample run -

# Input array of dtype <U3
In [419]: a
Out[419]: 
array([['001', '001', '001', '021'],
       ['001', '017', '001', '022']], dtype='<U3')

In [420]: convert_to_char_typeconv(a)
Out[420]: 
array([['A', 'A', 'A', 'U'],
       ['A', 'Q', 'A', 'V']], dtype='<U1')

In [421]: convert_to_char_view_sumr(a)
Out[421]: 
array([['A', 'A', 'A', 'U'],
       ['A', 'Q', 'A', 'V']], dtype='<U1')

Benchmarking

Other posted approaches -

import string
from string import ascii_uppercase

# @neko's soln
def neko(numbers):
    alphabet = string.ascii_uppercase
    letters = [alphabet[int(number)-1] for number in numbers]
    return letters

# @yatu's soln
def yatu(a):
    d = dict(enumerate(ascii_uppercase))
    return np.vectorize(d.get)(a.astype(int)-1)

# @Nils Werner's soln
def nils(data):
    map = np.array(list(string.ascii_uppercase))
    data = data.astype(int)
    return map[data - 1]

Timings on 180000 rows data -

To setup input data let's use the sample a with 2 rows and repeat it 90000x times along rows to simulate OP's case of 180000 rows.

In [425]: a
Out[425]: 
array([['001', '001', '001', '021'],
       ['001', '017', '001', '022']], dtype='<U3')

In [426]: a = np.repeat(a,90000,axis=0)

In [427]: %timeit neko(a.ravel())
     ...: %timeit yatu(a)
     ...: %timeit nils(a)
254 ms ± 1.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
285 ms ± 1.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
206 ms ± 882 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [428]: %timeit convert_to_char_typeconv(a)
     ...: %timeit convert_to_char_view_sumr(a)
206 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.83 ms ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

3 Comments

convert_to_char_view_dot returns AA instead fo A. Should it be view(U1) at the end? Also, could you replace tensordot by b@s?
@Brenlla Well, it was fine at my end. So, I am guessing the issue could had been with different input dtype or just something else. Edited with explicit type conversion for the first method, removed second and use u4 for the third one. Can you check out how these work at your end? Appreciate the feedback!
My a shape is (180000,4), dtype <U3 (or str96). When I ran the original convert_to_char_view_dot I get output shape (180000,2), each row has 2 elements: AA, AU. Now it works fine

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.