1

Say we have an numpy.ndarray with numpy.str_ elements. For example, below arr is the numpy.ndarray with two numpy.str_ elements like this:

arr = ['12345"""ABCDEFG'  '1A2B3C"""']

Trying to perform string slicing on each numpy element.

For example, how can we slice the first element '12345"""ABCDEFG' so that we replace its 10 last characters with the string REPL, i.e.

arr = ['12345REPL'  '1A2B3C"""']

Also, is it possible to perform string substitutions, e.g. substitute all characters after a specific symbol?

2
  • This might be of some interest - How can I slice each element of a numpy array of strings?. Commented Dec 5, 2016 at 15:46
  • Note that the syntax arr = ['12345"""ABCDEFG' '1A2B3C"""'] actually means arr = ['12345"""ABCDEFG1A2B3C"""'] Commented Dec 5, 2016 at 17:46

3 Answers 3

0

Strings are immutable, so you should either create slices and manually recombine or use regular expressions. For example, to replace the last 10 characters of the first element in your array, arr, you could do:

import numpy as np
import re

arr = np.array(['12345"""ABCDEFG', '1A2B3C"""'])
arr[0] = re.sub(arr[0][-10:], 'REPL', arr[0])

print(arr)
#['12345REPL' '1A2B3C"""']

If you want to replace all characters after a specific character you could use a regular expression or find the index of that character in the string and use that as the slicing index.

EDIT: Your comment is more about regular expressions than simply Python slicing, but this is how you could replace everything after the triple quote:

re.sub('["]{3}(.+)', 'REPL', arr[0])

This line essentially says, "Find the triple quote and everything after it, but only replace every character after the triple quotes."

Sign up to request clarification or add additional context in comments.

1 Comment

That is great. Could you also show how the substitution of all characters after the """ (including the """) would look like? I know the regex is [\'"]{3} but I get confused as to how to put it inside this sub. Or just show how to find the index for example of those 3 quote symbols as you suggested in your edit.
0

In python, strings are immutable. Also, in NumPy, array scalars are immutable; your string is therefore immutable.

What you would want to do in order to slice is to treat your string like a list and access the elements.

Say we had a string where we wanted to slice at the 3rd letter, excluding the third letter:

my_str = 'purple'
sliced_str = my_str[:3]

Now that we have the part of the string, say we wanted to substitute z's for every letter following where we sliced. We would have to work with the new string that pulled out the letters we wanted, and create an additional string with the desired string that we want to create:

# say I want to replace the end of 'my_str', from where we sliced, with a string named 's'
s = 'dandylion'
new_string = sliced_str + s     # returns 'pudandylion'

Because string types are immutable, you have to store elements you want to keep, then combine the stored elements with the elements you would like to add in a new variable.

1 Comment

Thank you for your explanation. However, if you could give a small code example to show how my slice can happen specifically in numpy array elements it would great for me to understand.
0

np.char has replace function, which applies the corresponding string method to each element of the array:

In [598]: arr = np.array(['12345"""ABCDEFG',  '1A2B3C"""'])
In [599]: np.char.replace(arr,'"""ABCDEFG',"REPL")
Out[599]: 
array(['12345REPL', '1A2B3C"""'], 
      dtype='<U9')

In this particular example it can be made to work, but it isn't nearly as general purpose as re.sub. Also these char functions are only modestly faster than iterating on the array. There are some good examples of that in @Divakar's link.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.