0

i am trying to remove all the character beside the last 4 from all the values in a numpy array. I'd normally use [-4:] but if i use that on the arra i only obtain the last 4 values in the array.

andatum = andatum[-4:] print(andatum)

'15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999']

runfile('O:/GIS/GEP/Risikomanagement/Flussvermessung/ALD/Analyses/ReadFilesToRawData.py', wdir='O:/GIS/GEP/Risikomanagement/Flussvermessung/ALD/Analyses') ['15.11.1999' '15.11.1999' '15.11.1999' '15.11.1999']

What i am trying to do is to obtain the same array but only with the last 4 digits (the year). Any idea how i could do that?

Thank you,

Davide

I would like to remove all the characters beside the last 4 (the year) but using [-4:] i get the last 4 entries of my numpy array.

2 Answers 2

2

Looks like you have a 1d array of strings:

In [28]: arr = np.array(['15.11.1999']*6)    
In [29]: arr
Out[29]: 
array(['15.11.1999', '15.11.1999', '15.11.1999', '15.11.1999',
       '15.11.1999', '15.11.1999'], dtype='<U10')

numpy is better for numbers than strings. This array is little better than a list of strings. But for convenience, numpy has a set of functions that apply string methods to the elements of an array.

In [30]: np.char.split(arr, sep='.')
Out[30]: 
array([list(['15', '11', '1999']), list(['15', '11', '1999']),
       list(['15', '11', '1999']), list(['15', '11', '1999']),
       list(['15', '11', '1999']), list(['15', '11', '1999'])],
      dtype=object)

We can convert this to a 2d array of strings with stack (or vstack):

In [31]: np.stack(_)
Out[31]: 
array([['15', '11', '1999'],
       ['15', '11', '1999'],
       ['15', '11', '1999'],
       ['15', '11', '1999'],
       ['15', '11', '1999'],
       ['15', '11', '1999']], dtype='<U4')

And select a column:

In [32]: np.stack(_)[:,2]
Out[32]: array(['1999', '1999', '1999', '1999', '1999', '1999'], dtype='<U4')

np.char does not have a function to index the strings. For that we have to stick with a list comprehension

In [33]: [i[-4:] for i in arr]
Out[33]: ['1999', '1999', '1999', '1999', '1999', '1999']

That kind of iteration is faster with lists.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. I must say i am a bit confused about the data types of my array (asnd also between array and lists). It looks like a date and i'd like to extract the year. I couldnt transform it in a number so far. even if i extract the last four digits it still stays like '1999' (between quotation). How can i change that?
1

andatum[i] will reference items in the array. To reference individual characters of these items, you need to use multiple brackets like this andatum[i][x]

To get array of only last 4 characters you need to go over each item of the array like this:

for i in range(len(andatum)):
    andatum[i] = andatum[i][:-4]

Or to keep things more tidy and also faster, this oneliner should also do the work:

andatum = [x[:-4] for x in andatum]

1 Comment

A list comprehension isn't usually significantly faster than a regular loop. Also, the list comprehension changes andatum to a list (it used to be a np.array)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.