0

I had an np.array contains a set of string (with different length each one) like this example:

title=['the first step in 2017', 'Here is my 2016 report', '2016 new considerations' ....] 

I want to extract the year from each element in the array I had written this piece of code :

list_yea=[]
    for i, tit in enumerate(title) : 
        if '20' in tit:
               print(year)# ??? I could not find a best solution 
               list_yea.append(year)

I assumed that all the years are within the range [2000-2020] My problem is how to return only the year from that string

I have tried this code but it gave me wrong result:

years=[]
c=1 # tocheck the number of string does not contain the year 
for i, tit in enumerate(title) :
    if '20' in tit or '199' in tit : # for both 199x and 20xx years
        spl=tit.split(' ')
        for j , check in enumerate(spl):
            if '20' in check:
                years.append(check)
    if '20' not in tit and '199' not in tit :
        c=c+1
        years.append(0)

len(years) ==> 16732 While my total dataset was 16914 samples Thank you in advance for any help

2 Answers 2

1

You can try by iterating over the string and check if it is an integer using try and except and then check if it is starting with 20(for years starting with 2000) and length of the substring is 4 (if there are any another numbers)

list_yea=[]
for i, tit in enumerate(title) : 
    for j in tit.split():
        try:        
            year = int(j)
            if len(j)==4 and '20' in j:
                list_yea.append(j)
        except:
               pass
Sign up to request clarification or add additional context in comments.

2 Comments

@baddy Can you share the output
@baddy I forgot to split the string. Now this code gives the years present in the string in the array
0

Simplest solution which meets the requirements:

import re

title=['the first step in 2017', 'Here is my 2016 report', '2016 new considerations']

for t in title:
    print(re.findall(r"[0-9]+", t)[0])

You could further specialize the regex if you wish.

2 Comments

It gives index out of range if the string does not contain the year
So, simply check length before accessing 0th element.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.