0

I'm writing code that's pulling in data from a website and it's printing out all the text between certain tags. I am storing the result into a list every time the code pulls data from a tag so I have a list looking something like

Warning
Not
News
Legends
Name1
Name2
Name3
Pickle
Stop
Hello

I want to look into this list of strings and have code that'll find the keywords legends and pickle and print whatever strings are between them.

To elaborate in a further activity, I may create a whole list of all possible legend names and then, if they occur whenever I generate my list, to print those out that reoccur. Any insight into any of these questions?

5 Answers 5

2

For the second approach, you could create a regex alternation of expected matching names, then use a list comprehension to generate a list of matches:

tags = ['Warning', 'Not', 'News', 'Legends', 'Name1', 'Name2', 'Name3', 'Pickle', 'Stop', 'Hello']
names = ['Name1', 'Name2', 'Name3']
regex = r'^(?:' + r'|'.join(names) + r')$'
matches = [x for x in tags if re.search(regex, x)]
print(matches)  # ['Name1', 'Name2', 'Name3']
Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

words = [
    "Warning", "Not", "News", "Legends", "Name1",
    "Name2", "Name3", "Pickle", "Stop", "Hello"
]
words_in_between = words[words.index("Legends") + 1:words.index("Pickle")]
print(words_in_between)

output:

['Name1', 'Name2', 'Name3']

This assumes that both "Legends" and "Pickle" are in the list exactly once.

Comments

1

You can use the list.index() method to find the numerical index of an item within a list, and then use list slicing to return the items in your list between those two points:

your_list = ['Warning','Not','News','Legends','Name1','Name2','Name3','Pickle','Stop','Hello']
your_list[your_list.index('Legends')+1:your_list.index('Pickle')]

The caveat is that .index() returns only the index of the first occurrence of the given item, so if your list has two 'legends' items, you'll only return the first index.

Comments

1

You can use list.index() to get the index of the first occurance of legends and pickle. Then you can use list slicing to get the elements in between

l = ['Warning','Not','News','Legends','Name1','Name2','Name3','Pickle','Stop','Hello']
l[l.index('Legends')+1 : l.index('Pickle')]
['Name1', 'Name2', 'Name3']

Comments

1

numpys function where gives you all occurances of a given item. So first make the lsit a numpy array

my_array = numpy.array(["Warning","Not","News","Legends","Name1","Name2","Name3","Pickle","Stop","Hello","Legends","Name1","Name2","Name3","Pickle",])

From here on you can use methods of numpy:

legends = np.where(my_array == "Legends")
pickle = np.where(my_array == "Pickle")

concatinating for easier looping

stack = np.concatenate([legends, pickle], axis=0)

look for the values between legends and pickle

np.concatenate([my_list[stack[0, i] + 1:stack[1, i]] for i in range(stack.shape[0])] )

The result in my case is:

array(['Name1', 'Name2', 'Name3', 'Name1', 'Name2'], dtype='<U7')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.