Mock Search Algorithm
Here is my problem:
In a given document(string) such as:
"hello there my name is dominic and my name is very special"
and searchTerms(list) such as:
['my','dominic'] or ['dominic','my'] (shouldn't matter)
an algorithm would return the shortest excerpt of the document containing the terms:
>>> 'dominic and my'
because
>>> 'my name is dominic'
contains more words than the previous.
Here is my current idea:
Start the algorithm off by creating a list with a number of inner list elements equal to the number of search terms. Inside each number of list would be indices in which that search element appears in the document.
document = document.split();
So searchTerms = ['my', 'dominic'] would return
[[2,7], [5]]
because my appears at index 2 and 7 and dominic appears only at 5.
The algorithm would then take this list and generate a list of all posibilities:
[[2,5], [7,5]]
As you can see there are two substrings in the document string that contain both my and dominic. Then the algo could take the range of both inner lists my doing a max()-min() This would tell me the second exerpt of document is smaller than the first and could then return document[5:(7+1)] which would be the expected outcome.
For my idea this is what I have so far:
document = "hello there my name is dominic and my name is very special"
searchTerms = ['my', 'dominic']
def answer(document, searchTerms):
index = []
document = document.split()
for a in range(0, len(searchTerms)):
index.append([i for i, x in enumerate(document) if x == searchTerms[a]])
return index
As of now this returns [[2,7],[5]] however I am running into mainly one issue:
- Efficiency:
Is this solution efficient for a document string and search terms list that is extremely large? And if not what could be done to make it more efficient or is my original idea not good
I appreciate any insight into solving this issue, thank you.
{"word": [36, 42, 51], ...}). Then you just need to find the closest positions for two words..