Finding a similar text present in string in python

Question

I have a txt file containing text

Table of Contents

Preface 1

Chapter 1: Tokenizing Text and WordNet Basics 7

Tokenizing text into sentences 8

Tokenizing sentences into words 10

Tokenizing sentences using regular expressions 12

If the string I have is :

input = "Tokenzing sentence using expressions"

I thought of using beginning and ending words to extract the sentence but there are lot of repetitions.

So whats the best way to get the output

Tokenizing sentences using regular expressions

Are you sure about matching Tokenzing with Tokenizing? or it's just mistake? — Gahan
– Gahan, Commented May 28, 2017 at 13:47

BoarGules · Accepted Answer · 2017-06-11 22:24:02Z

4

If you are prepared to preprocess your chapter headings, eliminating page numbers and stuff, this:

import difflib
contents = ["Tokenizing Text and WordNet Basics",
            "Tokenizing text into sentences",
            "Tokenizing sentences into words",
            "Tokenizing sentences using regular expressions"]
input = "Tokenzing sentence using expressions"
print (difflib.get_close_matches(input, contents, n=1))

will give you this output:

['Tokenizing sentences using regular expressions']

edited Jun 11, 2017 at 22:24

answered May 28, 2017 at 14:16

BoarGules

17.1k3 gold badges29 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Finding a similar text present in string in python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related