0

my problem is when I search a pdf file using python. I search it line by line so suppose I have a line contains:

"this this this %this"

so if we put x = "this this this %this" and I want to count the number of "this" and ignore what proceeds "%" as it is a comment. the code is :

if re.search("%",x):
    new_line = x.split()
    for g in new_line:
        if re.search("%",g):
            break
        elif g == "this":
            counter = counter+1
    print (counter)

but what if I have the following:

x = "this this this %this %this" the second percentage ends the comment and I want to skip "this" which is between "%" and count the last one

have any one any Idea to do it ?

1
  • 1
    If you are opening a PDF file as a text file and attempting to parse out the contents, be aware that PDF files often do not store their contents in sequential text strings as it appears on the output. Parsing raw PDF can be an essentially impossible task. Commented Apr 11, 2014 at 16:11

2 Answers 2

1

You could try

x = re.sub("%[^%]*%?", "", x);

Demo: http://regex101.com/r/tE6rL7

Sign up to request clarification or add additional context in comments.

1 Comment

Than you very much it worked like a magic. please I'm working in a project and I need further help if you don't mind?
0
data = "this this this %this %this"

data = ' '.join(data.split('%')[::2])

data # => "this this this  this"

1 Comment

thank you very much. could I ask you more question if you don't mind?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.