Python: REGULAR EXPRESSION on text

Question

Q4:Delete all the reference numbers in the text (including their brackets). It should delete the following: [8] etc. Before deleting them, print the list of those reference numbers then print the following: There are {length of list} references numbers to be deleted. My codes are below:

import re
with open('macOS.txt', 'r') as f:
  content = f.read()
  
temp = re.sub('<[^>]*>', '', content)
print(f'There are {len(temp)} references numbers to be deleted.')
print(temp)

While I am not sure is this right answer? For delete [8],[9] I used re.sub('<[^>]*>', '',content)

Q5:Using the new text from no.4, split the text to check how many sentences are in the text. Be careful to not split on the period in something like the following:

by Apple Inc.

since 2001 OS X 10.1 etc.

Then print the following: There are {length of list} sentences in the text.

But in Q5 I don't know how to use the new text from Q4? Anyone can please guide me how to do this?

You can not match [8] with '<[^>]*>, you could use \[\d+] to remove the square brackets that contain 1 or more digits. — The fourth bird
– The fourth bird, Commented May 3, 2021 at 10:11

The fourth bird · Accepted Answer · 2021-05-03 10:44:02Z

1

If you want to match 1 or more digits between square brackets, you can use \[\d+].

You can get the number of matches running len on the result of re.findall and use re.sub to replace the matches with a space.

import re

pattern = r"\[\d+]"

with open('macOS.txt', 'r') as f:
    content = f.read()
    print(f'There are {len(re.findall(pattern, content))} references numbers to be deleted.')
    result = re.sub(pattern, ' ', content)

    # use result for further processing

edited May 3, 2021 at 10:44

answered May 3, 2021 at 10:18

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

ssssmaner Over a year ago

You are right!! the output is :There are 9 references numbers to be deleted. but do you know how to Use the new text from this code? because I have to split the text to check how many sentences are in the text......

The fourth bird Over a year ago

@ssssmaner This line re.sub(pattern, ' ', content) returns the text with the replacements. You could assign that to a variable result = re.sub(pattern, ' ', content)

ssssmaner Over a year ago

result = re.sub(pattern, ' ', content) with open('result', 'r') as f: content = f.read() temp= re.match(r"(.*\.{1}\s{1})[A-Z].*",content) print(f'There are {len(temp)} sentences in the text.')

ssssmaner Over a year ago

I try to use the codes above but the error shows No such file or directory: 'result'

The fourth bird Over a year ago

@ssssmaner You can not do it like that. The content is what is read from the file, and you can not use it in the pattern before it as it does not exist yet.

|

Collectives™ on Stack Overflow

Python: REGULAR EXPRESSION on text

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related