0

Q4:Delete all the reference numbers in the text (including their brackets). It should delete the following: [8] etc. Before deleting them, print the list of those reference numbers then print the following: There are {length of list} references numbers to be deleted. My codes are below:

import re
with open('macOS.txt', 'r') as f:
  content = f.read()
  
temp = re.sub('<[^>]*>', '', content)
print(f'There are {len(temp)} references numbers to be deleted.')
print(temp)

While I am not sure is this right answer? For delete [8],[9] I used re.sub('<[^>]*>', '',content)

Q5:Using the new text from no.4, split the text to check how many sentences are in the text. Be careful to not split on the period in something like the following:

by Apple Inc.

since 2001 OS X 10.1 etc.

Then print the following: There are {length of list} sentences in the text.

But in Q5 I don't know how to use the new text from Q4? Anyone can please guide me how to do this?

3
  • You can not match [8] with '<[^>]*>, you could use \[\d+] to remove the square brackets that contain 1 or more digits. Commented May 3, 2021 at 10:11
  • temp = re.sub('[\d+]', '', content) is this right? Commented May 3, 2021 at 10:14
  • Please review homework guidance. Commented May 3, 2021 at 10:48

1 Answer 1

1

If you want to match 1 or more digits between square brackets, you can use \[\d+].

You can get the number of matches running len on the result of re.findall and use re.sub to replace the matches with a space.

import re

pattern = r"\[\d+]"

with open('macOS.txt', 'r') as f:
    content = f.read()
    print(f'There are {len(re.findall(pattern, content))} references numbers to be deleted.')
    result = re.sub(pattern, ' ', content)

    # use result for further processing
Sign up to request clarification or add additional context in comments.

8 Comments

You are right!! the output is :There are 9 references numbers to be deleted. but do you know how to Use the new text from this code? because I have to split the text to check how many sentences are in the text......
@ssssmaner This line re.sub(pattern, ' ', content) returns the text with the replacements. You could assign that to a variable result = re.sub(pattern, ' ', content)
result = re.sub(pattern, ' ', content) with open('result', 'r') as f: content = f.read() temp= re.match(r"(.*\.{1}\s{1})[A-Z].*",content) print(f'There are {len(temp)} sentences in the text.')
I try to use the codes above but the error shows No such file or directory: 'result'
@ssssmaner You can not do it like that. The content is what is read from the file, and you can not use it in the pattern before it as it does not exist yet.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.