-1

I proofread a lot of plaintext files submitted by peers. As my eyes get tired I sometimes overlook extra spaces, or duplicate words. I currently use the following RegEx searches:

[/t]{2} -Finds duplicate spaces  
(?>(/p{P})\1+)(?<![^.]|^)\.{3}) -Finds duplicate punctuation except ellipses  
\b(\w+)\s+\1\b -Finds duplicate words  

I also have a few custom searches, example find "Mister" and replace with "Mr."

Is there a simple way to execute these four types of replace functions in JavaScript?

4
  • possible duplicate of Javascript replacing string pattern using RegExp? Commented Feb 3, 2015 at 17:09
  • The two first patterns are false. where are the replacement strings? Commented Feb 3, 2015 at 17:37
  • I started with a simple solution, using Workflow for iOS. There I am using the above examples and the replacement string is $1, or for the first example I just have it replace with a single space. Commented Feb 3, 2015 at 21:10
  • I don't understand what the problem is. Javascript supports regular expressions, so learn how to use the API and use it. If you have a specific problem, ask a question. The MDN has a good article, but there are many others. Commented Feb 3, 2015 at 22:58

1 Answer 1

0

Your first two regex don't look very good.
You could combine them all into a single regex though.

Of the two types below:

Type - 1 uses the punct posix character class.
Type - 2 uses a Branch Reset construct and the Punct property construct.

Just do a global find/replace with whichever regex will work with your platform.

Type 1 -

 # Find:  (?:([^\S\r\n])[^\S\r\n]+|\b(\w+)(?:\s+\2)+\b|(\.{3})\.*|([[:punct:]])\4+)
 # Replace:  $1$2$3$4

 (?:
      ( [^\S\r\n] )                 # (1)
      [^\S\r\n]+ 
   |  
      \b 
      ( \w+ )                       # (2)
      (?: \s+ \2 )+
      \b 
   |  
      ( \.{3} )                     # (3)
      \.*
   |  
      ( [[:punct:]] )               # (4)
      \4+ 
 )

Type 2 -

 # Find:  (?|([^\S\r\n])[^\S\r\n]+|\b(\w+)(?:\s+\1)+\b|(\.{3})\.*|(\p{Punct})\1+)
 # Replace:  $1

 (?|
      ( [^\S\r\n] )                 # (1)
      [^\S\r\n]+ 
   |  
      \b 
      ( \w+ )                       # (1)
      (?: \s+ \1 )+
      \b 
   |  
      ( \.{3} )                     # (1)
      \.*
   |  
      ( \p{Punct} )                 # (1)
      \1+ 
 )
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.