0

I am trying to add the correct white space for data i am receiving. currently it shows like this

NotStarted

ReadyforPPPDReview

this is the code i am using

.replace(/([A-Z])/g, '$1')

"NotStarted" shows correct "Not Started" but "ReadyforPPPDReview" shows "Readyfor P P P D Review" when it should look like this "Ready for PPPD Review"

what is the best way to handle both of these using one regex or function?

5
  • BTW these are values coming back based on if it has been reviewed or not started. so it is dynamic Commented Jun 17, 2020 at 22:11
  • 1
    How do you plan to let regex engine now the Readyfor are two concatenated words? Commented Jun 17, 2020 at 22:32
  • well they really shouldnt be. It should display like this "Ready for PPPD Review" the problem i am having is since i added the regex and it is going based off of camel case it comes back as this "Readyfor P P P D Review" not sure how to handle this as well as the "Not Started" Commented Jun 17, 2020 at 22:43
  • Should there be camel casing for for word too in ReadyforPPPDReview? Something like ReadyForPPPDReview. Commented Jun 18, 2020 at 3:23
  • no unfortunately for is not camel casing thats one of the reasons this has been more difficult then it really should be lol Commented Jun 18, 2020 at 15:05

1 Answer 1

1

You would need an NLP engine to handle this properly. Here are two approaches with simple regex, both have limitations:

1. Use list of stop words

We blindly add spaces before and after the stop words:

var str = 'NotStarted, ReadyforPPPDReview';
var wordList = 'and, for, in, on, not, review, the'; // stop words

var wordListRe = new RegExp('(' + wordList.replace(/, */g, '|') + ')', 'gi');
var result1 = str
  .replace(wordListRe, ' $1 ') // add space before and after stop words
  .replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
  .replace(/  +/g, ' ') // remove excessive spaces
  .trim(); // remove spaces at start and end
console.log('str:     ' + str);
console.log('result1: ' + result1);

As you can imagine the stop words approach has some severe limitations. For example, words formula input would result in for mula in put.

1. Use a mapping table

The mapping table lists words that need to be spaced out (no drugs involved), as in this code snippet:

var str = 'NotStarted, ReadyforPPPDReview';
var spaceWordMap = {
  NotStarted: 'Not Started',
  Readyfor:   'Ready for',
  PPPDReview: 'PPPD Review'
  // add more as needed
};

var spaceWordMapRe = new RegExp('(' + Object.keys(spaceWordMap).join('|') + ')', 'gi');
var result2 = str
  .replace(spaceWordMapRe, function(m, p1) { // m: matched snippet, p1: first group
    return spaceWordMap[p1] // replace key in spaceWordMap with its value
  })
  .replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
  .replace(/  +/g, ' ') // remove excessive spaces
  .trim(); // remove spaces at start and end
console.log('str:     ' + str);
console.log('result2: ' + result2);

This approach is suitable if you have a deterministic list of words as input.

Sign up to request clarification or add additional context in comments.

1 Comment

@Pdavis33: I am glad this answer is useful to you. Please consider an upvote as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.