How to split the String based on array elements into array retaining the array the split word in javascript

Question

I have a string

sReasons =  "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt";

and I need to split the above string based on the separator array

const separator = ["O9", "EO", "HJ", "J8"];

Where first 2 characters(O9) represnet code, next 4 another code(C270) & next 4 the character(0021) length of the String which is Not eligible for SDWC

Where the separator codes are unique, with 2 capital letters and will not be repeated in textMessage except inEligType

I need to create a json of the format

{
    {inEligType: "O9", msgCode: "C270", msgLen: "0021", textMsg: "Not eligible for SDWC"},
    {inEligType: "EO", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "HJ", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "J8", msgCode: "C500", msgLen: "0016", textMsg: "Delivery Attempt"}
}

I'm basically failing at the splitting the string itself based on the array given, I tried the following

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

function formatReasons(Reasons: string) {
var words: any[] = Reasons.split(this.spearator); 
for(let word in words)
    {
       console.log(word) ;
    }
}
var result = formatReasons(sHdnReasonsCreate);
console.log("Returned Result: "+result);

But it gives me result as

["O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt"]length: 1__proto__: Array(0)

Returned Address is: undefined

And what will you do if one of the two-letter strings used as a separator happens to appear in the middle of the textMessage field? You'd be much better off splitting this according to the actual data format, by taking substrings of the appropriate length — jcalz
– jcalz, Commented Mar 26, 2021 at 13:43
They will not be appearing as these are unique and formed to not be coming in textMessage or msgCode — user9414660
– user9414660, Commented Mar 26, 2021 at 13:57

mike.k · Accepted Answer · 2021-03-26 14:16:37Z

2

My Regex-based approach:

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

// build the regex based on separators
let regexPattern = '^';
separator.forEach(text => {
    regexPattern += `${text}(.*)`;
});
regexPattern += '$';

// match the reasons
let r = new RegExp(regexPattern);
let matches = sReasons.match(r);

// prepare to match each message
let msgMatcher = new RegExp('^(?<msgCode>.{4})(?<msgLen>.{4})(?<textMsg>.*)$');
let output = [];

for (let i=1; i<matches.length; i++) {
    // match the message
    const msg = matches[i].match(msgMatcher);

    // store
    let item = msg.groups;
    item.inEligType = separator[i-1];
    output.push(item);
}

console.log(JSON.stringify(output, null, 2));

Produces

[
  {
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC",
    "inEligType": "O9"
  },
  {
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "EO"
  },
  {
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "HJ"
  },
  {
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt",
    "inEligType": "J8"
  }
]

answered Mar 26, 2021 at 14:16

mike.k

3,4771 gold badge14 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

georg Over a year ago

Of course, this only works for this particular example.

mike.k Over a year ago

@georg How so, and what abstract other examples do you propose applying it to? It can withstand changes in separator without any modifications of the logic, and the two string lengths hard-coded to 4 are defined that way, but with a direct way of changing them.

georg Over a year ago

You're assuming that each "separator" occurs exactly once and they always come in the given order. That might not be true in the general case.

mike.k Over a year ago

I see what you mean now. Yes there is a better way to generalize it, like the answer I see by @tarkh, I'd revise it to an approach similar to that one.

jcalz · Accepted Answer · 2021-03-26 14:32:53Z

It may well be that textMsg field, nor any other field, will never contain the two-letter strings you are using for the inEligType field. But are you absolutely sure of that? The data format looks to me like it really wants someone to parse it by substrings of certain lengths; why even have a msgLen field if you could just split based on delimiters? What if the list of inEligType codes changes in the future?

For these reasons I strongly recommend that you parse by substring lengths and not by delimiter matching. Here's one possible way to do that:

function formatReasons(reasons: string) {
  const ret = []
  while (reasons) {
    const inEligType = reasons.substring(0, 2);
    reasons = reasons.substring(2);
    const msgCode = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const msgLen = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const textMsg = reasons.substring(0, +msgLen);
    reasons = reasons.substring(+msgLen);
    ret.push({ inEligType, msgCode, msgLen, textMsg });
  }
  return ret;
}

You can verify that it produces the expected output for your example sReasons string:

const formattedReasons = formatReasons(sReasons);
console.log(JSON.stringify(formattedReasons, undefined, 2));
/* [
  {
    "inEligType": "O9",
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC"
  },
  {
    "inEligType": "EO",
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "HJ",
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "J8",
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt"
  }
] */

Note that the implementation above does not check that the string is properly formatted; right now if you pass garbage in, you get garbage out. If you want more safety you could do runtime checks and throw errors if you, say, run off the end of the reasons string unexpectedly, or find a msgLen field that doesn't represent a number. And one could refactor so that there's no repetition of code like const s = reasons.substring(0, n); reasons = reasons.substring(n). But the basic algorithm is there.

Playground link to code

tarkh · Accepted Answer · 2021-03-26 15:20:20Z

0

Another option with RegExp with less code

// Your data
const data =  "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt";

// Set your data splitters from array
const spl = ["O9", "EO", "HJ", "J8"].join('|');

// Use regexp to parse data
const results = [];
data.replace(new RegExp(`(${spl})(\\w{4})(\\w{4})(.*?)(?=${spl}|$)`, 'g'), (m,a,b,c,d) => {
  // Form objects and push to res
  results.push({
    inEligType: a,
    msgCode: b,
    msgLen: c,
    textMsg: d
  });
});

// Result
console.log(results);

edited Mar 26, 2021 at 15:20

answered Mar 26, 2021 at 14:26

tarkh

2,5691 gold badge11 silver badges14 bronze badges

Comments

Peter Seliger · Accepted Answer · 2021-03-26 17:56:56Z

A first approach, based on a groups capturing regex consumed by split, processed by a helper function and finally reduced to the expected result ...

function chunkRight(arr, chunkLength) {
  const list = []; 
  arr = [...arr];
  while (arr.length >= chunkLength) {
    list.unshift(
      arr.splice(-chunkLength)
    );
  }
  return list;
}

// see also ... [https://regex101.com/r/tatBAB/1]
// with e.g.
// (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4})
// ... or ...
// (O9|EO|HJ|J8)(\w{4})(\d{4})
//
function extractStatusItems(str, separators) {
  const regXSplit = RegExp(`(${ separators.join('|') })(\\w{4})(\\d{4})`);

  const statusValues = String(str).split(regXSplit).slice(1);
  const groupedValues = chunkRight(statusValues, 4);

  return groupedValues.reduce((list, [inEligType, msgCode, msgLen, textMsg]) =>
    list.concat({ inEligType, msgCode, msgLen, textMsg }), []
  );
}

const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt';

console.log(
  `statusCode ... ${ statusCode } ...`,
  extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8'])
);

.as-console-wrapper { min-height: 100%!important; top: 0; }

... followed by a second approach, based almost entirely on a regex which captures named groups, consumed by matchAll and finally mapped into the expected result ...

// see also ... [https://regex101.com/r/tatBAB/2]
// with e.g.
// (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4})(.*?)(?<textMsg>.*?)(?=O9|EO|HJ|J8|$)
//
function extractStatusItems(str, separators) {
  separators = separators.join('|');

  const regXCaptureValues = RegExp(
    `(?<inEligType>${ separators })(?<msgCode>\\w{4})(?<msgLen>\\d{4})(.*?)(?<textMsg>.*?)(?=${ separators }|$)`, 
    'g'
  );
  return [
    ...String(str).matchAll(regXCaptureValues)
  ].map(
    ({ groups }) => ({ ...groups })
  );
}

const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt';

console.log(
  `statusCode ... ${ statusCode } ...`,
  extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8'])
);

.as-console-wrapper { min-height: 100%!important; top: 0; }

Collectives™ on Stack Overflow

How to split the String based on array elements into array retaining the array the split word in javascript

4 Answers 4

4 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related