0

I have a string

sReasons =  "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt";

and I need to split the above string based on the separator array

const separator = ["O9", "EO", "HJ", "J8"];

Where first 2 characters(O9) represnet code, next 4 another code(C270) & next 4 the character(0021) length of the String which is Not eligible for SDWC

Where the separator codes are unique, with 2 capital letters and will not be repeated in textMessage except inEligType

I need to create a json of the format

{
    {inEligType: "O9", msgCode: "C270", msgLen: "0021", textMsg: "Not eligible for SDWC"},
    {inEligType: "EO", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "HJ", msgCode: "C390", msgLen: "0015", textMsg: "Service upgrade"},
    {inEligType: "J8", msgCode: "C500", msgLen: "0016", textMsg: "Delivery Attempt"}
}

I'm basically failing at the splitting the string itself based on the array given, I tried the following

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

function formatReasons(Reasons: string) {
var words: any[] = Reasons.split(this.spearator); 
for(let word in words)
    {
       console.log(word) ;
    }
}
var result = formatReasons(sHdnReasonsCreate);
console.log("Returned Result: "+result);

But it gives me result as

["O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt"]length: 1__proto__: Array(0)

Returned Address is: undefined
2
  • 2
    And what will you do if one of the two-letter strings used as a separator happens to appear in the middle of the textMessage field? You'd be much better off splitting this according to the actual data format, by taking substrings of the appropriate length Commented Mar 26, 2021 at 13:43
  • They will not be appearing as these are unique and formed to not be coming in textMessage or msgCode Commented Mar 26, 2021 at 13:57

4 Answers 4

2

My Regex-based approach:

sReasons =  "O9C2700021Not eligible for SDWCEOC0900015Service upgradeHJC3900015Service upgradeJ8C5HJ0016Delivery Attempt";    
const separator = ["O9", "EO", "HJ", "J8"];

// build the regex based on separators
let regexPattern = '^';
separator.forEach(text => {
    regexPattern += `${text}(.*)`;
});
regexPattern += '$';

// match the reasons
let r = new RegExp(regexPattern);
let matches = sReasons.match(r);

// prepare to match each message
let msgMatcher = new RegExp('^(?<msgCode>.{4})(?<msgLen>.{4})(?<textMsg>.*)$');
let output = [];

for (let i=1; i<matches.length; i++) {
    // match the message
    const msg = matches[i].match(msgMatcher);

    // store
    let item = msg.groups;
    item.inEligType = separator[i-1];
    output.push(item);
}

console.log(JSON.stringify(output, null, 2));

Produces

[
  {
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC",
    "inEligType": "O9"
  },
  {
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "EO"
  },
  {
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade",
    "inEligType": "HJ"
  },
  {
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt",
    "inEligType": "J8"
  }
]
Sign up to request clarification or add additional context in comments.

4 Comments

Of course, this only works for this particular example.
@georg How so, and what abstract other examples do you propose applying it to? It can withstand changes in separator without any modifications of the logic, and the two string lengths hard-coded to 4 are defined that way, but with a direct way of changing them.
You're assuming that each "separator" occurs exactly once and they always come in the given order. That might not be true in the general case.
I see what you mean now. Yes there is a better way to generalize it, like the answer I see by @tarkh, I'd revise it to an approach similar to that one.
1

It may well be that textMsg field, nor any other field, will never contain the two-letter strings you are using for the inEligType field. But are you absolutely sure of that? The data format looks to me like it really wants someone to parse it by substrings of certain lengths; why even have a msgLen field if you could just split based on delimiters? What if the list of inEligType codes changes in the future?

For these reasons I strongly recommend that you parse by substring lengths and not by delimiter matching. Here's one possible way to do that:

function formatReasons(reasons: string) {
  const ret = []
  while (reasons) {
    const inEligType = reasons.substring(0, 2);
    reasons = reasons.substring(2);
    const msgCode = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const msgLen = reasons.substring(0, 4);
    reasons = reasons.substring(4);
    const textMsg = reasons.substring(0, +msgLen);
    reasons = reasons.substring(+msgLen);
    ret.push({ inEligType, msgCode, msgLen, textMsg });
  }
  return ret;
}

You can verify that it produces the expected output for your example sReasons string:

const formattedReasons = formatReasons(sReasons);
console.log(JSON.stringify(formattedReasons, undefined, 2));
/* [
  {
    "inEligType": "O9",
    "msgCode": "C270",
    "msgLen": "0021",
    "textMsg": "Not eligible for SDWC"
  },
  {
    "inEligType": "EO",
    "msgCode": "C090",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "HJ",
    "msgCode": "C390",
    "msgLen": "0015",
    "textMsg": "Service upgrade"
  },
  {
    "inEligType": "J8",
    "msgCode": "C5HJ",
    "msgLen": "0016",
    "textMsg": "Delivery Attempt"
  }
] */

Note that the implementation above does not check that the string is properly formatted; right now if you pass garbage in, you get garbage out. If you want more safety you could do runtime checks and throw errors if you, say, run off the end of the reasons string unexpectedly, or find a msgLen field that doesn't represent a number. And one could refactor so that there's no repetition of code like const s = reasons.substring(0, n); reasons = reasons.substring(n). But the basic algorithm is there.

Playground link to code

Comments

0

Another option with RegExp with less code

// Your data
const data =  "O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt";

// Set your data splitters from array
const spl = ["O9", "EO", "HJ", "J8"].join('|');

// Use regexp to parse data
const results = [];
data.replace(new RegExp(`(${spl})(\\w{4})(\\w{4})(.*?)(?=${spl}|$)`, 'g'), (m,a,b,c,d) => {
  // Form objects and push to res
  results.push({
    inEligType: a,
    msgCode: b,
    msgLen: c,
    textMsg: d
  });
});

// Result
console.log(results);

Comments

0

A first approach, based on a groups capturing regex consumed by split, processed by a helper function and finally reduced to the expected result ...

function chunkRight(arr, chunkLength) {
  const list = []; 
  arr = [...arr];
  while (arr.length >= chunkLength) {
    list.unshift(
      arr.splice(-chunkLength)
    );
  }
  return list;
}

// see also ... [https://regex101.com/r/tatBAB/1]
// with e.g.
// (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4})
// ... or ...
// (O9|EO|HJ|J8)(\w{4})(\d{4})
//
function extractStatusItems(str, separators) {
  const regXSplit = RegExp(`(${ separators.join('|') })(\\w{4})(\\d{4})`);

  const statusValues = String(str).split(regXSplit).slice(1);
  const groupedValues = chunkRight(statusValues, 4);

  return groupedValues.reduce((list, [inEligType, msgCode, msgLen, textMsg]) =>
    list.concat({ inEligType, msgCode, msgLen, textMsg }), []
  );
}

const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt';

console.log(
  `statusCode ... ${ statusCode } ...`,
  extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8'])
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

... followed by a second approach, based almost entirely on a regex which captures named groups, consumed by matchAll and finally mapped into the expected result ...

// see also ... [https://regex101.com/r/tatBAB/2]
// with e.g.
// (?<inEligType>O9|EO|HJ|J8)(?<msgCode>\w{4})(?<msgLen>\d{4})(.*?)(?<textMsg>.*?)(?=O9|EO|HJ|J8|$)
//
function extractStatusItems(str, separators) {
  separators = separators.join('|');

  const regXCaptureValues = RegExp(
    `(?<inEligType>${ separators })(?<msgCode>\\w{4})(?<msgLen>\\d{4})(.*?)(?<textMsg>.*?)(?=${ separators }|$)`, 
    'g'
  );
  return [
    ...String(str).matchAll(regXCaptureValues)
  ].map(
    ({ groups }) => ({ ...groups })
  );
}

const statusCode = 'O9C2700021Not eligible for SDWCEOC3900015Service upgradeHJC3900015Service upgradeJ8C5000016Delivery Attempt';

console.log(
  `statusCode ... ${ statusCode } ...`,
  extractStatusItems(statusCode, ['O9', 'EO', 'HJ', 'J8'])
);
.as-console-wrapper { min-height: 100%!important; top: 0; }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.