0

Trying to find the most efficient way to extract values from a large string.

EXT-X-DATERANGE:ID="PreRoll_Ident_Open",START-DATE="2016-12-14T120000.000z",DURATION=3,X-PlayHeadStart="0.000",X-AdID="AA-1QPN49M9H2112",X-TRANSACTION-VPRN-ID="1486060788",X-TrackingDefault="1",X-TrackingDefaultURI="http,//606ca.v.fwmrm.net/ad/l/1?s=g015&n=394953%3B394953&t=1485791181366184015&f=&r=394953&adid=15914070&reid=5469372&arid=0&auid=&cn=defaultImpression&et=i&_cc=15914070,5469372,,,1485791181,1&tpos=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr="s=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr="

I have the above as an example. The idea is to extract all caps string before : as object key, and everything in between quotes until next comma as its value. Then iterate entire string until this object is created.

nonParsed.substring(nonParsed.lastIndexOf("="")+1, nonParsed.lastIndexOf("","));

I had this concept as a start, but some help iterating through this and making it more efficient would be appreciated.

Final output would be something like --

{
  'EXT-X-DATERANGE:ID': 'PreRoll_Ident_Open',
  'START-DATE': '2016-12-14T120000.000z',
  'DURATION': '3',
  ...
}
5
  • Maybe this helps: developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/… Commented Sep 11, 2017 at 20:49
  • 1
    It'll be a little tougher than usual since you seem to have a comma in your X-TrackingDefaultURI header value where the colon should be. This will make a naive split more difficult. Commented Sep 11, 2017 at 20:51
  • What would be the final output? Commented Sep 11, 2017 at 20:53
  • @revo I updated with sample object Commented Sep 11, 2017 at 21:02
  • The idea is to extract all caps string before : ... so why doesn't it apply to EXT-X-DATERANGE:ID which should extract EXT-X-DATERANGE part only! Commented Sep 11, 2017 at 21:06

3 Answers 3

2

It looks like the only property that messes up a predictable pattern is DURATION, which is followed by a number. Otherwise, you can rely on a naive pattern of alternating =" and ",.

You could do something like

str = str.replace(/DURATION=(\d+)/, `DURATION="$1"`);
return str.split('",').reduce((acc, entry) => {
    let key = `'${entry.split('="')[0]}'`;
    let value = `'${entry.split('="')[1]}'`;
    acc[key] = value;
    return acc;
}, {});

Then add a bit of logic to the end to sort out the Duration if you needed to.

Sign up to request clarification or add additional context in comments.

1 Comment

this is great! i can sanitize before this filter, so i can have duration follow the same string rules as the others
1

It looks like you have mixed case strings for the headers, not just uppercase. I would instead look for key-value pairs based on the = character. You can construct a regex and use the exec() method to then iterate and build your object.

var input = 'EXT-X-DATERANGE:ID="PreRoll_Ident_Open",START-DATE="2016-12-14T120000.000z",DURATION=3,X-PlayHeadStart="0.000",X-AdID="AA-1QPN49M9H2112",X-TRANSACTION-VPRN-ID="1486060788",X-TrackingDefault="1",X-TrackingDefaultURI="http,//606ca.v.fwmrm.net/ad/l/1?s=g015&n=394953%3B394953&t=1485791181366184015&f=&r=394953&adid=15914070&reid=5469372&arid=0&auid=&cn=defaultImpression&et=i&_cc=15914070,5469372,,,1485791181,1&tpos=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr="s=0&iw=&uxnw=394953&uxss=sg579054&uxct=4&metr=1031&init=1&vcid2=394953%3A466c5842-0cce-4a16-9f8b-a428e479b875&cr='

// Regex looks for any alpha character, colon, or hyphen before a =, then captures anything between the quotes and an optional comma after
var pattern = /([A-Za-z:-]+)="([^"]+)",?/g;

// Iterate the string using exec() and build the object along the way
var match;
var output = {};
while (match = pattern.exec(input)) {
    output[match[1]] = match[2];
}

console.dir(output);

2 Comments

May I suggest /(?:^|,)([A-Za-z\-]+?)(?::[A-Z\-]+)?=(".+?"|\d+?)(?=,|$)/gm as a more comprehensive regex? regex101.com/r/5XLR1O/1
it allows capture of that messy URL and the unquoted integer values
1

Here is a possible solution. You split the string on the double quotes (this of course presumes that you do not have an escaped double quote within a value). Then you cycle through the resulting array setting the ith value to the key and the ith+1 value to the value of that key. Here would be the code:

strings=nonparsed.split('"');
myObj={};
myObj[strings[0].slice(0,-1)]=strings[1];
for(i=2;i<strings.length;i+=2)myObj[strings[i].slice(1,-1)]=strings[i+1];

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.