0

I want to split the following lines into an array in Javascript:

Jun 02 16:45:04 [steveh]  [info] test1
Jun 02 16:45:12 [steveh]  [info] test2
Jun 02 16:45:12 [steveh]  [info] test3
test 3.1
test 3.2
Jun 02 16:45:16 [steveh]  [info] test4

I can do this with:

var arr = data.split(/\r?\n/);

Which gets me that:

[
    "Jun 02 16:45:04 [steveh]  [info] test1",
    "Jun 02 16:45:12 [steveh]  [info] test2",
    "Jun 02 16:45:12 [steveh]  [info] test3",
    "test 3.1",
    "test 3.2",
    "Jun 02 16:45:16 [steveh]  [info] test4"
]

So far so good, but the problem is, that I want not 6 items in that array, I want just 4 something like this:

[
    "Jun 02 16:45:04 [steveh]  [info] test1",
    "Jun 02 16:45:12 [steveh]  [info] test2",
    "Jun 02 16:45:12 [steveh]  [info] test3
    test 3.1
    test 3.2",
    "Jun 02 16:45:16 [steveh]  [info] test4"
]

I played around some time with the js .match() and .split() functions, but couldn't figure it out.

Here is as jsbin: http://jsbin.com/icufef/1/edit

3
  • Where are you getting this data from? Commented Jun 2, 2013 at 17:40
  • Actually from a webservice which gives server log back. I want to filter that log in javascript, therefore I need an array. But for testing you can handle it as as multiline string. Commented Jun 2, 2013 at 17:42
  • If you know the exact format of what you want to get you may try a regex to match that string instead of splitting. Online demo Commented Jun 2, 2013 at 17:53

3 Answers 3

1

Use the following RE in the split :

 /\r?\n(?=[^\n]*\[info\])/

Split on on newlines only if the following line contains [info].

Sign up to request clarification or add additional context in comments.

Comments

0

You can't do this in general unless you know something about the formats of your dates... well, I guess if you skip the dates and base it off of the [steveh] [info] pairs, you might find a solution. But what of the test 3.1 etc? What possible data could go in there? Could there be text with brackets? Could there be dates? How far are you willing to go to make sure this is sanely parsed without knowing how this data is structured?

It's always possible to come up with solution that parses it mostly right, but miss a few scenarios.

Depending upon the data, those scenarios might make the data impossible to parse correctly, that is, if the logger logs a line that also includes a line that looks like a new log line, say "foo\nJun 02 16:47:16 [steveh] [info] test4" and that date given in that string happens to be a date that's occurs in between consecutive loggings, it'd be impossible to separate that line from the other log lines by only looking at the log data.

3 Comments

I agree that its not possible without error possibility. One has to live with that risc. If the log format can be "tuned" (configurated), one could try finding a good starting sign in the log format.
Well, if the web service itself (that OP mentions) used some convention to delimit the log files, and delimited it for you, then there'd be no need to discuss this--unless maybe the mistakes it was making was intolerable. Instead, anybody wanting to use this webservice has to make their own crappy parser, and then, should the logger change its format, everybody using the service would have to fix their parsers. I can see valid reasons someone might want to expose a raw log file with a webservice, but interoperability is not among them.
I just have to wonder whenever I see questions like this if, by helping, I'm just helping to patch a doomed ship that should have already sunk and everybody would have been better off had the ship already sank.
0

You have to search for newline followed by a month-shortname, so it would be something like \r?\n(Jan|Feb|March|Apri...|Dec) as a split-argument. You need to know how your data provides those month-names, and that instead of "test" does not come "May be" to catch it for may.

EDIT: Oh, Xavier is right: instead of feeding it into split, you should mark those entries as real linebreaks:

data.replace('/^(Jan|Feb...) /', 'BREAKME$1');
data.split('/\r?\nBREAKME');

4 Comments

But won't that delete the month names from the output?
You'll need to add a lookahead in that group :) After that, I think it works just like what the OP wanted.
Add a ?= like this: \r?\n(?=Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
I have to admit: Normally I just end up doing it in a loop: as long as no "real new line" comes, I add the line to the last var, and when a "real new line" comes, I push the last var into the array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.