1

I have following Text input.

Group 1,Good,LEADS,"Leads Description 1 
 Leads Description 2","Note 1
 Note 2",1,100,210,10,Amt,15% 
 Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1 
 Switching Note 2",4,130,210,15,Amt,15%
 Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%

The Description and Note can be in the same line, or can have multi lines value. These are total 3 lines. When description and note is multi line, the Text is in Double Quotes "" so for line without multi line description or note a simple explode is working but for either of them in multi line. i am using following statement to parse it.

preg_split("/\n|\r\n?/", $text);

this statement works for lines, it only needs to take care of one condition as to consider the text between double quotes as one line.

Edit: the above Text is assigned to $text

6
  • Does it always start with Group and a digit? Commented Sep 16, 2020 at 13:20
  • Where is $text coming from?, can you show some more context? Commented Sep 16, 2020 at 13:25
  • In these cases, matching is safer than splitting Commented Sep 16, 2020 at 13:26
  • If it always start with Group and a digit ^\h*\KGroup\h+\d+,.*(?:\R(?!\h*Group\h+\d+,).*)* regex101.com/r/r8Zhxg/1 Commented Sep 16, 2020 at 13:27
  • stackoverflow.com/q/27623994/2943403 Commented Sep 16, 2020 at 14:42

2 Answers 2

1

instead of splitting try to group them by regular expression:

<?php
$s = 'Group 1,Good,LEADS,"Leads Description 1 
 Leads Description 2","Note 1
 Note 2",1,100,210,10,Amt,15% 
 Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1 
 Switching Note 2",4,130,210,15,Amt,15%
 Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%
';


  if (preg_match_all('/([^\r\n"]+|"[^"]*")+/', $s, $pregres)) {
    print_r($pregres[0]);
  }

output:

Array
(
    [0] => Group 1,Good,LEADS,"Leads Description 1 
 Leads Description 2","Note 1
 Note 2",1,100,210,10,Amt,15% 
    [1] =>  Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1 
 Switching Note 2",4,130,210,15,Amt,15%
    [2] =>  Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%
)

Regex explanation

([^\r\n"]+|"[^"]*")+

Inside parentheses there are two options (separated by or |):

[^\r\n"]+ - looks for a sequence of characters which is NOT a carriage return, line feed or double quotes. That will look for unquoted sting until it hits any linefeed

"[^"]*" - looks for a sequence which starts and ends with double quotes and contains any characters inside except for quotes. That will consume whole quoted string including all linefeeds inside the quotes.

They are grouped into parentheses and whole group allowed to repeat (by + followed the parentheses. This will consume whole string until there is a newline outside quotes.

Repeated Quotes (e.g. "this is a ""quoted"" string") also consumed.

Sign up to request clarification or add additional context in comments.

2 Comments

Can you please explain the regex so i can understand it better.
@HardCode I've expanded the answer
0

You could use (*SKIP)(*FAIL) to consume and ignore double quoted substrings, then only split on the newlines that are not consumed earlier. I'll chase the newline escape sequence (\R) with \s* to effectively left trim the lines.

Code: (Demo)

$text = <<<TEXT
Group 1,Good,LEADS,"Leads Description 1 
 Leads Description 2","Note 1
 Note 2",1,100,210,10,Amt,15% 
 Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1 
 Switching Note 2",4,130,210,15,Amt,15%
 Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%
TEXT;

var_export(preg_split('~"[^"]*"(*SKIP)(*FAIL)|\R\s*~', $text));

Output:

array (
  0 => 'Group 1,Good,LEADS,"Leads Description 1 
 Leads Description 2","Note 1
 Note 2",1,100,210,10,Amt,15% ',
  1 => 'Group 2,Good, SWITCHING, Switching, Description 1, "Switching Note 1 
 Switching Note 2",4,130,210,15,Amt,15%',
  2 => 'Group 1,Service,LICENCE,Licence Description 1,Licence Note 1,2,200,400,5,Pct,15%',
)

Admittedly, this technique will not do well if you your text has any escaped doubled quotes -- but then AterLux's answer will suffer in the same fashion.


Alternatively, if you didn't want to rely on the double quoting substrings AND your new rows always start with Group then a space then an integer then a comma, then you could go for: (Demo)

var_export(preg_split('~\R\h*(?=Group \d+,)~', $text, 0, PREG_SPLIT_NO_EMPTY));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.