9

I'm trying to make a php regex to extract multiple sections/conditions from one string... let me show you what I'm talking about; this is an excerpt from the total file contents (the real contents contain hundreds of these groupings):

part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }

As you can see, the data in the excerpt repeats twice. I need to search through the whole file and extract the following:

  • string after the word "part" -- which would be "C28" or "C29"
  • string after the "type" property -- which would be "1AB010050093" or "1AB008140029"

So, essentially, I need to get all the part references and associated types out of this file...and I'm not sure the best way to go about doing this.

Please let me know if more info is needed to help... thanks in advance!

3
  • Is there a reason you're not using a Json parser for this data type? Commented Jun 22, 2013 at 2:45
  • 1
    @Denomales While it looks similar, the example is not JSON data, and would not work with PHP's json_decode. Commented Jun 22, 2013 at 2:48
  • Fair enough. I had to ask. Commented Jun 22, 2013 at 2:50

2 Answers 2

12

Description

This expression will:

  • capture the group name as ref
  • capture the values of the type and descr fields.
  • The Type field when captured should be put into a named group called partnumber
  • The fields can appear in any order in the body
  • the descr field is optional and should only be captured if it exists. The (?:...)?`` brackets around thedescr` field makes the field optional

Note this is a single expression so you'll in to use the x option to so the regex engine ignore white space.

^part\s"(?P<ref>[^"]*)"[^{]*{
(?:(?=[^}]*\sdescr\s*:\s+"(?P<descr>[^"]*)"))?
(?=[^}]*\stype\s*:\s+"(?P<type>[^"]*)")

enter image description here

PHP Code Example:

Input Text

part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }
part "C30"
{ type       : "1AB0081400 30",
  shapeid    : "2_1206 30",
  insclass   : "CP6A,CP6B 30",
  gentype    : "RECT_032_016_006 30",
  machine    : "SMT 30",
  %package   : "080450E 30 ",
  %_item_number: "3 30 ",
  %_Term_Seq : "30" }

Code

<?php
$sourcestring="your source string";
preg_match_all('/^part\s"(?P<ref>[^"]*)"[^{]*{
(?:(?=[^}]*\sdescr\s*:\s+"(?P<descr>[^"]*)"))?
(?=[^}]*\stype\s*:\s+"(?P<partnumber>[^"]*)")/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches

$matches Array:
(
[ref] => Array
    (
        [0] => C28
        [1] => C29
        [2] => C30
    )

 [descr] => Array
    (
        [0] => 4700.0000 pFarad 10.00 % 100.0 - VE5-VS3
        [1] => 150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR
        [2] => 
    )

[partnumber] => Array
    (
        [0] => 1AB010050093
        [1] => 1AB008140029
        [2] => 1AB0081400 30
    )

)
Sign up to request clarification or add additional context in comments.

9 Comments

@Denomales where do you get the regex visualization image from?
@tristanbailey, I'm using debuggex.com. Although it doesn't support lookbehinds, named capture groups, or atomic groups it's still handy for understanding the expression flow. There is also regexper.com. They do a pretty good job too, but it's not real time as you're typing.
@Denomales, great solution, I know I didn't talk about this in my OP but how would I get the matched results from elements [1] and [2] together? For instance, one element from the final result should look like: [0] => Array( ['ref'] => C28, ['partnumber'] => 1AB010050093 ) Notice that I've kept the relation C28 goes with 1AB010050093, and so on...
I updated the answer to show how to capture part number as a named capture, and how to capture additional fields in the same run. I hope this is what you're looking for.
it doesn't work because the sample code differs slightly from the sample you provided. Specifically this regex expects the string part to appear at the start of the line, whereas in the sample link, the string appears after some white space. To correct for this just insert the \s* right after the ^ and it should work for you too. 3v4l.org/8cuFb
|
2

Assuming each groups have the same structure, you can use this pattern:

preg_match_all('~([^"]++)"[^{"]++[^"]++"([^"]++)~', $subject, $matches);
print_r($matches);

EDIT:

Notice: if you have more informations to extract, you can easily transform your datas into json, example:

$data = <<<LOD
part "C28"
{ type       : "1AB010050093",
  %cadtype   : "1AB010050094",
  shapeid    : "2_1206",
  descr      : "4700.0000 pFarad 10.00 % 100.0 - VE5-VS3",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "508",
  %_Term_Seq : "" }
part "C29"
{ type       : "1AB008140029",
  shapeid    : "2_1206",
  descr      : "150.0000 pFarad 5.00 % 100.0 Volt NP0 CERAMIC CAPACITOR",
  insclass   : "CP6A,CP6B",
  gentype    : "RECT_032_016_006",
  machine    : "SMT",
  %package   : "080450E",
  %_item_number: "3",
  %_Term_Seq : "" }
LOD;
$trans = array( "}\n"   => '}, ' , 'part'  => ''    ,
                "\"\n{" => ':{"' , ':'     => '":'  ,
                "\",\n" => '","' );

$data = str_replace(array_keys($trans), $trans, $data);
$data = preg_replace('~\s*+"\s*+~', '"', $data);
$json_data =json_decode('{"'.substr($data,1).'}');

foreach ($json_data as $key=>$value) {
    echo '<br/><br/>part: ' . $key . '<br/>type: ' . $value->type;    
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.