Extract part of string matching pattern

Question

I would like to scan a large piece of text using PHP and find all matches for a pattern, but then also 2 lines above the match and 2 lines below.

My text looks like this, but with some extra unnecessary text above and below this sample:

1

Description text

123.456.12

10.00

10.00

3

Different Description text

234.567.89

10.00

30.00

#Some footer text that is not needed and will change for each text file#

15

More description text

564.238.02

4.00

60.00

15

More description text

564.238.02

4.00

60.00

#Some footer text that is not needed and will change for each text file#

15

More description text

564.238.02

4.00

60.00

15

More description text

564.238.02

4.00

60.00

Using PHP, I am looking to match each number in bold (always same format - 3 numbers, dot, 3 numbers, dot, 2 numbers) but then also return the previous 2 lines and the next 2 lines and hopefully return an array so that I can use:

$contents[$i]["qty"] = "1";
$contents[$i]["description"] = "Description text";
$contents[$i]["price"] = "10.00";
$contents[$i]["total"] = "10.00";

etc...

Is this possible and would I use regex? Any help or advice would be greatly appreciated!

Thanks

ANSWERED BY vzwick

This is my final code that I used:

$items_array = array();
$counter = 0;

if (preg_match_all('/(\d+)\n\n(\w.*)\n\n(\d{3}\.\d{3}\.\d{2})\n\n(\d.*)\n\n(\d.*)/', $text_file, $matches)) {

    $items_string = $matches[0];
    foreach ($items_string as $value){

        $item = explode("\n\n", $value);

        $items_array[$counter]["qty"] = $item[0];
        $items_array[$counter]["description"] = $item[1];
        $items_array[$counter]["number"] = $item[2];
        $items_array[$counter]["price"] = $item[3];
        $items_array[$counter]["total"] = $item[4];

        $counter++;

    }

}
else
{
    die("No matching patterns found");
}

print_r($items_array);

There will be other text above and below the sample I posted, but within the loop of items, it will always be chunks of 5 lines. — SammyBlackBaron
– SammyBlackBaron, Commented Oct 19, 2011 at 10:11
Also, the bold number will always be in the same format - 3 numbers, dot, 3 numbers, dot, 2 numbers — SammyBlackBaron
– SammyBlackBaron, Commented Oct 19, 2011 at 10:11
I've also just realised that although within the loop of items it will always be chunks of 5 lines, the text file could span multiple pages and therefore have a footer that I would need to ignore. That was why I wondered if you can match the bold number and then collect that, the previous two lines and the next two lines as other text would then be ignored. — SammyBlackBaron
– SammyBlackBaron, Commented Oct 19, 2011 at 10:22

vzwick · Accepted Answer · 2011-10-19 10:36:05Z

2

$filename = "yourfile.txt";
$fp = @fopen($filename, "r");
if (!$fp) die('Could not open file ' . $filename);

$i = 0; // element counter
$n = 0; // inner element counter

$field_names = array('qty', 'description', 'some_number', 'price', 'total');
$result_arr = array();

while (($line = fgets($fp)) !== false) {
    $result_arr[$i][$field_names[$n]] = trim($line);
    $n++;
    if ($n % count($field_names) == 0) {
        $i++;
        $n = 0;
    }
}

fclose($fp);
print_r($result_arr);

Edit: Well, regex then.

$filename = "yourfile.txt";
$file_contents = @file_get_contents($filename);
if (!$file_contents) die("Could not open file " . $filename . " or empty file");
if (preg_match_all('/(\d+)\n\n(\w.*)\n\n(\d{3}\.\d{3}\.\d{2})\n\n(\d.*)\n\n(\d.*)/', $file_contents, $matches)) {
    print_r($matches[0]);
    // do your matching to field names from here ..
}
else
{
    die("No matching patterns found");
}

edited Oct 19, 2011 at 10:36

answered Oct 19, 2011 at 10:13

vzwick

11.1k5 gold badges47 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

SammyBlackBaron Over a year ago

Sorry, edited my post to say that there will be other text in the text file too, above and below the sample I posted. I don't need this and so would need to ignore it.

vzwick Over a year ago

But your data is all in one chunk, right? Is there any delimiter?

SammyBlackBaron Over a year ago

No, sorry I meant to post that originally. I'll update my sample text to show you what can happen...

SammyBlackBaron Over a year ago

Thanks, but just tried your code and it is returning "No matching patterns found" for the sample I posted.

vzwick Over a year ago

*sigh* Are your lines delimited by double newlines? If so, check the update.

|

Alexey · Accepted Answer · 2011-10-19 10:42:39Z

1

(.)+\n+(.)+\n+(\d{3}\.\d{3}\.\d{2})\n+(.)+\n+(.)+

It might be necessary to replace \n with \r\n. Make sure the regex is in a mode when the "." doesn't match with the new line character.

To reference groups by names, use named capturing group:

(?P<name>regex)

example of named capturing groups.

answered Oct 19, 2011 at 10:42

Alexey

9196 silver badges12 bronze badges

Comments

0xd · Accepted Answer · 2011-10-19 10:27:27Z

0

You could load the file in an array, and them use array_slice, to slice each 5 blocks of lines.

<?php

$file = file("myfile");
$finalArray = array();

for($i = 0; $i < sizeof($file); $i = $i+5)
{
    $finalArray[] = array_slice($file, $i, 5); 
}

print_r($finalArray);
?>

edited Oct 19, 2011 at 10:27

answered Oct 19, 2011 at 10:13

0xd

1,91112 silver badges18 bronze badges

1 Comment

SammyBlackBaron Over a year ago

Thanks, but see my updated sample and comments that there will be other text in the file that I don't need, hence the reason why I am only looking to match the bold number pattern and then get the previous 2 lines and the next 2 lines

Collectives™ on Stack Overflow

Extract part of string matching pattern

3 Answers 3

6 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related