0

I would like to scan a large piece of text using PHP and find all matches for a pattern, but then also 2 lines above the match and 2 lines below.

My text looks like this, but with some extra unnecessary text above and below this sample:

1

Description text

123.456.12

10.00

10.00

3

Different Description text

234.567.89

10.00

30.00

#Some footer text that is not needed and will change for each text file#

15

More description text

564.238.02

4.00

60.00

15

More description text

564.238.02

4.00

60.00

#Some footer text that is not needed and will change for each text file#

15

More description text

564.238.02

4.00

60.00

15

More description text

564.238.02

4.00

60.00

Using PHP, I am looking to match each number in bold (always same format - 3 numbers, dot, 3 numbers, dot, 2 numbers) but then also return the previous 2 lines and the next 2 lines and hopefully return an array so that I can use:

$contents[$i]["qty"] = "1";
$contents[$i]["description"] = "Description text";
$contents[$i]["price"] = "10.00";
$contents[$i]["total"] = "10.00";

etc...

Is this possible and would I use regex? Any help or advice would be greatly appreciated!

Thanks

ANSWERED BY vzwick

This is my final code that I used:

$items_array = array();
$counter = 0;

if (preg_match_all('/(\d+)\n\n(\w.*)\n\n(\d{3}\.\d{3}\.\d{2})\n\n(\d.*)\n\n(\d.*)/', $text_file, $matches)) {

    $items_string = $matches[0];
    foreach ($items_string as $value){

        $item = explode("\n\n", $value);

        $items_array[$counter]["qty"] = $item[0];
        $items_array[$counter]["description"] = $item[1];
        $items_array[$counter]["number"] = $item[2];
        $items_array[$counter]["price"] = $item[3];
        $items_array[$counter]["total"] = $item[4];

        $counter++;

    }

}
else
{
    die("No matching patterns found");
}

print_r($items_array);
4
  • Is it always chunks of 5 lines? Commented Oct 19, 2011 at 10:05
  • There will be other text above and below the sample I posted, but within the loop of items, it will always be chunks of 5 lines. Commented Oct 19, 2011 at 10:11
  • Also, the bold number will always be in the same format - 3 numbers, dot, 3 numbers, dot, 2 numbers Commented Oct 19, 2011 at 10:11
  • I've also just realised that although within the loop of items it will always be chunks of 5 lines, the text file could span multiple pages and therefore have a footer that I would need to ignore. That was why I wondered if you can match the bold number and then collect that, the previous two lines and the next two lines as other text would then be ignored. Commented Oct 19, 2011 at 10:22

3 Answers 3

2
$filename = "yourfile.txt";
$fp = @fopen($filename, "r");
if (!$fp) die('Could not open file ' . $filename);

$i = 0; // element counter
$n = 0; // inner element counter

$field_names = array('qty', 'description', 'some_number', 'price', 'total');
$result_arr = array();

while (($line = fgets($fp)) !== false) {
    $result_arr[$i][$field_names[$n]] = trim($line);
    $n++;
    if ($n % count($field_names) == 0) {
        $i++;
        $n = 0;
    }
}

fclose($fp);
print_r($result_arr);

Edit: Well, regex then.

$filename = "yourfile.txt";
$file_contents = @file_get_contents($filename);
if (!$file_contents) die("Could not open file " . $filename . " or empty file");
if (preg_match_all('/(\d+)\n\n(\w.*)\n\n(\d{3}\.\d{3}\.\d{2})\n\n(\d.*)\n\n(\d.*)/', $file_contents, $matches)) {
    print_r($matches[0]);
    // do your matching to field names from here ..
}
else
{
    die("No matching patterns found");
}
Sign up to request clarification or add additional context in comments.

6 Comments

Sorry, edited my post to say that there will be other text in the text file too, above and below the sample I posted. I don't need this and so would need to ignore it.
But your data is all in one chunk, right? Is there any delimiter?
No, sorry I meant to post that originally. I'll update my sample text to show you what can happen...
Thanks, but just tried your code and it is returning "No matching patterns found" for the sample I posted.
*sigh* Are your lines delimited by double newlines? If so, check the update.
|
1
(.)+\n+(.)+\n+(\d{3}\.\d{3}\.\d{2})\n+(.)+\n+(.)+

It might be necessary to replace \n with \r\n. Make sure the regex is in a mode when the "." doesn't match with the new line character.

To reference groups by names, use named capturing group:

(?P<name>regex)

example of named capturing groups.

Comments

0

You could load the file in an array, and them use array_slice, to slice each 5 blocks of lines.

<?php

$file = file("myfile");
$finalArray = array();

for($i = 0; $i < sizeof($file); $i = $i+5)
{
    $finalArray[] = array_slice($file, $i, 5); 
}

print_r($finalArray);
?>

1 Comment

Thanks, but see my updated sample and comments that there will be other text in the file that I don't need, hence the reason why I am only looking to match the bold number pattern and then get the previous 2 lines and the next 2 lines

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.