0

How would I iterate through around 1,500 text files in a folder, each containing

" Completion rate: 119 ( isComplete: 0 )\r\nFailure rate: 158 HC: 119-158-F "

and get "119" "0" "158" "119-158-F" out? What would be the easiest way to do this? preg_match?

2
  • Yes, preg_match is the way to go. Is there any other question? The linked manual page should get you there if you need an example. Commented Mar 2, 2012 at 19:48
  • How would I go through every file in a certain directory? Commented Mar 2, 2012 at 19:48

3 Answers 3

1

Use preg_match_all() to put each match into an array. Then you can print_r the array or implode it.

// Define regex
$regex = '/[^0-9]*([0-9]+)[^0-9]*([0-9]+)[^0-9]*([0-9]+)[^0-9]*([0-9]+-[0-9]+-[A-Z]+)/s'; 

// Open the directory containing your 1500 files
if ($handle = opendir('/path/folder')) {

    // Loop over each file in the directory
    while (false !== ($entry = readdir($handle))) {

        // Open the file
        $file = file_get_contents($entry);

        // Use preg_match_all to store each value in an array
        preg_match_all($regex, $file, $numbers);

        $numbers = $numbers[0]; // yes you have to do this part

        print_r($numbers); // Or implode instead
        echo '<br />';

    }

    closedir($handle);
}
Sign up to request clarification or add additional context in comments.

2 Comments

I fail to see how \d+ is going to match the required 119-158-F
Whoops, didn't see he wanted to extract that whole bit. Thought he just wanted each number. Best way is to use Kisaro's regex in my script.
0

I would use glob to iterate over the text files in the directory like below, though there are other options like opendir and readdir ...

$myDir = '/path/to/text/files';

foreach (glob("$myDir/*.txt") as $filename) {
  $str = file_get_contents($myDir . '/' . $filename);
  $pattern = '/^\s*Completion rate: (\d+) \( isComplete: (\d) \)\s*Failure rate: (\d+) HC: ([A-Z0-9\-]+)\s*$/';
  if (preg_match($pattern, $str, $match)) {
    var_dump($match);
  }
}

1 Comment

I've added a regex because of peer-pressure. However, there are a million different ways to write the regex depending on if the format of the data in your files ever changes ...
0

This works for me just fine using a online regex tester:

    preg_match_all('/[^0-9]*([0-9]+)[^0-9]*([0-9]+)[^0-9]*([0-9]+)[^0-9]*([0-9]+-[0-9]+-[A-Z]+)/s');

It'll get you this:

Array
(
[0] => Array
    (
        [0] => Completion rate: 129 ( isComplete: 0 )\r\nFailure rate: 158 HC: 119-158-F
    )

[1] => Array
    (
        [0] => 129
    )

[2] => Array
    (
        [0] => 0
    )

[3] => Array
    (
        [0] => 158
    )

[4] => Array
    (
        [0] => 119-158-F
    )

)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.