1

I am trying to parse a file with lines similar to:

       John David James (DEM) .  .  .  .  .  .     7,808   10.51
       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55
       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58
       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22
       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29

I need to capture the name and the number in the first column. The end result would be

John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188

I've tried

\s*\b(.*)\b(\s*\.\s*.*)(\d+,\d+|\d+)\b
\s*\b(.*)\b(\.|.\s)+\b(\d+,\d+|\d+)\b

Any suggestions?

4
  • Is the data always column aligned? Commented Nov 9, 2018 at 21:26
  • @SalmanA yes. They use periods and spaces to separate the names from the numbers Commented Nov 9, 2018 at 21:27
  • Then use substr. Not regex. Commented Nov 9, 2018 at 21:28
  • @SalmanA the length of the name varies and the value could be 1 - 5 digits. Commented Nov 9, 2018 at 21:32

3 Answers 3

1

This pattern captures the name by finding the dot sequence after the name.
Then captures a number and comma pattern as the number.

Then I loop to build the new array and replace comma with nothing.

$str = '       John David James (DEM) .  .  .  .  .  .     7,808   10.51
       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55
       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58
       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22
       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29';
preg_match_all("/\s*(.*?)\s*\.  \..*?([\d,]+)/", $str, $matches);

foreach($matches[1] as $key => $name){
    $new[] = $name . "," . str_replace(",", "", $matches[2][$key]);
}


var_dump($new);

Output:

array(5) {
  [0]=>
  string(27) "John David James (DEM),7808"
  [1]=>
  string(26) "Marvin D. Scott (DEM),6548"
  [2]=>
  string(32) "Maria "Mary" Williams (DEM),4551"
  [3]=>
  string(22) "Dwayne R. Johnson,4322"
  [4]=>
  string(12) "WRITE-IN,188"
}

https://3v4l.org/SdqoZ

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @Andreas. This works great. This version actually simplifies my work even more since I can work with the name and count separately.
1

You can achieve it with an UNGREEDY regexp.

Here, when we catch the name, we want "a sequence of any character followed by a sequence of dots and spaces". So here is the equivalent regexp: (.+)[. ]*.

But the engine is set in greedy mode default. What will happen? The first part (.+) won't stop at the first dot or the first space encountered. Why? Because it is possible to perform the whole regular expression to the end of the line, and the engine will take this path as it is in greedy mode.

Same goes with the whole regexp you can see in the working code below. The first capturing group will capture beyond the name field.

We need to tell him to "eat" the less matchable part.

<?php
$lines = '
       John David James (DEM) .  .  .  .  .  .     7,808   10.51
       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55
       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58
       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22
       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29
';
$lines = explode("\n", $lines);

// Here, the U flag sets the ungreedy mode
$pattern = '/^\s*(\S.+\S)[. ]+([0-9]+)(?:,([0-9]+))?\s.*$/U';
echo "<pre>";
foreach ($lines  as $line) {
    // Here : - ${1} will capture the name,
    //        - ${2} the integer part of the number
    //        - ${3} the decimal part
    echo preg_replace($pattern, '${1},${2}${3}', $line) . "\n";
}
echo "</pre>";
?>

Result:

John David James (DEM),7808
Marvin D. Scott (DEM),6548
Maria "Mary" Williams (DEM),4551
Dwayne R. Johnson,4322
WRITE-IN,188

5 Comments

Split()? From manual: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0.. Just to be clear, I did not downvote. I just wrote this as why use a deprecated function.
Yes, I saw your comment and I fixed my code. I was busy adding more explanations. Thanks.
Just another heads up, OP does not want the comma in the number.
Thanks for the extremely detailed description!
Thanks Amessihel. Your response was great but I picked @Andreas version since the code he provided gave me the name and count as variables that I could work with individually. I converted the names and numbers into a json array to use elsewhere.
1

If the data is column aligned (all columns have known, fixed width) then use string functions such as substr:

<?php
$lines = '
       John David James (DEM) .  .  .  .  .  .     7,808   10.51
       Marvin D. Scott (DEM)  .  .  .  .  .  .     6,548    9.55
       Maria "Mary" Williams (DEM)  .  .  .  .     4,551    8.58
       Dwayne R. Johnson.  .  .  .  .  .  .  .     4,322    8.22
       WRITE-IN.  .  .  .  .  .  .  .  .  .  .       188     .29
';

foreach(preg_split('/(\\r|\\n)+/', $lines) as $line) {
    if ($line === '') continue;
    $name = substr($line, 0, 46);
    $amount = substr($line, 46, 10);
    $name = rtrim(ltrim($name), " .");
    $amount = (float) str_replace(",", "", $amount);
    echo $name . ", " . $amount;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.