0

I am reading data from a CSV file that contains a quantities with the units attached. I need to separate the value from the units.

The values I read in could have units of either millivolts mV or just volts V. If the string in $splitter[0] is 1.987mV. I want to separate that into two values, 1.987 and mV.

$splitter[0] =~ /(.*)([mV])/;
print "$1 -- $2\n";

This outputs

1.987m -- V

If the units in $splitter[0] is V then it seems to be working.

Does anyone know why I'm not picking up the m?

0

2 Answers 2

2

You have no repetition after your character class, so you're asking for a single match of anything in that character class, i.e., an m or a V. There are many ways to skin this cat, though:

/^([\d.]+)(\D+)$/
/^([\d.]+)(\w+)$/
/^([^A-Za-z]+)(\w+)$/
/^([^A-Za-z]+)([A-Za-z]+)$/
/(.*)(mV|m)/
/(.*)(m?V)/

Some solutions above are more "correct" than others.

Also, you generally want to try to be as restrictive as possible with your regular expressions and match exactly what you mean. Try to stay away from . if you can help it, and prefer more specific patterns instead.

Sign up to request clarification or add additional context in comments.

3 Comments

That last- should it not be m?V?
D'oh! It's harder coming up with wrong solutions than I thought. ;-)
Thanks, I went with option #2 above and it worked perfectly.
0

As you have read, one reason your code is finding only V at the end of the string is because your character class matches only one character. [mV] matches either a single small m or a single capital V. To match more than one character you need a quantifier like [mV]+ which will match one or more characters, like m or V or mVm or mV etc.

The other reason is that you have a greedy match before it. .* will match zero or more of any character, so even if you fixed the quantifier on the units and wrote /(.*)([mV]+)/ you would still get 1.9876m and V because the dot is quite happy to match the m, leaving [mv]+ to match just V

Assuming the quantity is numeric, consisting of decimal digits and possibly a decimal point, and the units are always letters (including perhaps a Greek mu μ for micro) then you can split the value like this

use utf8;
use strict;
use warnings 'all';
use v5.10;

use open qw/ :std :encoding(UTF-8) /;

my @splitter = qw/ 1.987mV 442.0μH /;

for ( @splitter ) {

    my ($val, $units) = / ([0-9.]+) (\p{Letter}+) /x;

    say "$val ~ $units";
}

output

1.987 ~ mV
442.0 ~ μH

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.