1

I have got one array with milions of floating points number (ordered) and then another one smaller and i need to match the numbers within a certain tolerance (there is no overlap between the values in the big array and the values in the small array) from the small array in the big array. okay so no big deal, this is the perl function for returning a not perfect match within the tolerance, it's inside a for loop there I am looping through the small array values.

sub bin_search{
my ($arr, $v, $t ) = @_;
my ($min, $max) = (0, @$arr-1);
while ($min <= $max) {
  my $w   = $v + $t;
  my $k   = $v - $t;
  my $try = int( ( $min + $max ) / 2 );
  $min    = $try + 1, next if $arr -> [$try] < $k  ;
  $max    = $try - 1, next if $arr -> [$try] > $w ;
  return $arr -> [$try] ;
  }
 return 0;
} 

but then after checkin my data it seems I have got some values discarded because it is returning just the first match. I tried grep but it is too slow.

 my $min = $val - $t;
 my $max = $val + $t;
 my @arr2 = grep { ( $_ > $min ) && ($_ < $max ) }@big_arr1; 

so I wanted to modify a bit the binary search for returning a range from $min to $max, because I thought once there is one match is either at $min or $max, so something like

sub bin_search{
my ($arr, $v, $t ) = @_;
my ($min, $max) = (0, @$arr-1);
my $w   = $v + $t;
my $k   = $v - $t;
while ($min <= $max) {
  my $try = int( ( $min + $max ) / 2 );
  $min    = $try + 1, next if $arr -> [$try] < $k  ;
  $max    = $try - 1, next if $arr -> [$try] > $w ;
  last;
  }
 my @fin;
 if ( ($arr -> [$try] < $w) && ($arr -> [$try] > $k) ) {
    push @fin, $arr ->[$try]; $try++ }
 return \@fin;
} 

but I am missing some values, and I think that I am missing something, should I look just at one direction at the time? like left until we reach the lower limit then return to $try and do the same thing until higher limit?

7
  • So the numbers in the small array can match serveral numbers in the large array? And the numbers in the large array are sorted in increasing order, but not necessarily unique? Commented Nov 20, 2016 at 17:44
  • You call push @fin at most once. Commented Nov 20, 2016 at 18:21
  • @ikegami yeah it was a typo should be while Commented Nov 20, 2016 at 18:27
  • 1) Don't post code that you didn't actually run. 2) The problem is that you look for the end of the range, but you never look for its start. Commented Nov 20, 2016 at 18:29
  • @HåkonHægland the big array is sorted from low to high and they are unique values, while the small array a number can match up to I think 20 values circa within the tolerance (I ve got 6 decimal points and the tollerance has got 4 so I think 20 is a good approximation of the number of matches ). Commented Nov 20, 2016 at 18:29

1 Answer 1

1

Start by finding the index of a matching element using a binary search.

Once you have that, you need to find where the range starts. You can use a binary search for that as well, but a linear search is also acceptable if the number of matching elements is usually small.

Finally, you need to find the end of the range. You can use the same approach as you used for finding the start of the range.

The problem with your solution is that you didn't look for the start of the range.

The following is an untested implementation that uses the linear scan approach (like yours), so it assumes that there will be very few matching elements:

sub binsearch_numeric_range {
   my $min   = shift;
   my $max   = shift;
   my $array = shift;

   return () if !@$array;

   my $i = 0;
   my $j = $#$array;

   my $k;
   while (1) {
      $k = int(($i+$j)/2);

      if ($array->[$k] > $max) {
         $j = $k-1;
         return () if $i > $j;
      }
      elsif ($array->[$k] < $min) {
         $i = $k+1;
         return () if $i > $j;
      }
      else {
         last;
      }
   }

   my $min_k = $k;  --$min_k while $min_k > 0        && $array->[$min_k - 1] >= $min;
   my $max_k = $k;  ++$max_k while $max_k < $#$array && $array->[$max_k + 1] <= $max;

   return @$array[$min_k .. $max_k];
}

my @matches = binsearch_numeric_range($v-$t, $v+$t, $arr);

An implementation that doesn't require writing a whole new binsearch:

my $idx = binsearch { abs($a-$b) <= $t ? 0 : $a <=> $b } $v, @$arr;

my @range;
if ($idx >= 0) {
   my $min_idx = $idx;  --$min_idx while $min_idx > 0      && $arr->[$min_idx-1] >= ($v-$t);
   my $max_idx = $idx;  ++$max_idx while $max_idx < $#$arr && $arr->[$max_idx+1] <= ($v+$t);

   @range = @$array[$min_idx .. $max_idx];
}

The binsearch used is defined here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.