2

I have two arrays

@one = ("1|1|the|dog|ran", "1|2|a|x|b", "2|8|e|s|e");
@two = ("1|2|a|x|b", "1|1|down|the|street", "2|8|e|s|e");

I need to match them by the first two "|" separated elements. So that when on $one[0] the search would return $two[1].

There are millions of lines in each array so I need the fastest way to do this.

EDIT: Sorry for the confusion. I want to treat the first 2 "|" separated elements (ie. 1|2, 2|1) as a key for the array, loop through the first array and search the second array using that key to get the values in the second array. Does that help?

8
  • When you say "by the first to", did you actually mean "by the first TWO"? Because that'd make a lot more sense. Commented Nov 15, 2012 at 17:53
  • 2
    Start looking here in the Perl FAQ: How do I find the intersection of two arrays? Commented Nov 15, 2012 at 17:57
  • Given there are millions of elements in each array, are we concerned that you cannot hold an entire array in memory all at once? Are the arrays going to grow over time so that memory does become an issue? Commented Nov 15, 2012 at 17:58
  • If you have "millions" of lines in your arrays - which seems to be odd in the first place - perhaps you should consider moving your data to a database instead. Commented Nov 15, 2012 at 18:01
  • @DavidO Ya that might be a problem. When you take about memory is there a "perl memory amount" or are you taking about the computer's memory? Commented Nov 15, 2012 at 18:03

1 Answer 1

4
- For each record in the second array,
  - Parse the record
  - Add it to a hash keyed by the first two fields.

- For each record in the first array,
  - Parse the record
  - Look in the hash for a record with the appropriate key.
  - If there is one,
    - Do something with it.
Sign up to request clarification or add additional context in comments.

7 Comments

If there are millions of lines in each array, it might not be viable to build a hash to compare.
Hash is the best option, assuming the memory can take it.
@TLP, I don't see why using a array + hash would use up more memory than array + array with the same data. In fact, you could make do with just a hash since the ones don't need to be kept in memory.
@ikegami I haven't worked with hashs all that much. How do I add the second array keyed by the first two fields?
The record is added, not the array. 1|2 would do fine for key, and would use less memory than a Hash of Hash
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.