2

I've written a perl script that opens two files which contain lists. I want to find items in the first list that are not in the second list. The script uses two foreach loops. The outer loop goes through each line of the first list, extracting the necessary item information. The inner loop goes through the second list, extracting the item information, then comparing that information to the item in the first list.

So, the idea is that, for each item in the first list, the script will loop through all items in the second list, looking for matches. The trouble is that the inner foreach loop only loops once. I had this same problem in PHP when looping through MySQL tables in nested while loops. The solution was to reset the index of the mysql data using mysql_data_seek for each iteration of the outer loop. How can I do this in perl with filehandles?

5
  • 8
    Can you post some code explaining what you are doing now? Commented Jun 21, 2010 at 16:44
  • 2
    If you show your code, people will be able to point out what's wrong. Without it, people can only speculate and you're just wasting their (and your) time. Commented Jun 21, 2010 at 16:44
  • Without code I can't answer, but chances are you are re-using something in the inner loop from the outer loop, like a file handle or a loop counter. Commented Jun 21, 2010 at 16:45
  • Compare both your loops to see if you made a minor mistake in the inner loop and not in the outer loop. Other than that you need to show the code otherwise we don't know what you want. Commented Jun 21, 2010 at 16:46
  • 2
    You don't need to write your own code to check for list subsets -- there are CPAN modules for that, not to mention the builtin ~~ operator. Commented Jun 21, 2010 at 16:47

2 Answers 2

8

If your inner loop is a filehandle iterator, then you will need to reset it (say, by closing and reopening the file) every time you reach it.

foreach my $outer (@outer) {
   open INNER, '<', $inner_file;   # <--- need to add this
   while (my $inner = <INNER>) {
      ...
   }
   close INNER;                    # <--- optional with global scope filehandle
}

Alternatively, if you can spare the memory, you could copy the filehandle output to an array outside of the loop and then iterate over the array.

open INNER, '<', $inner_file;
my @INNER = <INNER>;
close INNER;

foreach my $outer (@outer) {
    foreach my $inner (@INNER) {
       ...
    }
}
Sign up to request clarification or add additional context in comments.

4 Comments

@smfoote, don't just say "thanks", vote it up and check the check mark beside it. And in the future, post some code with your question.
I don't have enough reputation to vote it up, and I did check the check mark. I understand that having code is generally useful, but I think this problem didn't require code, and the evidence is that mobrule was able to easily answer the question without seeing code. Generally when I add my code, the answers and comments are distracted from what I'm actually trying to figure out, and people start lecturing me on the quality of my code. I'm not really a fan of that.
@smfoote, yeah, having people try to teach you something must be a real bummer.
Well, it wasn't that easy ;-) But I concur with everyone else that when even a little code is provided, it is easier to guess what the OP needs to know (and which is not always the same as what is being asked) and increases the chance that you will get a useful answer.
3

It should be noted that the code as you describe it sounds very inefficient, O(n.m). You can get O(n+m) efficiency by putting the relevant contents of one file into a hash and then iterating the other file only once.

2 Comments

If I put the relevant contents of the first file in a hash, then iterate through the second file, I would still have to run through the hash with each iteration of the second file, I think. The code definitely could be more efficient, but neither of the files are very large, so the difference was no more than a few seconds. I should probably fix it anyway, so I don't get in bad habits.
@smfoote: if you can design the hash so that you can compare by key, then this comparison becomes O(1) with respect to the hash size instead of O(n). The "trick" of a hash is that you do not have to resort to even a binary search (O(log n)).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.