2

I've got a Perl program. It's output looks something like this:

http://www.site.com/file1.html
http://www.site.com/file2.html
http://www.site.com/file3.html
.
.
.
v

I've got an unfinished Python program. Here it is:

import subprocess

pipe = subprocess.Popen(["perl", "perl_program.pl"])

I run the python program in my terminal like this:

whatever:~ whatever$ python python_program.py

I get the following:

http://www.site.com/file1.html
http://www.site.com/file2.html
http://www.site.com/file3.html
.
.
.
v

I want to pop these URLs into an array in my Python code and manipulate them within Python. How do I do that?

Here is the Perl program I am working with:

 1  use LWP::Simple;
 2  use HTML::TreeBuilder;
 3  use Data::Dumper;
 4   
 5  my $tree = url_to_tree( 'http://www.registrar.ucla.edu/schedule/schedulehome.aspx' );
 6   
 7  my @selects  = $tree->look_down( _tag => 'select' );
 8  my @quarters = map { $_->attr( 'value' ) } $selects[0]->look_down( _tag => 'option' );
 9  my @courses  = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $selects[1]->look_down( _tag => 'option' );
10   
11  my $n = 0;
12   
13  my %hash;
14   
15  for my $quarter ( @quarters )
16  {
17      for my $course ( @courses )
18      {
19          my $tree_b = url_to_tree( "http://www.registrar.ucla.edu/schedule/crsredir.aspx?termsel=$quarter&subareasel=$course" );
20         
21          my @options = map { my $s = $_->attr( 'value' ); $s =~ s/&/%26/g; $s =~ s/ /+/g; $s } $tree_b->look_down( _tag => 'option' );
22         
23          for my $option ( @options )
24          {
25           
26           
27              print "trying: http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option\n";
28             
29              my $content = get( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
30             
31              next if $content =~ m/No classes are scheduled for this subject area this quarter/;
32             
33              $hash{"$course-$option"} = 1;
34              #my $tree_c = url_to_tree( "http://www.registrar.ucla.edu/schedule/detselect.aspx?termsel=$quarter&subareasel=$course&idxcrs=$option" );
35             
36              #my $table = ($tree_c->look_down( _tag => 'table' ))[2]->as_HTML;
37             
38              #print "$table\n\n\n\n\n\n\n\n\n\n";
39             
40              $n++;
41          }
42      }
43  }
44   
45  my $hash_count = keys %hash;
46  print "$n, $hash_count\n";
47   
48  sub url_to_tree
49  {
50      my $url = shift;
51     
52      my $content = get( $url );
53   
54      my $tree = HTML::TreeBuilder->new_from_content( $content );
55     
56      return $tree;
57  }
3
  • How are the URLs being generated? Would it be possible to port that code from perl to python or port the rest of the code from python to perl? Commented Mar 6, 2014 at 5:30
  • @hd1, yeah, I don't know, I don't even know what you're saying really, but I think if you explain it a bit more I might follow... Commented Mar 6, 2014 at 5:33
  • Post the perl code you're using as part of your question Commented Mar 6, 2014 at 5:40

1 Answer 1

3

Try this:

pipe = subprocess.Popen(["perl", "perl_program.pl"], stdout = subprocess.PIPE)
urls, stderr = pipe.communicate()
urls = urls.split("\n")

# urls is the array that you can now manipulate

Alternatively with Python 2.7 or higher you can use check_output

urls = subprocess.check_output(["perl", "perl_program.pl"]).split("\n")

If you use strip with no arguments, as suggested by @J.F.Sebastian, the url splitting would be even better, as superfluous newlines and whitespace are also stripped from the resulting list.

urls = urls.split()
Sign up to request clarification or add additional context in comments.

6 Comments

you could use split() (no argument): it removes newlines (including \r\n, like .splitlines(keepends=False)) and it trims other whitespace (there can't be whitespace inside url so it behaves like .strip())
I'm going to try your approach icedtrees. BTW, what is it with people on stackoverflow having arctic themed names? Like chill something, ice something. Is ice in vogue or something?
Wait, my URL list is about ten thousand lines... It takes a while to process... How will I know that I've got them in Python? How can I show the processing and spit out the first element in urls to the console, so I know I've got what I want?
OK, I'll do that now. I'm running your code now without it and printing out the first element in urls to see if I've got it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.