1

I am using BeautifulSoup to get data from a webpage. The webpage provides a date, which I see when I open the page in Firefox. However, when I view page source there is no date, just some javascript that generates the date. I see there are some related questions on here, I see references to ajax and json, I am just an amaeteur programmer though and remain confused here. Here is some of the HTML code which has the javascript code in it with the date I need.

<div class="match-details">
  <p class="floatleft">
    BARCLAYS PREMIER LEAGUE 

    <span>
      <script type="text/javascript">
        (function(){
        var d = new Date(1345489200000);

        var year = d.getFullYear();
        var month = d.getMonth() + 1;
        var day = d.getDate();
        var minutes = d.getMinutes();
        var hours = d.getHours();                                        

        if (minutes < 10) { minutes = '0' + minutes; }
        var dmy = [day, month, year];
        var hm = [hours, minutes];
        if (SITE_EDITION == 'us/en') {
            var dmy = [month, day, year];    
        }
        var matches_local = dmy.join('/') + " " + hm.join(':'); 
        matches_local += "<span class='live-red'>*</span>";

        document.write(matches_local);
        })();                                                       
      </script>
    </span>

  </p>
</div>
8
  • So what is your question? Commented Mar 1, 2013 at 20:25
  • Could you outdent the code a bit? There's no need for pushing it off the page... Commented Mar 1, 2013 at 20:28
  • @BurhanKhalid that code will output a date to the page when run in a browser. He wants to know how to get that programatically if he's screen-scraping with python Commented Mar 1, 2013 at 20:30
  • @TimPietzcker I edited it to fix indentation but we'd have to wait for people to review the edit and accept it Commented Mar 1, 2013 at 20:30
  • 1
    Are you just trying to find the string new Date(1345489200000); and turn that into a Python datetime object? Or are you trying to read the page rendered by this JavaScript and extract a date from the resulting HTML? Commented Mar 1, 2013 at 20:58

1 Answer 1

2

BeautifulSoup is an HTML processing library. You need a HTML + Javascript processing library.

Read up on this Question : Programmatic Python Browser with JavaScript

As that QA states...you basically either need to use a real browser -- via Selenium -- or use a python browser that supports javascript -- like Spynner.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the response. I am looking at pyv8 unfortunately i am having a hard time getting it set up on ubuntu. the people maintaining the pyv8 site recommend to use the prebuilt version but there is no prebuilt version for linux. i am going to open a new thread specifically asking how javascript parses the the line above. i think that will be simpler.
Sorry, I was not clear & edited my repsonse. You need a HTML + Javascript processing library. PyV8 will only let you run javascript. It won't parse the page & tell you which javascript to run. You need to have a javascript-supporting HTML browser to trigger the correct events and allow for the DOM to be manipulated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.