0

Following my previous question : how to fetch javascript contents in python

I tried to make another script which fetches the data from a javascript. After getting the webpage contents of course.

But, it's just not showing up the content I want. I want to find "content_id" from the javascript of the page. This is the page :- http://www.hulu.com/watch/815743

Here's what I have right now.

import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput


Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('script',{'type':'text/javascript'})
pattern = re.compile(r'"content_id":"(.*?)"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

I get this error :

AttributeError: 'NoneType' object has no attribute 'text'

Any idea how to solve this issue..?

3
  • Have you considered searching for "AttributeError: 'NoneType' object has no attribute"? There are quite a few similar questions out there already... Commented Sep 27, 2015 at 16:41
  • I tried the solutions ... didn't work. Commented Sep 27, 2015 at 16:42
  • 2
    I literally do not believe you - and even if you actually had tried all of the various suggestions, why don't you mention that in the question? Commented Sep 27, 2015 at 16:42

1 Answer 1

2

There are two problems in your regular expression pattern:

  • the quotes are escaped with backslashes in the script contents, take that into account
  • there is a whitespace after the colon

Here is the fixed version:

pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)

Works for me, getting 60585710 as a result.

FYI, here is the complete code that I'm executing:

import re

import requests
from bs4 import BeautifulSoup

Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)

pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)
Sign up to request clarification or add additional context in comments.

2 Comments

Okay. I see the error... But, I'm still getting the same error. Did you change anything else in your script?
It seems that my python installation had some error.I tried my code on other machine and it worked. Thanx :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.