Search a string in javascript using python

Question

Following my previous question : how to fetch javascript contents in python

I tried to make another script which fetches the data from a javascript. After getting the webpage contents of course.

But, it's just not showing up the content I want. I want to find "content_id" from the javascript of the page. This is the page :- http://www.hulu.com/watch/815743

Here's what I have right now.

import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput


Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('script',{'type':'text/javascript'})
pattern = re.compile(r'"content_id":"(.*?)"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

I get this error :

AttributeError: 'NoneType' object has no attribute 'text'

Any idea how to solve this issue..?

Have you considered searching for "AttributeError: 'NoneType' object has no attribute"? There are quite a few similar questions out there already... — jonrsharpe
– jonrsharpe, Commented Sep 27, 2015 at 16:41
I literally do not believe you - and even if you actually had tried all of the various suggestions, why don't you mention that in the question? — jonrsharpe
– jonrsharpe, Commented Sep 27, 2015 at 16:42

alecxe · Accepted Answer · 2015-09-28 02:48:31Z

2

There are two problems in your regular expression pattern:

the quotes are escaped with backslashes in the script contents, take that into account
there is a whitespace after the colon

Here is the fixed version:

pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)

Works for me, getting 60585710 as a result.

FYI, here is the complete code that I'm executing:

import re

import requests
from bs4 import BeautifulSoup

Link = 'http://www.hulu.com/watch/815743'
q = requests.get(Link)
soup = BeautifulSoup(q.text)

pattern = re.compile(r'\\"content_id\\":\s*\\"(.*?)\\"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print pattern.search(script.text).group(1)

edited Sep 28, 2015 at 2:48

answered Sep 27, 2015 at 20:04

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Xonshiz Over a year ago

Okay. I see the error... But, I'm still getting the same error. Did you change anything else in your script?

Xonshiz Over a year ago

It seems that my python installation had some error.I tried my code on other machine and it worked. Thanx :)

Collectives™ on Stack Overflow

Search a string in javascript using python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related