0

This is similar to my last question.

The thing is that the whole script looks like this :-http://pastebin.com/1MyGGD9h

And as you can see the 'userId' elements are repeated. The python script fetches the first one and ignores the second result. How do I fetch both of them and use their values separately.?

What I think could be done is fetch one element at a time and use the values separately. But, I can't seem to get it work.

Right now, for fetching the script .. my code looks like this:-

Import re

from bs4 import BeautifulSoup

page = """
<script type="text/javascript">
            var logged = true;
            var video_id = 59374;
            var item_type = 'official';

            var debug = false;
            var baseUrl = 'http://www.example.com';
            var base_url = 'http://www.example.com/';
            var assetsBaseUrl = 'http://www.example.com/assets';
            var apiBaseUrl = 'http://www.example.com/common';
            var playersData = [{"playerId":"showsPlayer","userId":true,"solution":"flash","playlist":[{"itemId":"5090","itemAK":"Movie"}]];
</script><script type="text/javascript" >
"""
soup = BeautifulSoup(page)

pattern = re.compile(r'"userId":"(.*?)"', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)

print pattern.search(script.text).group(1)

Right now, it shows "true". But, I want both the values. That is.. TRUE and FALSE, form both the elements. Any idea.?

0

1 Answer 1

0

There is only one userid in your question example but if there were two you should use findall, search will stop when it gets the first match:

a,b = pattern.findall(script.text)

Your regex also seems incorrect, there are no double quotes areound the values:

html="""<script type="text/javascript">
        var logged = true;
        var video_id = 59374;
        var item_type = 'official';

        var debug = false;
        var baseUrl = 'http://www.example.com';
        var base_url = 'http://www.example.com/';
        var assetsBaseUrl = 'http://www.example.com/assets';
        var apiBaseUrl = 'http://www.example.com/common';
        var playersData =     [{"playerId":"showsPlayer","userId":true,"solution":"flash","userId":false    ,"playlist":[{"itemId":"5090","itemAK":"Movie"}]];
</script><script type="text/javascript" >"""


pattern = re.compile(r'"userId":(.*?),')


print pattern.findall(page)
['true', 'false']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.