I want to extract the data between the html tag 'title' and in the 'meta' tag, I want to extract value of URL attribute and that too the text just before the '?'.
<html lang="en" id="facebook" class="no_js">
<head>
<meta charset="utf-8" />
<script>
function envFlush(a) {function b(c){for(var d in)c[d]=a[d];}if(window.requireLazy){window.requireLazy(['Env'],b);}else{window.Env=window.Env||{};b(window.Env);}}envFlush({"ajaxpipe_token":"AXjbmsNXDxPlvhrf","lhsh":"4AQFQfqrV","khsh":"0`sj`e`rm`s-0fdu^gshdoer-0gc^eurf-3gc^eurf;1;enbtldou;fduDmdldourCxO`ld-2YLMIuuqSdptdru;qsnunuxqd;rdoe"});
</script>
<script>CavalryLogger=false;</script>
<noscript>
<meta http-equiv="refresh" content="0; URL=/notes/kursus-belajar-bahasa-inggris/bahasa-inggris-siapa-takut-/685004288208871?_fb_noscript=1" />
</noscript>
<meta name="referrer" content="default" id="meta_referrer" />
<title id="pageTitle">
" CARA CEPAT BELAJAR BAHASA INGGRIS MUDAH DAN MENYENANGKAN "
</title>
<link rel="shortcut icon" href="https://fbstatic-a.akamaihd.net/rsrc.php/yl/r/H3nktOa7ZMg.ico" />
i.e. CARA CEPAT BELAJAR BAHASA INGGRIS MUDAH DAN MENYENANGKAN and 685004288208871.
I tried the following code :
>>> soup.title.contents
output is
[u'" CARA CEPAT BELAJAR BAHASA INGGRIS MUDAH DAN MENYENANGKAN "']
In this I don't want the characters '[]' , 'u' and single quotes.
Also, on implementing the following :
>>> soup.meta.contents
I get the output as :
[]
What can I try next? I am new to BeautifulSoup.
soup.title.textis what you want. Theu'...'is only there because the interactive shell callsrepron the return value.