0

I'm trying to scrape this link: 34th government

(https://knesset.gov.il/govt/eng/GovtByNumber_eng.asp)

which has several tables, but when i perform a request using this code:

import requests
from bs4 import BeautifulSoup

govts_url = r'https://knesset.gov.il/govt/eng/GovtByNumber_eng.asp'
website_url = requests.get(govts_url).text
soup = BeautifulSoup(website_url, 'lxml')
print(f"HTML: \n {soup.prettify()}")

I get the following result:

 <html>
 <head>
  <meta charset="utf-8"/>
  <script>
   window.rbzid="Q5gSRBmIWVopQazRgPTWKOEV0wGh1o+KvPO3KMiDuHxM9vVecPeHn4ult+Ba/KU9zInGRSRXUggEmkFs+D5NKSC/WEkCn+B4PCw9CeWkT+Q=";
        u82222.O=function(x){return x;};u82222.E=function (){return typeof u82222.t.u==='function'?u82222.t.u.apply(u82222.t,arguments):u82222.t.u;};u82222.i=function(x,y){return x+y;};u82222.A=function (){return typeof u82222.u.u==='function'?u82222.u.u.apply(u82222.u,arguments):u82222.u.u;};u82222.Y=function (){return typeof u82222.u.u==='function'?u82222.u.u.apply(u82222.u,arguments):u82222.u.u;};u82222.n=function(x,y){return x+y;};u82222.f=function(x,y){return x+y;};u82222.u=function(){var M=function(K,N){var I=N&0xffff;var r=N-I;return(r*K|0)+(I*K|0)|0;},Y=function(x,d,Z){var n=0xcc9e2d51,b=0x1b873593;var E=Z;var O=d&~0x3;for(var w=0;w<O;w+=4){var e=x.charCodeAt(w)&0xff|(x.charCodeAt(w+1)&0xff)<<8|(x.charCodeAt(w+2)&0xff)<<16|(x.charCodeAt(w+3)&0xff)<<24;e=M(e,n);e=(e&0x1ffff)<<15|e>>>17;e=M(e,b);E^=e;E=(E&0x7ffff)<<13|E>>>19;E=E*5+0xe6546b64|0;}e=0;switch(d%4){case 3:e=(x.charCodeAt(O+2)&0xff)<<16;case 2:e|=(x.charCodeAt(O+1)&0xff)<<8;case 1:e|=x.charCodeAt(O)&0xff;e=M(e,n);e=(e&0x1ffff)<<15|e>>>17;e=M(e,b);E^=e;}E^=d;E^=E>>>16;E=M(E,0x85ebca6b);E^=E>>>13;E=M(E,0xc2b2ae35);E^=E>>>16;return E;};return{u:Y};}();u82222.d=function(x,y){return x+y;};u82222.K=function (){return typeof u82222.u.u==='function'?u82222.u.u.apply(u82222.u,arguments):u82222.u.u;};u82222.N=function (){return typeof u82222.t.u==='function'?u82222.t.u.apply(u82222.t,arguments):u82222.t.u;};u82222.Z=function(x,y){return x+y;};u82222.I=function (){return typeof u82222.u.u==='function'?u82222.u.u.apply(u82222.u,arguments):u82222.u.u;};u82222.e=function (){return typeof u82222.t.u==='function'?u82222.t.u.apply(u82222.t,arguments):u82222.t.u;};u82222.t=function(){return{u:function(K){var A='',I=decodeURI("1?'%1CYH.=uVWU~%254_hW,o,WKM%22(-W%5BW,o,LU%075?'%1CH%5D9.5LU%07#?'%1C@W,o6LU%07%22?'%1Ch%5C$.%25N%07D1?'%1C%5DW,o4LU%07%124=LU%07%3C.%25N%07F=?'%1COW,o7bAH%3E?'%1CL%5B.=uF@F%3E%02%25N%07v%0F13SG%5D?,:AWU~13LU%07%3E?'%1CvY8%205FFD.=uF%5BF%3C-%25N%07J1?'%1C%5CW,o7");for(var Y=0,M=0;Y<I.length;Y++,M++){if(M===K.length){M=0;}A+=String.fromCharCode(I.charCodeAt(Y)^K.charCodeAt(M));}A=A.split('~|.');return function(t){return A[t];};}('PA[2))')};}();u82222.o=function(x,y){return x+y;};u82222.r=function (){return typeof u82222.t.u==='function'?u82222.t.u.apply(u82222.t,arguments):u82222.t.u;};u82222.b=function(x,y){return x+y;};u82222.w=function(x){return x;};u82222.s=function(x,y){return x+y;};u82222.F=function(x,y){return x+y;};u82222.M=function (){return typeof u82222.u.u==='function'?u82222.u.u.apply(u82222.u,arguments):u82222.u.u;};u82222.T=function(x,y){return x>y;};function u82222(){}u82222.x=function (){return typeof u82222.t.u==='function'?u82222.t.u.apply(u82222.t,arguments):u82222.t.u;};(typeof window==="object"?window:global).u82222=u82222;_=window;if(u82222.w(u82222.O(_[u82222.r(24)+u82222.e(0)+u82222.E(25)+u82222.E(14)+u82222.e(18)])||_[u82222.N(26)]||_[u82222.d(u82222.F(u82222.n(u82222.e(28),u82222.r(30))+u82222.N(20),u82222.N(14)),u82222.r(18))]||_[u82222.x(23)])||_[u82222.b(u82222.x(16),u82222.x(19))+u82222.r(6)+u82222.x(11)]||_[u82222.Z(u82222.E(6)+u82222.x(10)+u82222.e(9),u82222.x(14))]||_[u82222.s(u82222.T(975.11,476.89)?u82222.N(8):(13,105.77),u82222.E(1))+u82222.E(5)+u82222.N(25)]||_[u82222.E(4)]||_[u82222.o(u82222.x(3)+u82222.N(29)+u82222.e(14),u82222.e(15))+u82222.N(10)+u82222.x(7)]||_[u82222.i(u82222.e(2)+u82222.N(18)+u82222.N(12)+u82222.e(13)+u82222.x(22)+u82222.E(15)+u82222.E(25),u82222.e(27))+u82222.E(21)]){}else{location[u82222.f(u82222.r(11)+u82222.x(6)+u82222.e(17)+u82222.N(0),u82222.e(2))]();}
  </script>
 </head>
 <body>
 </body>
</html>

Which is, of course, not the content i desire. I guess i'm missing some kind of "activation" to the site, to see the true content. But how can i see it?

Thx!

1
  • Did you check whether the content you're after is dynamically generated? Commented Feb 18, 2020 at 22:20

2 Answers 2

1

I tried with selenium (download the driver that you would, in my case Chromedriver) and it works, you can get the full html source os the page and from here you can continue with the web scraping. I hope this helps you :)

from bs4 import BeautifulSoup
from selenium import webdriver

govts_url = r'https://knesset.gov.il/govt/eng/GovtByNumber_eng.asp'
exe_path = r'C:\Users\JRV\Desktop\WebCrawling/chromedriver.exe'

browser = webdriver.Chrome(exe_path)
browser.get(govts_url)
page = browser.page_source
browser.close()

soup = BeautifulSoup(page, 'html.parser')
print(f"HTML: \n {soup}")
Sign up to request clarification or add additional context in comments.

Comments

0

I believe this could be one of those sites where javascript activates the page, which in that case you would have to use something like Selenium. Check out this post.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.