Here is the case, I need to save a web page's source code as html file. But if you look at the web page, there are lots of section, I don't need them, I only want to save the source code of the article itself.
code:
from urllib.request import urlopen
page = urlopen('http://www.abcde.com')
page_content = page.read()
with open('page_content.html', 'wb') as f:
f.write(page_content)
I can save the whole source code from my code, but how can I just save the only part I want?
Explain:
<div itemscope itemtype="http://schema.org/MedicalWebPage">
.
.
.
</div>
I need to save the source code with and inside this tag , not extract the sentences in the tags.
The result I want is to save like this:
<div itemscope itemtype="http://schema.org/MedicalWebPage">
<div class="col-md-12 col-xs-12" style="padding-left:10px;">
<h1 itemprop="name" class="page_article_title" title="Apple" id="mask">Apple</h1>
</div>
<!--Article Start-->
<section class="page_article_div" id="print">
<article itemprop="text" class="page_article_content">
<p>
<img alt="Apple" src="http://www.abcde.com/383741719.jpg" style="width: 300px; height: 200px;" /></p>
<p>
The apple tree (Malus pumila, commonly and erroneously called Malus domestica) is a deciduous tree in the rose family best known for its sweet, pomaceous fruit, the apple.</p>
<p>
It is cultivated worldwide as a fruit tree, and is the most widely grown species in the genus Malus.</p>
<p>
<strong><span style="color: #884499;">Appe is red</span></strong></p>
<ol>
<li>
Germanic paganism</li>
<li>
Greek mythology</li>
</ol>
<p style="text-align: right;">
【Jane】</p>
<p style="text-align: right;">
Credit : Wiki</p>
</article>
<div style="text-align:right;font-size:1.2em;"><a class="authorlink" href="http://www.abcde.com/web/online;url=http://61.66.117.1234/name=2017">2017</a></div>
<br />
<div style="text-align:right;font-size:1.2em;">【Thank you!】</div>
</section>
<!--Article End-->
</div>
BeautifulSoup.tag.prettify()to filepyquerywhich works exactly likejquery& easy for DOM query & manipulation