I am trying to extract some attribute information from HTML file using BeautifulSoup. Below is the sample HTML and code I have tried.
<div id="rp_NaNnetSales" class="finsummary_nlptext add2Margin account nlpmain" style="display: inline;">
<div class="add2Margin account nlpremark"><br><br>
<div>Segment revenue and results</div>
<div></div>
</div>
<div class="add2Margin account nlpremark">This is my my revenue </div>
<div class="add2Margin account nlpremark">As a result, the Group turned in a respectable revenue of S$3,484.6 million for the financial year ended 31 December 2018 (' FY 2018'). Although FY 2018 revenue was 13.0% lower year- on- year, Venture attained a compounded annual growth rate
of 8.4% over the period from FY 2013 to FY 2018. ---- P11
</div>
</div>
<div id="rp_grossProfit" class="add2Margin account rationmain"><span class="ratio_name "><b>Gross Profit</b> increased by 191.3% to SGD 2,625,295.0 mil in FY18 (FY17: SGD 901,244.0 mil)</span>
</div>
<div id="rp_NaNgrossProfit" class="finsummary_nlptext add2Margin account nlpmain" style="display: inline;"></div>
<div id="rp_grossProfitMarginPercentage" class="add2Margin account rationmain"><span class="ratio_name "><b>GP margin</b> was stable at 100.0% in FY18 (FY17: 100.0% )</span>
</div>
with open(html_file_location, 'r') as f:
contents = f.read()
soup1 = BeautifulSoup(contents, features='lxml')
for child1 in soup1.recursiveChildGenerator():
if child1.name == "div":
for tag in child1.find_all("div"):
print(f'{tag.name}: {tag.text}')
print(f'{tag.name}: {tag.id}')
"tag.id" is incorrect but I am not sure how to correct it.