0

I am using BeautifulSoup and trying to parse the output to excel.

<div id="MainContent_BuildSheetUpdatePanel">
                <div id="MainContent_BuildSheetPanel">
                    <div class="row">
                        <div class="col-sm-4 mt-2">
                            <div class="card border-primary">
                                <div class="card-header">
                                    <h4 class="card-title text-center">SCHOOL:</h4>
                                </div>
                                <div class="card-body">
                                    <div class="form-group">
                                        <label>Class ID: </label>
                                        <input name="ctl00$MainContent$ClassIdTextBox" type="text" value="250" id="MainContent_IdTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                            <span id="MainContent_rfvClassIdTextBox" style="color:Red;display:none;">Required</span>
                                    </div>
                                    <div class="form-group">
                                        <label>Profile ID: </label>
                                        <input name="ctl00$MainContent$ProfileIdTextBox" type="text" value="NA" id="MainContent_ServiceIdTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvProfileIdTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>Serial Number: </label>
                                        <input name="ctl00$MainContent$NumberTextBox" type="text" value="763" id="MainContent_NumberTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvNumberTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>MC Number: </label>
                                        <input name="ctl00$MainContent$MCSerialNumberTextBox" type="text" value="290" id="MainContent_SerialNumberTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvMCSerialNumberTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>SK: </label>
                                        <input name="ctl00$MainContent$SkTextBox" type="text" value="384xm" id="MainContent_SkTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Profile: </label>
                                        <input name="ctl00$MainContent$ProfileTextBox" type="text" value="NA" id="MainContent_ProfileTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Address: </label>
                                        <input name="ctl00$MainContent$AddressTextBox" type="text" value="192.168.56.54" id="MainContent_AddressTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Dn: </label>
                                        <input name="ctl00$MainContent$DnTextBox" type="text" value="NA" id="MainContent_DnTextBox" disabled="disabled"  class="aspNetDisabled form-control">
                                        <span id="MainContent_rfvoDnTextBox" style="color:Red;display:none;">Required</span> 
                                    </div>
                                    <div class="form-group">
                                        <label>Hostname: </label>
                                        <input name="ctl00$MainContent$PrimaryHostNameTextBox" type="text" value="N/A" id="MainContent_HostNameTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Primary: </label>
                                        <input name="ctl00$MainContent$PrimarySidTextBox" type="text" value="N/A" id="MainContent_SidTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Server: </label>
                                        <input name="ctl00$MainContent$ServerTextBox" type="text" value="sv41" id="MainContent_ServerTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                    </div>
                                    <div class="form-group">
                                        <label>Server-Address: </label>
                                        <input name="ctl00$MainContent$AddressTextBox" type="text" value="10.56.1.41" id="MainContent_AddressTextBox" disabled="disabled" class="aspNetDisabled form-control">
                                         <span id="MainContent_ServerIpTxtRequiredFieldValidator" style="color:Red;display:none;">Required</span>                    
                                    </div>
                                    </div>
                                </div>
                            </div>
                        </div>

Expected output:

Class ID Profile ID Serial Number MC Number SK Profile Address Dn Hostname Primary 250 NA 763 290 384xm NA 192.168.56.54 NA NA NA

from bs4 import BeautifulSoup 
import requests 

html= """Inputfile """ 

for item in soup.select("div.form-group"): print(item.get_text())
1
  • from bs4 import BeautifulSoup import requests html= """Inputfile """ for item in soup.select("div.form-group"): print(item.get_text()) Commented Oct 19, 2019 at 17:46

1 Answer 1

1

You want the 'value' attribute. Depending on your full html you may be able to shorten the selectors.

from bs4 import BeautifulSoup as bs
import csv

soup = bs(your_html, 'lxml')

with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
    w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
    w.writerow([i.text for i in soup.select('.form-group label')])
    w.writerow([i['value'] for i in soup.select('input.aspNetDisabled')])

Specific items:

soup = bs(your_html, 'lxml')

items = ['"Class ID:"','"Serial Number:"','"Hostname:"']
items = ','.join(items)
nodes = [i['value'] for i in soup.select(f'label:contains({items}) + .aspNetDisabled')]
headers =  [i.text for i in soup.select(f'label:contains({items})')]

with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
    w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
    w.writerow(headers)
    w.writerow(nodes)
Sign up to request clarification or add additional context in comments.

3 Comments

This is working if that has value ,but sometimes i dont have value in it .Is it possible to get the required fields and value like Class ID,Serial Number,Hostname only
Do those you list all have value attribute?
yes i have ,but i dont need all the fields instead i want only few Class ID,Serial Number,Hostname like this to csv

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.