Regular expression in scrapy to get parameter of onclick function

Question

Using scrapy, i want to fetch parameter of onclink function only, i am using response.css() to extract links.

If i am using regular expression for getting parameter only, got an error (AttributeError: 'list' object has no attribute 're')

    <table class="table table-striped table-bordered table-hover Tax" >
               <thead>
                  <tr>
                    <th>Sr No.</th>
                    <th>Name</th>
                    <th>Registration No</th>
                    <th>Address</th>
                    <th>Sectors</th>
                  </tr>
               </thead>
               <tbody>
<tr>

    <td>1</td><td> <a href="javascript:void(0)" onclick='show_info("173543");'> ABCD</a></td>
                    <td>Address</td>
                    <td>12345</td>
                    <td>Data Not Found</td>
                  </tr></tbody></table>

I am using Scrapy for Scrap onclick parameter

link_first = response.css(".table.table-striped.table-bordered.table-hover.Tax>tbody>tr>td>a").xpath("./@onclick").extract().re("show_info\((.+?)\)", text)

Required O/P : 173543

Thanks @Andersson By removing .extract() and text got my response — helpdoc
– helpdoc, Commented Aug 13, 2018 at 8:18

Madhan Varadhodiyil · Accepted Answer · 2018-08-13 08:26:30Z

2

extract() extracts the textual data as a list of strings . To match selectors with regular expression, you need to use re() on selector itself.

html = """<table class="table table-striped table-bordered table-hover Tax" >
            <thead>
                <tr>
                    <th>Sr No.</th>
                    <th>Name</th>
                    <th>Registration No</th>
                    <th>Address</th>
                    <th>Sectors</th>
                </tr>
            </thead>
            <tbody>
<tr>

    <td>1</td><td> <a href="javascript:void(0)" onclick='show_info("173543");'> ABCD</a></td>
                    <td>Address</td>
                    <td>12345</td>
                    <td>Data Not Found</td>
                </tr></tbody></table>"""

from scrapy.selector import Selector 
response= Selector(text=html)
links = response.css(".table.table-striped.table-bordered.table-hover.Tax>tbody>tr>td>a").xpath("./@onclick").re("show_info\((.+?)\)")

print links

returns :

[u'"173543"']

Hope this helps :)

answered Aug 13, 2018 at 8:26

Madhan Varadhodiyil

2,1061 gold badge16 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Granitosaurus Over a year ago

or alternatively as Gangabass pointed out you can use re_first as a shortcut for getting only the first member.

gangabass · Accepted Answer · 2018-08-13 11:06:11Z

0

I use XPath contains to get correct onclick content and parse it using re_first()

link_id = response.xpath('//td/a[contains(@onclick, "show_info")]/@onclick').re_first( r'"([^"]+)"')

edited Aug 13, 2018 at 11:06

answered Aug 13, 2018 at 8:50

gangabass

10.7k2 gold badges26 silver badges36 bronze badges

1 Comment

steliosbl Over a year ago

Please include some explanation with your answer.

Collectives™ on Stack Overflow

Regular expression in scrapy to get parameter of onclick function

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related