0

Using scrapy, i want to fetch parameter of onclink function only, i am using response.css() to extract links.

If i am using regular expression for getting parameter only, got an error (AttributeError: 'list' object has no attribute 're')

    <table class="table table-striped table-bordered table-hover Tax" >
               <thead>
                  <tr>
                    <th>Sr No.</th>
                    <th>Name</th>
                    <th>Registration No</th>
                    <th>Address</th>
                    <th>Sectors</th>
                  </tr>
               </thead>
               <tbody>
<tr>

    <td>1</td><td> <a href="javascript:void(0)" onclick='show_info("173543");'> ABCD</a></td>
                    <td>Address</td>
                    <td>12345</td>
                    <td>Data Not Found</td>
                  </tr></tbody></table>

I am using Scrapy for Scrap onclick parameter

link_first = response.css(".table.table-striped.table-bordered.table-hover.Tax>tbody>tr>td>a").xpath("./@onclick").extract().re("show_info\((.+?)\)", text)

Required O/P : 173543

2
  • 1
    .extract()[0].re("show_info\((.+?)\)"? Commented Aug 13, 2018 at 8:13
  • Thanks @Andersson By removing .extract() and text got my response Commented Aug 13, 2018 at 8:18

2 Answers 2

2

extract() extracts the textual data as a list of strings . To match selectors with regular expression, you need to use re() on selector itself.

html = """<table class="table table-striped table-bordered table-hover Tax" >
            <thead>
                <tr>
                    <th>Sr No.</th>
                    <th>Name</th>
                    <th>Registration No</th>
                    <th>Address</th>
                    <th>Sectors</th>
                </tr>
            </thead>
            <tbody>
<tr>

    <td>1</td><td> <a href="javascript:void(0)" onclick='show_info("173543");'> ABCD</a></td>
                    <td>Address</td>
                    <td>12345</td>
                    <td>Data Not Found</td>
                </tr></tbody></table>"""

from scrapy.selector import Selector 
response= Selector(text=html)
links = response.css(".table.table-striped.table-bordered.table-hover.Tax>tbody>tr>td>a").xpath("./@onclick").re("show_info\((.+?)\)")

print links

returns :

[u'"173543"']

Hope this helps :)

Sign up to request clarification or add additional context in comments.

1 Comment

or alternatively as Gangabass pointed out you can use re_first as a shortcut for getting only the first member.
0

I use XPath contains to get correct onclick content and parse it using re_first()

link_id = response.xpath('//td/a[contains(@onclick, "show_info")]/@onclick').re_first( r'"([^"]+)"')

1 Comment

Please include some explanation with your answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.