0

I have a webpage that have a structure like this:

<div class="l_post j_l_post l_post_bright "...>
    ...
    <div class="j_lzl_c_b_a core_reply_content">
       <li class="lzl_single_post j_lzl_s_p first_no_border" ...>
         <div class="lzl_cnt">
         content
         </div>
       </li>
       <li class="lzl_single_post j_lzl_s_p first_no_border" ...>
       ...
       </li>
    </div>

</div>
<div class="l_post j_l_post l_post_bright "...>
...(contain content, same as above)
</div>
...

Currently I could select all the content in one step like this:

for i in driver.find_elements_by_xpath('//*[@class="lzl_cnt"]'):
    print(i.text)

But as you could see, the webpage consist of repetitive blocks that contain the contents that I need, therefore I want to get those contents separately along with other information that differs between those repetitive blocks(<div class="l_post j_l_post l_post_bright "...>...</div>), moreover, I want those contents within <li class ="lzl_single_post"...>to be separated so as to be easier for me to process the contents later . I tried this:

items = []

# get each blocks
for sel in driver.find_elements_by_xpath('//div[@class="l_post j_l_post l_post_bright  "]'):
    name = sel.find_element_by_css_selector('.d_name').text
    try: content = sel.find_element_by_css_selector('.j_d_post_content').text
    except: content = '',
    try: 
        reply = []
        # get each post within specific block
        for i in sel.find_elements_by_xpath('//*[@class="lzl_cnt"]'):
            reply.append(i.text)
    except: reply = []
    items.append({'name': name, 'content': content, 'reply': reply})

But the result shows that I am getting all the replies on the webpage every time the outer for-loop runs instead of a set of replies for each individual block that I wanted

Any suggestions?

1 Answer 1

1

Just add . (context pointer) to XPath as

sel.find_elements_by_xpath('.//*[@class="lzl_cnt"]')

Note that //*[@class="lzl_cnt"] means all nodes in DOM with "lzl_cnt" class name while .//*[@class="lzl_cnt"] means all nodes that are descendant of sel with "lzl_cnt" class name

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.