1

I can without problem select single value using xpath in python but how to join few single xpath to get one?
here is sample fragment of html source (r.content):

<div class="members">
    <h2>Members</h2>
    <div class="member">
        <span title="Last Online:&nbsp;2017-02-20 22:37:42" data-time="2017-02-20T22:37:42Z">
          <span class="profile-link">
            <a href="/account/view-profile/KonterBolet">
              <img class="achievement" src="36.png" alt="Completed 36" title="Completed 36">KonterA</a>
          </span>
          <span class="memberType">Leader</span>
        </span>
    </div>
    <div class="member">
        <span title="Last Online:&nbsp;2017-02-19 11:28:20" data-time="2017-02-19T11:28:20Z">
          <span class="profile-link hasTwitch twitchOffline" data-twitch-user="mardok_tv">
            <a href="/account/view-profile/mardok">
              <img class="achievement" src="35.png" alt="Completed 35" title="Completed 35">mardok</a>
            <a class="twitch" href="//www.twitch.tv/mardok_tv" target="_blank" title="Offline"></a>
          </span>
          <span class="memberType">Officer</span>
        </span>
    </div>
</div>

I use python requests to get content and lxml to parse it

import requests
from lxml import html
ses = requests.session()
r = ses.get(SITE_URL)
webContent = html.fromstring(r.content)

first xpath:
acc = webContent.xpath("//span/a[contains(@href,'account/view-profile')]/text()")
and result:
['konterA', 'mardok']

second xpath :
twitch = webContent.xpath('//span/@data-twith-user')
and result:
['mardok_tv']

third xpath:
lastOnline = webContent.xpath('//span/@data-time')
and result:
['2017-02-20T22:37:42Z','2017-02-19T11:28:20Z']

How to join this three together to get result like this:
[['konterA','','2017-02-20T22:37:42Z'],['mardok','mardok_tv','2017-02-19T11:28:20Z']

2 Answers 2

1

Consider parsing all items together under same parent, iterating on a top-level xpath. And use XPath's concat() to return an empty length string '' if no attrib/element value exists. Below also uses XPath's normalize-space() to remove line breaks and carriage returns from values.

# PARSING POSTED SNIPPET AS STRING
webContent = html.fromstring(htmlstr)

# INITIALIZING LISTS
acc = []; twitch = []; lastOnline = []

# ITERATING THROUGH SECOND CHILD <SPAN>
for i in webContent.xpath("//span/span[1]"):    
    acc.append(i.xpath("concat(normalize-space(a[contains(@href,'account/view-profile')]),'')"))
    twitch.append(i.xpath("concat(@data-twitch-user, '')"))
    lastOnline.append(i.xpath("concat(../@data-time, '')"))

# ZIP EQUAL LENGTH LISTS
xpath_list = list(zip(acc, twitch, lastOnline))

print(xpath_list)
# [('KonterA', '', '2017-02-20T22:37:42Z'), ('mardok', 'mardok_tv', '2017-02-19T11:28:20Z')]
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks you so much, this is exactly what I need, I try get it in many ways but I never ever consider iterfind. Thanks!
You actually don't need iterfind() as xpath() will do. See update! One of those "of course..." moments. Iterfind comes from the built-in etree but traverses elements like lxml's xpath.
0

let's call them first_list, second_list and third_list. Modify second_list as:

second_list = [ i if i.strip("_tv") in first_list else "" for i in second_list ]

after that, do :

 zip(first_list, second_list, third_list)

This should give you a list of tuples in same way.

[('konterA','','2017-02-20T22:37:42Z'),('mardok','mardok_tv','2017-02-19T11:28:20Z')]

3 Comments

I can't simply join lists together because values on this list are totaly different not only '_tv' part
@mastaBot, then how do you know where to put the word ? For eg. if there was "foo" in place of "mardok_tv", then what should be the output ?
I don't know and then I need to do it like in @Parfait example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.