1

Im trying to fetch links to guides about given class and it's style of play. Here on screenshot highlighted in yellow is the div responsible for their rendering. I need to use async since this class is used in discord.py bot and trying to use HTMLSession() resulted in error saying i need to use AsyncHTMLSession. Website address - https://immortal.maxroll.gg/category/build-guides#classes%3D%5Bdi-necromancer%5D%26metas%3D%5Bdi-pvp%5D enter image description here

But my code outputs this div as empty

code

from requests_html import AsyncHTMLSession

asession = AsyncHTMLSession()

class Scrapper:
    def __init__(self):
        self.headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0'}

    async def return_page(self, url):
        
        response = await asession.get(url)
        await response.html.arender(timeout=15, sleep=5)
        #article = response.html.find('#filter-results', first=True)
        print(response.html.html)
        return response

    async def return_build_articles(self, userClass, instance):
        
        url = f"https://immortal.maxroll.gg/category/build-guides#classes%3D%5Bdi-{userClass}%5D%26metas%3D%5Bdi-{instance}%5D"

        
        
        articles = await self.return_page(url)

selected part of the output

</div>
</div>
</div>
</div>
</div>
</form> <hr class="global-separator ">
<div id="immortal-mobile-mid-banner" class="adsmanager-ad-container mobile-banner"></div>
<div id="filter-results" class="posts-list"><!-- here should be the results --></div>
<div class="page-navigation" role="navigation"></div>
</div>
</div>
</div>

1 Answer 1

1

The data you see on the page is loaded with Javascript from external URL. To get the data asynchronously you can use asynchttp package. For example:

import json
import asyncio
import aiohttp


url = "https://site-search-origin.maxroll.gg/indexes/wp_posts_immortal/search"

data = {
    "filters": '(classes = "di/necromancer") AND (metas = "di/PvP") AND (category = "Build Guides")',
    "limit": 1000,
    "offset": 0,
    "q": "",
}

headers = {
    "X-Meili-API-Key": "3c58012ad106ee8ff2c6228fff2161280b1db8cda981635392afa3906729bade"
}


async def main():
    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=data, headers=headers) as resp:
            json_data = await resp.json()

    # uncomment to print all data:
    # print(json.dumps(json_data, indent=4))

    for hit in json_data["hits"]:
        print(hit["post_title"])
        print(hit["permalink"])
        print("-" * 80)


asyncio.run(main())

Prints:

Bone Spikes Necromancer PvP Guide
https://immortal.maxroll.gg/build-guides/bone-spikes-necromancer-pvp-guide-battlegrounds-rite-of-exile
--------------------------------------------------------------------------------
Bone Wall Necromancer PvP Guide
https://immortal.maxroll.gg/build-guides/bone-wall-necromancer-pvp-guide-battlegrounds-rite-of-exile
--------------------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I thought using requests_html will do the trick since they advert that this package can load javascript using arender() method but I think i maybe was doing something wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.