python async requests_html div not loaded with JS (?) data

Question

Im trying to fetch links to guides about given class and it's style of play. Here on screenshot highlighted in yellow is the div responsible for their rendering. I need to use async since this class is used in discord.py bot and trying to use HTMLSession() resulted in error saying i need to use AsyncHTMLSession. Website address - https://immortal.maxroll.gg/category/build-guides#classes%3D%5Bdi-necromancer%5D%26metas%3D%5Bdi-pvp%5D

But my code outputs this div as empty

code

from requests_html import AsyncHTMLSession

asession = AsyncHTMLSession()

class Scrapper:
    def __init__(self):
        self.headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0'}

    async def return_page(self, url):
        
        response = await asession.get(url)
        await response.html.arender(timeout=15, sleep=5)
        #article = response.html.find('#filter-results', first=True)
        print(response.html.html)
        return response

    async def return_build_articles(self, userClass, instance):
        
        url = f"https://immortal.maxroll.gg/category/build-guides#classes%3D%5Bdi-{userClass}%5D%26metas%3D%5Bdi-{instance}%5D"

        
        
        articles = await self.return_page(url)

selected part of the output

</div>
</div>
</div>
</div>
</div>
</form> <hr class="global-separator ">
<div id="immortal-mobile-mid-banner" class="adsmanager-ad-container mobile-banner"></div>
<div id="filter-results" class="posts-list"><!-- here should be the results --></div>
<div class="page-navigation" role="navigation"></div>
</div>
</div>
</div>

Andrej Kesely · Accepted Answer · 2022-06-07 12:50:25Z

1

The data you see on the page is loaded with Javascript from external URL. To get the data asynchronously you can use asynchttp package. For example:

import json
import asyncio
import aiohttp


url = "https://site-search-origin.maxroll.gg/indexes/wp_posts_immortal/search"

data = {
    "filters": '(classes = "di/necromancer") AND (metas = "di/PvP") AND (category = "Build Guides")',
    "limit": 1000,
    "offset": 0,
    "q": "",
}

headers = {
    "X-Meili-API-Key": "3c58012ad106ee8ff2c6228fff2161280b1db8cda981635392afa3906729bade"
}


async def main():
    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=data, headers=headers) as resp:
            json_data = await resp.json()

    # uncomment to print all data:
    # print(json.dumps(json_data, indent=4))

    for hit in json_data["hits"]:
        print(hit["post_title"])
        print(hit["permalink"])
        print("-" * 80)


asyncio.run(main())

Prints:

Bone Spikes Necromancer PvP Guide
https://immortal.maxroll.gg/build-guides/bone-spikes-necromancer-pvp-guide-battlegrounds-rite-of-exile
--------------------------------------------------------------------------------
Bone Wall Necromancer PvP Guide
https://immortal.maxroll.gg/build-guides/bone-wall-necromancer-pvp-guide-battlegrounds-rite-of-exile
--------------------------------------------------------------------------------

answered Jun 7, 2022 at 12:50

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Swagnar Over a year ago

Thank you! I thought using requests_html will do the trick since they advert that this package can load javascript using arender() method but I think i maybe was doing something wrong.

Collectives™ on Stack Overflow

python async requests_html div not loaded with JS (?) data

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related