Python scrapy login using session cookie

Question

I'm trying to scrape from sites after authentication. I was able to take the JSESSIONID cookie from an authenticated browser session and download the correct page using urlopener like below.

import cookielib, urllib2

cj = cookielib.CookieJar()
c1 = cookielib.Cookie(None, "JSESSIONID", SESSIONID, None, None, DOMAIN,
        True, False, "/store",True, False, None, False, None, None, None)
cj.set_cookie(c1)

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
fh = opener.open(url)

But when I use this code for creating scrapy requests (tried both dict cookies and cookiejar), the downloaded page is the non-authenticated version. Anyone know what the problem is?

cookies = [{
    'name': 'JSESSIONID',
    'value': SESSIONID,
    'path': '/store',
    'domain': DOMAIN,
    'secure': False,
}]

request1 = Request(url, cookies=self.cookies, meta={'dont_merge_cookies': False})
request2 = Request(url, meta={'dont_merge_cookies': True, 'cookiejar': cj})

Did you tried just cookies={'JSESSIONID': SESSIONID}?

R. Max
– R. Max

2013-11-30 05:49:09 +00:00
Commented Nov 30, 2013 at 5:49 — R. Max
– R. Max, Commented Nov 30, 2013 at 5:49

Faniry Ramanohisoa · Accepted Answer · 2016-07-06 18:46:19Z

2

You were able to get the JSESSIONID from your browser.

Why not let Scrapy simulate a user login for you?

Then, I think your JSESSIONID cookie will stick to subsequent requests given that :

Scrapy uses a single cookie jar (as opposed to Multiple cookie sessions per spider) for the entire spider lifetime containing all your scraping steps,
the COOKIES_ENABLED setting for the cookie middleware defaults to true,
dont_merge_cookies defaults to false :
When some site returns cookies (in a response) those are stored in the cookies for that domain and will be sent again in future requests. That’s the typical behaviour of any regular web browser. However, if, for some reason, you want to avoid merging with existing cookies you can instruct Scrapy to do so by setting the dont_merge_cookies key to True in the Request.meta.

Example of request without merging cookies:
```
request_with_cookies = Request(url="http://www.example.com",
                               cookies={'currency': 'USD', 'country': 'UY'},
                               meta={'dont_merge_cookies': True})
```

edited Jul 6, 2016 at 18:46

answered Jul 6, 2016 at 18:39

Faniry Ramanohisoa

616 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

West Over a year ago

I alwas see every login example end here ` # continue scraping with authenticated session...` yet thats the exact last step most people have trouble with. I'm trying to use scrapy and the login is successful, yet my next request is still unauthenticated and fails with 403 error

Collectives™ on Stack Overflow

Python scrapy login using session cookie

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related