5

I'm trying to scrape websocket datas(frames) from a website using sockjs, in Python but I don't really know how to do that.

URL: ws://example.io/sockjs/wkzeza/websocket

In the web debugger I can see this response headers:

Date: Sun, 27 Aug 2017 09:42:15 GMT
Connection: upgrade
Set-Cookie: oWG+Kel2MBo0v9FQK81NvuvBZcUwChaMvG2bsv1Ofs9Q8hHN+PlTn6PolO/8MgFXh2ygqC7A8WsJ7cgZwvpwvsbSp0VCpRHqmYMhGGxr; Expires=Sun, 03 Sep 2017 09:42:15 GMT; Path=/
Upgrade: websocket
Sec-WebSocket-Accept: HA0gkvrFCF7qjVYIDvSBa5sJKkg=
Sec-WebSocket-Extensions: permessage-deflate
Server: nginx
CF-RAY: 394e146d34a12f65-MAD

Normally with only the response header I can retrieve the datas from the frames, right?

I've tried with this code but I can read the content:

from websocket import create_connection
import json

headers = json.dumps({'Date': 'Sun, 27 Aug 2017 09:42:15 GMT',
'Connection': 'upgrade',
'Set-Cookie': 'oWG+Kel2MBo0v9FQK81NvuvBZcUwChaMvG2bsv1Ofs9Q8hHN+PlTn6PolO/8MgFXh2ygqC7A8WsJ7cgZwvpwvsbSp0VCpRHqmYMhGGxr; Expires=Sun, 03 Sep 2017 09:42:15 GMT; Path=/',
'Upgrade': 'websocket',
'Sec-WebSocket-Accept': 'HA0gkvrFCF7qjVYIDvSBa5sJKkg=',
'Sec-WebSocket-Extensions': 'permessage-deflate',
'Server': 'nginx',
'CF-RAY': '394e146d34a12f65-MAD'})

ws = create_connection('ws://example.io/sockjs/wkzeza/websocket', header=headers)
response = ws.recv_data_frame()
print(response)

>> [1, <websocket._abnf.ABNF at 0x7efe29aa0da0>]

Thanks for your help.

2
  • 2
    Well, finally I've found. The reason was... my lack of knowlegde about websocket and also I've found a really good repo with an example an explanation (github.com/oliver006/sockpuppet). With few changes it works perfectly for my case. Commented Aug 28, 2017 at 10:47
  • it also solved my problem. thanks Commented Oct 19, 2017 at 3:09

1 Answer 1

7

Check the traffic in Chrome (or other browser) to see how you should negotiate to beginning the flow of data. When the negotiation is OK you can do something like:

while True:
    ws.recv()

Here is an example for up/down WebSocket-traffic in Chrome.

enter image description here

Just copy the message up and use it in ws.send(). Example:

ws.send('''{"H":"publicmaphub","M":"getData","A":[],"I":1}''')

The example is from this live view of buses in Norway/Stavanger: https://www.kolumbus.no/ruter/kart/sanntidskart-internt/?c=58.974238,5.691347,14&lf=all&vt=bus,ferry
(On that page you also need to first get a token through HTTPS, connect with WebSocket and do another HTTPS to start the traffic. After this you can do the ws.recv() and ws.send() combos to start getting data.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.