15

I want to submit login to the website Reddit.com, navigate to a particular area of the page, and submit a comment. I don't see what's wrong with this code, but it is not working in that no change is reflected on the Reddit site.

import mechanize
import cookielib


def main():

#Browser
br = mechanize.Browser()


# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

#Opens the site to be navigated
r= br.open('http://www.reddit.com')
html = r.read()

# Select the second (index one) form
br.select_form(nr=1)

# User credentials
br.form['user'] = 'DUMMYUSERNAME'
br.form['passwd'] = 'DUMMYPASSWORD'

# Login
br.submit()

#Open up comment page
r= br.open('http://www.reddit.com/r/PoopSandwiches/comments/f47f8/testing/')
html = r.read()

#Text box is the 8th form on the page (which, I believe, is the text area)
br.select_form(nr=7)

#Change 'text' value to a testing string
br.form['text']= "this is an automated test"

#Submit the information  
br.submit()

What's wrong with this?

4
  • Try adding a sleep of at least 10 seconds. You should also inspect (not 'View Source', but 'Inspect Element' in Chrome or similar in FF) the form in your browser and compare to the downloaded HTML. It might have fields dynamically filled by JS. Commented Jan 18, 2011 at 6:34
  • 1
    By the way, Reddit has an API, wouldn't that work better? Commented Jan 18, 2011 at 6:35
  • Hmm, let me try to add sleep. I'm not sure how to use API as there is no documentation for submitting comments. Commented Jan 18, 2011 at 7:25
  • EDIT: Tried sleep. Didn't work. Commented Jan 18, 2011 at 7:51

1 Answer 1

19

I would definitely suggest trying to use the API if possible, but this works for me (not for your example post, which has been deleted, but for any active one):

#!/usr/bin/env python

import mechanize
import cookielib
import urllib
import logging
import sys

def main():

    br = mechanize.Browser()
    cj = cookielib.LWPCookieJar()
    br.set_cookiejar(cj)

    br.set_handle_equiv(True)
    br.set_handle_gzip(True)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)

    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

    r= br.open('http://www.reddit.com')

    # Select the second (index one) form
    br.select_form(nr=1)

    # User credentials
    br.form['user'] = 'user'
    br.form['passwd'] = 'passwd'

    # Login
    br.submit()

    # Open up comment page
    posting = 'http://www.reddit.com/r/PoopSandwiches/comments/f47f8/testing/'
    rval = 'PoopSandwiches'
    # you can get the rval in other ways, but this will work for testing

    r = br.open(posting)

    # You need the 'uh' value from the first form
    br.select_form(nr=0)
    uh = br.form['uh']

    br.select_form(nr=7)
    thing_id = br.form['thing_id']
    id = '#' + br.form.attrs['id']
    # The id that gets posted is the form id with a '#' prepended.

    data = {'uh':uh, 'thing_id':thing_id, 'id':id, 'renderstyle':'html', 'r':rval, 'text':"Your text here!"}
    new_data_dict = dict((k, urllib.quote(v).replace('%20', '+')) for k, v in data.iteritems())

    # not sure if the replace needs to happen, I did it anyway
    new_data = 'thing_id=%(thing_id)s&text=%(text)s&id=%(id)s&r=%(r)s&uh=%(uh)s&renderstyle=%(renderstyle)s' %(new_data_dict)

    # not sure which of these headers are really needed, but it works with all
    # of them, so why not just include them.
    req = mechanize.Request('http://www.reddit.com/api/comment', new_data)
    req.add_header('Referer', posting)
    req.add_header('Accept', ' application/json, text/javascript, */*')
    req.add_header('Content-Type', 'application/x-www-form-urlencoded; charset=UTF-8')
    req.add_header('X-Requested-With', 'XMLHttpRequest')
    cj.add_cookie_header(req)
    res = mechanize.urlopen(req)

main()

It would be interesting to turn javascript off and see how the reddit comments are handled then. Right now there is a bunch of magic that happens in an onsubmit function called when making your post. This is where the uh and id value get added.

Sign up to request clarification or add additional context in comments.

7 Comments

Wow. Thank you so much. I would have never figured that out.
Hmm... I'm getting this error on all active threads: ControlNotFoundError: no control matching name 'thing_id.' Any ideas?
Haha, no. You misinterpreted that sentence-- no matter which active thread I use this program on, it still triggers the error. The program I'm trying to make is for my own purposes. It posts relevant book chapters to a private subreddit I moderate.
Problem solved-- it was the [8]th form that contained thing_id. Thank you very much.
Hmmm... looks like thing_id is in different forms for different subreddits (an interesting problem!) Additionally, selecting a form with the wrong thing_id will post a response to somebody, rather than a new comment.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.