0

I am trying to upload a extract the response of a site based on the file that upload to file. Site has the following form.

<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body>
     <form method="POST" action="http://somewebsite.com/imgdigest" enctype="multipart/form-data">
        quality:<input type="text" name="quality" value="2"><br>
        category:<input type="text" name="category" value="1"><br>
        debug:<input type="text" name="debug" value="1"><br>
        image:<input type="file" name="image"><br>
        <input type="submit" value="Submit">
     </form>
  </body>
</html>

What I want to do is upload a file, submit the form and extract the response.

I started by looking at an example, I think I successfully manage to get the upload work. Because when I ran this I didn't get any errors.

import urllib2_file
import urllib2
import request
import lxml.html as lh

data = {'name': 'image',
        'file':  open('/user/mydir/21T03NAPE7L._AA75_.jpg')
       }
urllib2.urlopen('http://localhost/imgdigestertest.html', data)

Unfortunately I am not doing a request here to get the response back. I am not sure how I should do that response. Once I get the response I should be able to extract the data with some pattern match which I am comfortable off.

Based on the answer provided tried the following code:

import requests

url = 'http://somesite.com:61235/imgdigest'
files = {'file': ('21e1LOPiuyL._SL160_AA115_.jpg', 
                  open('/usr/local/21e1LOPiuyL._SL160_AA115_.jpg', 'rb'))}
other_fields = {"quality": "2",
                "category": "1",
                "debug": "0"
               }
headers={'content-type': 'text/html; charset=ISO-8859-1'}

response = requests.post(url, data=other_fields, files=files, headers=headers)

print response.text

now I get the following error: which tells me that some how image file doesn't get attached correctly. Do we have to specify the file type?

Image::Image(...): bufSize = 0.  Can not load image data. Image size = 0.   DigestServiceProvider.hpp::Handle(...) | 
9
  • 1
    what does urllib2.urlopen('localhost/imgdigestertest.html', data).read() returns? Commented Jul 13, 2012 at 21:24
  • I get same html i have posted as a result, and I tried the below answer and results are the same Commented Jul 13, 2012 at 22:43
  • what does data2 = urllib.urlencode(data) req = urllib2.Request(url, data2) do? Commented Jul 13, 2012 at 22:48
  • So I need urllib before running this code..? Commented Jul 13, 2012 at 23:08
  • 1
    Ah, so you want to pass on the img to another site. What should that site respond with? Commented Jul 13, 2012 at 23:23

1 Answer 1

2

Use the requests library (pip install requests, if you use pip).

For their example, see here: http://docs.python-requests.org/en/latest/user/quickstart/#post-a-multipart-encoded-file

To customize that to look like yours:

import requests
url = 'http://localhost:8080/test_meth'
files = {'file': ('21T03NAPE7L._AA75_.jpg', 
                  open('./text.data', 'rb'))}
other_fields = {"quality": "2",
                "category": "1",
                "debug": "1"
               }
response = requests.post(url, data=other_fields, files=files)
print response.text

On my local system, text.data contains this:

Data in a test file.

I wrote a server.py with cherrypy (pip install cherrypy) to test the client I gave above. Here is the source for the server.py:

import cherrypy
class Hello(object):
    def test_meth(self, category, debug, quality, file):
        print "Form values:", category, debug, quality
        print "File name:", file.filename
        print "File data:", file.file.read()
        return "More stuff."
    test_meth.exposed = True
cherrypy.quickstart(Hello())

When I run the above client.py, it prints:

More stuff.

which as you can see in the server.py example is what is returned.

Meanwhile, the server says:

Form values: 1 1 2
File name: 21T03NAPE7L._AA75_.jpg
File data: Data in a test file.

127.0.0.1 - - [14/Jul/2012:00:00:35] "POST /test_meth HTTP/1.1" 200 11 "" "python-requests/0.13.3 CPython/2.7.3 Linux/3.2.0-26-generic"

Thus, you can see that the client is posting the filename as described in the code and the file contents of the specified local file.

One thing to point out, at the beginning of this post I said to use the requests library. This is not to be confused with with the urllib request that you are importing in your original question.

Sign up to request clarification or add additional context in comments.

8 Comments

inside file dictionary you have written file name string should that be name of the field instead of '21T03NAPE7L._AA75_.jpg'
what would be the reason that I get the same html I have posted as the results.
I'm not sure what you mean by getting the same html that you posted. If the response.text is the same as the original HTML form that you got, then that means that the server is returning the same form as a response to a post.
Yes, I am getting the same form data when I do print response.text
I am not uploading 21T03NAPE7L._AA75_.jpg and a text file, I am uploading a text file and saying that the file name of what I'm uploading is 21T03NAPE7L._AA75_.jpg, which is perfectly valid. The fact that you are using a locally cache version of the form is fairly likely to cause trouble. If I were doing this, I would use python interactively and point the requests library straight to the real web site. If it isn't a secret, you could share what the real site is.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.