3

I am currently trying to read a txt file from a website.

My script so far is:

webFile = urllib.urlopen(currURL)

This way, I can work with the file. However, when I try to store the file (in webFile), I only get a link to the socket. Another solution I tried was to use read()

webFile = urllib.urlopen(currURL).read()

However this seems to remove the formating (\n, \t etc) are removed.

If I open the file like this:

 webFile = urllib.urlopen(currURL)

I can read it line by line:

for line in webFile:
    print line

This will should result in:

"this" 
"is" 
"a"
"textfile"

But I get:

't'
'h'
'i'
...

I wish to get the file on my computer, but maintain the format at the same time.

2
  • 1
    stackoverflow.com/questions/22676/…. Just take webFile and write it to a file. Commented Oct 6, 2015 at 13:56
  • is there no way of doing it, without hving to first write it to a local file? Commented Oct 6, 2015 at 13:59

4 Answers 4

8

You should use readlines() to read entire line:

response = urllib.urlopen(currURL)
lines = response.readlines()
for line in lines:
    .
    .

But, i strongly recommend you to use requests library. Link here http://docs.python-requests.org/en/latest/

Sign up to request clarification or add additional context in comments.

Comments

2

This is because you iterate over a string. And that will result in character for character printing.

Why not save the whole file at once?

import urllib
webf = urllib.urlopen('http://stackoverflow.com/questions/32971752/python-read-file-from-web-site-url')
txt = webf.read()

f = open('destination.txt', 'w+')
f.write(txt)
f.close()

If you really want to loop over the file line for line use txt = webf.readlines() and iterate over that.

2 Comments

module 'urllib' has no attribute 'urlopen'
I think I wrote this in Python version 2. See here : stackoverflow.com/questions/25863101/…
0

If you're just trying to save a remote file to your local server as part of a python script, you could use the PycURL library to download and save it without parsing it. More info here - http://pycurl.sourceforge.net


Alternatively, if you want to read and then write the output, I think you've just got the methods out of sequence. Try the following:

# Assign the open file to a variable
webFile = urllib.urlopen(currURL)

# Read the file contents to a variable
file_contents = webFile.read()
print(file_contents)

> This will be the file contents

# Then write to a new local file
f = open('local file.txt', 'w')
f.write(file_contents)

If neither applies, please update the question to clarify.

Comments

0

You can directly download the file and save it using a name that you prefer. After that, you can read the file and later you can delete it if you don't need the file anymore.

!pip install wget

import wget 
url = "https://raw.githubusercontent.com/apache/commons-validator/master/src/example/org/apache/commons/validator/example/ValidateExample.java" 
wget.download(url, 'myFile.java')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.