0

I have written this code to replace urls with their titles. It does replace urls with titles as required but it prints their titles in next line.

twfile.txt contains these lines:

link1 http://t.co/HvKkwR1c
no link line

Output tw2file:

link1
Instagram
no link line

but i want output in this form:

link1 Instagram
no link line

What should i do?

My Code:

from bs4 import BeautifulSoup
import urllib

output = open('tw2file.txt','w')

with open('twfile.txt','r') as inputf:
    for line in inputf:
        try:
            list1 = line.split(' ')
            for i in range(len(list1)):

                if "http" in list1[i]:
                    ##print list1[i]
                    response = urllib.urlopen(list1[i])
                    html = response.read()
                    soup = BeautifulSoup(html)
                    list1[i] = soup.html.head.title
                    ##print list1[i]


                    list1[i] = ''.join(ch for ch in list1[i])
                else:
                    list1[i] = ''.join(ch for ch in list1[i])
            line = ' '.join(list1)
            print line
            output.write(line)
        except:
            pass


inputf.close()
output.close()
0

2 Answers 2

1

Try this code: (see here, here, and here)

from bs4 import BeautifulSoup
import urllib

with open('twfile.txt','r') as inputf, open('tw2file.txt','w') as output:
    for line in inputf:
        try:
            list1 = line.split(' ')
            for i in range(len(list1)):
                if "http" in list1[i]:
                    response = urllib.urlopen(list1[i])
                    html = response.read()
                    soup = BeautifulSoup(html)
                    list1[i] = soup.html.head.title
                    list1[i] = ''.join(ch for ch in list1[i]).strip() # here
                else:
                    list1[i] = ''.join(ch for ch in list1[i]).strip() # here
            line = ' '.join(list1)
            print line
            output.write('{}\n'.format(line))  # here
        except:
            pass

BTW, you are using Python 2.7.x +, two opens expressed in the same with clause. Also their closes are unnecessary.

Sign up to request clarification or add additional context in comments.

Comments

1

Regarding the content written to a file

fileobject = open("bar", 'w' )
fileobject.write("Hello, World\n") # newline is inserted by '\n'
fileobject.close()

Regarding console output

Change print line to print line,

Python writes the '\n' character at the end, unless the print statement ends with a comma.

3 Comments

it's not affecting the output
why are you printing 2 times ? print line and output.write(line)?
print seems for console. the other seems for file

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.