I have written this code to replace urls with their titles. It does replace urls with titles as required but it prints their titles in next line.
twfile.txt contains these lines:
link1 http://t.co/HvKkwR1c
no link line
Output tw2file:
link1
Instagram
no link line
but i want output in this form:
link1 Instagram
no link line
What should i do?
My Code:
from bs4 import BeautifulSoup
import urllib
output = open('tw2file.txt','w')
with open('twfile.txt','r') as inputf:
for line in inputf:
try:
list1 = line.split(' ')
for i in range(len(list1)):
if "http" in list1[i]:
##print list1[i]
response = urllib.urlopen(list1[i])
html = response.read()
soup = BeautifulSoup(html)
list1[i] = soup.html.head.title
##print list1[i]
list1[i] = ''.join(ch for ch in list1[i])
else:
list1[i] = ''.join(ch for ch in list1[i])
line = ' '.join(list1)
print line
output.write(line)
except:
pass
inputf.close()
output.close()