0

Hopefully there isn't a duplicated question that I've looked over because I've been scouring this forum for someone who has posted to a similar to the one below...

Basically, I've created a python script that will scrape the callsigns of each ship from the url shown below and append them into a list. In short it works, however whenever I iterate through the list and display each element there seems to be a '[' and ']' between each of the callsigns. I've shown the output of my script below:

Output

***********************     Contents of 'listOfCallSigns' List     ***********************

0 ['311062900']
1 ['235056239']
2 ['305500000']
3 ['311063300']
4 ['236111791']
5 ['245639000']
6 ['235077805']
7 ['235011590']

As you can see, it shows the square brackets for each callsign. I have a feeling that this might be down to an encoding problem within the BeautifulSoup library.

Ideally, I want the output to be without any of the square brackets and just the callsign as a string.

***********************     Contents of 'listOfCallSigns' List     ***********************

0 311062900
1 235056239
2 305500000
3 311063300
4 236111791
5 245639000
6 235077805
7 235011590

This script I'm using currently is shown below:

My script

# Importing the modules needed to run the script 
from bs4 import BeautifulSoup
import urllib2
import re
import requests
import pprint


# Declaring the url for the port of hull
url = "http://www.fleetmon.com/en/ports/Port_of_Hull_5898"


# Opening and reading the contents of the URL using the module 'urlib2'
# Scanning the entire webpage, finding a <table> tag with the id 'vessels_in_port_table' and finding all <tr> tags
portOfHull = urllib2.urlopen(url).read()
soup = BeautifulSoup(portOfHull)
table = soup.find("table", {'id': 'vessels_in_port_table'}).find_all("tr")


# Declaring a list to hold the call signs of each ship in the table
listOfCallSigns = []


# For each row in the table, using a regular expression to extract the first 9 numbers from each ship call-sign
# Adding each extracted call-sign to the 'listOfCallSigns' list
for i, row in enumerate(table):
    if i:
        listOfCallSigns.append(re.findall(r"\d{9}", str(row.find_all('td')[4])))


print "\n\n***********************     Contents of 'listOfCallSigns' List     ***********************\n"

# Printing each element of the 'listOfCallSigns' list
for i, row in enumerate(listOfCallSigns):
    print i, row  

Does anyone know how to remove the square brackets surrounding each callsign and just display the string?

Thanks in advance! :)

2 Answers 2

3

Change the last lines to:

# Printing each element of the 'listOfCallSigns' list
for i, row in enumerate(listOfCallSigns):
    print i, row[0]  # <-- added a [0] here

Alternatively, you can also add the [0] here:

for i, row in enumerate(table):
    if i:
        listOfCallSigns.append(re.findall(r"\d{9}", str(row.find_all('td')[4]))[0]) <-- added a [0] here

The explanation here is that re.findall(...) returns a list (in your case, with a single element in it). So, listOfCallSigns ends up being a "list of sublists each containing a single string":

>>> listOfCallSigns
>>> [ ['311062900'], ['235056239'], ['311063300'], ['236111791'],
['245639000'], ['305500000'], ['235077805'], ['235011590'] ]

When you enumerate your listOfCallSigns, the row variable is basically the re.findall(...) that you appended earlier in the code (that's why you can add the [0] after either of them).

So row and re.findall(...) are both of type "list of string(s)" and look like this:

>>> row
>>> ['311062900']

And to get the string inside the list, you need access its first element, i.e.:

>>> row[0]
>>> '311062900'

Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Brilliant! This worked! I'm very intrigued, could you explain why inserting row[0] removes the brackets? Does it now return each of the list elements as strings?
I've edited my answer to add some details about how to get the string from inside the list.
0

This can also be done by stripping the unwanted characters from the string like so:

a = "string with bad characters []'] in here" 
a = a.translate(None, "[]'")
print a 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.