1

When I Try to use the sorted() function in python it only sorts the elements within each array alphabetically as the first 3 outputs are:

[u'A', u'a', u'a', u'f', u'g', u'h', u'i', u'n', u'n', u's', u't']
[u'N', u'a', u'e', u'g', u'i', u'i', u'r']
[u'C', u'a', u'e', u'm', u'n', u'o', u'o', u'r']

These should be Afghanistan, Nigeria and Cameroon respectively but instead they are only sorted within their own array.

Where have I went wrong in my code?

import urllib2
import csv
from bs4 import BeautifulSoup

url = "http://en.wikipedia.org/wiki/List_of_ongoing_armed_conflicts"
soup = BeautifulSoup(urllib2.urlopen(url))

#f= csv.writer(open("test.csv","w"))
#f.writerow(["location"])
def unique(countries):
    seen = set()
    for country in countries:
        l = country.lower()
        if l in seen:
            continue
        seen.add(l)
        yield country



for row in soup.select('table.wikitable tr'):
    cells = row.find_all('td')
    if cells:
        for location in cells[3].find_all(text=True):
            location = location.split()

            for locations in unique(location):
                print sorted(locations)

#f.writerow([location])

2 Answers 2

1

With each iteration of the loop, you can get one or more locations (as a list). All of them need to be added to a single list to be able to sort it.

We use the extend method to do that.

locs = []  # contains all locations
for row in soup.select('table.wikitable tr'):
    cells = row.find_all('td')
    if cells:
        # location here returns a list
        for location in cells[3].find_all(text=True):
            locs.extend(location.split())

print sorted(locs)

sorted(locs) will also be a list. To print a specific element you can do

specific_element = sorted(locs)[index]
Sign up to request clarification or add additional context in comments.

4 Comments

thanks this is exactly what I needed! How would I then select the fully iterated string? I've tried to print only specific elements but to no avail since print sorted(locs[any number]) says its out of range
I have updated my answer. sorted takes a list, but locs[index] will give you one element. To get element out of the sorted list you must do sorted(locs)[index].
So when I do this I get a lot of [ ] values then after a while it comes up the first name then eventually all the names, how would I select the very last iterable?
To get the last sorted location? sorted(locs)[-1]. Negative indexing selects elements starting from the end of the list.
1

Your variable names are bad, and are confusing you. location is a list of locations, and locations is a single location!

you want:

for locations in cells[3].find_all(text=True):
    locations = locations.split()

    for location in sorted(unique(locations)):
        print location 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.