2

I'm having some trouble sorting data from a text file by a certain field. Possibly by multiple fields later. The .txt is several thousands of lines of code. I'm brand new to python so my code is probably a bit messy. For example, this is the textfile i would read from:

stuff
123 1200 id-aaaa [email protected]
322 1812 id-wwww [email protected]
839 1750 id-wwww [email protected]
500 0545 id-aaaa [email protected]
525 1322 id-bbbb [email protected]

my code so far is as follows:

filelist = open("info.txt").readlines()
splitlist = list()

class data:
    def __init__(self, eventName, time, identity, domain):
        self.evenName = eventName
        self.time = time
        self.identity = identity
        self.domain = domain

for line in filelist:
    filelist = list.split(', ')
    splitlist.append(filelist)

for column in splitlist:
    if (len(column) > 1): #to skip the first line
        eventName = column[0].strip()
        time = column[1].strip()
        identity = column[2].strip()
        domain = column[3].strip()

I want to sort the .txt file line by line by the identity, then maybe by time. I saw that this could be done by classes in the python tutorial, so i'm trying to go that route. Please advise. Thank you!

3
  • Is stuff in the text file, or the name of the text file? Commented Jun 10, 2012 at 22:50
  • Is there only one such non-data line (ie a header), or might they occur anywhere? Commented Jun 10, 2012 at 23:04
  • There is two to be exact. The first and the last line. Commented Jun 10, 2012 at 23:10

4 Answers 4

8
with open("info.txt") as inf:
    data = []
    for line in inf:
        line = line.split()
        if len(line)==4:
            data.append(line)

data.sort(key=lambda s:(s[2],s[1]))

If you want to get a bit fancier,

from collections import namedtuple
Input = namedtuple('Input', ('name', 'time', 'identity', 'domain'))

with open("info.txt") as inf:
    inf.next()  # skip header
    data = [Input(*(line.split()) for line in inf]

data.sort(key=lambda s:(s['identity'],s['time']))

If you really, really want to use a class, try:

import time

class Data(object):
    def __init__(self, event, time_, identity, domain):
        self.event = event
        self.time = time.strptime(time_, "%H%M")
        self.identity = identity
        self.domain = domain

with open("info.txt") as inf:
    data = []
    for line in inf:
        try:
            data.append(Data(*(line.split()))
        except TypeError:
            # wrong number of arguments (ie header or footer)
            pass

data.sort(key=lambda s:(s.identity,s.time))
Sign up to request clarification or add additional context in comments.

5 Comments

Hi Hugh. I seem to be getting a "*** non-keyword arg after keyword arg" error on line data.sort(...
Sorry - missed a closing-quote on 'time'.
It was actually on the first block of code you have written down. Is there a certain library that i need to include to perform .sort?
Nope, sort is built in. I think that either I needed brackets on the s[2],s[1] or it was choking on the footer line.
thanks for all the help hugh!! I referred to this a few times and it helped me understand python syntax and better coding techniques.
0

This is a common mistake made, what you have done it opened it without actually reading the file in the proper syntax, here is what I think:

filelist = open("info.txt", "r")
print filelist
filelist.read() # reads the entire file
splitlist = list()

class data:
    def __init__(self, eventName, time, identity, domain):
        self.evenName = eventName
        self.time = time
        self.identity = identity
        self.domain = domain

for line in filelist:
    filelist = list.split(', ')
    splitlist.append(filelist)

for column in splitlist:
    if (len(column) > 1): #to skip the first line
        eventName = column[0].strip()
        time = column[1].strip()
        identity = column[2].strip()
        domain = column[3].strip()

Hope that works! Source: http://docs.python.org/tutorial/inputoutput.html

1 Comment

filelist.read() brings the entire file line in as a single line of data... not what is wanted.
0

To sort by id then date:

text = ["123 1200 id-aaaa [email protected]",
        "322 1812 id-wwww [email protected]",
        "839 1750 id-wwww [email protected]",
        "500 0545 id-aaaa [email protected]",
        "525 1322 id-bbbb [email protected]"]
text = [i.split() for i in text]
text.sort(key=lambda line: (line[2],line[1]))
text = [' '.join(i) for i in text]
print text
#Output:
['500 0545 id-aaaa [email protected]', 
'123 1200 id-aaaa [email protected]', 
'525 1322 id-bbbb [email protected]', 
'839 1750 id-wwww [email protected]', 
'322 1812 id-wwww [email protected]']

2 Comments

If you're going to do text = sorted(text), it would use half as much memory to just text.sort().
@Hugh Bothwell - thankyou, I have amended, but you were way faster on the trigger!
0

The following Python code should put together the information you want, which is then sorted.

rows = []
for line in open("info.txt"):
    line = line.split()
    if len(line) != 4:
        continue

    eventName, time, identity, domain = line

    # Add them in the order you want to sort by
    rows.append((identity, time, eventName, domain)) 

rows.sort()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.