3

I have a list of strings, each of which is an email formatted in almost exactly the same way. There is a lot of information in each email, but the most important info is the name of a facility, and an incident date.

I'd like to be able to take that list of emails, and create a new list where the emails are grouped together based on the "location_substring" and then sorted again for the "incident_date_substring" so that all of the emails from one location will be grouped together in the list in chronological order.

The facility substring can be found usually in the subject line of each email. The incident date can be found in a line in the email that starts with: "Date of Incident:".

Any ideas as to how I'd go about doing this?

9
  • To be honest, I'm new and have very little idea how to do this. What I think I should do is define a function that extracts the facility name from each string. Then create another function that extracts the date. Then use the sorted() method with those as keys, or something...But I really don't know! Commented Dec 8, 2012 at 17:34
  • There is something called the decorate-sort-undecorate idiom that is commonly used in python. See here. Commented Dec 8, 2012 at 17:35
  • Do you know what a regex is? (not meant offensive, if you don't know what it is, google a good tut.) 'Cause it's the solution to your problem. First search with a regex for the location and build a dictionary with the locations as keys, and lists of emails as values. Commented Dec 8, 2012 at 17:37
  • I do know what regex is. I actually have a dictionary, with the names of the facilities as values, and the facility IDs as keys that I extracted from a text file I put together. The purpose of that is for a later step in the program. Sounds like you're suggesting I create a new one though, which I didn't think about. Commented Dec 8, 2012 at 17:43
  • @mrpryd: if you post an example email I will see if I can brew some basic code for you. Commented Dec 8, 2012 at 18:03

3 Answers 3

5

Write a function that returns the two pieces of information you care about from each email:

def email_sort_key(email):
    """Find two pieces of info in the email, and return them as a tuple."""
    # ...search, search...
    return "location", "incident_date"

Then, use that function as the key for sorting:

emails.sort(key=email_sort_key)

The sort key function is applied to all the values, and the values are re-ordered based on the values returned from the key function. In this case, the key function returns a tuple. Tuples are ordered lexicographically: find the first unequal element, then the tuples compare as the unequal elements compare.

Sign up to request clarification or add additional context in comments.

3 Comments

Ok, this is what I was discussing earlier in the comments. The question is, what does that function look like? Does it require regex?
@mrpryd is this question about sorting the list, or about extracting the relevant information robustly from each email?
Both, kinda :P THe ultimate goal is to sort the list. But to do so, you have to extract the info from each element of the list beforehand. At this point, I'm convinced that Regex is the way...based on the fake email example I posted above. Then this answer would work
0

Your solution might look something like this:

def getLocation (mail): pass
    #magic happens here

def getDate (mail): pass
    #here be dragons

emails = [...] #original list

#Group mails by location
d = {}
for mail in emails:
    loc = getLocation (mail)
    if loc not in d: d [loc] = []
    d [loc].append (mail)

#Sort mails inside each group by date
for k, v in d.items ():
    d [k] = sorted (v, key = getDate)

Comments

0

This is something you could do:

from collections import defaultdict
from datetime import datetime
import re

mails = ['list', 'of', 'emails']

mails2 = defaultdict(list)

for mail in mails:
    loc = re.search(r'Subject:.*?for\s(.+?)\n', mail).group(1)
    mails2[loc].append(mail)

for m in mails2.values():
    m.sort(key=lambda x:datetime.strptime(re.search(r'Date of Incident:\s(.+?)\n',
                                                    x).group(1), '%m/%d/%Y'))

Please note that this has absolutely no error handling for cases where the regexes don't match.

1 Comment

I ended up not having to do all of this, but the Regex was CLEARLY the way to go on extracting the names and the dates. I will probably go ahead and use some of the other answers posted just cause it's simpler -- but your regex pattern allowed me to extract the info easily. Can you explain more about how you put the pattern together? I was able to code the date of incident by using re.compile(r'Date of Incident:.*?(\d\d.\d\d.\d\d\d\d)'), which is similar to what you did. But I still don't quite understand each piece. Wish regex was simpler! Gonna have to retag this question...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.