pythonic way to optimize the logic to filter/extract data from list

Question

I have a list like below:

['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))',
 '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))',
 '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
 '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
 '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', 
'22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

From this list i want Three different list as a result. I want result using single iteration on list.

list of all uids i.e [3234,3235,3236,3237,3241 .....]
list of Seen uids i.e [3234,3235 ...] <-- uid of item which has \Seen Flag
list of deleted uids i.e [3236,3253] <-- uid of item which has \Deleted Flag

Did you make that list? if so you got it all wrong from the start. — Robert William Hanks
– Robert William Hanks, Commented Oct 8, 2010 at 10:09
@Robert, I think the list is returned by some IMAP Server, which includes flags like Read/Unread/Deleted... — shahjapan
– shahjapan, Commented Oct 8, 2010 at 10:14

David Webb · Accepted Answer · 2010-10-08 09:22:32Z

3

The best thing to do would be to turn your data into a dict mapping UID to FLAGS, then searching it will be easy. So the data will look something like this:

{'3254': '', '3304': '', '3236': '\\Deleted', '3237': '-FLAGS \\Seen +FLAGS', '3234': 'seen \\Seen', '3235': '\\Seen', '3430': '\\Seen', '3431': '', '3252': '\\Seen', '3253':'\\Deleted', '3478': '', '3479': '', '3256': '\\Seen', '3481': '', '3480': '', '3318': '\\Seen', '3434': '\\Seen', '3243': '\\Seen', '3242': '\\Seen', '3241': '-FLAGS \\Seen +FLAGS', '3247': '\\Seen', '3245': '\\Seen', '3244': '\\Seen', '3447': '-FLAGS \\Seen +FLAGS'}

You can do this using a Regular Expression to match each entry in the list. If we get the regexp to return two groups in the match we can easily build the dict.

So we end up with something like this:

items = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))',  '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))',  '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', '11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', '13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', '16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', '18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', '20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', '22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

import re
pattern = re.compile(r"\d+ \(UID (\d+) FLAGS \(([^)]*)\)\)")
values = dict(pattern.match(item).groups() for item in items)

We can then easily query the items in values to get what you want:

print "All UIDs:",values.keys()
print "Seen UIDs:",[uid for uid,flags in values.iteritems() if r"\Seen" in flags]
print "Deleted UIDs:",[uid for uid,flags in values.iteritems() if r"\Deleted" in flags]

edited Oct 8, 2010 at 9:22

answered Oct 8, 2010 at 9:08

David Webb

195k57 gold badges319 silver badges302 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Noufal Ibrahim Over a year ago

Aren't you iterating over the list of items multiple times to get Seen and Deleted in your solution here?

David Webb Over a year ago

@Noufal Ibrahim - Yes. I'm assuming the list isn't horribly long so I'm valuing readability over performance.

Noufal Ibrahim Over a year ago

I totally agree with your approach. The questioner asked for a single iteration. That's why I brought it up.

David Webb Over a year ago

@Noufal Ibrahim - Good point! I hadn't read the question properly.

Fbo · Accepted Answer · 2010-10-08 09:44:19Z

import re

data = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))',
 '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))',
 '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
 '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
 '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', 
'22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

r = re.compile('\d+\s\(UID\s(?P<uid>\d+)\sFLAGS\s\((?P<data>.*)\)\)')
uid_list = []
seen_uid_list = []
deleted_uid_list = []
for s in data:
    m = r.match(s)
    if m:
        uid_list.append(m.group('uid'))
        if m.group('data').rfind('Seen') > 0: seen_uid_list.append(m.group('uid'))
        if m.group('data').rfind('Deleted') > 0: deleted_uid_list.append(m.group('uid'))

print uid_list
print seen_uid_list
print deleted_uid_list

Noufal Ibrahim · Accepted Answer · 2010-10-08 09:10:22Z

1

I'm not sure about list comprehensions since those usually map one list to another (using either filtering or mapping). I've not seen them being used to split lists. However, you could do this with a combination of a genexp and a loop in a single iteration. I've blown this up a little so that it's clear.

import re
grepper = re.compile(r'[0-9]+ \(UID (?P<uid>[0-9]+) FLAGS (?P<flags>\(.*\))\)')

t = [..] #your list

items = (grepper.search(m).groupdict() for m in t)

all = []
seen = []
deleted = []
for i in items:
  if "Seen" in i:
    seen.append(i["uid"])
  if "Deleted" in i:
    deleted.append(i["uid"])
  all.append(i["uid"])

You should have your 3 lists now.

answered Oct 8, 2010 at 9:10

Noufal Ibrahim

73.2k13 gold badges140 silver badges174 bronze badges

3 Comments

salezica Over a year ago

You are iterating over the list twice :(

Kirk Strauser Over a year ago

Technically, grepper.search and then for i in items.

Noufal Ibrahim Over a year ago

The grepper.search is a generator expression and it doesn't iterate over t in advance. Of course, if you're referring to scanning over the element to match the regular expression, it is an iteration.

shahjapan · Accepted Answer · 2010-10-08 09:31:21Z

1

all,deleted,seen = [list(filter(None, a)) for a in \
    zip(*map(lambda a: (a[2], '\Deleted' in a[-1] and a[2], '\Seen' in  a[-1] and a[2]), map(lambda a: a.split(' '), items)))]

which will be faster using re or without re - you need to check with timeit !!!

answered Oct 8, 2010 at 9:31

shahjapan

14.6k22 gold badges77 silver badges107 bronze badges

1 Comment

Noufal Ibrahim Over a year ago

Oh boy. I'm not sure I'd want to see that in production code. :)

ChessMaster · Accepted Answer · 2010-10-08 10:40:13Z

1

This one works for your data sample....

uids, seen, deleted = [], [], []
for item in myList:
    uids.append(int(item[7:12]))
    if 'Se' in item[20:]:  seen.append(uids[-1])
    elif 'De' in item[20:]: deleted.append(uids[-1])

edited Oct 8, 2010 at 10:40

answered Oct 8, 2010 at 10:31

ChessMaster

5491 gold badge4 silver badges12 bronze badges

Comments

ghostdog74 · Accepted Answer · 2010-10-08 09:32:27Z

0

all=[]
seen=[]
deleted=[]
for item in alist:
    s=item.split(" ",4)
    all.append(s[2])
    if "seen" in s[-1].lower():
        seen.append(s[2])
    elif "delete" in s[-1].lower():
        deleted.append(s[2])

answered Oct 8, 2010 at 9:32

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

Comments

salezica · Accepted Answer · 2010-10-08 09:38:50Z

The only way I can think of of doing it in one iteration generating the three lists you ask, is by iterating manually. No python magic I can come up with.

You can easily improve this if you know specifics about the format and how it's generated. I don't know why +FLAGS and -FLAGS in some items, for example, and didn't know when to expect parenthesis, so I had to use find(). Also, I could've just split() the string in two, but then again, I don't know what the flag format means,...

def parseList(l):
    lall = []
    lseen = []
    ldeleted = []

    for item in l:
        spl = item.split()

        uid = int(spl[2])

        lall.append(uid)

        for word in spl[4:]:
            if word.find("\Seen") != -1:
                lseen.append(uid)

            elif word.find("\Deleted") != -1:
                ldeleted.append(uid)

    return lall, lseen, ldeleted

Collectives™ on Stack Overflow

pythonic way to optimize the logic to filter/extract data from list

7 Answers 7

4 Comments

Comments

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

Comments

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related