Unique string before writing to file - python

Question

Why can't this work? I want to unique the results I get from the Rest api before I write it to the file --

MISP_HOST="https://192.168.1.8"
API_KEY="asdfasdfas"
EXPORT_DATA="attributes/text/download/md5"
OUTPUT_FILE="md5-"+today

def main():
    URL="%s/%s" % (MISP_HOST, EXPORT_DATA)
    request = urllib2.Request(URL)
    f = open(OUTPUT_FILE,'w') 
    request.add_header('Authorization', API_KEY)
    data = urllib2.urlopen(request).read()
    set(data)
    print type(data)
    f.write(data)
    f.close()

It work with no errors but the data is definitely not unique. I'm trying not to do this in bash. Could you explain the why it doesn't work too? Many thanks!!!

What do you mean by "unique the results"? Do you want each word in the results to appear 1 time? Is the result plain text? — tdelaney
– tdelaney, Commented May 29, 2016 at 20:17
Do data = set(data) in order to actually keep the set that is created. Note though that data is just a string, so set(data) will not do what you expect. You should parse the data first. — poke
– poke, Commented May 29, 2016 at 20:18

tdelaney · Accepted Answer · 2016-05-29 22:23:31Z

2

If your result is plain text, you can use a regular expression to find all of the words in the text and then build a set from there. This example also lower cases the words so that the set is case insensitive and writes each word on its own line.

import re

MISP_HOST="https://192.168.1.8"
API_KEY="asdfasdfas"
EXPORT_DATA="attributes/text/download/md5"
OUTPUT_FILE="md5-"+today
def main():
    URL="%s/%s" % (MISP_HOST, EXPORT_DATA)
    request = urllib2.Request(URL)
    f = open(OUTPUT_FILE,'w') 
    request.add_header('Authorization', API_KEY)
    data = urllib2.urlopen(request).read()
    unique = set(word.lower() for word in re.findall(r'\w+', data))
    # that could be expanded to
    # wordlist = re.findall(r'\w+', data)
    # unique = set(word.lower() for word in wordlist)
    print type(unique)
    f.write('\n'.join(unique))
    f.close()

edited May 29, 2016 at 22:23

answered May 29, 2016 at 20:25

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dpitt1968 Over a year ago

that works but I still don't understand why I can't convert the string to a list and then to a set.

tdelaney Over a year ago

You didn't convert the string to a list in your example. When converting the string to a list, you have to ask what format the string is in and what format you want the list to be. I guessed that data was plain text and you wanted to extract the words. That's what the re.findall did. It created a list of all of the words (thats what the \w+ expression does - it matches all characters that make up a word) in the string. I made it a bit more complicated by also writing a generator that converts the items of this list to lower case before creating the set.

Collectives™ on Stack Overflow

Unique string before writing to file - python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related