0

Why can't this work? I want to unique the results I get from the Rest api before I write it to the file --

MISP_HOST="https://192.168.1.8"
API_KEY="asdfasdfas"
EXPORT_DATA="attributes/text/download/md5"
OUTPUT_FILE="md5-"+today

def main():
    URL="%s/%s" % (MISP_HOST, EXPORT_DATA)
    request = urllib2.Request(URL)
    f = open(OUTPUT_FILE,'w') 
    request.add_header('Authorization', API_KEY)
    data = urllib2.urlopen(request).read()
    set(data)
    print type(data)
    f.write(data)
    f.close()

It work with no errors but the data is definitely not unique. I'm trying not to do this in bash. Could you explain the why it doesn't work too? Many thanks!!!

2
  • What do you mean by "unique the results"? Do you want each word in the results to appear 1 time? Is the result plain text? Commented May 29, 2016 at 20:17
  • 3
    Do data = set(data) in order to actually keep the set that is created. Note though that data is just a string, so set(data) will not do what you expect. You should parse the data first. Commented May 29, 2016 at 20:18

1 Answer 1

2

If your result is plain text, you can use a regular expression to find all of the words in the text and then build a set from there. This example also lower cases the words so that the set is case insensitive and writes each word on its own line.

import re

MISP_HOST="https://192.168.1.8"
API_KEY="asdfasdfas"
EXPORT_DATA="attributes/text/download/md5"
OUTPUT_FILE="md5-"+today
def main():
    URL="%s/%s" % (MISP_HOST, EXPORT_DATA)
    request = urllib2.Request(URL)
    f = open(OUTPUT_FILE,'w') 
    request.add_header('Authorization', API_KEY)
    data = urllib2.urlopen(request).read()
    unique = set(word.lower() for word in re.findall(r'\w+', data))
    # that could be expanded to
    # wordlist = re.findall(r'\w+', data)
    # unique = set(word.lower() for word in wordlist)
    print type(unique)
    f.write('\n'.join(unique))
    f.close()
Sign up to request clarification or add additional context in comments.

2 Comments

that works but I still don't understand why I can't convert the string to a list and then to a set.
You didn't convert the string to a list in your example. When converting the string to a list, you have to ask what format the string is in and what format you want the list to be. I guessed that data was plain text and you wanted to extract the words. That's what the re.findall did. It created a list of all of the words (thats what the \w+ expression does - it matches all characters that make up a word) in the string. I made it a bit more complicated by also writing a generator that converts the items of this list to lower case before creating the set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.