1

I'm creating this search like query in python that gathers ids and pars them with names like a whois database sort of.

so lets say this is my data in a .txt file

["81448068", "jimmy"]
["69711823", "bob"]
["92739493", "kyle, jimmy"]
["96399981", "kyle"]
["40112089", "john, peter"]
["79784393", "matthew, chad"]
["749968","bob, jimmy"]

The Id's are on the left column and on the right it's the names associated with it. How would I use some type of while or for loop to conduct a search of that data till all names are matched fully? and this method should work if any data is added, I'll show you what I have currently, but on a larger txt file with more names it would not work

def getNames(name):
    x = []
    f = open("names.txt","r") # that data up there represents names.txt
    for line in f.readlines():
        rid,names = json.loads(line.strip())
        if name in names:
           x += names.split(", ")
    return list(set(x))

I want to cycle those names back into get names to get more names that are associated with the names that are returned in x, but I would have to manualy add another for alot is there some type of while that would loop this till all matches associated with the names are found? So doing getNames("jimmy") would return,

["bob","jimmy","kyle"] 

Then it would check all names associated with bob, all names associated with those till there are no more associated, and it would do that for jimmy and Kyle as well.

So basically, It would search the initial name, in this case lets use jimmy. It returns the names associated with "jimmy" then it would search each individual name associated with that so lets say it returns, ["bob","jimmy","kyle"] It would then take bob, search all the names with that name in the names then search all of those names and search all of those names until it conducted all the names then it would do jimmy do the same and then go to Kyle and do the same. I could do this with a for but, I would need to add multiple fors depending on how extensive I want the search to be, how would I make it a while so it conducts the data till all matches are found.

10
  • What's the problem with making this function recursive? Commented Oct 8, 2013 at 22:15
  • What should your function take as input and what should it give as output ? please clarify further your question. Commented Oct 8, 2013 at 22:43
  • 1
    @ychaouche I edited, hopefully the bottom paragraph explains what you were asking, if you have a further question ask hopefully I can answer it. Commented Oct 8, 2013 at 22:49
  • Ah ok that's a nice problem to solve. Let me try. Commented Oct 8, 2013 at 22:51
  • 1
    Nitpick: for line in f.readlines() => for line in f; also close f at the end or use with. Commented Oct 9, 2013 at 0:41

5 Answers 5

2

This was pretty interesting. My basic strategy was to fix up the associations during the processing of the file, rather than do searching after the fact. I haven't tested it, but it should work, typos notwithstanding.

EDIT: now I've tested it.

file names.txt:

["81448068", "jimmy"]
["69711823", "bob"]
["92739493", "kyle, jimmy"]
["96399981", "kyle"]
["40112089", "john, kyle"]
["79784393", "matthew, chad"]
["749968"  , "bob, jimmy"]

processing file:

import json

def process_new_file(filename, associations=None):
    if associations == None:
        associations = {}
    with open(filename, 'r') as f:
        lines = f.readlines()
    for line in lines:
        rid, onames = json.loads(line.strip())
        extras = set([x.strip() for x in onames.split(',')])
        names = set()
        while len(extras) > 0:
            names.update(extras)
            extras = set()
            for name in names:
                if not name in associations:
                    associations[name] = set()
                extras.update(associations[name] - names)
        for name in names:
            associations[name].update(names)
    return associations

if __name__ == '__main__':
    associations = process_new_file('names.txt')
    tmpl = '{:20}: {}'
    for person in associations:
        print tmpl.format(person, ', '.join(associations[person]))

Output:

jimmy               : jimmy, bob, john, kyle
chad                : matthew, chad
kyle                : jimmy, bob, john, kyle
matthew             : matthew, chad
bob                 : jimmy, bob, john, kyle
john                : jimmy, bob, john, kyle
Sign up to request clarification or add additional context in comments.

5 Comments

That would not put the friends of the friends of jimmy, nor the friends of the friends of their friends etc.
@ychaouche: you were correct! I fixed it now to build up the complete chain. There is scope for optimization (pruning the search tree in the while loop), but this should still be fast enough.
I don't see how your script will put "Mary" as a key in associations if the data fed to it is for ex. "09122", ["John","Mary"]. It would have "John" as a key but not Mary so getName("Mary") would return nothing.
in addition, it doesn't remove duplicates. There must be an error in the set usage.
All names become keys in associations. Also, duplicates are culled in sets.
1

As far as parsing the file, you are doing a good job.

As for the searching, you should probably consider to change how you approach the whole topic. Python or any language for that matter is good for implementing business logic however in most cases it is not very good in searching big data-stores. So if you will have 1000 names Python will do a decent job however once you will start getting into 1,000,000s of names, the script will crawl to a halt.

In order to do efficient searches on big data-stores, that is what databases were designed for. So the sensible approach is to dump all the data from the txt files first into the db and then query it as you like. That will allow you to have a most robust solution and will allow you to expand it in the future with ease. If you are currently using txt files, you probably do not need a full-blown db so sqlite should suffice just fine. Sqlite in case you don't know is a full SQL database in a single file. Nothing to install, nothing to configure, just one file on a filesystem and that is it. Python has an excellent wrapper with good documentation for sqlite so if you do decide to take this route, it should not be very difficult. The documentation can be found here. Another alternative to using sqlite wrapper directly is to use ORM (object relational wrapper) such as SQLAlchemy however in case you are not familiar with databases and since your use of the db would be so simple I would recommend to simply use the sqlite wrapper directly.

Comments

1

This is a situation where eval could be more appropriate. eval() basically says take a string of text and treat it as if it's python code. So since your input file is in the format of a python list, it evaluates to a list. (Note that you can iterate through lines of a file as I have below).

def generateAssociations():
    associations = dict()

    f = open("names.txt","r")
    for line in f:
        line = eval(line)
        id = int(line[0])
        names = line[1].split(',')
        newNames = [name.strip() for name in names]

        for name in newNames:
            if name not in associations:
                associations[name] = lineNames
            else:
                associations[name] = associations[name] + newNames

    for key, value in associations.items():
        associations[key] = set(value)

    pprint.pprint(associations)

When run on your example data, it gives this as output:

{'bob': {'jimmy', 'bob'},
'chad': {'matthew', 'chad'},
'jimmy': {'kyle', 'jimmy', 'bob'},
'john': {'john', 'peter'},
'kyle': {'jimmy', 'kyle'},
'matthew': {'matthew', 'chad'},
'peter': {'john', 'peter'}}

This is it for first level friends, but should be pretty easy to extend further.

2 Comments

I don't think eval has a benefit over the json.loads approach above...? The problem is more in the workflow than getting the names loaded...
@beroe Good point, I might have gotten caught up in that because I haven't worked with json.loads before and didn't find his intention easy to interpret. I'll expand with more logic.
1

Ok here's how I'd do : (code is also available as a gist here : https://gist.github.com/ychaouche/6894532)

data = [
    ["81448068", "jimmy"],
    ["69711823", "bob"],
    ["92739493", "kyle, jimmy"],
    ["96399981", "kyle"],
    ["40112089", "john, kyle"],
    ["79784393", "matthew, chad"],
    ["749968"  , "bob, jimmy"],
]

class MetaPerson(type):
    _names_cache_ = {}
    def __call__(cls,name):
        return MetaPerson._names_cache_.setdefault(name,type.__call__(cls,name))

class Person(object):
    __metaclass__ = MetaPerson

    def __init__(self,name):
        self.name = name
        self.friends = []

    def __repr__(self):
        return "<Person '%s'>" % self.name

    def add_friend(self,person):
        if not person in self.friends : 
            self.friends.append(person)

        if self not in person.friends:
            person.add_friend(self)

    def get_network(self,depth_limit=-1):
        return [friend for friend in self.get_network_recursive(0,depth_limit,[self])]

    def get_network_recursive(self,actual_depth,depth_limit,network):
        if depth_limit != -1 and actual_depth > depth_limit:
            return

        if all((friend in network for friend in self.friends)):
            return

        for friend in self.friends :
            if not friend in network :
                yield friend
                network.append(friend)
                for friendsfriend in friend.get_network_recursive(actual_depth+1,depth_limit,network):
                    yield friendsfriend

class Population:
    def __init__(self):
        self.members = []

    def find(self,name):
        for member in self.members :
            if member.name == name :
                return member

    def add_member(self,person):
        if not person in self.members : 
            self.members.append(person)

    def __repr__(self):
        return repr(self.members)

def parse(data):
    population = Population()
    for tup in data :
        names   = tup[1].replace(" ","").split(",")
        # will return an existing person if name is already taken
        person1 = Person(names[0])
        # add_member won't add if already present        
        population.add_member(person1)
        if len(names) == 2 :
            # will return an existing person if name is already taken
            person2 = Person(names[1])
            # add_member won't add if already present
            population.add_member(person2)
            person2.add_friend(person1)
    return population

def main():
    population = parse(data)
    print "population",population
    kyle = population.find("kyle")
    print "Kyle's network : ",kyle.get_network(1)

def test():    
    antoine = Person("Antoine")
    johny   = Person("Johny")
    patrick = Person("Patrick")
    lisa    = Person("Lisa")

    johny.add_friend(antoine)
    antoine.add_friend(patrick)
    patrick.add_friend(lisa)
    johny.add_friend(lisa)
    print johny.get_network(1)


if __name__ == "__main__" :
    main()

here's the output

chaouche@karabeela ~/CODE/TEST/PYTHON $ python findnames.py
population [<Person 'jimmy'>, <Person 'bob'>, <Person 'kyle'>, <Person 'john'>, <Person 'matthew'>, <Person 'chad'>]
Kyle's network :  [<Person 'jimmy'>, <Person 'bob'>, <Person 'john'>]
chaouche@karabeela ~/CODE/TEST/PYTHON $

If I change the depth to 0 (meaning immediate friends only), the output changes to

chaouche@karabeela ~/CODE/TEST/PYTHON $ python findnames.py
population [<Person 'jimmy'>, <Person 'bob'>, <Person 'kyle'>, <Person 'john'>, <Person 'matthew'>, <Person 'chad'>]
Kyle's network :  [<Person 'jimmy'>, <Person 'john'>]
chaouche@karabeela ~/CODE/TEST/PYTHON $

Comments

0

You can create an endless loop by typing:

 loop = "loop"
 while loop == "loop":
     ##your code here...

The program will loop anything below the while function forever.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.