4

For example, I have a model like this:

Class Doggy(models.Model):
    name = models.CharField(u'Name', max_length = 40)
    color = models.CharField(u'Color', max_length = 20)

How can i select doggies with the same color? Or with the same name :)

UPD. Of course, I don't know the name or the color. I want to.. kind of, group by their values.

UPD2. I'm trying to do something like that, but using Django:

SELECT * 
FROM table 
WHERE tablefield IN ( 
 SELECT tablefield
 FROM table 
 GROUP BY tablefield  
 HAVING (COUNT(tablefield ) > 1) 
) 

UPD3. I'd like to do it via Django ORM, without having to iterate over the objects. I just want to get rows with duplicate values for one particular field.

5 Answers 5

8

I'm late to the party, but here you go:

Doggy.objects.values('color', 'name').annotate(Count('pk'))

This will give you results that have a count of how many of each Doggy you have grouped by color and name.

Sign up to request clarification or add additional context in comments.

1 Comment

:) Yeah, that's how it should be done, though there was no annotate back in 2011 ;)
3

You can use itertools.groupby() for this:

import operator
import itertools
from django.db import models

def group_model_by_attr(model_class, attr_name):
    assert issubclass(model_class, models.Model), \
        "%s is not a Django model." % (model_class,)
    assert attr_name in [field.name for field in Event._meta.fields], \
        "The %s field doesn't exist on model %s" % (attr_name, model_class)

    all_instances = model_class.objects.all().order_by(attr_name)
    keyfunc = operator.attrgetter(attr_name)    
    return [{k: list(g)} for k, g in itertools.groupby(all_instances, keyfunc)]

grouped_by_color = group_model_by_attr(Doggy, 'color')
grouped_by_name = group_model_by_attr(Doggy, 'name')

grouped_by_color (for example) will be a list of dicts like [{'purple': [doggy1, doggy2], {'pink': [doggy3,]}] where doggy1,2, etc. are Doggy instances.

UPDATE:

From your update it looks like you just want a list of ids for each event type. I tested this with 250k records in postgresql on my ubuntu laptop w/ a core 2 duo & 3gb of ram, and it took .35 seconds (the itertools.group_by took .72 seconds btw) to generate the dict. You mention that you have 900K records, so this should be fast enough. If it's not it should be easy to cache/update as the records change.

from collections import defaultdict

doggies = Doggy.objects.values_list('color', 'id').order_by('color').iterator()
grouped_doggies_by_color = defaultdict(list)
for color, id in doggies:
    grouped_doggies_by_color[color].append(id)

5 Comments

Thanks for trying to help. Actually, i'd like to do that using the ORM. Of course, I can iterate over all of the objects, but that's not that great if you have more than 900k of them...
No problem. I definitely would have mentioned the data size in your question. I've updated my answer with that I think will work for you.
Thank you for your reply, I've upvoted it :) But i will leave the question open - I really want to know if the mentioned query could be made via ORM $)
The second is basically done through the ORM, with a bit of messaging the data. FYI: I just tried w/750k records and groupby took 48 seconds, and the values_list took 22seconds.
Yeah I understand that, thank you, but I'm still interested in constructing the query like SQL query above.
2

If you're looking for Doggy's of a certain colour - you'd do something like.

Doggy.objects.filter(color='blue')

If you want to find Doggys based on the colour of the current Doggy

def GetSimilarColoredDoggys(self):
    return Doggy.objects.filter(color=self.color)

The same would go for names:-

def GetDoggysWithSameName(self):
    return Doggy.objects.filter(color=self.name)

4 Comments

Omg, was I really so unclear in my question? Sorry, I will update it to show that the color/name is not known.
Btw, your naming doesn't follow the python neaming convention. Seems, you like C :)
@DataGreed: Your comment to Mez is true that using CamelCase for function names instead of lowercase with underscores isn't the preferred way. But then again, spaces between the '=' sign in keyword arguments—as you've used in your question—aren't PEP8 compliant either. python.org/dev/peps/pep-0008
Fair enough, but I was the one who was asking for advice. Remember, that it is a knowledge-base and if a novice would accept this advice, it could lead him the dark way of code formatting havoc. BTW, using camelCase with lowerecase first letter would not be so confisung. But, again, that's not the point.
-2

I would change your data model so that the color and name are a one-to-many relationship with Doggy as follows:

class Doggy(models.Model):
    name = models.ForeignKey('DoggyName')
    color = models.ForeignKey('DoggyColor')

class DoggyName(models.Model):
    name = models.CharField(max_length=40, unique=True)

class DoggyColor(models.Model):
    color = models.CharField(max_length=20, unique=True)

Now DoggyName and DoggyColor do not contain duplicate names or colors, and you can use them to select dogs with the same name or color.

6 Comments

Omg, i was not asking that. This model was a dummy one, provided for example. The real issue is finding duplicate messages.
@DataGreed: Why isn't changing your data model a valid option?
Because I'm not interested in an advice about DB architecture this time. If you want to know the real picture: i've got something like a forum, where i'd like to find the duplicate messages and create a report for moderator about them. So, the question itself is about making a particular type of query using the Django ORM (if it is possible, of course - I've tried a lot of ways without using extra() and didn't get it working).
@DataGreed: Your example Doggy model with name and color as CharField's will result in redundant data and a data model that is not in second normal form. Violating 2NF results in wasted storage space and reduced query performance. If your "real is issue is finding duplicate messages", then you should ask your real question instead of down voting people that ask the question you actually asked.
I've downvoted your answer because it didn't contain an answer to the actual question. You could also answer something like "Just don't do it, find yourself another hobby", but I wasn't asking for advice on finding the other hobby than writing django apps. I was asking the exact question about making an exact query using the ORM.
|
-3

Okay, apparently, there's no way to do such thing with ORM only.

If you have to do it, you have to use .extra() to execute needed SQL-statement (if you are using SQL database, of course)

1 Comment

I would use .raw() rather than .extra() -- it's simpler, and you can use any SQL you want and get back Django model objects. docs.djangoproject.com/en/dev/topics/db/sql/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.