48

I want to know what is the difference between creating classes with or without using "hashset" in constructor.

Using code first approach (4.3) one can creat models like this:

public class Blog
 {
     public int Id { get; set; }
     public string Title { get; set; }
     public string BloggerName { get; set;}
     public virtual ICollection<Post> Posts { get; set; }
  }

public class Post
 {
    public int Id { get; set; }
    public string Title { get; set; }
    public DateTime DateCreated { get; set; }
    public string Content { get; set; }
    public int BlogId { get; set; }
    public ICollection<Comment> Comments { get; set; }
 }

or can create models like this :

public class Customer
{
    public Customer()
    {
        BrokerageAccounts = new HashSet<BrokerageAccount>();
    }
    public int Id { get; set; }
    public string FirstName { get; set; }
    public ICollection<BrokerageAccount> BrokerageAccounts { get; set; }
}

public class BrokerageAccount
{

    public int Id { get; set; }
    public string AccountNumber { get; set; }
    public int CustomerId { get; set; }

}

What is hashset doing here?

should i use hashset in the first two models also?

is there any article which shows the application of hashset?

3 Answers 3

28

The HashSet does not define the type of collection that will be generated when you actually fetch data. This will always be of type ICollection as declared.

The HashSet created in the constructor is to help you avoid NullReferenceExceptions when no records are fetched or exist in the many side of the relationship. It is in no way required.

For example, based on your question, when you try to use a relationship like...

var myCollection = Blog.Posts();

If no Posts exist then myCollection will be null. Which is OK, until you fluent chain things and do something like

var myCollectionCount = Blog.Posts.Count();

which will error with a NullReferenceException.

Where as

var myCollection = Customer.BrokerageAccounts();
var myCollectionCount = Customer.BrokerageAccounts.Count();

will result in and empty ICollection and a zero count. No exceptions :-)

Sign up to request clarification or add additional context in comments.

5 Comments

Is the () on properties valid (Blog.Posts())? Shouldn't it just be Blog.Posts to access the field?
This seems to be wrong. The debugger shows me exactly the type I use in my constructor, even for data fetched from the database. This is also reflected in different behaviors when accessing the collection (ex. through DataBinding on those collections).
@linac It's not the HashSet that defines the return type, but the definition of the ICollection<T> property. The HashSet is used to just initializes the ICollection property. If you don't initialize the property in the constructor, the debugger will still show the ICollection type as defined. Nothing to do with the HashSet!!
You'll have to mark your property as virtual, for EF to override the collection type. Otherwise it has no other option than to keep the available list.
I'm kind of curious to know would there be a small performance hit from adding the hashset in the constructor.. Does that mean for every Blog entity framework makes, it has to first make a empty hashset, which is never used because entity framework then overrides the ICollection<BrokerageAccount> BrokerageAccounts getters and setters? ... I can see maybe having the hashset might be useful for unit-tests
27

Generally speaking, it is best to use the collection that best expresses your intentions. If you do not specifically intend to use the HashSet's unique characteristics, I would not use it.

It is unordered and does not support lookups by index. Furthermore, it is not as well suited for sequential reads as other collections, and the fact that it allows you to add the same item multiple times without creating duplicates is only useful if you have a reason to use it for that. If that is not your intention, it can hide misbehaving code and make problems difficult to isolate.

The HashSet is mostly useful in situations where insertion and removal times are very important, such as when processing data. It is also extremely useful for comparing sets of data (again when processing) using operations like intersect, except, and union. In any other situation, the cons generally outweigh the pros.

Consider that when working with blog posts, inserts and removes are quite rare compared to reads, and you generally want to read the data in a specific order, anyway. That is more or less the exact opposite of what the HashSet is good at. It is highly doubtful that you would ever intend to add the same post twice, for any reason, and I see no reason why you would use set-based operations on posts in a class like that.

Comments

22

I'm fairly new to Entity Framework but this is my understanding. The collection types can be any type that implements ICollection<T>. In my opinion a HashSet is usually the semantically correct collection type. Most collections should only have one instance of a member (no duplicates) and HashSet best expresses this. I have been writing my classes as shown below and this has worked well so far. Note that the collection is typed as ISet<T> and the setter is private.

public class Customer
{
    public Customer()
    {
        BrokerageAccounts = new HashSet<BrokerageAccount>();
    }
    public int Id { get; set; }
    public string FirstName { get; set; }
    public ISet<BrokerageAccount> BrokerageAccounts { get; private set; }
}

2 Comments

I completely agree. In most of the cases the HashSet is the most natural fit.
hashset still seems right in EF6.x. Natively, EF will use hashsets in this exact manner during db-first creation of types too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.