Thursday, 27 June 2013

Learning the Hard Way: NHibernate Collections

Here's on that's bitten out team recently: the issue of managing collections of child entities on an parent entity. A number of our records were going missing from the database. How could this be? We don't really delete anything, we just 'soft delete' - setting a flag to mark something as deleted. Take the following entities:
public class Employee
{
    public virtual Guid Id { get; set; }
    public virtual string Name { get; set; }
    public virtual bool Deleted { get; set; }

    public static Employee Create(string name)
    {
        return new Employee
                   {
                       Id = Guid.NewGuid(),
                       Name = name,
                       Deleted = false
                   };
    }
}

public class Team 
{
    public virtual Guid Id { get; set; }
    public virtual string Name { get; set; }
    public virtual IList TeamEmployees { get; set; }
    public virtual bool Deleted { get; set; }

    public static Team Create(string name)
    {
        return new Team
                    {
                        Id = Guid.NewGuid(),
                        Name = name,
                        TeamEmployees = new List(),
                        Deleted = false
                    };
    }

    public virtual void UpdateEmployees(IList employees)
    {
        foreach(var teamEmployee in TeamEmployees.Where(x => !employees.Contains(x.Employee)).Reverse())
        {
            TeamEmployees.Remove(teamEmployee);
        }

        foreach(var employee in employees.Where(x => !TeamEmployees.Select(y => y.Employee).Contains(x)))
        {
            TeamEmployees.Add(TeamEmployee.Create(employee, this));
        }
    }
}

public class TeamEmployee
{
    public virtual Guid Id { get; set; }
    public virtual Employee Employee { get; set; }
    public virtual Team Team { get; set; }
    public virtual bool Deleted { get; set; }

    public static TeamEmployee Create(Employee employee, Team team)
    {
        return new TeamEmployee
                    {
                        Id = Guid.NewGuid(),
                        Employee = employee,
                        Team = team,
                        Deleted = false
                    };
    }
}
The problem here is when you load a team, update the employees and save it - you can be deleting records without realising it. The mapping on Team for TeamEmployees was set to 'all-delete-orphans', so when the association between a Team and an Employee was removed, any record it ever existed was also lost. Even if the cascade had just been 'all', the foreign key to Team would have been nullified and the history would have been lost.

There are a few ways to limit these problems, such as revoking delete access for the database login, and setting all cascades to 'save-update', but it also pays to be cleverer about how collections are handled.

Instead of removing the TeamEmployee record, it is flagged as deleted:
public class Team 
{
    public virtual Guid Id { get; set; }
    public virtual string Name { get; set; }
    public virtual IList TeamEmployees { get; set; }
    public virtual bool Deleted { get; set; }

    public static Team Create(string name)
    {
        return new Team
                    {
                        Id = Guid.NewGuid(),
                        Name = name,
                        TeamEmployees = new List(),
                        Deleted = false
                    };
    }

    public virtual void UpdateEmployees(IList employees)
    {
        foreach(var teamEmployee in TeamEmployees.Where(x => !employees.Contains(x.Employee)).Reverse())
        {
            TeamEmployees.Remove(teamEmployee);
        }

        foreach(var employee in employees.Where(x => !TeamEmployees.Select(y => y.Employee).Contains(x)))
        {
            TeamEmployees.Add(TeamEmployee.Create(employee, this));
        }
    }
}
And the Mapping file has a where clause added to it, so that it only loads the undleted records:
<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
                   namespace="Collections.Domain"
                   assembly="Collections.Domain">
  <class name="Team" table="`Team`">
    <id name="Id" column="Id" type="Guid">
      <generator class="assigned"/>
    </id>
    <property name="Name" column="`Name`" />
 <property name="Deleted" />
 <bag name="TeamEmployees" cascade="save-update" where="Deleted = 0" >
      <key column="TeamId"/>
   <one-to-many class="TeamEmployees" />
 </bag>
  </class>
</hibernate-mapping>
Care must be taken when accessing the collection in the same session after deleting a record, as it will be present but marked as deleted. A linq clause '.Where(x => !x.Deleted) should be used.

Another problem would be if the Team and Employee relationship was mapped as many-to-many. There may be a way to soft delete the relationships, but I am not currently aware of it.

Really, this whole scenario is another argument in favour of breaking down all many-to-many relationships with an extra entity. There are others, such as having somewhere to store information about the relationship. Many times I have found it necessary to break down a many-to-many, but never to go the other way. Therefore I am favouring breaking down these relationships as a default.

Fortunately due to our rightfully paranoid auditing and event logging, all customer records were retrived and the data was returned to its expected state.

Friday, 14 June 2013

Overcoming the n+1 issue with one-to-one mapping in NHibernate.

This recently caught me out. In out business domain, we have Task entities. Each task can have a Task that preceeded it, and a Task that follows it. This is modelled like this:
 
namespace OneToOneIssue.Domain
{
    public class Task
    {
        public virtual int Id { get; set; }
        public virtual string Description { get; set; }
        public virtual Task FollowingTask { get; set; }
        public virtual Task PrecedingTask { get; set; }
    }
}
And the database table looks like this:
CREATE TABLE [dbo].[Task](
    [Id] bigint NOT NULL,
    [Description] nvarchar(100) NULL,
    [FollowingTaskId] int NULL
    CONSTRAINT [PK_Task] PRIMARY KEY CLUSTERED 
    (
        [Id] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
and the mapping like this:
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
                   namespace="OneToOneIssue.Domain"
                   assembly="OneToOneIssue.Domain">
  <class name="Task" table="`Task`">
    <id name="Id" column="Id" type="int">
      <generator class="assigned"/>
    </id>
    <property name="Description" column="`Description`" />
    <many-to-one name="FollowingTask" class="Task" column="FollowingTaskId"  />
    <one-to-one name="PrecedingTask" class="Task" property-ref="FollowingTask" lazy="proxy" />
  </class>
</hibernate-mapping>
This all worked fine, but created a serious performace issue when loading a collection of tasks. Take this set of data for example:
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (1, 'Task 1', 2)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (2, 'Task 2', 3)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (3, 'Task 3', NULL)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (4, 'Task 4', 5)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (5, 'Task 5', 6)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (6, 'Task 6', NULL)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (7, 'Task 7', 8)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (8, 'Task 8', 9)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (9, 'Task 9', NULL)
Running a query to get records 2, 5 and 8 (The ones with preceding and following tasks) like this:
using (var session = sessionFactory.OpenSession())
{
    var tasks = session
        .CreateCriteria<Task>()
        .Add(Restrictions.In("Id", new[] { 2, 5, 8 }))
        .List<Task>();
}
In this instance, NHibernate would create a query to get the list of tasks, but then perform a query for each task to retrieve its preceding task. You do not need to access the preceding task to make this happen, and setting lazy="false" on the preceding task does not resolve it.

This is a known bug in NHibernate and has been discussed in StackOverflow posts such as this.

One option would be for each record in the Task table to have a FollowingTaskId and a PrecedingTaskId, but this would increase the chance of data inconsistencies and would be mean major, high risk updates to our current data set.

A less invasive way would be to pretend that the preceding task is part of a collection that only ever has one record. A Chinese Collection if you like (it enforces a 'single child policy'). From the point of view of the rest of the code, nothing has changed, and the database can remain the same.

So the entity now looks like this:
namespace OneToOneIssue.Domain
{
    public class Task
    {
        public virtual int Id { get; set; }
        public virtual string Description { get; set; }
        public virtual Task FollowingTask { get; set; }
        private IList _precedingTasks;

        public virtual Task PrecedingTask
        {
            get 
            { 
                return _precedingTasks.FirstOrDefault(); 
            }
            set
            {
                _precedingTasks.Clear();
                _precedingTasks.Add(value);
            }
        }
    }
}
And the mapping changes to this:
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
                   namespace="OneToOneIssue.Domain"
                   assembly="OneToOneIssue.Domain">
  <class name="Task" table="`Task`">
    <id name="Id" column="Id" type="int">
      <generator class="assigned"/>
    </id>
    <property name="Description" column="`Description`" />
    <many-to-one name="FollowingTask" class="Task" column="FollowingTaskId"  />
    <bag name="_precedingTasks" access="field">
      <key column="FollowingTaskId"/>
      <one-to-many class="Task" />
    </bag>
  </class>
</hibernate-mapping>
Now the collection can be queried as before and preceding tasks will not be lazy-loaded untill the code accesses them. Of course if you need the preceding tasks in the collection, you will need to eager load them.

Preceding tasks can be set and saved in the usual way.

Cheers to Radim Köhler for suggesting this solution here.