Friday, 14 June 2013

Overcoming the n+1 issue with one-to-one mapping in NHibernate.

This recently caught me out. In out business domain, we have Task entities. Each task can have a Task that preceeded it, and a Task that follows it. This is modelled like this:
 
namespace OneToOneIssue.Domain
{
    public class Task
    {
        public virtual int Id { get; set; }
        public virtual string Description { get; set; }
        public virtual Task FollowingTask { get; set; }
        public virtual Task PrecedingTask { get; set; }
    }
}
And the database table looks like this:
CREATE TABLE [dbo].[Task](
    [Id] bigint NOT NULL,
    [Description] nvarchar(100) NULL,
    [FollowingTaskId] int NULL
    CONSTRAINT [PK_Task] PRIMARY KEY CLUSTERED 
    (
        [Id] ASC
    )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
and the mapping like this:
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
                   namespace="OneToOneIssue.Domain"
                   assembly="OneToOneIssue.Domain">
  <class name="Task" table="`Task`">
    <id name="Id" column="Id" type="int">
      <generator class="assigned"/>
    </id>
    <property name="Description" column="`Description`" />
    <many-to-one name="FollowingTask" class="Task" column="FollowingTaskId"  />
    <one-to-one name="PrecedingTask" class="Task" property-ref="FollowingTask" lazy="proxy" />
  </class>
</hibernate-mapping>
This all worked fine, but created a serious performace issue when loading a collection of tasks. Take this set of data for example:
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (1, 'Task 1', 2)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (2, 'Task 2', 3)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (3, 'Task 3', NULL)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (4, 'Task 4', 5)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (5, 'Task 5', 6)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (6, 'Task 6', NULL)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (7, 'Task 7', 8)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (8, 'Task 8', 9)
INSERT INTO [Task] ([Id], [Description], [FollowingTaskId]) VALUES (9, 'Task 9', NULL)
Running a query to get records 2, 5 and 8 (The ones with preceding and following tasks) like this:
using (var session = sessionFactory.OpenSession())
{
    var tasks = session
        .CreateCriteria<Task>()
        .Add(Restrictions.In("Id", new[] { 2, 5, 8 }))
        .List<Task>();
}
In this instance, NHibernate would create a query to get the list of tasks, but then perform a query for each task to retrieve its preceding task. You do not need to access the preceding task to make this happen, and setting lazy="false" on the preceding task does not resolve it.

This is a known bug in NHibernate and has been discussed in StackOverflow posts such as this.

One option would be for each record in the Task table to have a FollowingTaskId and a PrecedingTaskId, but this would increase the chance of data inconsistencies and would be mean major, high risk updates to our current data set.

A less invasive way would be to pretend that the preceding task is part of a collection that only ever has one record. A Chinese Collection if you like (it enforces a 'single child policy'). From the point of view of the rest of the code, nothing has changed, and the database can remain the same.

So the entity now looks like this:
namespace OneToOneIssue.Domain
{
    public class Task
    {
        public virtual int Id { get; set; }
        public virtual string Description { get; set; }
        public virtual Task FollowingTask { get; set; }
        private IList _precedingTasks;

        public virtual Task PrecedingTask
        {
            get 
            { 
                return _precedingTasks.FirstOrDefault(); 
            }
            set
            {
                _precedingTasks.Clear();
                _precedingTasks.Add(value);
            }
        }
    }
}
And the mapping changes to this:
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2"
                   namespace="OneToOneIssue.Domain"
                   assembly="OneToOneIssue.Domain">
  <class name="Task" table="`Task`">
    <id name="Id" column="Id" type="int">
      <generator class="assigned"/>
    </id>
    <property name="Description" column="`Description`" />
    <many-to-one name="FollowingTask" class="Task" column="FollowingTaskId"  />
    <bag name="_precedingTasks" access="field">
      <key column="FollowingTaskId"/>
      <one-to-many class="Task" />
    </bag>
  </class>
</hibernate-mapping>
Now the collection can be queried as before and preceding tasks will not be lazy-loaded untill the code accesses them. Of course if you need the preceding tasks in the collection, you will need to eager load them.

Preceding tasks can be set and saved in the usual way.

Cheers to Radim Köhler for suggesting this solution here.

No comments:

Post a Comment