BulkInsert in Entity Framework

BulkInsert extension method in Entity Framework

Install Z.EntityFramework.Extensions:

Now, I’ll show you how to use the Z.EntityFramework.Extensions package and perform bulk insert, update, and delete operations through Entity Framework. First, open the NuGet Package Manager Console window and search for the Z.EntityFramework.Extensions package. Select“Z.EntityFramework.Extensions“, then select “Project” and select the latest version, and finally click the “Install” button as shown in the image below.

After clicking the Install button, it will take some time and install the Z.EntityFramework.Extensions DLL into your project.

Note: This Z.EntityFramework.Extensions extends our DbContext object with high-performance bulk operations such as BulkSaveChanges, BulkInsert, BulkUpdate, BulkDelete, BulkMerge, etc. It supports SQL Server, MySQL, Oracle, PostgreSQL, SQLite, and more!

Advantages of using Entity Framework extensions:
  1. Easy to use.
  2. flexible.
  3. Improve performance.
  4. Improve application responsiveness.
  5. Reduce database load by reducing the number of database round-trips.
BulkInsert extension method:

Z.EntityFramework.Extensions provides two methods, BulkInsert and BulkInsertAync, which allow us to insert a large number of entities into the database at once. The syntax for using the bulk insert extension method is as follows:

context.BulkInsert(listStudents);
context.BulkInsertAsync(listStudents, cancellationToken);

Learn an example of using the BulkInsert method of the Entity Framework extension

In the following example, we will use the BulkInsert extension method to insert a list of students (i.e. newStudents) into the database. Here, we don’t need to call SaveChanges method while performing bulk insert operation. In this example, using a single round trip, the context class performs the INSERT operation.

using System;
using System.Collections.Generic;
using System.Linq;

namespaceDBFirstApproach
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("BulkInsert Method Started");
            IList<Student> newStudents = new List<Student>() {
                new Student() { FirstName = "John", LastName = "Taylor", StandardId = 1 },
                new Student() { FirstName = "Sara", LastName = "Taylor", StandardId = 1 },
                new Student() { FirstName = "Pam", LastName= "Taylor", StandardId = 1 },
            };
            using (var context = new EF_Demo_DBEntities())
            {
                context.Database.Log = Console.Write;
                //Easy to use
                context.BulkInsert(newStudents);
            }
            Console.WriteLine("BulkInsert Method Completed");
            GetStudents("Taylor");
            Console.Read();
        }

        public static void GetStudents(string LastName)
        {
            using (var context = new EF_Demo_DBEntities())
            {
                var studentsList = context.Students.Where(std => std.LastName.Equals(LastName, StringComparison.InvariantCultureIgnoreCase));
                foreach (var std in studentsList)
                {
                    Console.WriteLine($"FirstName : {std.FirstName}, LastName : {std.LastName}, StandardId : {std.StandardId}");
                }
            }
        }
    }
}

When you run the above code, you will get the following output. If you look closely at the SQL query, you will see that it is using SQL Merge to perform a BULK operation. Since it reduces the number of round trips to the database server, it greatly improves application performance.

BulkInsert Method Started
Opened connection at 12-12-2022 19:45:45 + 05:30
-- Executing Command:

/* SELECT server information */
SELECT @@VERSION

/* SELECT table information */
SELECT DestinationTable.Name AS DestinationName ,
        (SELECT 1
          WHERE EXISTS ( SELECT 1
                             FROM sys.triggers AS X
                             WHERE X.parent_id = A.object_id
                                    AND X.is_disabled = 0
                                    AND OBJECTPROPERTY(X.object_id,
                                                       'ExecIsInsertTrigger') = 1 )
        ) AS HasInsertTrigger,
        (SELECT 1
          WHERE EXISTS ( SELECT 1
                             FROM sys.triggers AS X
                             WHERE X.parent_id = A.object_id
                                    AND X.is_disabled = 0
                                    AND OBJECTPROPERTY(X.object_id,
                                                       'ExecIsUpdateTrigger') = 1 )
        ) AS HasUpdateTrigger,
        ( SELECT 1
          WHERE EXISTS ( SELECT 1
                             FROM sys.triggers AS X
                             WHERE X.parent_id = A.object_id
                                    AND X.is_disabled = 0
                                    AND OBJECTPROPERTY(X.object_id,
                                                       'ExecIsDeleteTrigger') = 1 )
        ) AS HasDeleteTrigger
FROM (SELECT @Table_0 AS Name) AS DestinationTable
        LEFT JOIN sys.synonyms AS B ON B.object_id = OBJECT_ID(DestinationTable.Name)
                                                                                AND COALESCE(PARSENAME(base_object_name,4), @@SERVERNAME) = @@SERVERNAME
                                                                                AND COALESCE(PARSENAME(base_object_name,3), DB_NAME(DB_ID())) = DB_NAME(DB_ID())
        INNER JOIN sys.tables AS A ON A.object_id = OBJECT_ID(DestinationTable.Name)
                                        OR A.object_id = OBJECT_ID(B.base_object_name)
ORDER BY DestinationName

/* SELECT column information */
SELECT DestinationTable.Name AS DestinationName ,
        C.name AS ColumnName ,
        C.column_id AS ColumnOrder,
        C.precision AS Precision,
        C.scale AS Scale,
        C.max_length AS MaxLength ,
        C.collation_name AS Collation ,
        C.Is_Identity AS IsIdentity,
        ( CASE WHEN EXISTS ( SELECT 1
                             FROM sys.index_columns AS X
                             WHERE X.index_id = B.index_id
                                    AND X.object_id = B.object_id
                                    AND X.column_id = C.column_id ) THEN 1
               ELSE 0
          END ) AS IsPrimaryKey ,
        C.system_type_id AS System_Type_Id ,
        D.Name AS TypeName,
        (CASE WHEN E.base_object_name IS NOT NULL THEN 1 ELSE 0 END) AS IsSynonym,
        D.is_user_defined,
        F.name,
                CASE WHEN C.default_object_id = 0 THEN 'ZZZ_NO_DEFAULT' ELSE ISNULL(OBJECT_DEFINITION(C.default_object_id), 'ZZZ_ERROR_DEFAULT_ZZZ') END AS DefaultValueSql,
        C.is_nullable
FROM (SELECT @Table_0 AS Name) AS DestinationTable
        LEFT JOIN sys.synonyms AS E ON E.object_id = OBJECT_ID(DestinationTable.Name)
                                                                                AND COALESCE(PARSENAME(base_object_name,4), @@SERVERNAME) = @@SERVERNAME
                                                                                AND COALESCE(PARSENAME(base_object_name,3), DB_NAME(DB_ID())) = DB_NAME(DB_ID())
        INNER JOIN sys.tables AS A ON A.object_id = OBJECT_ID(DestinationTable.Name)
                                        OR A.object_id = OBJECT_ID(E.base_object_name)
        LEFT JOIN sys.indexes AS B ON B.object_id = A.object_id
                                        AND B.is_primary_key = 1
        INNER JOIN sys.columns AS C ON C.object_id = A.object_id
        INNER JOIN sys.types AS D ON D.system_type_id = C.system_type_id
                                     AND D.user_type_id = C.user_type_id
        INNER JOIN sys.schemas AS F ON D.schema_id = F.schema_id
ORDER BY DestinationName,
        ColumnOrder

-- @Table_0: [dbo].[Student] (Type = String, Size = 15)
-- CommandTimeout:30
-- Executing at 12-12-2022 19:45:45
-- Completed at 12-12-2022 19:45:45
-- Result: SqlDataReader


-- Executing Command:
MERGE INTO [dbo].[Student] AS DestinationTable
USING
(
SELECT TOP 100 PERCENT * FROM (SELECT @0_0 AS [StudentId], @0_1 AS [FirstName], @0_2 AS [LastName], @0_3 AS [StandardId], @0_4 AS ZZZ_Index
UNION ALL SELECT @1_0 AS [StudentId], @1_1 AS [FirstName], @1_2 AS [LastName], @1_3 AS [StandardId], @1_4 AS ZZZ_Index
UNION ALL SELECT @2_0 AS [StudentId], @2_1 AS [FirstName], @2_2 AS [LastName], @2_3 AS [StandardId], @2_4 AS ZZZ_Index) AS StagingTable ORDER BY ZZZ_Index
) AS StagingTable
ON 1 = 2
WHEN NOT MATCHED THEN
    INSERT ( [FirstName], [LastName], [StandardId] )
    VALUES ( [FirstName], [LastName], [StandardId] )
OUTPUT
    $action,
    StagingTable.ZZZ_Index,
    INSERTED.[StudentId] AS [StudentId_zzzzinserted]

;
-- @0_0: 0 (Type = Int32, Size = 4)
-- @0_1: John (Type = AnsiString, Size = 100)
-- @0_2: Taylor (Type = AnsiString, Size = 100)
-- @0_3: 1 (Type = Int32, Size = 4)
-- @0_4: 0 (Type = Int32, Size = 0)
-- @1_0: 0 (Type = Int32, Size = 4)
-- @1_1: Sara (Type = AnsiString, Size = 100)
-- @1_2: Taylor (Type = AnsiString, Size = 100)
-- @1_3: 1 (Type = Int32, Size = 4)
-- @1_4: 1 (Type = Int32, Size = 0)
-- @2_0: 0 (Type = Int32, Size = 4)
-- @2_1: Pam (Type = AnsiString, Size = 100)
-- @2_2: Taylor (Type = AnsiString, Size = 100)
-- @2_3: 1 (Type = Int32, Size = 4)
-- @2_4: 2 (Type = Int32, Size = 0)
-- CommandTimeout:120
-- Executing at 12-12-2022 19:45:45
-- Completed at 12-12-2022 19:45:46
-- Result: 3 rows


Closed connection at 12-12-2022 19:45:46 + 05:30
BulkInsert Method Completed
FirstName: John, LastName: Taylor, StandardId: 1
FirstName: Sara, LastName: Taylor, StandardId: 1
FirstName: Pam, LastName: Taylor, StandardId: 1
Performance comparison of SaveChanges and BulkInsert in Entity Framework:

Let’s take a look at the performance benchmark between Entity Framework SaveChanges method and BulkInsert extension method through an example. As we all know, the Entity Framework SaveChanges method will generate and execute a separate SQL query for each entity, i.e. multiple database passes, while the BulkInsert extension method performs the same task using a single database pass.

We will batch insert 1000 students using two methods namely AddRange method and BulkInsert extension method and will measure the time required by both methods to complete the task. For better understanding, take a look at the following example. In the example below, don’t consider using the FirstTimeExecution method for performance testing because we know that when we perform something for the first time, it will take more time.

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespaceDBFirstApproach
{
    class Program
    {
        static void Main(string[] args)
        {
            //Don't consider below for performance Testing
            //This warmup
            FirstTimeExecution();

            //Generate 1000 Students
            List<Student> studentList = GenerateStudents(1000);
            Stopwatch SaveChangesStopwatch = new Stopwatch();
            Stopwatch BulkInsertStopwatch = new Stopwatch();

            using (EF_Demo_DBEntities context1 = new EF_Demo_DBEntities())
            {
                // Add the Student Collection using the AddRange Method
                context1.Students.AddRange(studentList);
                SaveChangesStopwatch.Start();
                context1.SaveChanges();
                SaveChangesStopwatch.Stop();
                Console.WriteLine($"SaveChanges, Entities : {studentList.Count}, Time Taken : {SaveChangesStopwatch.ElapsedMilliseconds} MS");
            }

            using (EF_Demo_DBEntities context2 = new EF_Demo_DBEntities())
            {
                // BulkInsert
                BulkInsertStopwatch.Start();
                context2.BulkInsert(studentList, options => options.AutoMapOutputDirection = false); // performance can be improved with options
                BulkInsertStopwatch.Stop();
                Console.WriteLine($"BulkInsert, Entities : {studentList.Count}, Time Taken : {BulkInsertStopwatch.ElapsedMilliseconds} MS");
            }

            Console.Read();
        }

        public static void FirstTimeExecution()
        {
            List<Student> stduentsList = GenerateStudents(20);

            // SaveChanges
            using (var context = new EF_Demo_DBEntities())
            {
                context.Students.AddRange(stduentsList);
                //Call the SaveChanges Method to INSERT the data into the database
                context.SaveChanges();
                // Delete the Newly Inserted Data
                context.BulkDelete(stduentsList);
            }

            // BulkInsert
            using (var context = new EF_Demo_DBEntities())
            {
                context.BulkInsert(stduentsList, options => options.AutoMapOutputDirection = false);
                // Delete the Newly Inserted Data
                context.BulkDelete(stduentsList);
            }
        }

        public static List<Student> GenerateStudents(int count)
        {
            var listOfStudents = new List<Student>();

            for (int i = 0; i < count; i + + )
            {
                listOfStudents.Add(new Student() { FirstName = "FirstName_" + i, LastName = "LastName_" + i, StandardId = 1 });
            }

            return listOfStudents;
        }
    }
}

Now, run the above code and you will get the following output. As you can see, the SaveChanges method takes 490 milliseconds to insert 1000 entities into the database, while the BulkInsert Extension method only takes 20 milliseconds to insert the same 1000 entities into the database. Therefore, you can imagine how dangerous the Entity Framework AddRange method is when performance is considered.

Note: There are many factors that affect baseline times to consider, such as indexes, column types, latencies, constraints, etc.

Why is the BulkInsert extension method faster than SaveChanges?

Inserting 1000 entities for initial load or file import is a typical scenario. The SaveChanges method makes handling this situation very slow/impossible due to the number of database round-trips required. SaveChanges performs a database round trip for each entity being inserted. So if 10,000 entities need to be inserted, 10,000 database round-trips will be performed, which will make it slower.

The BulkInsert counterpart requires as few database round-trips as possible. For example, behind the scenes in SQL Server, executing SqlBulkCopy to insert 10,000 entities is the fastest method available.