.NET sub-database and table high performance: waterfall paging

Framework introduction

According to the usual practice, we will first introduce the protagonist of this issue: ShardingCore, a high-performance, lightweight solution for reading and writing separation of sub-tables and sub-databases under ef-core, with zero dependencies, zero learning costs, and zero business code hacking

The only fully-automated table-splitting and multi-field table-splitting framework under dotnet has high performance, zero dependencies, zero learning costs, zero business code intrusion, and supports read-write separation and dynamic table-sharding and database-splitting. The same route can be completely automated Defined Nova component framework.

Project Address

github: https://github.com/dotnetcore/sharding-core

gitee: https://gitee.com/dotnetchina/sharding-core

Background

Under the large amount of data, there is a large amount of data, and the front-end needs to display it in the form of a waterfall. The simplest thing we can do is to take the articles published by users as an example. Suppose we have a large number of articles. Posts are required to be displayed to users in reverse order of post release time, so on the mobile phone, we usually display them in the form of pull-down to refresh and pull-up to load, so we generally have the following centralized writing method.

General pagination

select count(*) from article
select * from article order by publish_time desc limit 0,20

This operation is generally our conventional paging operation. First perform the total and then obtain the paging. The advantage of this method is that it supports pagination of any rule. The disadvantage is that it needs to be queried twice, one count and one limit. Of course, the amount of data in the later stage is too large. The first count is required, but there is also a problem that if the amount of data keeps changing, there will be some data from the previous page in the next page, because the data is constantly being added, and your page has not kept up with the release speed. This will be sent.

waterfall pagination

In addition to the above conventional paging operations, we can also perform specific paging methods for specific order paging to achieve high performance, because based on the major premise we are a waterfall flow under a large number, our article assumes that the snowflake id is used as the primary key, then Our pagination can be written like this

select * from article where id<last_id order by publish_time desc limit 0,20

First of all, let’s analyze it. This statement is realized by using the order of the inserted data distribution and the sorting you need to query, and because the id will not be repeated and the order and time of the snowflake id are consistent and in the same direction. This method can be used for sorting. The limit does not need to skip any number each time, and can directly obtain the required number. It only needs to pass the id of the last query result. This method makes up for the above-mentioned conventional pagination. problem, and has very high performance, but the disadvantages are also obvious, does not support page jumping, does not support arbitrary sorting, so this method is currently very suitable for waterfall sorting of front-end apps.

Realization under Fragmentation

First of all, this function needs to be realized under sharding. We need id to support sharding, and publish_time is divided into tables according to time. Both are indispensable.

principle

Assuming that the article table article we use publish_time as the fragmentation field, assuming that the table is divided by day, then we will have the following table

article_20220101, article_20220102, article_20220103, article_20220104, article_20220105, article_20220106…

Snowflake id auxiliary sharding

Because Snowflake id can reverse analyze the time, so we have =, >=, > for Snowflake id, The operation of <=, <, contains can be used for auxiliary sharding to reduce the scope of sharding. Suppose our snowflake id is parsed out to be 2021-01-05 11:11:11, then the < less than operation for this Snowflake id can be equivalent to x< 2021-01-05 11:11:11, if I ask you which tables we need to query, it is obvious [article_20220101, article_20220102, article_20220103, article_20220104, article_20220105], we all need Inquire.

union all sharding mode

If you use the fragmentation mode of union all, then usually all the tables of 20220101-20220105 will be union all and then mechanically filtered, then the advantages can be imagined: simple, the number of connections consumes only 1, many sql statements are supported, and the disadvantages It is also obvious that optimization is a big problem in the later stage, and there is a problem with the use of cross-library

select * from
(select * from article_20220101 union all select * from article_20220102 union all select * from article_20220103....) t
 where id<last_id order by publish_time desc limit 0,20

Stream fragmentation, sequential query

If you are aggregated in streaming sharding mode, usually we will query all the tables of 20220101-20220105 in parallel and separately, and then sort the result set of each query and obtain them after sorting the priority queue. Advantages: the statement is simple and easy to optimize , controllable performance, support sub-library, disadvantages: complex implementation, high consumption of connections

select * from article_20220101 where id<last_id order by publish_time desc limit 0,20
select * from article_20220102 where id<last_id order by publish_time desc limit 0,20
select * from article_20220103 where id<last_id order by publish_time desc limit 0,20
?…

Optimization under streaming sharding

Currently ShardingCore uses streaming aggregation + union all, if and only when the user manually calls UseUnionAllMerge, the shard SQL will be converted into union all aggregation.

This is how the pagination ShardingCore for the above waterfall stream works

Determine the order of the fragment table, that is, because the fragment field is publish_time, and because the sort field is publish_time, the fragment table is actually in order, that is, [article_20220105 , article_20220104, article_20220103, article_20220102, article_20220101], because we are opening n concurrent threads, this sorting may not make sense, but if we only open and set a single connection concurrency, the program will now pass id screen the table, and then obtain it from large to small until the skip + take is satisfied, that is, 0 + 20 = 20 pieces of data, then directly discard the remaining query and return the result, then this query is basically the same as the single table query , because basically a maximum of two tables can basically meet the requirements (the specific scenario is not necessarily)



Explanation: Assuming that last_id de-analyzes the result is 2022-01-04 05:05:05, then article_20220105 can be basically ruled out, and if the number of concurrent connections is 1, then directly Query article_20220104, if not satisfied, continue to query article_20220103 until the query result is 20. If the number of concurrent connections is 2, then query [article_20220104, article_20220103] If not satisfied, continue with the following two tables until the result is 20 pieces of data, so we can clearly understand its working principle and optimize it


Description


Through the above optimization, the high performance O(1) of streaming aggregation query under sequential query can be guaranteed


Through the above optimization, the client fragmentation can be guaranteed to have the control of the minimum number of connections


Setting a reasonable primary key can effectively solve our performance optimization under big data sharding


Practice
ShardingCore is currently continuously optimizing sharding queries and as much as possible without business code intrusion to achieve high-performance sharding query aggregation.
Next, I will show you the only fully automatic routing, multi-field fragmentation, code-free intrusion, high-performance sequential query framework fragmentation function in the traditional database field under dotnet , If you have used it I believe you will fall in love with him.
The first step to install dependencies
# ShardingCore core framework version 6.4.2.4 +
PM> Install-Package ShardingCore
# The database driver here chooses the community driver of mysql, the latest version of efcore6 is enough
PM> Install-Package Pomelo.EntityFrameworkCore.MySql
The second step is to add objects and context
Many friends have asked me if I must use fluentapi to use ShardingCore, it’s just a personal preference, and here I use dbset + attribute to achieve
//article table
[Table(nameof(Article))]
public class Article
{
    [MaxLength(128)]
    [Key]
    public string Id { get; set; }
    [MaxLength(128)]
    [Required]
    public string Title { get; set; }
    [MaxLength(256)]
    [Required]
    public string Content { get; set; }
    
    public DateTime PublishTime { get; set; }
}
context
public class MyDbContext:AbstractShardingDbContext, IShardingTableDbContext
{
    public MyDbContext(DbContextOptions<MyDbContext> options) : base(options)
    {
Add methods that will cause the model of efcore to be loaded in advance, such as Database.xxxx
    }

    public IRouteTail RouteTail { get; set; }
    
    public DbSet<Article> Articles { get; set; }
}
The third step is to add article routing
public class ArticleRoute:AbstractSimpleShardingDayKeyDateTimeVirtualTableRoute<Article>
{
    public override void Configure(EntityMetadataTableBuilder<Article> builder)
    {
        builder.ShardingProperty(o => o.PublishTime);
    }

    public override bool AutoCreateTableByTime()
    {
        return true;
    }

    public override DateTime GetBeginTime()
    {
        return new DateTime(2022, 3, 1);
    }
}
So far basically Article has supported the score table by day
The fourth step is to add query configuration to let the framework know that we are sequentially splitting tables and defining the order of splitting tables
public class TailDayReverseComparer : IComparer<string>
{
    public int Compare(string? x, string? y)
    {
        //The program uses positive order by default, that is, it is sorted in positive order by time. We need to use reverse order, so we can directly call the original comparator and multiply by negative one.
        return Comparer<string>.Default.Compare(x, y) * -1;
    }
}
//The review condition that the current query satisfies must be the query of a single shard object, which can be joined to a common non-shard table
public class ArticleEntityQueryConfiguration: IEntityQueryConfiguration<Article>
{
    public void Configure(EntityQueryBuilder<Article> builder)
    {
        / / Set the default framework for the sort order of Article, here is the reverse order
        builder.ShardingTailComparer(new TailDayReverseComparer());
        The following settings have the same effect as the above, so that the framework really uses reverse order for Article suffix sorting
        //builder.ShardingTailComparer(Comparer<string>.Default, false);
        
        // Briefly explain the meaning of the following configuration
        //Which attribute of the first parameter table name Article is sorted in order is the same as Tail sorted by day. PublishTime is used here
        //The second parameter indicates whether the attribute PublishTime asc is consistent with the ShardingTailComparer configured above, and true means it is consistent. Obviously, this is the opposite because the tail sorting is reversed by default.
        //The third parameter indicates whether it is an Article attribute, and it is also possible to set the same name here, because considering the select of the anonymous object
        builder.AddOrder(o => o.PublishTime, false,SeqOrderMatchEnum.Owner|SeqOrderMatchEnum.Named);
        //The id used here for demonstration is a simple time format, so it is the same as the time configuration
        builder.AddOrder(o => o.Id, false,SeqOrderMatchEnum.Owner|SeqOrderMatchEnum.Named);
        //Set here what kind of sorting method to use if the query does not have the order with the above configuration by default
        //The first parameter indicates whether it is the same as the configuration of ShardingTailComparer. The current configuration is in reverse order, that is, the query starts from the latest time. If it is false, the query starts from the earliest time
        //The fuse is configured later, that is, the fuse condition is reviewed, such as FirstOrDefault, which can be blown only if one is satisfied
        builder.AddDefaultSequenceQueryTrip(true, CircuitBreakerMethodNameEnum.Enumerator, CircuitBreakerMethodNameEnum.FirstOrDefault);

        //The configuration here is the limit on the number of connections opened by default when using the sequential query configuration. The startup can set a default number of threads that is the current cpu at the beginning. This is optimized to only need one thread. Of course, if Cross-table, then it is serial execution
        builder.AddConnectionsLimit(1, LimitMethodNameEnum.Enumerator, LimitMethodNameEnum.FirstOrDefault);
    }
}
Step 5 Add configuration to route
public class ArticleRoute:AbstractSimpleShardingDayKeyDateTimeVirtualTableRoute<Article>
{
    //Omit.....
    public override IEntityQueryConfiguration<Article> CreateEntityQueryConfiguration()
    {
        return new ArticleEntityQueryConfiguration();
    }
}
Step 6 startup configuration
var builder = WebApplication. CreateBuilder(args);

// Add services to the container.
ILoggerFactory efLogger = LoggerFactory. Create(builder =>
{
    builder.AddFilter((category, level) => category == DbLoggerCategory.Database.Command.Name & amp; & amp; level == LogLevel.Information).AddConsole();
});
builder.Services.AddControllers();
builder.Services.AddShardingDbContext<MyDbContext>()
    .AddEntityConfig(o =>
    {
        o.CreateShardingTableOnStart = true;
        o.EnsureCreatedWithOutShardingTable = true;
        o.AddShardingTableRoute<ArticleRoute>();
    })
    .AddConfig(o =>
    {
        o.ConfigId = "c1";
        o.UseShardingQuery((conStr, b) =>
        {
            b.UseMySql(conStr, new MySqlServerVersion(new Version())).UseLoggerFactory(efLogger);
        });
        o.UseShardingTransaction((conn, b) =>
        {
            b.UseMySql(conn, new MySqlServerVersion(new Version())).UseLoggerFactory(efLogger);
        });
        o.AddDefaultDataSource("ds0", "server=127.0.0.1;port=3306;database=ShardingWaterfallDB;userid=root;password=root;");
        o.ReplaceTableEnsureManager(sp => new MySqlTableEnsureManager<MyDbContext>());
    }).EnsureConfig();

var app = builder. Build();

app.Services.GetRequiredService<IShardingBootstrapper>().Start();
using (var scope = app. Services. CreateScope())
{
    var myDbContext = scope. ServiceProvider. GetRequiredService<MyDbContext>();
    if (!myDbContext. Articles. Any())
    {
        List<Article> articles = new List<Article>();
        var beginTime = new DateTime(2022, 3, 1, 1, 1,1);
        for (int i = 0; i < 70; i ++ )
        {
            var article = new Article();
            article.Id = beginTime.ToString("yyyyMMddHHmmss");
            article.Title = "Title" + i;
            article.Content = "Content" + i;
            article.PublishTime = beginTime;
            articles. Add(article);
            beginTime = beginTime.AddHours(2).AddMinutes(3).AddSeconds(4);
        }
        myDbContext. AddRange(articles);
        myDbContext. SaveChanges();
    }
}
app. MapControllers();

app.Run();
The seventh step is to write the query expression
public async Task<IActionResult> Waterfall([FromQuery] string lastId,[FromQuery]int take)
{
    Console.WriteLine($"-----------start query, lastId:[{lastId}],take:[{take}]-----------");
    var list = await _myDbContext.Articles.WhereIf(o => String.Compare(o.Id, lastId) < 0,!string.IsNullOrWhiteSpace(lastId)).Take(take)..OrderByDescending(o => o.PublishTime) ToListAsync();
    return Ok(list);
}
Run the program 
Because the 07 table does not exist, this query will query the 07 and 06 tables, and then we will perform the next pagination and pass in the last id
 
Because there is no rule for fragment routing of Article.Id, there is no way to filter the id, then we configure the fragmentation rules of Id
First write code for ArticleRoute
public class ArticleRoute:AbstractSimpleShardingDayKeyDateTimeVirtualTableRoute<Article>
{
    public override void Configure(EntityMetadataTableBuilder<Article> builder)
    {
        builder.ShardingProperty(o => o.PublishTime);
        builder.ShardingExtraProperty(o => o.Id);
    }

    public override bool AutoCreateTableByTime()
    {
        return true;
    }

    public override DateTime GetBeginTime()
    {
        return new DateTime(2022, 3, 1);
    }

    public override IEntityQueryConfiguration<Article> CreateEntityQueryConfiguration()
    {
        return new ArticleEntityQueryConfiguration();
    }

    public override Expression<Func<string, bool>> GetExtraRouteFilter(object shardingKey, ShardingOperatorEnum shardingOperator, string shardingPropertyName)
    {
        switch (shardingPropertyName)
        {
            case nameof(Article.Id): return GetArticleIdRouteFilter(shardingKey, shardingOperator);
        }

      return base.GetExtraRouteFilter(shardingKey, shardingOperator, shardingPropertyName);
    }
    /// <summary>
    /// The route of the article id
    /// </summary>
    /// <param name="shardingKey"></param>
    /// <param name="shardingOperator"></param>
    /// <returns></returns>
    private Expression<Func<string, bool>> GetArticleIdRouteFilter(object shardingKey,
        ShardingOperatorEnum shardingOperator)
    {
        //Convert the sub-table field into an order number
        var id = shardingKey?.ToString()  string.Empty;
        //Determine whether the order number is in the format we meet
        if (!CheckArticleId(id, out var orderTime))
        {
            //If the format is different, return false directly, then this query will not go through any routing because it is linked by and, which can effectively prevent malicious attacks
            return tail => false;
        }

        //Tail at the current time
        var currentTail = TimeFormatToTail(orderTime);
        //Because the table is divided by month, get the time of the next month to judge whether the id is created at the critical point
        //var nextMonthFirstDay = ShardingCoreHelper.GetNextMonthFirstDay(DateTime.Now);//This is wrong
        var nextMonthFirstDay = ShardingCoreHelper. GetNextMonthFirstDay(orderTime);
        if (orderTime. AddSeconds(10) > nextMonthFirstDay)
        {
            var nextTail = TimeFormatToTail(nextMonthFirstDay);
            return DoArticleIdFilter(shardingOperator, orderTime, currentTail, nextTail);
        }
        //Because the table is divided by month, get the time at the beginning of this month to judge whether the id is created at the critical point
        //if (orderTime.AddSeconds(-10) < ShardingCoreHelper.GetCurrentMonthFirstDay(DateTime.Now))//This is wrong
        if (orderTime.AddSeconds(-10) < ShardingCoreHelper.GetCurrentMonthFirstDay(orderTime))
        {
            //Last month tail
            var previewTail = TimeFormatToTail(orderTime. AddSeconds(-10));

            return DoArticleIdFilter(shardingOperator, orderTime, previewTail, currentTail);
        }

        return DoArticleIdFilter(shardingOperator, orderTime, currentTail, currentTail);

    }

    private Expression<Func<string, bool>> DoArticleIdFilter(ShardingOperatorEnum shardingOperator, DateTime shardingKey, string minTail, string maxTail)
    {
        switch (shardingOperator)
        {
            case ShardingOperatorEnum. GreaterThan:
            case ShardingOperatorEnum. GreaterThanOrEqual:
                {
                    return tail => String.Compare(tail, minTail, StringComparison.Ordinal) >= 0;
                }

            case ShardingOperatorEnum. LessThan:
                {
                    var currentMonth = ShardingCoreHelper. GetCurrentMonthFirstDay(shardingKey);
                    //At critical value o=>o.time < [2021-01-01 00:00:00] tail 20210101 should not be returned
                    if (currentMonth == shardingKey)
                        return tail => String.Compare(tail, maxTail, StringComparison.Ordinal) < 0;
                    return tail => String.Compare(tail, maxTail, StringComparison.Ordinal) <= 0;
                }
            case ShardingOperatorEnum.LessThanOrEqual:
                return tail => String.Compare(tail, maxTail, StringComparison.Ordinal) <= 0;
            case ShardingOperatorEnum.Equal:
                {
                    var isSame = minTail == maxTail;
                    if (isSame)
                    {
                        return tail => tail == minTail;
                    }
                    else
                    {
                        return tail => tail == minTail || tail == maxTail;
                    }
                }
            default:
                {
                    return tail => true;
                }
        }
    }

    private bool CheckArticleId(string orderNo, out DateTime orderTime)
    {
        //yyyyMMddHHmmss
        if (orderNo. Length == 14)
        {
            if (DateTime.TryParseExact(orderNo, "yyyyMMddHHmmss", CultureInfo.InvariantCulture,
                    DateTimeStyles. None, out var parseDateTime))
            {
                orderTime = parseDateTime;
                return true;
            }
        }

        orderTime = DateTime. MinValue;
        return false;
    }
}
Complete routing: Multi-field fragmentation for Id and support greater than less than sorting
The above is the optimization of multi-field sharding. For more details, you can click here for the blog. You have to look at the sub-table and sub-database solution under .Net-multi-field sharding
Then we continue to query to see the results
 
The same goes for the third page
 

DEMO:https://github.com/xuejmnet/ShardingWaterfallApp

Summary
Although the current framework is a very young framework, I believe that my performance optimization in the field of sharding should not find the second one under all the existing frameworks of .net, and the whole framework also supports union all aggregation, which can It satisfies the query of special statements listed in group + first, and has high performance. It is not only a fully automatic sharding but also a high-performance framework with many features and performance. The goal is to squeeze out the last performance of client sharding. .
at last
As a dotnet programmer, I believe that our previous shard selection scheme did not have a good shard selection except for mycat and shardingsphere-proxy, but I believe that through Analysis of the principle of ShardingCore, you can not only understand the knowledge points of sharding under big data, but also participate in it or implement one by yourself. I believe that only when you understand the principle of sharding, dotnet will have more For good talents and future, we need not only elegant packaging, but also an understanding of the principles.
I believe that the ecology of dotnet will gradually improve in the future with this almost perfect grammar

Reposted from: Xue Jiaming
Link: cnblogs.com/xuejiaming/p/15966501.html

-

Technical group: Add Xiaobian WeChat and comment into the group
Editor WeChat: mm1552923
Public number: dotNet Programming Daquan