677.Database EvaluationCloudwave 4.0 Standalone VS Starrocks 3.0 Standalone

1. Test purpose

  • In this article, Seven Mirror will conduct a performance comparison test on two MPP databases based on the Star Schema Benchmark standard test set to evaluate their performance in different scenarios.
  • Seven Mirrors will use some common performance indicators, such as response time, maximum CPU usage, etc., to compare the pros and cons of different databases.
  • The test goal of Seven Mirrors is to provide readers with a reference to help them choose a database that suits their needs.

2. Test environment

  • In this article, Seven Mirrors will use the following databases for testing:
    • Cloudwave 4.0: A domestic cloud-native data warehouse with top-level performance, currently known as the first domestic snowflake-like storage-computing separation architecture database (reformed from the bottom layer of traditional distributed databases accumulated over the past 10 years, supporting complete storage-computing separation, 2021 published in September).
    • Starrocks 3.0: A new generation of extremely fast full-scenario MPP (Massively Parallel Processing) database.
  • The test environment of Qijing is as follows:
    • Hardware configuration: CPU 64 cores, memory 256GB, 1TB capacity Alibaba Cloud ESSD pl1 cloud disk
    • Operating system: Linux CentOS 7.6
    • Test tools: test scripts, analysis scripts, sql scripts (see attachment)
    • Test data: The data generated using the SSB benchmark test set, the total amount is about 30GB, 100GB

3. Test plan

  • Seven Mirrors will execute SSB 13 standard test SQLs, execute them on different databases respectively, and compare their response time and CPU resource consumption.
  • Seven Mirrors will be tested according to the following steps:
    • Step 1: Prepare data. Seven Mirrors will use the SSB tool to generate data and import it into different databases.
    • Step 2: Design test scripts. Seven Mirrors will design corresponding test scripts according to the characteristics of different databases.
    • Step 3: Execute the test. Seven Mirrors will execute beta scripts and perform tests on different databases.
    • Step 4: Analyze the results. Seven Mirrors will collect and organize the results, and compare the performance indicators of different databases.

4. Test results

  • In this section, Seven Mirror will present and analyze the test results of Seven Mirror. Seven Mirrors will use tables and charts to show the performance indicators of different databases in different scenarios.
  • Table 1: Query performance test results
database dataset Response time (ms) CPU maximum usage
Cloudwave 4.0 ssb30 779 1453%/ 6400%
Starrocks 3.0 ssb30 1182 3291%/6400%
Cloudwave 4.0 ssb100 1767 4143%/6400%
Starrocks 3.0 ssb100 3054 4462%/6400%
  • Figure 1: Query performance test results

  • Analysis: The overall performance of the Cloudwave 4.0 stand-alone version is nearly 0.5 times better than that of the Starrocks 3.0 stand-alone version under the 30g Star Schema Benchmark standard test set
  • Figure 2: Query performance test results

  • Analysis: The overall performance of the Cloudwave 4.0 stand-alone version is nearly 0.7 times better than that of the Starrocks 3.0 stand-alone version under the 100g Star Schema Benchmark standard test set

5. Summary and suggestions

  • In this article, Seven Mirrors conducted a performance comparison test on Cloudwave and Starrocks databases, and got some valuable conclusions.
  • Overall, Cloudwave outperforms Starrocks on the SSB standard test set
  • According to the test results of Qijing, Qijing gives the following suggestions:
    • For scenarios that pursue extreme performance, especially olap multidimensional analysis scenarios (Since Star Schema Benchmark is a star schema test set widely used in academia and industry, this test set can easily compare the basic performance indicators of various OLAP products ), it is recommended to use the Cloudwave database, because the performance of the Cloudwave database in this area is really excellent;
    • For regular scenarios, it is recommended to use the Starrocks database, because the Starrocks database officially claims to be extremely fast in all scenarios, and there are also use cases for different scenarios;
  • Of course, the test of Seven Mirrors is not perfect, and there are still some limitations and deficiencies. For example:
    • Seven Mirrors only used one data set and one tool for testing, which may not cover all possible situations.
    • Seven Mirrors only considered some common performance indicators, and did not involve other aspects, such as security, usability, scalability, etc.
    • Seven Mirrors did not take into account issues such as compatibility and migration costs between different databases.
  • Therefore, when actually selecting a database, you need to make comprehensive consideration and evaluation according to your specific needs and scenarios.

6. Additional

  1. Cloudwave test script
#!/bin/bash
# Program:
# test ssb
# History:
# 2023/03/17 [email protected] version:0.0.1

rm -rf ./n*txt
for ((i=1; i<20; i ++ ))
do

    cat sql_ssb.sql |./cplus.sh > n${i}.txt

done

  1. Starrocks test script
#!/bin/bash
# Program:
# test ssb
# History:
# 2023/03/17 [email protected] version:0.0.1

rm -rf ./n*txt
for ((i=1; i<20; i ++ ))
do

    cat sql_ssb.sql | mysql -uroot -P 9030 -h 127.0.0.1 -v -vv -vvv >n${i}.txt

done

  1. analysis script
#!/bin/bash
#Program:
# analysis cloudwave/starrocks logs of base compute
#History:
#2023/02/20 [email protected] version:0.0.1

path=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:~/bin
export path



suff="(s)#####"

if [ -z "${1}" ]
then

        echo "Please input database'name"
        exit -1

the fi

if [ -z "$2" ]
then

        echo "Please input times of scanner"
        exit -f
the fi

if [ -n "${3}" ]
then
        suff=${3}
the fi

for current in ${2}
do
        result_time=""

        if [ "${1}" == "starrocks" ]
        then
            for time in $( cat ${current} | grep sec | awk -F '(' '{print $2}' | awk -F ' ' '{print $1}' )
            do
                result_time="${result_time}${time}${suff}"
            done
        elif [ "${1}" == "cloudwave" ]
        then
            for time in $( cat ${current} | grep Elapsed | awk '{print $2}'| sed 's/:/*60 + /g'| sed 's/ + 00\*60 //g ; s/ + 0\*60//g ; s/^0\*60 + //g' )
            do
                result_time="${result_time}${time}${suff}"
            done
        the fi

        echo ${result_time%${suff}*}

done


exit 0
  1. sql_ssb.sql
use ssb100;
select sum(lo_revenue) as revenue from lineorder, dates where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
select sum(lo_revenue) as revenue from lineorder, dates where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35;
select sum(lo_revenue) as revenue from lineorder, dates where lo_orderdate = d_datekey and d_weeknuminyear = 6 and d_year = 1994 and lo_discount between 5 and 7 and lo_quantity between 26 and 35;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder ,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERIC A' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand between 'MFGR#2221' and 'MFGR#22 28' and s_region = 'ASIA' group by d_year, p_brand order by d_year, p_brand;
select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand = 'MFGR#2239' and s_region = 'EUROPE 'group by d_year, p_brand order by d_year, p_brand;
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_region = 'ASIA' and s_region = 'AS IA'and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES\ ' and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city= 'UNITED KI5 ') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city= 'UNITED KI5 ') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, lo_revenue desc;
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation;
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_part key and c_region = 'AMERICA 'and s_region = 'AMERICA' and (d_year = 1997 or d_year = 1998) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category;
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA 'and s_nation = 'UNITED STATES' and (d_year = 1997 or d_year = 1998) and p_category = 'MFGR#14' group by d_year, s_city, p_brand order by d_year, s_city, p_brand;

Seven Mirror will also bring Cloudwave 4.0 cluster version VS Starrocks 3.0 cluster version on the 1T SSB data set evaluation.

Please add a picture description

Additional: If you want to experience Cloudwave 4.0 (or encounter installation and usage problems), you can reply on the official account: [Hanyun Database, 4.0] or [cloudwave4.0];

If you want to experience Starrocks3.0 (or encounter problems with installation and use), you can reply on the official account: [starrocks3.0]