Removing abnormal data from Arduino – Wright and Grubbs criteria

Directory

  • Wright criterion
    • Introduction
  • Grubbs criterion
    • Introduction
  • Arduino code implementation
  • References

Wright Criterion

Introduction

Wright’s criterion is a method for identifying outliers under normal distribution. The specific contents are as follows:
Suppose that in a series of equal-precision measurement results, the

i

i

i measured values

x

i

x_i

The residual corresponding to xi?

v

i

=

x

i

?

x

ˉ

v_i=x_i-\bar{x}

The absolute value of vi?=xixˉ satisfies

v

i

m

a

x

>

3

σ

x

ˉ

|v_i|_{max}>3\sigma_{\bar{x}}

∣vi?∣max?>3σxˉ?The error is a gross error, and the corresponding measurement value

x

i

x_i

xi? is an abnormal value and should be discarded.

where the standard deviation is estimated:

σ

=

1

n

?

1

i

=

1

n

v

i

2

(Bessel formula)

\sigma=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}v_i^2} (Bessel formula)

σ=n?11?i=1∑n?vi2?
? (Bessel’s formula) Wright criterion is easy to use and is more suitable for the number of measurements.

n

n

When n is larger

(

n

>

10

)

(n>10)

(n>10).

Grabbs Criterion

Introduction

The Grubbs criterion is a method for identifying outliers in normal samples or near-normal samples when the population standard deviation is unknown.
Residual error of a certain measurement

v

i

=

x

i

?

x

ˉ

>

T

0

(

n

,

α

)

v_i=x_i-\bar{x}>T_0(n,\alpha)

vi?=xixˉ>T0?(n,α), it is judged that this value contains gross errors and should be eliminated.

T

T

T value and number of repeated measurements

n

n

n and confidence probability

α

\alpha

α are related, so the Grubbs criterion is a better judgment criterion.

T

T

The T value is obtained by looking up the table.

The Grubbs criterion theory is more rigorous and has clear meaning of probability. It can be used in situations with strict requirements.

20

< n < 100 20 Schedule:

T

0

(

n

,

α

)

T_0(n,\alpha)

T0?(n,α) table

Arduino code implementation

//Error data elimination program, returns the average value of valid data
//The parameter data input is the original measurement data. When returning, the first datanum are valid data.
//The parameter baddata has no input data, and the output is the deleted data.
//The input parameter datanum is the number of original measurement data
//The parameter badnum has no input data, and the output is the number of deleted data.
//The parameter rule is the Wright or Grubbs criterion selection, 3 is Wright criterion, 4 is Grubbs 95%, 5 is Grubbs 99%, and less than 3 is a custom criterion.
double Detection(double data[], double baddata[], int datanum, int &badnum, int rule)
{<!-- -->
    double data_b[datanum]; // Temporarily store reserved data
    double v[datanum]; // Residual error
    double g95[] = {<!-- -->1.15, 1.46, 1.67, 1.82, 1.94, 2.03, 2.11, 2.18, 2.23, 2.29, 2.33, 2.37, 2.41, 2.44, 2.47, 2.50, 2.53, 2.56, 2. 58 , 2.60, 2.62, 2.64, 2.66, 2.74, 2.81, 2.87, 2.96, 3.17}; // Grubbs 95%
    double g99[] = {<!-- -->1.16, 1.49, 1.75, 1.94, 2.10, 2.22, 2.32, 2.41, 2.48, 2.55, 2.61, 2.66, 2.71, 2.75, 2.79, 2.82, 2.85, 2.88, 2. 91 , 2.94, 2.96, 2.99, 3.01, 3.10, 3.18, 3.24, 3.34, 3.58}; // Grubbs 99%
    double bsl; // Bessel formula result
    double maxdev; // maximum deviation from valid Wright or Grubbs criterion
    double sum; // Accumulate temporary storage
    double average; // average
    int badindex; //The number of deleted data at a certain time
    int validNum = 0; //Number of valid data
    int proindex = 0; //Number of loops
    double lg; // coefficient of Wright or Grubbs criterion
    int i;
    if (rule <= 3) // When rule is less than or equal to 3, directly use Wright coefficient 3 or a custom rule value
        lg = rule;
    else if (rule > 5) // When rule is greater than 5, it is forced to be Wright's criterion
        lg = 3;
    badnum = 0; //Initialize the number of bad data
                // Loop until the number of valid data is less than or equal to 5 or there is no bad data
    while (1)
    {<!-- -->
        //Select different Grubbs criteria based on rule value
        if (rule == 4) // Grubbs 95%
        {<!-- -->
            if (datanum >= 100)
                lg = g95[27]; // When the number of data is greater than 100
            else if (datanum >= 50)
                lg = g95[26];
            else if (datanum >= 40)
                lg = g95[25];
            else if (datanum >= 35)
                lg = g95[24];
            else if (datanum >= 30)
                lg = g95[23];
            else if (datanum >= 25) // When the number of data is greater than 25 but less than 30
                lg = g95[22];
            else // When the number of data is less than 25
                lg = g95[datanum - 3];
        }
        // When rule is 5, use Grubbs 99% criterion
        else if (rule == 5) // Grubbs 99%
        {<!-- -->
            if (datanum >= 100) // When the number of data is greater than 100
                lg = g99[27];
            else if (datanum >= 50)
                lg = g99[26];
            else if (datanum >= 40)
                lg = g99[25];
            else if (datanum >= 35)
                lg = g99[24];
            else if (datanum >= 30)
                lg = g99[23];
            else if (datanum >= 25) // When the number of data is greater than 25 but less than 30
                lg = g99[22];
            else // When the number of data is less than 25
                lg = g99[datanum - 3];
        }
        proindex + + ; // update loop times

        sum = 0;
        for (i = 0; i < datanum; i + + )
            sum + = data[i];
        average = sum / datanum; // Calculate the average

        sum = 0;
        for (i = 0; i < datanum; i + + )
        {<!-- -->
            v[i] = data[i] - average; // Calculate residuals
            sum + = v[i] * v[i]; // Calculate the sum of squares of the residuals
        }

        bsl = sqrt(sum / (datanum - 1)); // Calculate Bessel formula standard deviation
        maxdev = lg * bsl; // Calculate the maximum deviation

        // Eliminate bad values, that is, eliminate gross error data
        validNum = 0;
        badindex = 0;
        for (i = 0; i < datanum; i + + )
            if (fabs(v[i]) >= maxdev & amp; & amp; maxdev != 0) // When |Vi|> criterion deviation value
            {<!-- -->
                baddata[badnum + + ] = data[i]; // Use Xi as gross error data and put it into the bad data array
                badindex + + ;
            }
            else
                data_b[validNum + + ] = data[i]; // Otherwise, temporarily store the valid number data in the data_b array
        for (i = 0; i < validNum; i + + ) // Return the temporarily stored valid number data to the data array data
            data[i] = data_b[i];
        datanum = validNum; // Use the current number of valid data as the number of data
        // Determine whether the stopping condition is met
        if (datanum > 5) // If the valid data is greater than 5, continue processing
        {<!-- -->
            if (badindex == 0) // If there is no gross error data that can be eliminated
                break; // Jump out of the loop, that is, the gross error data is processed
        }
        else
            break; // If the valid data is less than or equal to 5, jump out of the loop directly
    }
    return average; // The subroutine returns the average of the valid data
}

Reference materials

[1] Arduino uses ultrasonic ranging module HC-SR04 to obtain accurate measurement values – elimination of error data. https://blog.csdn.net/m0_61543203/article/details/127185686
[2] Processing of Arduino measurement error data – Wright and Grubbs criteria to eliminate abnormal data. https://blog.csdn.net/m0_61543203/article/details/126780804
[3] Statistics What is Wright’s criterion? https://zhidao.baidu.com/question/144962833.html
[4] Abnormal data and deviation data processing principles. https://zhuanlan.zhihu.com/p/93855259