C# – Equality and Identity – Seems like a slightly flawed design

Concept: Equality and Identity

Before we begin, we need to clarify two concepts first

Equality: Also known as value equality, it indicates that two items are equal in value under a certain comparison rule. Equality only considers whether the values are equal. For example, if the values of two integer variables a and b are both 1, they are equal even though they are two variables.

Identity: Two items are essentially the same item. For example, suppose you take two photos of your cat in the bedroom and the living room. Although the cats in the two photos may have different shapes and different locations, they are the same cat, which means they are identical.

The actual determination logic of equality depends on actual needs, so generally speaking, we have a larger operating space for equality determination. However, the determination of equality should follow the following principles (= sign indicates equality determination):

1. Reflexivity: self = self

2. Symmetry: A=B and B=A return the same value

3. Transitivity: If A=B, B=C, then A=C

4. Consistency: If A remains unchanged and B remains unchanged, then A=B remains unchanged

The principle of identity determination is clearly used to determine whether two items are the same item. In most programming languages, this determination is reflected in indicating whether two references point to the same object. For this reason, we have little room for maneuver in determining identity (of course, this is reasonable).

Another thing to note is that two objects that are equal may not necessarily have the same identity, but two objects that have the same identity at the same time must be equal.

Equality and identity in C#

Although usually we should only need one method to determine equality and one method to determine identity, this can not only reduce the workload of class designers, but also reduce coding errors. However, what is interesting is that C# provides a variety of commonly used comparison methods for this type of comparison determination:

  • == and != operators
  • Equals method of object class
  • Equals static method of object class
  • ReferenceEquals static method of object class
  • IEquatable<>Equals method of generic interface
  • GetHashCode method of object class
  • is operator

For C#, the comparison method of equality and identity is often a design choice. What this means here is that whether a comparison behavior is implemented as a comparison of equality or identity often depends on the design of the class itself. For example, even in general, some language enthusiasts may be inclined to think that the == operator compares identity, while the Equals method compares equality. However, since C# allows operator overloading and method rewriting, if the designer of the class is willing, the == operator can be overloaded to achieve comparative equality (for example, C#’s string type overloads == to implement equality. Comparison, so in C# you can use the == symbol to determine whether two strings are equal), or use the Equals method as an implementation to compare identity (promise me, don’t do this).

C# provides a variety of commonly used comparison and determination methods, which provides developers with considerable freedom, but freedom also means that if some common specifications are not followed, the design of the class will become chaotic. This article will introduce the comparison methods listed above one by one and provide some personal suggestions for use.

Starting with examples, how to determine equality and identity

Before we begin, let us first classify the judgment methods mentioned above. Here they are divided into 4 categories:

Equality comparison Identity comparison Equality or identity comparison Special comparison
object Equals method Object’s ReferenceEquals static method == operator is operator
object Equals static method != operator object’s GetHashCode method
IEquatable<> Equals method of generic interface

4.1 Equality comparison

4.1.1 Equals method of object

(1) Basic information

The Equals method is defined in the object class, and its method is declared as follows:

public virtual bool Equals(Object? obj);

As can be seen from its method name, the Equals method should be defined to compare equality. This method accepts a parameter of type Object and returns the comparison result of equality. However, although the Equals method is conceptually used to compare equality, the default implementation of the Equals method is to compare the identity of the two. That is, by default, it will only determine whether the two references point to the same object. , as shown in the figure below

class Cat
{
    public string? CatID { get; set; }
}

Cat cat1 = new Cat();
Cat cat2 = new Cat();

Console.WriteLine(cat1.Equals(cat2)); // The output is False

Therefore, if you want to implement equality comparison correctly, you should override the Equals method.

(2) Basic use

Now suppose we expect that two Cat objects are equal as long as their CatIDs are equal. So obviously the default Equals method cannot meet our needs. Fortunately, Equals is a virtual method modified by virtual, which means that it can simply be overridden by its subclasses. And don’t forget that since object is the base class of all types, all custom types can override this method. Just like the picture below.

class Cat
{
    public string? CatID { get; set; }

    public override bool Equals(object? obj)
    {
        return this.CatID == ((Cat)obj).CatID;
    }
}

Cat cat1 = new Cat();
Cat cat2 = new Cat();

Console.WriteLine(cat1.Equals(cat2)); // The output is True

Of course, the above implementation lacks robustness. For example, what if a null parameter is passed in? Or the parameters cannot be converted to Cat type? Obviously the above implementation will throw an exception at this time. However, from a practical point of view, there is almost no reason for an equality method to throw an exception – either the two objects are equal or not, and throwing an exception has almost no meaning to the program flow. Therefore, the implementation of the Equals method may be more complicated than you think, but it is not too complicated:

  1. If the parameter obj is null, return false directly.
  2. If the parameter obj has the same identity as the caller, return true directly.
  3. If the type of parameter obj is inconsistent with the target type, return false directly.
  4. For other equality comparisons that need to be performed based on business requirements, you may need to call the Equals method of the base class.

Based on the above process, a better rewriting method should be as follows:

class Cat
{
    public string? CatID { get; set; }

    public override bool Equals(object? obj)
    {
        // If the parameter obj is null, return false directly
        if (obj == null)
        {
            return false;
        }
        // If the parameter obj has the same identity as the caller, return true directly.
        if (ReferenceEquals(this, obj))
        {
            return true;
        }
        // If the type of parameter obj is inconsistent with the target type, return false directly.
        if (this.GetType() != obj.GetType())
        {
            return false;
        }
        // As long as the CatIDs of the two objects are equal, they are considered to be equal.
        return this.CatID == ((Cat)obj).CatID;
    }
}

Although this implementation looks more complicated than the original, in fact the first three steps have nothing to do with the type itself, so they can be used universally. In addition, you may notice that ReferenceEquals is used in the above example for identity determination, which will be mentioned later.

(3) Other questions

It should be noted that the ValueType class overrides the Equals method. Its comparison method is to determine whether two ValueTypes are equal by comparing whether the values of each field are equal. In other words, the Equals method of the type inherited from ValueType actually performs equality. judge. For example, struct type:

struct Point
{
    public float X;
    public float Y;
}

Point p1 = new Point();
Point p2 = new Point();
p1.Equals(p2); // True, p1 and p2 are equal

However, this does not mean that when defined as a struct, there is no need to consider overriding the Equals method to ensure equality determination. In fact, since value types are often used in places with performance requirements, and the default implementation of ValueType needs to consider common conditions, but this This means that its implementation is often inefficient for specific types of implementations, so it is still necessary to manually override the Equals method to avoid unnecessary reflection operations.

(4) Defects

In fact, it is mentioned in “CLR via C#” that if Equals can use the following default implementation:

public virtual bool Equals(object? obj)
{
    if (obj == null) return false;
    if (ReferenceEquals(this, obj)) return true;
    if (this.GetType() != obj.GetType()) return false;
    return true;
}

Then it will be much more convenient when subclasses rewrite Equals. For example, almost all Equals overrides can be defined as follows:

public override bool Equals(object? obj)
{
    if (base.Equals(obj))
    {
        // Equality comparison required according to business requirements
    }
    return false;
}

From this perspective, the current default implementation of Equals is indeed flawed.

4.1.2 Equals static method of object

(1) Basic information

In the object base class, in addition to the Equals method for instances, there is also a static version of the Equals method, whose method is declared as follows:

public static bool Equals(object? objA, object? objB);

Obviously, this method can effectively avoid exceptions caused when the object to be compared is null. At the same time, the final determination of this method depends on the implementation of the instance version of Equals.

(2) Basic use

In the instance Equals method, if the call is successful, the caller must not be null, so we do not need to consider the caller being null in the instance Equals method. But outside of the Equals method, we sometimes do need to consider the situation where the caller is null. A common approach is to check the caller for null before calling. For example, it is written as follows:

if (a != null & amp; & amp; a.Equals(b))
{
    // do something
}

However, using the static Equals method can reduce unnecessary null operations and simplify coding, as follows:

if (Equals(a, b))
{
    // do something
}

The Equals static method has fewer applicable occasions and is usually used to simplify coding when the caller needs to be nulled. In addition, if it is necessary to explain, if the two parameters passed to the Equals static method are both null, Equals will also return true.

4.1.3 IEquatable<>Equals method of generic interface

(1) Basic information

The IEquatable<> generic interface is used to indicate that the implementation type can perform type-specific equality comparisons. The definition of this interface is very simple, and only agrees on an Equals method that accepts a type as its generic parameter. Its interface is defined as follows:

public interface IEquatable<T>
{
    bool Equals(T? other);
}

Compared with the Equals method of object, this interface more clearly states that its implementation type can use the Equals method of the interface for equality comparison. At the same time, unlike the Equals of object, which uses parameters of the object type, the Equals method of the IEquatable<> interface Parameter types are specialized types, so type conversions can be reduced, resulting in better performance.

(2) Basic use

The Equals method of the IEquatable<> interface should behave similarly to the Equals method of object, but now there is no need to consider type-related issues, so it can be written as follows. Similarly, here is the Cat class used in Equals of object as an example:

class Cat : IEquatable<Cat>
{
    public string? CatID { get; set; }

    public bool Equals(Cat? other)
    {
        // If the parameter other is null, return false directly.
        if (other == null)
        {
            return false;
        }
        // If the parameter other is the same as the caller, return true directly.
        if (ReferenceEquals(this, other))
        {
            return true;
        }
        // As long as the CatIDs of the two objects are equal, they are considered to be equal.
        return this.CatID == other.CatID;
    }
}

(3) Suggestions

Overriding the Equals method of object and implementing the IEquatable<> interface should be done at the same time. This work is not difficult. After implementing one, the other can be implemented through a simple call, but a more general type can be created. A possible example is as follows:

class Cat : IEquatable<Cat>
{
    public string? CatID { get; set; }

    public bool Equals(Cat? other)
    {
        if (other == null)
        {
            return false;
        }
        if (ReferenceEquals(this, other))
        {
            return true;
        }
        return this.CatID == other.CatID;
    }

    public override bool Equals(object? obj)
    {
        return Equals(obj as Cat);
    }
}

4.2 Identity comparison

4.2.1 ReferenceEquals static method of object

(1) Basic information

Although the default implementation of the Equals method performs identity comparisons, since the Equals method can be overridden and semantically should be used for equality comparisons, the Equals method should not be relied upon to perform identity comparisons (the same goes for the == operator symbol). To perform reliable identity comparison, you should use other methods. Fortunately, there is only one common way to perform identity comparison in C#, which is the ReferenceEquals static method (although in fact, its implementation relies on the == operator), The method prototype is as follows:

public static bool ReferenceEquals(object? objA, object? objB);

Returns true if objA and objB refer to the same object.

(2) Basic use

This method is very simple to use. You only need to pass in the two parameters that need to be determined for identity. The example is as follows:

object a = new object();
object b = new object();
Console.WriteLine(ReferenceEquals(a, b)); // False

a = b; // Now let a and b point to the same object
Console.WriteLine(ReferenceEquals(a, b)); // True

(3) Principle

In fact, the implementation of the ReferenceEquals method is very simple, and its implementation is similar to the following:

public static bool ReferenceEquals(object? objA, object? objB)
{
    return objA == objB;
}

This method simply returns the result of using the == operator on the parameters. The reason why it is effective is that the two parameter types of this method are both object, and the default implementation of the == operator by object is to perform identity comparison. Based on this principle, identity determination can also be made as follows:

if ((object)a == (object)b)
{
    //do something
}

Of course, this is not recommended, because the semantics of using ReferenceEquals are obviously clearer.

4.3 Equality or identity comparison

4.3.1 ==Operator

(1) Basic information

The == operator is one of the commonly used binary logical operators. However, compared with the Equals method and the ReferenceEquals static method, which have clear semantics, the == operator cannot simply clarify whether it performs equality comparison or identity. sexual comparison. Although in practice, many times we prefer to use it for equality comparison, such as:

1 == 1; // True
2 == 3; // False
"Cat" == "Cat" // True

In fact, for numerical primitive types such as int and double, the == operator performs an equality judgment; for class reference types, it performs an identity judgment; for struct value types, it depends on the definition (actually , can only be equality, just how to compare equality).

Not only that, because C# allows operator overloading, the actual behavior of the == operator can be modified. For example, the following definition modifies the behavior of the == operator when used for Cat class comparisons, allowing it to perform equality comparisons (compare CatID value) instead of the default identity comparison:

public static bool operator ==(Cat left, Cat right)
{
    return left.CatID == right.CatID;
}

Based on the above reasons, relying on == to determine equality or identity is not completely reliable. But it is controllable, that is, as long as the definition can be determined, the result of the == operator is predictable. The == operator can make the program more readable, and it is worthwhile to use it in a standardized way.

(2) Basic use

As mentioned before, the actual performance of the == operator depends on the type properties and operator overloading. In fact, its performance is as follows:

  • For numerical primitive types such as int and double: Equality Determination
  • For the string primitive type: Equality determination (string is a reference type that is treated specially)
  • For object primitive type: Identity determination
  • For custom classes: Identity Determination
  • For custom struct: Depends on definition

Since the definition of primitive types cannot be modified, it can be considered that the == operator’s determination of equality and identity is reliable and stable, and will not be discussed here. The following mainly explains the == operator in custom classes and custom struct types.

1. In a custom class

For custom classes, the == operator performs identity judgment by default, which is as follows:

class Cat
{
    public string? CatID { get; set; }
}

Cat cat1 = new Cat();
Cat cat2 = new Cat();
cat1 == cat2; // False, because the == operator compares for identity by default

cat1 = cat2; // Now let cat1 and cat2 point to the same object
Cat2 == cat2; // True

As long as there is no overloaded == operator in this type definition, the comparison results of this type using == will have the above behavior. But sometimes we may hope that the == operator can provide equality judgment, and then we can modify the comparison behavior by overloading the operator. For example, if we hope that two Cat objects will have equality as long as their CatIDs are the same, then you can:

class Cat
{
    public string? CatID { get; set; }

    public bool operator ==(Cat left, Cat right)
    {
        return left.CatID == right.CatID;
    }
}

Cat cat1 = new Cat();
Cat cat2 = new Cat();
Cat2 == cat2; // True, the result of the == operator only depends on comparing the value of CatID

2. In custom struct

If the == operator is not manually overloaded, the compiler will show that the operator definition cannot be found, and the struct will not be able to use the == operator. For example, the following code will report an error:

struct Cat
{
    public string? CatID { get; set; }
}

Cat cat1 = new Cat();
Cat cat2 = new Cat();
Cat2 == cat2; // Error, == operator is not defined

Therefore, if you want the Cat type to be able to use the == operator for comparison operations, please overload the == operator:

struct Cat
{
    public string? CatID { get; set; }

    public bool operator ==(Cat left, Cat right)
    {
        return left.CatID == right.CatID;
    }
}

(4): Suggestions

Personal suggestion, unless there is a sufficiently convincing reason (an example is the string type), if you want to make an equality judgment on a class type, you should first use the Equals method (including the Equals method of IEquatable<>). The == operator of class types should not be overloaded, and == should maintain its default behavior, which is identity determination.

For value types, the == operator should be overloaded and the IEquatable<> interface should be implemented to provide better equality determination support. (It is also recommended to override the Equals method of object, but please avoid manually using the Equals method of object to determine equality on value types as much as possible, otherwise it will incur additional boxing and unboxing costs. The main purpose of rewriting it is to try as much as possible Avoid reflection operations in Equals, which is overridden in its base class ValueType.)

4.3.2 != operator

(1) Basic information

!= is the inverse operation of the == operator, so you can refer to the == operator column for understanding, and will not be repeated here. The only thing to say here is that the == operator must be overloaded in pairs with the != operator, that is, if one of the two is overloaded, the other must be overloaded at the same time. Fortunately, usually as long as the == operator is overloaded, the != operator can be easily overloaded, as follows:

class Cat
{
    public string? CatID { get; set; }

    public bool operator ==(Cat left, Cat right)
    {
        return left.CatID == right.CatID;
    }

    public bool operator !=(Cat left, Cat right)
    {
        return !(left == right);
    }
}

4.4 Special comparison

4.4.1 is operator – null test

(1) Basic information

The earliest function of the is operator is for type determination, that is, to determine whether the type is the target type or has an inheritance relationship, as follows:

class A {}
class B : A {}

object a = new A();
object b = new B();

a is A; // True
b is A; // True

But now, the is operator can also be used for null processing, as follows:

if (a is null) { ... } // Similar to a == null

You may be curious why you don’t just use == to check for null, like the following:

if (a == null) { ... }

This is because you cannot tell the null result of the above code without knowing the type definition. This is because the == operator can be overloaded. For example, consider now the following code:

class Cat
{
    public static bool operator ==(Cat? left, Cat? right)
    {
        return false;
    }
}

Cat? cat = null;
if (cat == null)
{
    //do something
}

After a little thought, you will realize that because == is overloaded, the value of a == null in the above formula is always false. If the == in the above formula is changed to the is operator, this problem will not occur.

(2) Principle

is is syntactic sugar. The actual behavior of a is null in the above is equivalent to:

(object)a == null

In addition, in addition to using is for empty judgment, you can also use is not for non-empty judgment.

if (a is not null) { } // Equivalent to (object)a != null
4.4.2 GetHashCode – Unequal comparison

(1) Basic information

GetHashCode is a virtual method defined in the object class. Its method is declared as follows:

public virtual int GetHashCode();

The actual function of this method is to obtain the hash value of the object.

(2) Basic use

Although the GetHashCode method is used to obtain the hash value of an object rather than to judge equality or identity, please consider the requirements for general hash values:

  • If two objects are equal, their hash values should be the same
  • Conversely, two objects with the same hash value are not necessarily equal.

Based on the above: If the hash values of two objects can be different, it can at least be determined that they are not equal. Therefore, at some point, you can quickly determine whether two objects are equal by determining whether the hash values are different, for example:

if (a.GetHashCode() != b.GetHashCode())
{
    // a and b are not equal
}

Of course, the reliability of this judgment method depends on the implementation of the hash function, and it is only recommended when the consequences can be determined and necessary.

5. Summary

Since C# provides a variety of comparison judgment methods, it requires a certain amount of effort to correctly implement reliable comparison judgment. Here are some summary suggestions simply combined with coding standards and practices.

1. For equality comparison, use the Equals method (as opposed to its static version)

a.Equals(b);
Equals(a, b);

2. To perform identity comparison, use the ReferenceEquals static method

ReferenceEquals(a, b);

3. To perform a null test, use the is operator

a is null; // Equivalent to (object)a == null
a is not null; // Equivalent to (object)a != null

4. If you can determine the behavior of the == and != operators, you can use them to enhance readability

1 == 1;
"Cat" == "Cat";

5. If you override the Equals method of object, you should also override the GetHashCode method.

class Cat
{
    public override bool Equals(object? obj) { ... }
    public override int GetHashCode() { ... }
}

6. If the == operator is overloaded, you should overload the != operator and override the Equals method and GetHashCode method

class Cat
{
    public static bool operator ==(Cat left, Cat right) { ... }
    public static bool operator !=(Cat left, Cat right) { ... }
    public override bool Equals(object? obj) { ... }
    public override int GetHashCode() { ... }
}

7. If the type can be compared for equality, override the Equals method and implement the IEquatable<> interface.

class Cat : IEquatable<Cat>
{
    public bool Equals(Cat? other) { ... }
    public override bool Equals(object? obj) { ... }
    public override int GetHashCode() { ... }
}

8. Do not overload the == and != operators on class types, and let them maintain the default behavior for identity judgment.

9. For struct types, make sure to overload the == and != operators and implement the IEquatable<> interface. In other words, struct should fully implement equality comparison

struct Cat : IEquatable<Cat>
{
    public static bool operator ==(Cat left, Cat right) { ... }
    public static bool operator !=(Cat left, Cat right) { ... }
    public bool Equals(Cat? other) { ... }
    public override bool Equals(object? obj) { ... }
    public override int GetHashCode() { ... }
}

(If you think your struct type does not need to be compared for equality, please consider whether you really need to use the struct type)