Quick calculation of cosine similarity of all sample pairs in PyTorch

PyTorch defines the cosine_similarity function to calculate the cosine similarity between pairs of vectors. However, there is currently no way to calculate the cosine similarity between each pair of vectors in the list. We’ll explore a very simple and effective way to do this in PyTorch.

Let’s first look at a formula, which is the mathematical formula for calculating cosine similarity:

cosine similarity

=

S

C

(

A

,

B

)

:

=

cos

?

(

θ

)

=

A

?

B

A

B

=

i

=

1

n

A

i

B

i

i

=

1

n

A

i

2

i

=

1

n

B

i

2

\text{cosine similarity}=S_C(A,B):=\cos(\theta)=\frac{\mathbf{A}\cdot\mathbf{B}}{\|\ \mathbf{A}\|\|\mathbf{B}\|}=\frac{\sum_{i=1}^nA_iB_i}{\sqrt{\sum_{i=1} ^nA_i^2}\sqrt{\sum_{i=1}^nB_i^2}}

cosine similarity=SC?(A,B):=cos(θ)=∥A∥∥B∥A?B?=∑i=1n?Ai2?
?∑i=1n?Bi2?
?∑i=1n?Ai?Bi
Now let’s start with a systematic introduction on how to efficiently calculate the cosine similarity between multiple pairs of vectors in pytorch (multiple pairs of vectors form a matrix form, which is often used in contrastive learning)

Introduction

From Wikipedia:

In data analysis, cosine similarity is a measure of the similarity between two nonzero vectors defined in an inner product space. Cosine similarity is the cosine of the angle between vectors; that is, it is the dot product of the vectors divided by the vector lengths. It follows that cosine similarity does not depend on the magnitude of the vectors, but only on their angles. Cosine similarity always belongs to the interval [?1,1].

PyTorch API for cosine similarity

torch.nn.functional.cosine_similarity(x1, x2, dim=1, eps=1e-8) -> Tensor

This computes the pairwise cosine similarity between x1 and x2 along the specified dimension. That is, if x1 and x2 both have shape (10, 4, 5), and we wish to calculate cosine similarity along the last dimension, the resulting shape is (10, 4).
For example:

x, y = torch.randn(10, 4, 5), torch.randn(10, 4, 5)
print(F.cosine_similarity(x, y, dim=2).shape)

#torch.Size([10, 4])

This is because when we feed cosine_similarity a 3d tensor and ask it to run cosine similarity on the third dimension (dimension index = 2), it collapses that index into a single value.

Compute cosine similarity of all pairs in PyTorch

Existing query results on the Internet show that it has been difficult for people to find a concise and effective method to perform all-pair cosine similarity operations. In fact, the problem is considered so complex that the torchmetrics page has a whole metric dedicated to the topic.
Fortunately, there is a working solution to this problem (mentioned in this PyTorch GitHub issue). Let’s take a look at what it looks like first!
cosine_similarity(x[None,:,:], x[:,None,:], dim=-1)
There’s a lot going on in there, and it might not be obvious how it really works, so the rest of this article will focus on dissecting what’s going on in the various sub-parts of this approach by building a solution from scratch. In the following subsections we will learn about the following as it applies to concisely calculate all pairs of cosine similarities.

  1. Index tensor with “None”
  2. Expand a tensor along a single dimension using tensor.expand()
  3. Implicitly expand tensors along a single dimension using PyTorch broadcast semantics

Index tensors with “None”

The first thing we need to understand is what happens when you use None to index a PyTorch tensor?

Similar to NumPy, you can insert a single dimension (“unpack” the dimension) by indexing that dimension with None. In turn, n[:, None] will have the effect of inserting a new dimension at dim=1. This is equivalent to n.unsqueeze(dim=1)

x = torch.randn(3)
# Indexing with None does the same thing as unsqueezing the tensor
# at that dimension. After this indexing operation, the tensors
# x_row_dup and x_col_dup will have 1 additional dimension at
# dimensions 0 and 1 respectively.
x_row_dup, x_col_dup = x[None,:], x[:,None]
print(x, x.shape)
print(x_row_dup, x_row_dup.shape)
print(x_col_dup, x_col_dup.shape)

#tensor([-1.2756, 1.1559, -0.0660]) torch.Size([3])
#tensor([[-1.2756, 1.1559, -0.0660]]) torch.Size([1, 3])
#tensor([[-1.2756],
#[1.1559],
# [-0.0660]]) torch.Size([3, 1])

Use .expand(…) to expand tensors

PyTorch Expand(…) API is used to expand the values of certain dimensions by repeating the values of certain dimensions. Note that only dimensions with a value of 1 can be expanded. Let’s look at the example below.

x_row_dup, x_col_dup = x_row_dup.expand(3, 3), x_col_dup.expand(3, 3)
print("x stretched across rows")
print(" - - - - - - - - - - - - -")
print(x_row_dup, x_row_dup.shape)
print("")
print("x stretched across columns")
print(" - - - - - - - - - - - - - - ")
print(x_col_dup, x_col_dup.shape)

#Print content
x stretched across rows
- - - - - - - - - - - - -
tensor([[-1.2756, 1.1559, -0.0660],
        [-1.2756, 1.1559, -0.0660],
        [-1.2756, 1.1559, -0.0660]]) torch.Size([3, 3])

x stretched across columns
- - - - - - - - - - - - -
tensor([[-1.2756, -1.2756, -1.2756],
        [1.1559, 1.1559, 1.1559],
        [-0.0660, -0.0660, -0.0660]]) torch.Size([3, 3])

Suppose our input tensor has 3 elements, namely (A, B, C). To compute the cosine similarity of all pairs, we first expand this tensor along 3 rows and 3 columns. The specific process is as follows:

Unsqueeze: In the first step, our input tensor (A, B, C) has a size of 3. We first expand it along the 0th and 1st dimensions. The specific method is to insert None as just introduced, so that it looks like this:
(A, B, C) expanded along dimension 0 will look like ((A, B, C)) and have shape (1, 3).
(A, C, C) expanded along dimension 1 would look like ((A), (B), ( C)) and have shape (3, 1).

Expand: We will then expand these tensors along their single dimension (the dimension with value 1) so that both tensors are squared.
((A, B, C)) expands to shape (3, 3) as follows:

((A, B, C),
 (A, B, C),
 (A, B, C))

((A), (B), ?) expands into shape (3, 3) as follows:

((A, A, A),
 (B, B ,B),
 (C, C, C))

If you don’t understand the above part, it is recommended to take a look at numpy’s broadcast operation. Here is an article recommended: numpy broadcast mechanism, which is introduced in great detail.

If we perform pairwise cosine similarity (the PyTorch API can already do this), then we will get full pairwise cosine similarity as shown in the figure below.

That’s it! Below is an example showing the tensors we have used so far.

# Add a dummy dimension at the end so that we can perform cosine
# similarity on that last dimension.
x_row_dup = x_row_dup.reshape(3, 3, 1)
x_col_dup = x_col_dup.reshape(3, 3, 1)
x_cosine_similarity = F.cosine_similarity(x_row_dup, x_col_dup, dim=-1)
print(x_cosine_similarity)

#tensor([[ 1., -1., 1.],
# [-1., 1., -1.],
# [1., -1., 1.]])

But wait! Why are all values 1 or -1? ! It’s easy to understand that this is because the angles of individual element vectors are either 0 degrees or 180 degrees, depending on whether they point in the same direction or in opposite directions. If you don’t understand, you can watch the deduction process below:

Let’s try the same thing with a 2D matrix instead of a 1D vector.

x = torch.randn(3, 2)
x_row_dup, x_col_dup = x[None,:,:], x[:,None,:]
x_row_dup, x_col_dup = x_row_dup.expand(3, 3, 2), x_col_dup.expand(3, 3, 2)
x_cosine_similarity = F.cosine_similarity(x_row_dup, x_col_dup, dim=-1)
print(x_cosine_similarity)

#tensor([[1.0000, 0.9512, 0.9826],
       [0.9512, 1.0000, 0.9920],
       [0.9826, 0.9920, 1.0000]])

This will seem easy to understand!

Last resort: Broadcast

Although we used .expand(…) earlier, we would like to mention that this is completely unnecessary since most operations defined in PyTorch support a concept called dimension broadcasting. From the documentation, if a PyTorch operation supports broadcasting, its Tensor parameter can be automatically expanded to the same size (without copying the data).

So when given 2 tensors of shape (1, 3) and (3, 1) as input, the cosine similarity operation will broadcast them to (3, 3) and essentially perform implicitly what we showed above The tensor expansion step performed by Eq.

When we run the code below.

x_cosine_similarity = F.cosine_similarity(x[None,:,:], x[:,None,:], dim=-1)
# This should print the same matrix as above.
print(x_cosine_similarity)

#tensor([[1.0000, 0.9512, 0.9826],
 # [0.9512, 1.0000, 0.9920],
 # [0.9826, 0.9920, 1.0000]])

This is actually the same as what we got above.

Notes on efficiency

In contrast to many of the solutions mentioned in this discussion, this solution does not use any explicit for loops. Every time you write an explicit for loop, you run into the following problem:

  1. Some significant CPU calculations are occurring that may cause GPU starvation
  2. If you use a for loop, you may leave some opportunity for GPU parallel execution. This will affect your overall GPU utilization and therefore the time it takes to run your calculations

The above article is mainly translated from: All Pairs Cosine Similarity in PyTorch. If you are interested, you can read the original article, it is absolutely wonderful!

References:
1.All Pairs Cosine Similarity in PyTorch,https://medium.om/@dhruvbird/all-pairs-cosine-similarity-in-pytorch-867e722c8572
2.numpy broadcast mechanism, https://blog.csdn.net/qq_51352578/article/details/125074264