Matrix Calculus

Trace

In this section, we collect several results related to derivatives of traces of functions of a matrix \(X\).

Formula
\[\frac{\partial \Trace(A X B)}{\partial X} = A^T B^T\]

Here is a size argument which is helpful in remembering the result. Trace can be computed only of square matrices. Assume that \(X\) is of size \(m \times n\) and let \(A\) be of size \(p \times m\). Then, since \(A X B\) must be square, hence \(B\) must be of size \(n \times p\). Then the number of rows of \(A\) and columns of \(B\) match. Hence, \(BA\) and \(A^T B^T\) are valid multiplications with sizes \(n \times m\) and \(m \times n\). The size of \(X\) and \(\frac{\partial \Trace(A X B)}{\partial X}\) are both \(m \times n\) which forces the choice of result.

Formula
\[\frac{\partial \Trace(A X^T B)}{\partial X} = B A\]

Following the size argument earlier, \(BA\) has the size identical to \(X\).

Some of the implications are listed here.

Formula
\[\frac{\partial \Trace(A X)}{\partial X} = A^T\]
\[\frac{\partial \Trace(X B)}{\partial X} = B^T\]
\[\frac{\partial \Trace(A X^T)}{\partial X} = A\]
\[\frac{\partial \Trace(X^T B)}{\partial X} = B\]

Note that in each case, dimensionally, we just have to achieve that the derivative has same size as \(X\).

Formula
\[\begin{split}\begin{aligned} \frac{\partial \Trace(AXBXC)}{\partial X} &= A^T (BXC)^T + (AXB)^T C^T\\ &= A^T C^T X^T B^T + B^T X^T A^T C^T \end{aligned}\end{split}\]
Proof

We differentiate separately for each appearance of \(X\) treating rest of the terms as constant and then add up the derivatives.

If \(A\) is size \(p \times m\), \(X\) is size \(m \times n\), then \(B\) must be size \(n \times m\) and \(C\) must be size \(m \times p\) to make \(AXBXC\) a square matrix of size \(p \times p\).

Frobenius Norm

The Frobenius norm squared is easily expressed in the form of trace:

\[\| X \|_F^2 = \Trace (X X^T).\]
Formula
\[\frac{\partial \| X \|_F^2}{\partial X} = 2X.\]
Proof
\[\| X \|_F^2 = \Trace(X X^T) = \Trace(I X X^T) = \Trace (X X^T I).\]

We differentiate separately for each appearance of \(X\) treating rest of the terms as constant and then add up the derivatives.

Formula
\[\frac{\partial \| X - A \|_F^2}{\partial X} = 2(X-A).\]
\[\Trace ((X-A)(X-A)^T) = \Trace(XX^T - AX^T - XA^T + AA^T).\]

We get the result by differentiating each term in the sum separately.

Formula
\[\frac{\partial \| A X \|_F^2}{\partial X} = 2(A^T A) X.\]
Proof

We expand the Frobenius norm terms:

\[\| A X \|_F^2 = \Trace ((A X) (A X)^T) = \Trace (A X X^T A^T).\]

Differentiating for each appearance of \(X\) separately, we will get

\[A^T (X^T A^T)^T + A^T (A X) = 2 (A^T A) X\]

Dimensionally, if \(X\) is of size \(m \times n\), then \(A\) is of size \(p \times m\) and \(A^T A\) is of size \(m \times m\). Thus size of \(X\) and \(A^T A X\) are same as expected.

Formula
\[\frac{\partial \| X B \|_F^2}{\partial X} = 2 X B B^T.\]
Proof

We expand the Frobenius norm terms:

\[\| X B \|_F^2 = \Trace ((X B) (X B)^T) = \Trace (X B B^T X^T).\]

Differentiating for each appearance of \(X\) separately, we will get

\[(B B^T X^T)^T + (X B B^T) = 2 X B B^T.\]

Dimensionally, \(B\) is of size \(n \times p\) and \(BB^T\) is of size \(n \times n\) which post-multiplies with \(X\) without changing size.

Formula
\[\frac{\partial \| A X - B \|_F^2}{\partial X} = 2(A^T A) X - 2 A^T B = 2 A^T (AX -B).\]
Proof

We expand the Frobenius norm terms:

\[\| A X - B\|_F^2 = \Trace ((A X - B) (A X - B)^T) = \Trace (A X X^T A^T - B X^T A^T - A X B^T + B B^T ).\]

Differentiating each term separately, we get:

\[2(A^T A) X - A^T B - A^T B = 2A^T(AX - B).\]
Formula
\[\frac{\partial \| X B - C \|_F^2}{\partial X} = 2X (B B^T) - 2 C B^T = 2 (X B - C) B^T.\]
Proof

We expand the Frobenius norm terms:

\[\| X B - C\|_F^2 = \Trace ((X B - C) (X B - C)^T) = \Trace(X B B^T X^T - C B^T X^T - X B C^T + CC^T).\]

Differentiating each term separately, we get:

\[2 X (BB^T) - C B^T - (B C^T)^T = 2( X B - C) B^T.\]