What is my misconception about these two Matrix Calculus formulas for the differential?

Metronome

Junior Member
Joined
Jun 12, 2018
Messages
148
If [imath]X[/imath] is a matrix of variables, [imath]g(X)[/imath] is a scalar-valued function of [imath]X[/imath], and [imath]<\cdot,\ \cdot>_F[/imath] is the Frobenius Inner Product, then [imath]dg\ =\ <\nabla g,\ dX>_F[/imath]. Some examples I've seen derived are [imath]d(||X||_F)\ =\ <\frac{X}{||X||_F},\ dX>_F[/imath] and [imath]d(\vec v^T X \vec w)\ =\ <\vec v \vec w^T,\ dX>_F[/imath].

In general (and thus if [imath]g(X)[/imath] is still scalar-valued in particular), if [imath]J(g)[/imath] is the Jacobian Matrix for [imath]g(X)[/imath], then [imath]dg\ =\ J(g)dX[/imath].

However, the Frobenius Inner Product returns a scalar, and therefore the RHS of the first equation and thus the LHS of the first equation must be a scalar, while for all [imath]X[/imath] which are "non-trivial matrices" ([imath]2[/imath] by [imath]2[/imath] or larger), each of [imath]J(g)[/imath] and [imath]dX[/imath] should also be a non-trivial matrix, and since the matrix multiplication of two non-trivial matrices is never a scalar, the RHS of the second equation and thus the RHS of the second equation must not be a scalar. Therefore [imath]dg[/imath] is both a scalar and not a scalar.

I have not referenced the fact that [imath]\nabla g = (J(g))^T[/imath]. This is also true of course, but I can't even get the shapes to make sense. The problem seems to be that the Frobenius Inner Product returns a very different shape than matrix multiplication. What is my misconception here?
 
and since the matrix multiplication of two non-trivial matrices is never a scalar
Where does matrix multiplication come from? You call [imath]X[/imath] a matrix, but only use it as a vector in your formulae.
 
Where does matrix multiplication come from? You call [imath]X[/imath] a matrix, but only use it as a vector in your formulae.
I am definitely thinking of [imath]X[/imath] as [imath]2[/imath] by [imath]2[/imath] or larger and not necessarily square. I did forget to specify that by [imath]\nabla[/imath], I mean the more general version of the Gradient used in Matrix Calculus which returns a matrix if appropriate, so that may be the issue.
 
I am definitely thinking of [imath]X[/imath] as [imath]2[/imath] by [imath]2[/imath] or larger and not necessarily square. I did forget to specify that by [imath]\nabla[/imath], I mean the more general version of the Gradient used in Matrix Calculus which returns a matrix if appropriate, so that may be the issue.
You can think of [imath]X[/imath] as higher order tensors for all I care. But I still don't see where in your post you actually used matrix properties of [imath]X[/imath], as opposed to "plain" vector properties?
 
You can think of [imath]X[/imath] as higher order tensors for all I care. But I still don't see where in your post you actually used matrix properties of [imath]X[/imath], as opposed to "plain" vector properties?
If you frame my post as an (obviously hopeless) attempt to show a contradiction between the two formulas, then I have used matrix properties of [imath]X[/imath] where I say "since the matrix multiplication of two non-trivial matrices is never a scalar, the RHS of the second equation and thus the LHS [typo in original] of the second equation must not be a scalar." For example, because [imath]X[/imath] is a non-trivial matrix, it can be inferred that [imath]dX[/imath] is also a non-trivial matrix, and (unlike for vectors) there is nothing of any shape that could multiply [imath]dX[/imath] on the left which would output a scalar.
 
If [imath]X[/imath] is a matrix of variables, [imath]g(X)[/imath] is a scalar-valued function of [imath]X[/imath], ...

e.g. [imath] g=\det [/imath]

... and [imath]<\cdot,\ \cdot>_F[/imath] is the Frobenius Inner Product, ...

which means [imath] \bigl\langle X,Y \bigr\rangle_F=\sum_{ij}X_{ij}Y_{ij} [/imath]

... then [imath]dg\ =\ <\nabla g,\ dX>_F[/imath].

What do you mean by [imath] dX [/imath]? If [imath] g\, : \,\mathbb{R}^{n^2}\to \mathbb{R} [/imath] then [imath] dg\, : \,\mathbb{R}^{n^2}\to \mathbb{R} [/imath]. What is the difference between [imath] dg [/imath] and [imath] \nabla g [/imath]? How can the RHS depend on [imath] X [/imath] whereas the LHS does not?
 
I have used matrix properties of XXX where I say "since the matrix multiplication of two non-trivial matrices is never a scalar
Yes, you said that, but all your formulae treat [imath]X[/imath] as a "plain" vector.
 
Yes, you said that, but all your formulae treat [imath]X[/imath] as a "plain" vector.
If the problem purported is that my use of [imath]X[/imath] is inconsistent with [imath]X[/imath] being a matrix, then I think this is just false. The Forbenius Inner Product takes matrices as input, the Frobenius Norm takes matrices as input, I clarified that I'm using the Matrix Calculus extension of [imath]\nabla[/imath], [imath]\vec v^T X \vec w[/imath] is a bilinear form, etc. If you think I have still made a mistake, feel free to pinpoint it.

If the problem purported is that my use of [imath]X[/imath] does not require [imath]X[/imath] to be a matrix, then I think this would be irrelevant even if true. The formulas are supposed to hold for all shapes of [imath]X[/imath] (at least up to matrices, if not higher order tensors). Therefore, they should hold for whatever particular shape I think of [imath]X[/imath] having.
 
e.g. [imath] g=\det [/imath]
Indeed
which means [imath] \bigl\langle X,Y \bigr\rangle_F=\sum_{ij}X_{ij}Y_{ij} [/imath]
Indeed
What do you mean by [imath] dX [/imath]? If [imath] g\, : \,\mathbb{R}^{n^2}\to \mathbb{R} [/imath] then [imath] dg\, : \,\mathbb{R}^{n^2}\to \mathbb{R} [/imath]. What is the difference between [imath] dg [/imath] and [imath] \nabla g [/imath]? How can the RHS depend on [imath] X [/imath] whereas the LHS does not?
By [imath]dX[/imath], I mean the differential of the matrix [imath]X[/imath]. Similarly, [imath]dg[/imath] is the differential of [imath]g(X)[/imath], whereas [imath]\nabla g[/imath] is the Gradient of [imath]g(X)[/imath]. Both the RHS and LHS of both equations depend on [imath]X[/imath]; I just left the dependence of [imath]g[/imath] on [imath]X[/imath] implicit in the formulas since I had specified it earlier.
 
By [imath]dX[/imath], I mean the differential of the matrix [imath]X[/imath]. Similarly, [imath]dg[/imath] is the differential of [imath]g(X)[/imath], whereas [imath]\nabla g[/imath] is the Gradient of [imath]g(X)[/imath]. Both the RHS and LHS of both equations depend on [imath]X[/imath]; I just left the dependence of [imath]g[/imath] on [imath]X[/imath] implicit in the formulas since I had specified it earlier.
Still, what is [imath] dX? [/imath] Is it the matrix with [imath] x_{ij}[/imath] at position [imath] (i,j) [/imath] and zero elsewhere, i.e. the Jacobi matrix of its coordinate functions? What are the variables?

[imath] g [/imath] is a function from a Euclidean space into the field of (I assume) real numbers, which would save us from dealing with conjugates. This means, the differential [imath] dg [/imath] of [imath] g [/imath] is the same, a function from the Euclidean space to the real numbers, the Jacobi matrix, only with the size [imath] n^2\times 1. [/imath] I assume we can write it as [imath] dg=\bigl\langle \nabla g,X \bigr\rangle_F [/imath] since it is all about arrangement and nothing seriously happens.

Maybe I didn't get your question, and an example would be helpful. Say we use [imath] n=2, [/imath] the variables [imath] x,y,u,v [/imath] to avoid confusion with the indices of [imath] X, [/imath] and the determinant
[math] g(X)=\det(X)=\det\left(\begin{pmatrix}X_{11}&X_{12}\\X_{21}&X_{22}\end{pmatrix}\right) .[/math]Then
[math] g(x,y,u,v)=g\begin{pmatrix}x&y\\u&v\end{pmatrix}= xv-yu [/math]and
[math] J_X(g)=\nabla_X(g)=\left(\dfrac{\partial g}{\partial x}(X)\, , \,\dfrac{\partial g}{\partial y}(X) \, , \, \dfrac{\partial g}{\partial u}(X) \, , \, \dfrac{\partial g}{\partial v}(X)\right) =\left(X_{22}\, , \,-X_{21}\, , \,-X_{12}\, , \,X_{11}\right).[/math]
Do we agree so far?
 
Still, what is [imath] dX? [/imath] Is it the matrix with [imath] x_{ij}[/imath] at position [imath] (i,j) [/imath] and zero elsewhere, i.e. the Jacobi matrix of its coordinate functions? What are the variables?

[imath] g [/imath] is a function from a Euclidean space into the field of (I assume) real numbers, which would save us from dealing with conjugates. This means, the differential [imath] dg [/imath] of [imath] g [/imath] is the same, a function from the Euclidean space to the real numbers, the Jacobi matrix, only with the size [imath] n^2\times 1. [/imath] I assume we can write it as [imath] dg=\bigl\langle \nabla g,X \bigr\rangle_F [/imath] since it is all about arrangement and nothing seriously happens.

Maybe I didn't get your question, and an example would be helpful. Say we use [imath] n=2, [/imath] the variables [imath] x,y,u,v [/imath] to avoid confusion with the indices of [imath] X, [/imath] and the determinant
[math] g(X)=\det(X)=\det\left(\begin{pmatrix}X_{11}&X_{12}\\X_{21}&X_{22}\end{pmatrix}\right) .[/math]Then
[math] g(x,y,u,v)=g\begin{pmatrix}x&y\\u&v\end{pmatrix}= xv-yu [/math]and
[math] J_X(g)=\nabla_X(g)=\left(\dfrac{\partial g}{\partial x}(X)\, , \,\dfrac{\partial g}{\partial y}(X) \, , \, \dfrac{\partial g}{\partial u}(X) \, , \, \dfrac{\partial g}{\partial v}(X)\right) =\left(X_{22}\, , \,-X_{21}\, , \,-X_{12}\, , \,X_{11}\right).[/math]
Do we agree so far?
As I understand, [imath]dX[/imath] is a matrix shaped the same as [imath]X[/imath] with elements each approaching [imath]0[/imath], possibly independently (i.e., a computer approximation might assign the elements I.I.D. randomly). We can indeed assume that [imath]g[/imath] is a function from matrices (with real elements) to real numbers, but the equation should be [imath]dg = \bigl\langle \nabla g,\ dX \bigr\rangle_F[/imath] rather than [imath]dg = \bigl\langle \nabla g,\ X \bigr\rangle_F[/imath]

The example seems close to right so far; [imath]X = \begin{pmatrix}x & y \\ u & v\end{pmatrix}[/imath] and [imath]g(X) = \det(X) = xv - yu[/imath], but I think it should be that the Jacobian and the (generalized) Gradient are each other's transposes. In particular I have that [imath]\nabla_X (g) = \det(X)(X^{-1})^T[/imath], so I guess [imath]J_X (g)[/imath] would be [imath]\det(X)X^{-1}[/imath].
 
As I understand, [imath]dX[/imath] is a matrix shaped the same as [imath]X[/imath] with elements each approaching [imath]0[/imath], possibly independently (i.e., a computer approximation might assign the elements I.I.D. randomly).

The shape is not the question. The variables are! If you expect [imath] dX [/imath] not to be straight away zero, then its matrix entries have to be functions with variables along which we can differentiate them. One possibility is to consider each matrix entry to be a coordinate function:
[math] X_{11}=x\, , \,X_{12}=y\, , \,X_{21}=u\, , \,X_{22}=v. [/math]Another possibility would be to consider the domain from which the matrices are taken as a manifold and consider paths on this manifold. In this case, we have
[math] X_{11}=x(t)\, , \,X_{12}=y(t)\, , \,X_{21}=u(t)\, , \,X_{22}=v(t) [/math]where [imath] t\mapsto X=X(t) [/imath] is the parameterization of such a path and the parameter [imath] t [/imath] the variable along which we differentiate. In this case
[math] dX=\begin{pmatrix}x'(t)&y'(t)\\u'(t)&v'(t)\end{pmatrix}dt. [/math]
However, I suspect that we have the first case here: four variables [imath] x,y,u,v, [/imath] the coordinates of the matrix.

We can indeed assume that [imath]g[/imath] is a function from matrices (with real elements) to real numbers, but the equation should be [imath]dg = \bigl\langle \nabla g,\ dX \bigr\rangle_F[/imath] rather than [imath]dg = \bigl\langle \nabla g,\ X \bigr\rangle_F[/imath]
I do not understand this formula without knowing what [imath] g [/imath] and [imath] dX [/imath] are.
The example seems close to right so far; [imath]X = \begin{pmatrix}x & y \\ u & v\end{pmatrix}[/imath] and [imath]g(X) = \det(X) = xv - yu[/imath], but I think it should be that the Jacobian and the (generalized) Gradient are each other's transposes. In particular I have that [imath]\nabla_X (g) = \det(X)(X^{-1})^T[/imath], so I guess [imath]J_X (g)[/imath] would be [imath]\det(X)X^{-1}[/imath].

You have to distinguish between the directions along which we differentiate and the location at which the derivative is evaluated! The gradient and so the Jacobian is evaluated at a certain position. Do not use the same letters for variables and location, this is confusing.

Maybe I should have been clearer and should have written

[math] J_P(g)=\nabla_P(g)=\left(\dfrac{\partial g}{\partial x}(P)\, , \,\dfrac{\partial g}{\partial y}(P) \, , \, \dfrac{\partial g}{\partial u}(P) \, , \, \dfrac{\partial g}{\partial v}(P)\right) =\left(P_{22}\, , \,-P_{21}\, , \,-P_{12}\, , \,P_{11}\right).[/math]where [imath] P [/imath] is the location where we evaluate the derivative, a point [imath] P. [/imath]

This means for my example that
[math]\begin{array}{lll} d_P\det=\bigl\langle \nabla_P(\det),dX \bigr\rangle_F&= \bigl\langle \nabla_P(\det),dX \bigr\rangle=\left(P_{22}\, , \,-P_{21}\, , \,-P_{12}\, , \,P_{11}\right)\cdot \begin{pmatrix}dx\\dy\\du\\dv\end{pmatrix}=P_{22}dx-P_{21}dy-P_{12}du+P_{11}dv \end{array} .[/math]Of course, we can read this as a function of location [imath] P\longmapsto d_P\det. [/imath] And then we get to the point where confusion with derivatives often arises. The coordinates [imath] P_{ij} [/imath] of the location become variables again, and people write them as such, neglecting the changed meaning. Therefore the formula becomes
[math] X=\begin{pmatrix}x&y\\u&v\end{pmatrix} \longmapsto d_X(\det)=v\,dx-u\,dy-y\,du+x\,dv.[/math]
The case [imath] g=\| \, . \, \|_F [/imath] with [imath] \| \, X \, \|_F=x+y+u+v [/imath] is probably easier. Let's see.
[math] \nabla_P(\| \, . \, \|_F)=\left(\left.\dfrac{\partial }{\partial x}\right|_P\| \, . \, \|_F \, , \,\left.\dfrac{\partial }{\partial y}\right|_P\| \, . \, \|_F \, , \,\left.\dfrac{\partial }{\partial u}\right|_P\| \, . \, \|_F \, , \,\left.\dfrac{\partial }{\partial v}\right|_P\| \, . \, \|_F \right)=(1,1,1,1) [/math]and
[math] d_P\left(\| \, . \, \|_F\right) =\bigl\langle \nabla_P\left(\| \, . \, \|_F\right),dX \bigr\rangle_F=(1,1,1,1)\begin{pmatrix}dx\\dy\\du\\dv\end{pmatrix}=dx+dy+du+dv.[/math]
We do not need to consider the point of evaluation anymore in this case because the derivative of the Frobenius norm is constantly one in every coordinate.
 
I think I have proceeded a bit differently (without vectorizing as you appear to), but I did get the same answer for the Frobenius Inner Product. Zooming out a bit, I've made some progress on an answer to the original question. My two formulas for [imath]dg[/imath] seem to differ only by composition with a trace. In other words, [imath]dg\ =\ <\nabla g,\ dX>_F[/imath] and [imath]dg\ =\ trace(J(g)dX)[/imath] agree! The only mystery now is that I have not seen [imath]dg\ =\ trace(J(g)dX)[/imath] explicated as a correct formula, and I have seen [imath]dg\ =\ J(g)dX[/imath].
 
I think I have proceeded a bit differently (without vectorizing as you appear to), but I did get the same answer for the Frobenius Inner Product. Zooming out a bit, I've made some progress on an answer to the original question. My two formulas for [imath]dg[/imath] seem to differ only by composition with a trace. In other words, [imath]dg\ =\ <\nabla g,\ dX>_F[/imath] and [imath]dg\ =\ trace(J(g)dX)[/imath] agree! The only mystery now is that I have not seen [imath]dg\ =\ trace(J(g)dX)[/imath] explicated as a correct formula, and I have seen [imath]dg\ =\ J(g)dX[/imath].
Where do you see a trace? The Frobenius product considers all matrix entries. Traces are obtained if we differentiate the determinant and evaluate it at the identity matrix, see my formula in post #12. If
[math] P=I=\begin{pmatrix}1&0\\0&1\end{pmatrix} [/math]then
[math] d_I(\det)=1\cdot dx-0\cdot dy-0\cdot du +1\cdot dv=dx+dv=\operatorname{trace}(dX). [/math]
This is why we get the tangential space of [imath] \operatorname{SL}(2)=\left\{X\in \mathbb{M}(2,\mathbb{R})\,|\,\det(X)=1\right\} [/imath] as
[math] \mathfrak{sl}(2)=\left\{dX\in \mathbb{M}(2,\mathbb{R})\,|\, d_I(\det(X))=d(1)=0=\operatorname{trace}(dX)\right\}, [/math]the vector space of [imath] 2\times 2 [/imath] matrices with vanishing trace.
 
Last edited:
Where do you see a trace? The Frobenius product considers all matrix entries. Traces are obtained if we differentiate the determinant and evaluate it at the identity matrix, see my formula in post #12. If
[math] P=I=\begin{pmatrix}1&0\\0&1\end{pmatrix} [/math]then
[math] d_I(\det)=1\cdot dx-0\cdot dy-0\cdot du +1\cdot dv=dx+dv=\operatorname{trace}(dX). [/math]
This is why we get the tangential space of [imath] \operatorname{SL}(2)=\left\{X\in \mathbb{M}(2,\mathbb{R})\,|\,\det(X)=1\right\} [/imath] as
[math] \mathfrak{sl}(2)=\left\{dX\in \mathbb{M}(2,\mathbb{R})\,|\, d_I(\det(X))=d(1)=0=\operatorname{trace}(dX)\right\}, [/math]the vector space of [imath] 2\times 2 [/imath] matrices with vanishing trace.
I don't think the trace is associated with the determinant from this specific example. It seems to appear generally (at least for all square [imath]X[/imath]), and can be seen with very little reference to the underlying calculus problem. Say we have some [imath]X[/imath] and [imath]g(X)[/imath] as before. Then if [imath]X[/imath] is square, then the Jacobian and Gradient are each also square, all three of the same size (the Gradient is the Matrix Calculus version of the Gradient as defined here, which is the transpose of the Jacobian). Call the Jacobian [imath]J_{X}(g) = \begin{pmatrix}J_{11} & J_{12} \\ J_{21} & J_{22}\end{pmatrix}[/imath] and the differential [imath]dX = \begin{pmatrix}dX_{11} & dX_{12} \\ dX_{21} & dX_{22}\end{pmatrix}[/imath].

The first formula is [imath]dg = \bigl\langle \nabla_X (g),\ dX \bigr\rangle_F[/imath]. Transposing the Jacobian then doing the Forbenius Inner Product yields [imath]dg = J_{11}\ dX_{11} + J_{12}\ dX_{21} + J_{21}\ dX_{12} + J_{22}\ dX_{22}[/imath].

The second formula is [imath]dg = J_X(g)\ dX[/imath]. Doing the matrix multiplication yields [imath]\begin{pmatrix}J_{11}\ dX_{11} + J_{12}\ dX_{12} & J_{11}\ dX_{12} + J_{12}\ dX_{22} \\ J_{21}\ dX_{11} + J_{22}\ dX_{21} & J_{21}\ dX_{12} + J_{22}\ dX_{22}\end{pmatrix}[/imath]. The first result can be seen in the trace of this second result.
 
The link helps a lot to clarify the language that is used in your source. I wouldn't write [imath] df =f(x+dx)-f(x)[/imath] since it is a bit sloppy, but ok.

Now that I can look up what you actually mean, can we restart the discussion? What is your question? To answer the only sentence with a question mark in your post #1 ...
The problem seems to be that the Frobenius Inner Product returns a very different shape than matrix multiplication. What is my misconception here?
... I did the following calculation to verify the formula [imath] \bigl\langle A,B \bigr\rangle_F=\operatorname{tr}(A^TB). [/imath]

[math]\begin{array}{lll} (A^TB)_{ij}&=\displaystyle{\sum_{k=1}^n (A^T)_{ik}B_{kj}=\sum_{k=1}^n A_{ki}B_{kj}}\\[12pt] tr(A^TB)&=\displaystyle{\sum_{m=1}^n(A^TB)_{mm}=\sum_{m=1}^n\left(\sum_{k=1}^n A_{km}B_{km}\right)=\bigl\langle A,B \bigr\rangle_F } \end{array}[/math]
The notation [imath] A=\operatorname{vec(A}) [/imath] is simply the vector we obtain from reading the matrix from left to right, and row by row, e.g. [math] \operatorname{vec}(X)=(x,y,u,v) \text{ in case }X=\begin{pmatrix}x&y\\u&v\end{pmatrix}.[/math]
The Frobenius product of two matrices [imath] A,B [/imath] is thus the vector product [imath] \operatorname{vec(A)}^T\cdot \operatorname{vec}(B) [/imath] because the vector product matches the indices:
[math] \vec{v}^T\cdot \vec{w}=\displaystyle{\sum_{k=1}^{n}v_k w_k} [/math]or in our case
[math] \operatorname{vec}(A)^T\cdot \operatorname{vec}(B)=\displaystyle{\sum_{(i,j)=(1,1)}^{(n,n)}v_{ij} w_{ij}} [/math]Please note that the "T" at a matrix means the transposed matrix, whereas the "T" at a vector only means that we write it as a row. Vectors without that "T" are column vectors.

Does that answer your question about the connection of the ordinary matrix product and the Frobenius product?
 
Last edited:
Now that I can look up what you actually mean, can we restart the discussion? What is your question?
Okay, I think we have most of the grounding now. Starting again, we have two formulas, [imath]dg = \bigl\langle \nabla_X (g),\ dX \bigr\rangle_F[/imath] and [imath]dg = J_X (g)\ dX[/imath] (the latter being the generalization of [imath]df\ =\ f'(x)\ dx[/imath] from ordinary calculus).

What I believe you have shown is that [imath]dg = \bigl\langle \nabla_X (g),\ dX \bigr\rangle_F[/imath] is equivalent to [imath]dg = vec(J_X (g))^T\ vec(dX)[/imath], where I'm amending your definition of vectorization to stacking the columns of a matrix into a column vector (I suspect this is what you wanted anyhow, or else some of your vector products appear to be outer products rather than inner products). I think the last two subquestions to answer the main question are...

1) What justifies vectorization in this way? I could buy the idea of vectorizing both sides of an equation, i.e., [imath]dg = J_X (g)\ dX\ \implies\ vec(dg) = vec(J_X (g)\ dX)[/imath]. I've done this in other contexts. But [imath]dg = J_X (g)\ dX\ \implies\ dg = vec(J_X (g))^T\ vec(dX))[/imath] is mysterious.

2) We still have the apparent contradiction that [imath]dg[/imath] is both a scalar and a matrix. Even vectorizing [imath]dg[/imath] wouldn't make it a scalar as needed. Are there two genuinely different concepts of [imath]dg[/imath] here, and if so, how is each to be interpreted?
 
1) What justifies vectorization in this way? I could buy the idea of vectorizing both sides of an equation, i.e., [imath]dg = J_X (g)\ dX\ \implies\ vec(dg) = vec(J_X (g)\ dX)[/imath]. I've done this in other contexts. But [imath]dg = J_X (g)\ dX\ \implies\ dg = vec(J_X (g))^T\ vec(dX))[/imath] is mysterious.

It is he same resulting number, only written differently.

[math]\begin{array}{lll} \bigl\langle A,B \bigr\rangle_F&=\displaystyle{ \sum_{i=1}^n \sum_{j=1}^n A_{ij} B_{ij} } \end{array}[/math]
and
[math]\begin{array}{lll} \operatorname{vec}(A)^T\cdot \operatorname{vec}(B)&=(A_{11},A_{12},\ldots,A_{1n},A_{21},A_{22},\ldots,A_{2n},\ldots,A_{n1},A_{n2},\ldots,A_{nn}) \cdot \begin{pmatrix}B_{11}\\B_{12}\\ \vdots \\B_{1n}\\ \vdots\\ \vdots \\B_{nn}\end{pmatrix} \\ &\\ &=A_{11}B_{11}+A_{12}B_{12}+\ldots+A_{1n}B_{1n}+\ldots+A_{n1}B_{n1}+A_{n2}B_{n2}+\ldots+A_{nn}B_{nn}\\[12pt] &=\displaystyle{ \sum_{i=1}^n \sum_{j=1}^n A_{ij} B_{ij} } \end{array}[/math]
2) We still have the apparent contradiction that [imath]dg[/imath] is both a scalar and a matrix. Even vectorizing [imath]dg[/imath] wouldn't make it a scalar as needed. Are there two genuinely different concepts of [imath]dg[/imath] here, and if so, how is each to be interpreted?

Yes, but this is only because the notation isn't stringent. We have to decide what [imath] dg [/imath] means!

If we have a differentiable function [imath] g\, : \,\mathbb{R}^{n^2}\longrightarrow \mathbb{R} [/imath] then [imath] dg [/imath] is usually the linear function from one tangent space to the other, the derivative or the differential form here, in our case
[math] dg\, : \,T_p\left(\mathbb{R}^{n^2}\right)=\mathbb{R}^{n^2}\longrightarrow T_{g(p)}\left(\mathbb{R}\right)=\mathbb{R} [/math]This means that [imath] dg [/imath] is a linear transformation from an [imath] n^2 [/imath]-dimensional vector space into a one-dimensional vector space.
Hence, we can write it as a [imath] n^2\times 1 [/imath] matrix, or rearranged as an [imath] n\times n [/imath] matrix. This is a matter of convenience, and given that this is about programming, a matter of index management.

Now, why can it be seen as a scalar? Well, it isn't, but the images under [imath] dg [/imath] are scalars. If we evaluate the derivative at a certain point, [imath] g(p), [/imath] and apply a direction - note that derivatives are always directional - say the direction [imath] dX [/imath] which lives in the [imath] n^2 [/imath]-dimensional vector space where our matrices live, then we get
[math] D_{g(p)}(g)\cdot dX=\bigl\langle \nabla_{g(p)}\, , \,dX \bigr\rangle \in \mathbb{R}.[/math]
Abbreviating the derivative [imath] D [/imath] of [imath] g [/imath] at the location [imath] g(p) [/imath] in direction [imath] dX [/imath] by simply writing it [imath] dg=D_{g(p)}(g) [/imath] or even [imath] dg=D_{g(p)}(g)\cdot dX [/imath] is sloppy. It should at least be something like [imath] dg(X) [/imath] or [imath] d_Xg. [/imath]

In the end, it is the same question as whether a notation like [imath] f(x) [/imath] means the function or the resulting number.
Look at the beginning of section 5.1 in your source where they defined [imath] df.[/imath] They have smuggled a [imath] [dX] [/imath] into the end of the line, indicating the multiplication with [imath] dX [/imath], which makes it a number.

Ask 3 scientists about what a derivative is and you get 5 different answers and 8 different notations, at least.
 
Last edited:
It is he same resulting number, only written differently.

[math]\begin{array}{lll} \bigl\langle A,B \bigr\rangle_F&=\displaystyle{ \sum_{i=1}^n \sum_{j=1}^n A_{ij} B_{ij} } \end{array}[/math]
and
[math]\begin{array}{lll} \operatorname{vec}(A)^T\cdot \operatorname{vec}(B)&=(A_{11},A_{12},\ldots,A_{1n},A_{21},A_{22},\ldots,A_{2n},\ldots,A_{n1},A_{n2},\ldots,A_{nn}) \cdot \begin{pmatrix}B_{11}\\B_{12}\\ \vdots \\B_{1n}\\ \vdots\\ \vdots \\B_{nn}\end{pmatrix} \\ &\\ &=A_{11}B_{11}+A_{12}B_{12}+\ldots+A_{1n}B_{1n}+\ldots+A_{n1}B_{n1}+A_{n2}B_{n2}+\ldots+A_{nn}B_{nn}\\[12pt] &=\displaystyle{ \sum_{i=1}^n \sum_{j=1}^n A_{ij} B_{ij} } \end{array}[/math]


Yes, but this is only because the notation isn't stringent. We have to decide what [imath] dg [/imath] means!

If we have a differentiable function [imath] g\, : \,\mathbb{R}^{n^2}\longrightarrow \mathbb{R} [/imath] then [imath] dg [/imath] is usually the linear function from one tangent space to the other, the derivative or the differential form here, in our case
[math] dg\, : \,T_p\left(\mathbb{R}^{n^2}\right)=\mathbb{R}^{n^2}\longrightarrow T_{g(p)}\left(\mathbb{R}\right)=\mathbb{R} [/math]This means that [imath] dg [/imath] is a linear transformation from an [imath] n^2 [/imath]-dimensional vector space into a one-dimensional vector space.
Hence, we can write it as a [imath] n^2\times 1 [/imath] matrix, or rearranged as an [imath] n\times n [/imath] matrix. This is a matter of convenience, and given that this is about programming, a matter of index management.

Now, why can it be seen as a scalar? Well, it isn't, but the images under [imath] dg [/imath] are scalars. If we evaluate the derivative at a certain point, [imath] g(p), [/imath] and apply a direction - note that derivatives are always directional - say the direction [imath] dX [/imath] which lives in the [imath] n^2 [/imath]-dimensional vector space where our matrices live, then we get
[math] D_{g(p)}(g)\cdot dX=\bigl\langle \nabla_{g(p)}\, , \,dX \bigr\rangle \in \mathbb{R}.[/math]
Abbreviating the derivative [imath] D [/imath] of [imath] g [/imath] at the location [imath] g(p) [/imath] in direction [imath] dX [/imath] by simply writing it [imath] dg=D_{g(p)}(g) [/imath] or even [imath] dg=D_{g(p)}(g)\cdot dX [/imath] is sloppy. It should at least be something like [imath] dg(X) [/imath] or [imath] d_Xg. [/imath]

In the end, it is the same question as whether a notation like [imath] f(x) [/imath] means the function or the resulting number.
Look at the beginning of section 5.1 in your source where they defined [imath] df.[/imath] They have smuggled a [imath] [dX] [/imath] into the end of the line, indicating the multiplication with [imath] dX [/imath], which makes it a number.

Ask 3 scientists about what a derivative is and you get 5 different answers and 8 different notations, at least.

Alright, I think I understand it about as well as it can be understood. Thank you for the help!
 
Top