Using the product rule for a partial derivative of a matrix / vector function...

datahead8888

New member
Joined
Jul 28, 2013
Messages
5
Suppose we have a function consisting of a series of matrices multiplied by a vector:
f(X) = A * B * b
--where X is a vector containing elements that are contained within A, b, and/or b,
--A is a matrix, B is a matrix, and b is a vector

Each Matrix and the vector is expressed as more terms, ie...
X = (x1, x2, x3)

A =
[ x1 + y1 y4 y7 ]
[ y2 x2 + y5 y8 ]
] y3 y6 x3 + y9 ]

B =
[ y1 x2 + y4 x3 + y7 ]
[x1 + y2 y5 y8 ]
] y3 y6 y9 ]

b = [y1 y2 y3]' (' means transposed)

Now we want to find the Jacobian of f - ie the partial derivative of f wrt X.

One way to do this is to multiply the two matrices and then multiply that by the vector, creating one 3x1 vector in which each element is an algebraic expression resulting from matrix multiplication. The partial derivative could then be computed per element to form a 3x3 Jacobian. This would be feasible in the above example, but the one I'm working is a lot more complicated (and so I would also have to look for patterns in order to simplify it afterwards).

I was wanting to try to use the chain rule and/or the product rule for partial derivatives if possible. However, with the product rule you end up with A' * B * b + A * B' * b + A * B * b', where each derivative is wrt to the vector X. I understand that the derivative of a matrix wrt a vector is actually a 3rd order tensor, which is not easy to deal with. If this is not correct, the other terms still have to evaluate to matrices in order for matrix addition to be valid. If I use the chain rule instead, I still end up with the derivative of a matrix wrt a vector.

Is there an easier way to break down a matrix calculus problem like this? I've scoured the web and cannot seem to find a good direction.
 
Suppose we have a function consisting of a series of matrices multiplied by a vector:
f(X) = A * B * b
--where X is a vector containing elements that are contained within A, b, and/or b,
--A is a matrix, B is a matrix, and b is a vector

Each Matrix and the vector is expressed as more terms, ie...
X = (x1, x2, x3)

A =
[ x1 + y1 y4 y7 ]
[ y2 x2 + y5 y8 ]
] y3 y6 x3 + y9 ]

B =
[ y1 x2 + y4 x3 + y7 ]
[x1 + y2 y5 y8 ]
] y3 y6 y9 ]

b = [y1 y2 y3]' (' means transposed)

Now we want to find the Jacobian of f - ie the partial derivative of f wrt X.

One way to do this is to multiply the two matrices and then multiply that by the vector, creating one 3x1 vector in which each element is an algebraic expression resulting from matrix multiplication. The partial derivative could then be computed per element to form a 3x3 Jacobian. This would be feasible in the above example, but the one I'm working is a lot more complicated (and so I would also have to look for patterns in order to simplify it afterwards).

I was wanting to try to use the chain rule and/or the product rule for partial derivatives if possible. However, with the product rule you end up with A' * B * b + A * B' * b + A * B * b', where each derivative is wrt to the vector X. I understand that the derivative of a matrix wrt a vector is actually a 3rd order tensor, which is not easy to deal with. If this is not correct, the other terms still have to evaluate to matrices in order for matrix addition to be valid. If I use the chain rule instead, I still end up with the derivative of a matrix wrt a vector.

Is there an easier way to break down a matrix calculus problem like this? I've scoured the web and cannot seem to find a good direction.

I think you're getting components of a vector and dummy variables mixed up.

if you have f[X] = A * B * b, the underlying dummy variables are x1, x2, x3 as you've noted.

J(f[X])[i,j] = partial derivative of f[X] (ith component of f[X]) wrt to dummy variable xj

you're differentiating w/respect to dummy variables, not vector components.

using index notation for the matrix math we see

f[X] = Aik Bkj bj

This is just the product of functions of x1,x2,x3 and you know how to differentiate those using the chain rule.

This all ends up being that the chain rule works fine for matrices as well in this case for example

del/delx1 (f[X]) = del/delx1(A) B b + A del/delx1(B) b + A B del/delx1(b)
 
So instead of taking the derivative with respect to vector X (which is what was about to produce 3rd order tensors), you're taking the derivative with respect to scalar component x1? (And then, in turn x2 and x3)?

This occurred to me last night, but I was passing it off as a big hack. To your point, it probably is the best way to do this problem, since it keeps each matrix as a matrix and the vector as a vector in the resulting derivative, while allowing matrix Calculus rules to hold.

This should make these problems much easier in the future, since now I can break it into pieces.
 
If you take the derivative of A...
A =
[ x1 + y1 y4 y7 ]
[ y2 x2 + y5 y8 ]
] y3 y6 x3 + y9 ]

...wrt x1 you get...
[ 1 0 0 ]
[ 0 0 0 ]
[ 0 0 0 ]

...wrt x2 you get...
[ 0 0 0 ]
[ 0 1 0 ]
[ 0 0 0 ]

...wrt x3 you get...
[ 0 0 0 ]
[ 0 0 0 ]
[0 0 1 ]

...in this case it's not too bad, but if you have a more complex system like the actual one I had been trying to work right now, it gets much more messy, and you still have to look for patterns in these matrices of 1's and 0's in order to consolidate it into a matrix/vector expression that yields the Jacobian.

Is there any easy way to handle this other than taking each matrix wrt a vector and dealing with 3rd order tensors instead?
 
Top