OPE Prop for the Output Layer
All modern models use more that one parameter in their architectures, so now it's time to discuss the OPE Prop algorithm for multiple parameters / weights.
The Main Formula
If we consider that:
i - is the inputs in the shape of an array of arrays of numbers, where the first dimension is the data row, and the second are the values.
a - is the vector of answers, since each unit outputs only one value (y)
n - is the length of our dataset
w - is a vector of weights
b - is the bias
e(w1, w2, ..., b) - is our error function (MSE for basic purposes)
Then we've got the following formula for our error function e(w1, w2, ..., b):
e(w1, w2, ..., b) = n-1 ((a1 - w1i11 - w2i12 - ... - b)2 + (a2 - w1i21 - w2i22 - ... - b)2 + ...)
Then the partial derivative of the error function to, for example, w1 is:
de/dw1 = -2n-1 ( (a1i11 - w1i112 - w2i12i11 - ... - bi11) + (a2i21 - w1i212 - w2i22i21 - ... - bi21) + ... )
de/db = -2n-1 (a1 + a2 + ... - nb - w1(i11 + i21 + ...) - w2(i12 + i22 + ...) - ...)
And the extrema's are at:
w1(i112 + i212 + ...) + w2(i12i11 + i22i21 + ...) + ... + b(i11 + i21 + ...) = a1i11 + a2i21 + ...
w1(i11 + i 21 + ...) + w2(i12 + i 22 + ...) + ... + nb = a1 + a2 + ...
So, to find the best-fitting weight (w1 in this case) with the current model, we do the following:
w1 = ( (a1i11 + a2i21 + ...) - b(i11 + i21 + ...) - w2(i12i11 + i22i21 + ...) - ... ) / (i112 + i212 + ...)
b = ( (a1 + a2 + ...) - w1(i11 + i 21 + ...) - w2(i12 + i 22 + ...) - ... ) / n
A Better Explanation
The long formula can be hard to interpret, so we can break it up into pieces and understand them as a whole easier.
Assuming we want to find the best-fitting value for wr, we will do the following:
Calculate the sum of each corresponding answer times the corresponding input of the weight, let's name it s. Then s is equal to:
s = a1i1r + a2i2r + ...
Then calculate a variable g, equal to this:
g = b (i1r + i2r + ...)
Then assign a variable k to this, assuming that n is a vector of all weight numbers, except r:
k = wn1 (i1n1i1r + i2n1i2r + ...) + wn2 (i1n2i1r + i2n2i2r + ...) + ...
And then assign j to the sum of all the inputs to this weight squared, equal to this:
j = i1r2 + i2r2 + ...
Now, the best–fitting value of wr is:
wr = (s - g - k) / j
This is the main formula of the OPEProp algorithm, now you can check out the simpler formula for the hidden layer, which uses the same algorithm as the output layer formula. If you just want a simple implementation in code and not worry about all of these quirky equations I am explaining, you can use the OPEProp Library. But if you want to realize it yourself, you'll need to understand all of this weird math, good luck and I hope you create whatever you want!
OPE Prop formulas on this website are licensed under the CC BY-SA 4.0 License. More details here