Output Layer | OPE Prop

All modern models use more that one parameter in their architectures, so now it's time to discuss the OPE Prop algorithm for multiple parameters / weights.

If we consider that:

i - is the inputs in the shape of an array of arrays of numbers, where the first dimension is the data row, and the second are the values.

a - is the vector of answers, since each unit outputs only one value (y)

n - is the length of our dataset

w - is a vector of weights

b - is the bias

e(w₁, w₂, ..., b) - is our error function (MSE for basic purposes)

Then we've got the following formula for our error function e(w₁, w₂, ..., b):

e(w₁, w₂, ..., b) = n^-1 ((a₁ - w₁i₁₁ - w₂i₁₂ - ... - b)² + (a₂ - w₁i₂₁ - w₂i₂₂ - ... - b)² + ...)

Then the partial derivative of the error function to, for example, w₁ is:

de/dw₁ = -2n^-1 ( (a₁i₁₁ - w₁i₁₁² - w₂i₁₂i₁₁ - ... - bi₁₁) + (a₂i₂₁ - w₁i₂₁² - w₂i₂₂i₂₁ - ... - bi₂₁) + ... )

de/db = -2n^-1 (a₁ + a₂ + ... - nb - w₁(i₁₁ + i₂₁ + ...) - w₂(i₁₂ + i₂₂ + ...) - ...)

And the extrema's are at:

w₁(i₁₁² + i₂₁² + ...) + w₂(i₁₂i₁₁ + i₂₂i₂₁ + ...) + ... + b(i₁₁ + i₂₁ + ...) = a₁i₁₁ + a₂i₂₁ + ...

w₁(i₁₁ + i ₂₁ + ...) + w₂(i₁₂ + i ₂₂ + ...) + ... + nb = a₁ + a₂ + ...

So, to find the best-fitting weight (w₁ in this case) with the current model, we do the following:

w₁ = ( (a₁i₁₁ + a₂i₂₁ + ...) - b(i₁₁ + i₂₁ + ...) - w₂(i₁₂i₁₁ + i₂₂i₂₁ + ...) - ... ) / (i₁₁² + i₂₁² + ...)

b = ( (a₁ + a₂ + ...) - w₁(i₁₁ + i ₂₁ + ...) - w₂(i₁₂ + i ₂₂ + ...) - ... ) / n

The long formula can be hard to interpret, so we can break it up into pieces and understand them as a whole easier.

Assuming we want to find the best-fitting value for w_r, we will do the following:

Calculate the sum of each corresponding answer times the corresponding input of the weight, let's name it s. Then s is equal to:

s = a₁i_1r + a₂i_2r + ...

Then calculate a variable g, equal to this:

g = b (i_1r + i_2r + ...)

Then assign a variable k to this, assuming that n is a vector of all weight numbers, except r:

k = w_n₁ (i_1n₁i_1r + i_2n₁i_2r + ...) + w_n₂ (i_1n₂i_1r + i_2n₂i_2r + ...) + ...

And then assign j to the sum of all the inputs to this weight squared, equal to this:

j = i_1r² + i_2r² + ...

Now, the best–fitting value of w_r is:

w_r = (s - g - k) / j

This is the main formula of the OPEProp algorithm, now you can check out the simpler formula for the hidden layer, which uses the same algorithm as the output layer formula. If you just want a simple implementation in code and not worry about all of these quirky equations I am explaining, you can use the OPEProp Library. But if you want to realize it yourself, you'll need to understand all of this weird math, good luck and I hope you create whatever you want!

OPE Prop formulas on this website are licensed under the CC BY-SA 4.0 License. More details here