You have a function cleverly designed so that being zero is optimal. Closer to zero the better. It has 1000 dials to control it bit otherwise input and output.
So like a AWS Lambda with 1000 env vars!
Some clever math gal designed it so if you do this gradient descent thing it learns! But let's not worry about that for now. We just want to understand gradient descent.
So you have an input you like. And a desired output and thia function that makes an actual output and a way to turn that into a score of closeness. Closer to Zero better.
So you put the input, env vats, output and you get say 0.3
Not bad. But then you decide to wiggle an env var just a bit to see if it makes it better. 0.31 doh! Ok the other way. 0.29 yay! Ok so leave it there and do the next one and so on.
Now repeat with the next input and output pair.
And again with another.
Then do the whole set again!
You will find the average amount you are wrong by gets better!
This is sort of gradient descent.
One extra trick. Using maths and calculus you can figure out how to adjust the env vars so you dont need to guess and the amount you adjust them will be more optimal.
Calculus is about the rate things change, and if you say do A + B then a change in A becomes the same change in A + B but you can also do this in reverse! This let's you calculate not guess those changes needed to the env vars.
You have a function cleverly designed so that being zero is optimal. Closer to zero the better. It has 1000 dials to control it bit otherwise input and output.
So like a AWS Lambda with 1000 env vars!
Some clever math gal designed it so if you do this gradient descent thing it learns! But let's not worry about that for now. We just want to understand gradient descent.
So you have an input you like. And a desired output and thia function that makes an actual output and a way to turn that into a score of closeness. Closer to Zero better.
So you put the input, env vats, output and you get say 0.3
Not bad. But then you decide to wiggle an env var just a bit to see if it makes it better. 0.31 doh! Ok the other way. 0.29 yay! Ok so leave it there and do the next one and so on.
Now repeat with the next input and output pair.
And again with another.
Then do the whole set again!
You will find the average amount you are wrong by gets better!
This is sort of gradient descent.
One extra trick. Using maths and calculus you can figure out how to adjust the env vars so you dont need to guess and the amount you adjust them will be more optimal.
Calculus is about the rate things change, and if you say do A + B then a change in A becomes the same change in A + B but you can also do this in reverse! This let's you calculate not guess those changes needed to the env vars.