Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you’re actually massively overthinking it.

The state of a neural network is described entirely by its parameters, which usually consist of a long array (well, a matrix, or a tensor, or whatever…) of floating point numbers. What is being optimised when a network is trained is these parameters and nothing else. When you evaluate a neural network on some input (often called performing ‘inference’), that is when the functions we’re talking about are used. You start with the input vector, and you apply all of those functions in order and you get the output vector of the network. The training process also uses these functions, because to train a network you have to perform evaluation repeatedly in between tweaking those parameters to make it better approximate the desired output for each input. Importantly, the functions do not change. They are constant; it’s the parameters that change. The functions are the architecture — not the thing being learned. Essentially what the parameters represent is how likely each neuron is to be activated (have a high value) if others in the previous layer are. So you can think of the parameters as encoding strengths of connections between each pair of neurons in consecutive layers. Thinking about ‘what path to take through the neural layers’ is way too sophisticated — it’s not doing anything like that.

> Though fundamentally I don't think there is distinction between functions in computer science and mathematics. The program as a whole is effectively a function.

You’re pretty much right about that, but there are two important problems/nitpicks:

(1) We can’t prove (in general) that a given program will halt and evaluate to something (rather than just looping forever) on a given input, so the ‘entire program’ is instead what’s called a partial function. This means that it’s still a function on its domain — but we can’t know what its precise domain is. Given an input, it may or may not produce an output. If it does, though, it’s well defined because it’s a deterministic process.

(2) You’re right to qualify that it’s the whole program that is (possibly) a function. If you take a function from some program that depends on some state in that same program, then clearly that function won’t be a proper ‘mathematical’ function. Sure, if you incorporate that extra state as one of your inputs, it might be, but that’s a different function. You have to remember that in mathematics, unlike in programming, a function consists essentially of three pieces of data: a domain, a codomain, and a ‘rule’. If you want to be set-theoretic and formal about it, this rule is just a subset of the cartesian product of its domain and codomain (it’s a set of pairs of the form (x, f(x))). If you change either of these sets, it’s technically a different function and there are good reasons for distinguishing between these. So it’s not right to say that mathematical functions and functions in a computer program are exactly the same.



I appreciate your responses, sorry I hope I don't seem like Im arguing for the sake of arguing.

>Essentially what the parameters represent is how likely each neuron is to be activated (have a high value) if others in the previous layer are. So you can think of the parameters as encoding strengths of connections between each pair of neurons in consecutive layers. Thinking about ‘what path to take through the neural layers’ is way too sophisticated — it’s not doing anything like that.

Im a little confused. The discussion thus far about how neural networks are essentially just compositions of functions, but you are now saying that the function is static, and only the parameters change.

But that aside, if these parameters change which neurons are activated, and this activation affects which neurons are activated in the next layer, are these parameters effectively not changing the path taken through the layers?

>Sure, if you incorporate that extra state as one of your inputs, it might be, but that’s a different function.

So say we have this program " let c = 2; function 3sum (a,b) { return a+b + c; } let d = 3sum(3,4)"

I believe you are saying, if we had constructed this instead as

"function(a,b,c) { return a+b+c } let d = 3sum(3,4,2) "

then, this is a different function.

Certainly, these are different in a sense, but at a fundamental level, when you compile this all down and run it, there is an equivalency in the transformation that is happening. That is, the two functions equivalently take some input state A (composed of a,b,c) and return the same output state B, while applying the same intermediary steps (add a to b, add c to result of (add to b)). Really, in the first case where c is defined outside the scope of the function block, the interpreter is effectively producing the function 3sum(x,y,c) as it has to at some point, one way or another, inject c into a+b+c.

Similarly, I am won't argue that the current, formal definitions of functions in mathematics are exactly that of functions as they're generally defined in programming.

Rather, what I saying is that there is an equivalent way to think and study functions that equally apply to both fields. That is, a function is simply a transformation from A to B, where A and B can be anything, whether that is bits, numbers, or any other construction in any system. The only primitive distinction to make here is whether A and B are the same thing or different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: