sparsity [#text mining] Huge matrices are created based on word frequencies with many cells having zero values. This problem is called sparsity and is minimized using various techniques.
kag datasets download benhamner/nips-papers
ngram, modified skip-gram, spacy
TODO Turn the
math4IQB lectures into keywords
ANNs are mathematical machines the biology can only get us so far we really need math to extend what we get from the biology into a useful algorithm and the deeper the math that we use the better the network we actually will be working with so math and psychology actually there are two questions that we can't answer biologically what type of network we should use the network topology of real-world NNs is just far too complex and how do we estimate synaptic weights and these thresholds theta sub J we'll start with the first and we'll see that as we learn more the answer to these questions will change so we'll get something that works but then as we learn more as we refine more as we do deeper and deeper mathematics these answers that we're going to get in this lecture will be modified so first off we could say well let's suppose we looked at a minimum a clinic minimally connected Network a tree so for instance we could use a decision tree where logistic regression is used for each decision that is a NN and as a matter of fact it's something that's kind of fun to set up is instead of using information gained at each node use logistic regression at each node but what we'll find out in such a case is that such a ANN is actually a linear classifier and training would be via maximum entropy and we don't necessarily have any indication of maximum entropy training in our own brains so minimally connected may not be the best approach we don't escape linear what about maximally connected this actually has some utility so we're going to look at an ANN on a complete graph with a discrete firing function and in particular we'll look at what are called hopfield net works on will correspond to one and off will correspond to negative one not zero and so our firing function is actually going to from negative one to one at some threshold theta sub J and will be completely connected it's a complete graph and we're going to assume symmetric weights so W IJ is equal to W J I and the hopfield network fires randomly so we'll go through and randomly choose a neuron and we'll update it and then randomly choose another so on and so forth so we can look at this thing hopfield network in terms of matrices are our inputs there's one input for each neuron and is also the output from the neuron and the input to the other neurons and each X of J is either 1 or negative 1 in the weight matrix is just all the synaptic weights the neurons are not connected to themselves and it's a symmetric matrix now we're going to use hebbian learning learning will correspond to modification of the synaptic weights and we'll do so using what's known as a heavy inerting rule we get this from the cognitive psychologist David hab who came up with a learning theory based on the idea that learning takes place by reinforcing connections among learned States so to learn a pattern so we want the network to be able to recall this pattern that we're going to give it then we're going to have each entering the pattern be either a plus 1 or a minus 1 we're going to fix a learning rate and then we're going to update what we had previously for the synaptic weights using this very simple rule the new synaptic weights will be the old synaptic weights plus epsilon times P sub I times P sub J in matrix form we're actually looking at what we call an outer product a column times a row and this gives us our heavy and learning matrix except down the diagonal we have P 1 squared P 2 squared so and so forth each one of these is a 1 however so that means that we can subtract the identity matrix and that will remove the diagonal so that our matrix form rule is the new matrix will be the old matrix plus epsilon times P dot P transpose minus the identity assimilation begins with an initial state after which we select a neuron at random and fire based on this firing rule notice here that I will not be equal to J so the neurons not connected to itself and we repeat until hopefully something useful happens we're going to look at this in terms of letters in some sense so we're going to have these rectangular grids and blue will correspond to a1 and white will correspond to a 0 but remember that our is 1 why it is negative 1 or we can think of this blue is true and white is false and we're using negative 1 for false now we imagine complete connectivity with all these weights now I haven't shown all the edges here we just want to imagine that every single one of these rectangles is connected to every other rectangle and then we want to choose the neuron at random and calculate it to new state so here's the actual simulation of that we'll take our input pattern this is a T will learn that input and let's teach it another one so this is using the hebbian learning rule we're updating that matrix using this matrix learning rule to update that synaptic matrix so we've learned T and a C and here we go with an I so we can learn that and so now we want to see if we can recall and so we put something in and notice that we're not going to put exactly in but we're going to say that sort of looks like an eye and so now we're going to fire ten neurons at a time and you'll notice that what happens let's go to a hundred at a time is as we randomly choose we get something that settles in to one of the letters that we've learned so we learned the letter C as we randomly select and fire we call that asynchronous then we end up with this now the hopfield network has an energy and the energy is defined as you see here and there's a theorem that the energy decreases each time a neuron fires and let's actually prove that so if we take the new energy minus the old energy the is so after we've randomly selected a neuron I then only the X sub i's can change because we selected an exabyte random and everything else stays the same so therefore in that double sum all we're left with is the X sub I term so we can see that in this double sum that the only thing left from the double sum will be the X sub I now notice that we have a negative here out in front that's going to be important now suppose that the new value of X sub I is greater than the value of x by well that will imply that the first term is positive it'll also imply that the sum of the weighted inputs was greater than the threshold which will imply that the second term was positive and so therefore e new - e old will be a negative times a positive times a positive and that implies that inu - the old is negative or that the energy decreased due to the firing from what it was previously the other case is that the new value of x sub I is less than the old value of X sub I in which case the first term is negative which implies the threshold was larger than the weighted sum and therefore the second term was negative and therefore we get the product of three negatives and once again the new energy is less than the old so in either case we get less energy or lower value of the energy so let's look at this in action so now when I learn things it's actually going to show us what the energy is so there's the energy for learning the letter T and now let's learn the letter C and notice when we hit the learn button that it's going to have an energy negative forty four ninety eight notice all these energies are negative and now we're going to look at the I will learn the I and once we've learned the I then once again we get a negative energy so if we recognize or if we want to see if we can recognize so we think that looks like an I don't think well let's randomly choose neurons and notice what happens after ten is that the energy is going down from the initial input pattern in particular it's going to keep going down until it reaches a final value corresponding to something that we've learned so this works no matter what pattern we put in we're going to start at a higher energy and as we asynchronously choose neurons at random and fire then it's going to settle in notice we also begin to see a problem here because we might have said that look like a C but in reality it thinks it's an i and then the problem is because we have an energy surface this is in in dimensions which can have spurious states it can also have rather broad valleys for some patterns but narrow valleys for other but we're going to focus on the spurious states concept we learn some kind of a pattern in this case the T and then we learn say SC and so we got another minimum and then we also learn I and we got another minimum but in the course of learning these letters we start introducing other minima local minima and these local minima are places where the network could settle into but they're not things that we actually wanted that we taught the network they're spurious they just popped up so can a hopfield network correctly predict the class of any trained pattern in other words can we get F of pattern equals class to some high degree of accuracy no we can't and the reason is that the more we train with these patterns the more the spurious states can overwhelm what we've learned so that we eventually will have lacked the ability to correctly recall what the network was taught now let's look at an example of that some will teach it a new letter and will teach it the letter H and so I teach it the letter H using our hebbian learning rule to change the synaptic weight matrix using PP transpose minus I and now let's suppose we want to recognize something and so we do that that thinks that's an AI okay we'll give it that and now let's suppose that we say we want to learn something else I mean recognize something else what thinks that's an I so it's got a wide valley for the I so it really thinks kind of thinks everything's an AI and if you'll notice that's because we've reinforced the upper and lower part of the eye with three different patterns now we enter this input pattern and it converges to something that we didn't teach it in fact it's pretty easy to recreate the spurious state we just make a C and anything that looks like a C with some extra stuff is going to converge the spurious state that's got the extra thing there and you can see that and we'll put some junk in here inside the C and if we run it and run it randomly selecting neurons ten at a time now hundred at a time and it converges down to a minimum energy but this is a local minimum this is a spurious State this is not something we actually taught the network so what is the best network well we have to turn to mathematics to get that answer
A neural network A decision tree where logistic regression is used for each decision.
Instead of using information gain at each node, you use logistic regression.
vim +/"logistic regression$" "$NOTES/glossary.txt"
Thanks for reading!
If this article appears incomplete, it may be intentional. Try prompting for a continuation.
If this article appears incomplete, it may be intentional. Try prompting for a continuation.