sparsity
[#text mining]

Huge matrices are created based on word
frequencies with many cells having zero
values.

This problem is called sparsity and is
minimized using various techniques.


## Articles

### keyword extraction: nltk, sklearn

Automated Keyword Extraction from Articles using NLP

kag datasets download benhamner/nips-papers


### textrank: numpy, spacy

towardsdatascience.com/textrank-for-keyword-extraction-by-python-c0bae21bcec0

### ngram, modified skip-gram, spacy

Keywords Extraction with Ngram and Modified Skip-gram based on spaCy

## TODO Turn the math4IQB lectures into keywords

readsubs "https://www.youtube.com/watch?v=gfPUWwBkXZY"

ANNs are mathematical machines the biology can
only get us so far we really need math to
extend what we get from the biology into a
useful algorithm and the deeper the math that
we use the better the network we actually will
be working with so math and psychology

actually there are two questions that we can't
answer biologically what type of network we
should use the network topology of real-world
NNs is just far too complex and how do we
estimate synaptic weights and these thresholds
theta sub J

to these questions will change so we'll get
something that works

we do deeper and deeper mathematics these
answers that we're going to get in this
lecture will be modified so first off

we could say well let's suppose we looked at a
minimum a clinic minimally connected Network a
tree so for instance we could use a decision
tree where logistic regression is used for
each decision that is a NN and as a matter of
fact it's something that's kind of fun to set
up

is instead of using information gained at each
node use logistic regression at each node but

what we'll find out in such a case is that
such a ANN is actually a linear classifier and
training would be via maximum entropy and we
don't necessarily have any indication of
maximum entropy training in our own brains so
minimally connected may not be the best
approach we don't escape linear what about
maximally connected this actually has some
utility so we're going to look at an ANN on a
complete graph with a discrete firing function
and

in particular we'll look at what are called
hopfield net works on will correspond to one
and off will correspond to negative one not
zero

and so our firing function is actually going
to from negative one to one at some threshold
theta sub J and will be completely connected

it's a complete graph

and we're going to assume symmetric weights

so W IJ is equal to W J I and the hopfield
network fires randomly

so we'll go through and randomly choose a
neuron

and we'll update it and then randomly choose
another so on and so forth

so we can look at this thing hopfield network
in terms of matrices are our inputs

there's one input for each neuron and is also
the output from the neuron and the input to
the other neurons and each X of J is either 1
or negative 1 in the weight matrix

is just all the synaptic weights the neurons
are not connected to themselves

and it's a symmetric matrix

now we're going to use hebbian learning
learning will correspond to modification of
the synaptic weights and we'll do so using
what's known as a heavy inerting rule we get
this from the cognitive psychologist David hab
who came up with a learning theory based on
the idea that learning takes place by
reinforcing connections among learned States
so to learn a pattern

so we want the network to be able to recall
this pattern that we're going to give it

then we're going to have each entering the
pattern be either

a plus 1 or a minus 1

we're going to fix a learning rate

and then we're going to update what we had
previously for the synaptic weights using this
very simple rule

the new synaptic weights will be the old
synaptic weights plus epsilon times P sub

I times P sub J in matrix form

we're actually looking at what we call an
outer product a column times a row and this
gives us our heavy and learning matrix except
down the diagonal we have

P 1 squared P 2

squared so and so forth

each one of these is a 1 however so that means
that we can subtract the identity matrix and
that will remove the diagonal so that our
matrix form rule is the new matrix will be the
old matrix plus epsilon times P dot P
transpose minus the identity assimilation
begins with an initial state after which we
select a neuron at random and fire based on
this firing rule notice here that I will not
be equal to J so the neurons not connected to
itself and we repeat until hopefully something
useful happens we're going to look at this in
terms of letters in some sense

so we're going to have these rectangular grids
and blue

will correspond to a1 and white will
correspond to a 0 but remember that our is 1
why it is negative 1 or we can think of this
blue is true and white is false and we're
using negative 1 for false now we imagine
complete connectivity with all these weights
now I haven't shown all the edges here we just
want to imagine that every single one of these
rectangles is connected to every other
rectangle

and then we want to choose the neuron at
random and calculate it to new state

so here's the actual simulation of that we'll
take our input pattern

this is a T will learn that input and let's
teach it another one so this is using the
hebbian learning rule we're updating that
matrix using this matrix learning rule to
update that synaptic matrix so we've learned T
and a C and here we go with an I

so we can learn that

and so now we want to see if we can recall

and so we put something in and notice that
we're not going to put exactly in

but we're going to say that sort of looks like
an eye

and so now we're going to fire ten neurons at
a time

and you'll notice that what happens let's go
to a hundred at a time is as we randomly
choose we get something that settles in to one
of the letters that we've learned

so we learned the letter C as we randomly
select and fire

we call that asynchronous

then we end up with this now the hopfield
network has an energy and the energy is
defined as you see here

and there's a theorem that the energy
decreases each time a neuron fires and let's
actually prove that so if we take the new
energy minus the old energy the is so after
we've randomly selected a neuron

I then only the X sub

i's

can change because we selected an exabyte
random and everything else stays the same so
therefore

in that double sum all we're left with is the
X sub I term

so we can see that in this double sum that the
only thing left from the double sum will be
the X sub I now notice that we have a negative
here out in front that's going to be important
now suppose that the new value of X sub I is
greater than the value of x by well that will
imply that the first term is positive it'll
also imply that the sum of the weighted inputs
was greater than the threshold which will
imply that the second term was positive and so
therefore e new - e old will be a negative
times a positive times a positive and that
implies that inu - the old is negative or that
the energy decreased due to the firing from
what it was previously the other case

is that the new value of x sub I is less than
the old value of X sub I in which case the
first term is negative which implies the
threshold was larger than the weighted sum and
therefore the second term was negative and
therefore we get the product of three
negatives and once again the new energy is
less than the old so in either case we get
less energy or lower value of the energy

so let's look at this in action so now when I
learn things it's actually going to show us
what the energy is

so there's the energy for learning the letter
T and now let's learn the letter C and notice
when we hit the learn button that it's going
to have an energy negative

forty four ninety eight notice all these
energies are negative and now we're going to
look at the I will learn the I and once we've
learned the I then once again we get a
negative energy so if we recognize or if we
want to see if we can recognize

so we think that looks like an

I don't think well let's randomly choose
neurons and notice what happens after ten is
that the energy is going down from the initial
input pattern in particular

it's going to keep going down until it reaches
a final value corresponding to something that
we've learned so

this works no matter what pattern we put in

we're going to start at a higher energy and as
we asynchronously choose neurons at random and
fire

then it's going to settle in notice we also
begin to see a problem here because we might
have said that look like a C but in reality it
thinks it's an i

and then the problem is because we have an
energy surface this is in in dimensions which
can have spurious states it can also have
rather broad valleys for some patterns but
narrow valleys for other

but we're going to focus on the spurious
states concept we learn some kind of a pattern
in this case the T

and then we learn say SC

and so we got another minimum

and then we also learn I

and we got another minimum

but in the course of learning these letters we
start introducing other minima local minima

and these local minima are places where the
network could settle into

but they're not things that we actually wanted
that we taught the network

they're spurious

they just popped up so

can a hopfield network correctly predict the
class of any trained pattern in other words
can we get F of pattern equals class to some
high degree of accuracy

no

we can't and the reason is that the more we
train with these patterns the more the
spurious states can overwhelm what we've
learned so that we eventually will have lacked
the ability to correctly recall what the
network was taught now let's look at an
example of that some will teach it a new
letter and will teach it the letter H

and so I teach it

the letter H using our hebbian learning rule
to change the synaptic weight matrix using PP
transpose minus I and

now let's suppose we want to recognize
something

and so we do that that thinks that's an AI

okay we'll give it that and now let's suppose
that we say we want to learn something else

I mean recognize something else

what thinks that's an I

so it's got a wide valley for the I

so it really thinks kind of thinks
everything's an AI and if you'll notice that's
because we've reinforced the upper and lower
part of the eye with three different patterns
now we enter this input pattern

and it converges to something that we didn't
teach it in fact it's pretty easy to recreate
the spurious state we just make a C and
anything that looks like a C with some extra
stuff is going to converge the spurious state
that's got the extra thing there

and you can see that

and we'll put some junk in here inside the C

and if we run it and run it randomly selecting
neurons ten at a time now hundred at a time

and it converges down to a minimum energy

but this is a local minimum this is a spurious
State

this is not something we actually taught the
network

so what is the best network

well we have to turn to mathematics to get

vim +/"logistic regression$" "$NOTES/glossary.txt"