A non-symbolic approach to knowledge representation.
Artificial Neuron: i1--w1-->|----\ i2--w2-->| \ -> ...>| f >------> f(net) i ik--wk-->| / -------->|----/ (f = activation function) Bias (Constant Value) Domain for inputs (and outputs): (1) R (2) [0,1] (3) [-1,1] (4) {0,1} w is the weight of input, so: i1.w1 + i2.w2 + ... + ik.wk + w.bias = net i.e. net = Σ(1,k,i.w) + w.bias The values of input and output must be in the same domain Example: |- 1: net >= 0 f(net) = | |- 0: net < 0 net = x.1 + y.1 - 2 = x + y - 2 x -(+1)->|----\ | \ -> | f >------> f(net) i y -(+1)->| / 1 -(-2)->|----/ D = {0,1} x | y | net | f(net) ---+---+-----+-------- 0 | 0 | -2 | 0 1 | 0 | -1 | 0 (logical '^') 0 | 1 | -1 | 0 1 | 1 | 0 | 1 What if bias is replaced with -1? Neural Networks: [1]----+-|\ [2]--+-|-| |----------|\ [3]+-|-|-|/ | \ | | +----|\ | / (inputs can go to every neuron) | +------| |-------|/ +--------|/
Neural networks can be simplified so there is only the output layer. Since these neurons don't depend on eachother and they all have the same function, we can only keep one.
Example Recognize pattern A Solution: use supervised training (of a perceptron) -- We want the perceptron to return YES if the input is "A" and NO otherwise.
+----------------------------------------------------+ | Non-A characters +--------------+ | | | 'A'characters| | | +-----------------+ | | | | | | | | | | Finite +------|-------+ | | | Training Set | | | +-----------------+ | D - domain +----------------------------------------------------+ of objects Objective: after the training, the NN will, with high probability, classify all the characters in the domain correctly. Training set T: T = {(i1,an1),(i2,an2),...,(ik,ank)} - if all inputs are correct, the training is done - Other wise we repeat until they are all correct
How to adjust weights?
For input i, we compare an with f(net)
delta = an - f(net)
wl := wl + C.delta.il
c < 1 learning note
PERCEPTRON LEARNING ALGORITHM INPUT: initial weights w1,...,wk training set T = {(i1,an1),(i2,an2),...,(ik,ank)} learning note 0 < C < 1 OUTPUT: modified w1,...,wk METHOD: change = true while change do change = false for every training pair (is, ans) do /* is = (is1,...,isk) */ delta = (ank - f(netk)) for input ik if delta /= 0 for every input weight wj do wj := wj + C.delta.isj w = w + C.delta change = true output w1,...,wk
A problem (in 2D) is any set of points on the plain.
Definition: A problem P is Linearly Separable if there exists a line l such that all the points in P are below or above l. I.E. l seperates all the positive instances from all the negative instances of the problem.
^ |\ |+\ - | \ - |+ +\- | + \ - +-----\-------------->
Fact: Perceptron can only learn linearly seperable problems.
Example (see the previous example) Negative Example: ^ - - | - ######### - | - #+ + + # - | - # + + # - | - #+ + +# - | ######### - | - - - - +--------------------> The problem above can not be learned by a perceptron.
Note: All activation functions in BPNN are continuous.
Typically:
f(x) = 1 / ( 1 + e^-x )
Let in be the input to BPNN and let out = (out1,..., outn) be the output. If out = an (note: (in,an) is from training set), there is no need to correct weights.
Suppose that an /= out. Let N be an output neuron such that outN /= anN.
i1--w1-->|\ i2--w2-->| \ ...>| >------> f(net) = outN /= anN ik--wk-->| / 1 --w--->|/
Let us consider the netN value.
Compute error term ΔN
for N: ΔN = ( anN - outN ) . f'(net) (Note that f'(net) returns (and sets Δ to) zero where f(net) is flat.)
New weights: Wj = Wj + C.ΔN.ij
Let us consider a neuron M from 2nd lost layer.
i1--w1-->|\ / V1 Δ1 i2--w2-->| \ / ...>|M >--- V2 Δ2 /= 0 ik--wk-->| / \ 1 --w--->|/ \ V3 Δ3 outM = f(netM) M computes its error term: ΔM = Σ(1,n,V.Δ) . f'(netM)
Correct weights: Wj = Wj + C.ΔM.ij
+--------+ \ i1--w1-->| |-Δ1-> out1 an11 | i2--w2-->| | | ...>| |-Δ2-> out2 an12 > an1 in--wn-->| | ... | 1 --w--->| |-Δk-> outk an1k | +--------+ /
T:(i1, an1), ..., (it, ant)
Δ = l=1Σt j=1Σk Δj2 ^ Δ | . | . | . . . | . . . | . +-------------------->
Termination condition for the BPNN is the first minimum for Δ