Free Essay

Submitted By Jitesh19

Words 7306

Pages 30

Words 7306

Pages 30

Introduction to feedforward neural networks

Introduction to feedforward neural networks

1. Problem statement and historical context

A. Learning framework

Figure 1 below illustrates the basic framework that we will see in artiﬁcial neural network learning. We assume that we want to learn a classiﬁcation task G with n inputs and m outputs, where, y = G(x) ,

(1)

x = x1 x2 … xn

T

and y = y 1 y 2 … y m

T

.

(2)

In order to do this modeling, let us assume a model Γ with trainable parameter vector w , such that, z = Γ ( x, w )

(3)

where, z = z1 z2 … zm

T

.

(4)

Now, we want to minimize the error between the desired outputs y and the model outputs z for all possible inputs x . That is, we want to ﬁnd the parameter vector w∗ so that,

E ( w∗ ) ≤ E ( w ) , ∀w ,

(5)

where E ( w ) denotes the error between G and Γ for model parameter vector w . Ideally, E ( w ) is given by,

E(w) =

∫

y – z 2 p ( x ) dx

(6)

x

where p ( x ) denotes the probability density function over the input space x . Note that E ( w ) in equation (6) is dependent on w through z [see equation (3)]. Now, in general, we cannot compute equation (6) directly; therefore, we typically compute E ( w ) for a training data set of input/output data,

{ ( x i, y i ) } , i ∈ { 1, 2, …, p } ,

(7)

where x i is the n -dimensional input vector, x i = x i 1 x i 2 … x in

T

(8)

x2

y2

…

…

Unknown mapping G

xn

ym

z1 z2 Trainable model Γ

…

zm

-1-

model outputs

y1

…

inputs

x1

desired outputs

corresponding to the i th training pattern, and y i is the m -dimensional output vector,

Figure 1

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

y i = y i 1 y i 2 … y im

T

(9)

corresponding to the i th training pattern, i ∈ { 1, 2, …, p } . For (7), we can deﬁne the computable error function E ( w ) ,

1

E ( w ) = -2

p

∑

i=1

yi – zi

2

1

= -2

p

m

∑ ∑ ( yij – zij ) 2

(10)

i = 1j = 1

where, z i ≡ Γ ( x i, w ) .

(11)

If the data set is well distributed over possible inputs, equation (10) gives a good approximation of the error measure in (6).

As we shall see shortly, artiﬁcial neural networks are one type of parametric model Γ for which we can minimize the error measure in equation (10) over a given training data set. Simply put, artiﬁcial neural networks are nonlinear function approximators, with adjustable (i.e. trainable) parameters w , that allow us to model functional mappings, including classiﬁcation tasks, between inputs and outputs.

B. Biological inspiration

axon

…

dentrites

So why are artiﬁcial neural networks called artiﬁcial neural networks? These models are referred to as neural networks because their structure and function is loosely based on biological neural networks, such as the human brain. Our brains consist of basic cells, called neurons, connected together in massive and parallel fashion. An individual neuron receives electrical signals from dentrites, connected from other neurons, and passes on electrical signals through the neuron’s output, the axon, as depicted (crudely) in Figure 2 below.

neuron

Figure 2

axon output

A neuron’s transfer function can be roughly approximated by a threshold function as illustrated in Figure 3 below. In other words, a neuron’s axon ﬁres if the net stimulus from all the incoming dentrites is above some threshold. Learning in our brain occurs through adjustment of the strength of connection between neurons (at the axon-dentrite junction). [Note, this description is a gross simpliﬁcation of what really goes on in a brain; nevertheless, this brief summary is adequate for our purposes.]

net stimulus from dentrites

-2-

Figure 3

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

Now, artiﬁcial neural networks attempt to crudely emulate biological neural networks in the following important ways:

1. Simple basic units are the building blocks of artiﬁcial neural networks. It is important to note that artiﬁcial

“neurons” are much, much simpler than their biological counterparts.

2. Individual units are connected massively and in parallel.

3. Individual units have threshold-type activation functions.

4. Learning in artiﬁcial neural networks occurs by adjusting the strength of connection between individual units. These parameters are known as the weights of the neural network.

We point out that artiﬁcial neural networks are much, much, much simpler than complex biological neural networks (like the human brain). According to the Encyclopedia Britannica, the average human brain consists of approximately 10 10 individual neurons with approximately 10 12 connections. Even very complicated artiﬁcial neural networks typically do not have more than 10 4 to 10 5 connections between, at most, 10 4 individual basic units.

As of September, 2001, an INSPEC database search generated over 45,000 hits with the keyword “neural network.” Considering that neural network research did not really take off until 1986, with the publication of the backpropagation training algorithm, we see that research in artiﬁcial neural networks has exploded over the past 15 years and is still quite active today. We will try to cover some of the highlights of that research. First, however, we will formalize our discussion above, clearly deﬁning what a neural network is, and how we can train artiﬁcial neural networks to model input/output data; that is, how learning occurs in artiﬁcial neural networks.

2. What makes a neural network a neural network?

A. Basic building blocks of neural networks

Figure 4 below illustrates the basic building block of artiﬁcial neural networks; the unit’s basic function is intended to roughly approximate the behavior of biological neurons, although biological neurons tend to be orders-of-magnitude more complex than these artiﬁcial units.

In Figure 4, φ ≡ φ0 φ1 … φq

˜

T

(12)

represents a vector of scalar inputs to the unit, where the φ i variables are either neural network inputs x j , or the outputs from previous units, including the bias unit φ 0 , which is ﬁxed at a constant value (typically 1).

Also,

w ≡ ω0 ω1 … ωq

T

(13)

represents the input weights of the unit, indicating the strength of connection from the unit inputs φ i ; as we shall see later, these are the trainable parameters of the neural network. Finally, γ represents the (typically nonlinear) activation function of the unit, and ψ represents the scalar output of the unit where, ψ γ ω0 φ0 = 1

ω1

ω2

φ1

φ2

-3-

…

ωq φq Figure 4

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

q

ψ ≡ γ ( w ⋅ φ ) = γ ∑ ω i φ i

i = 0

˜

(14)

Thus, a unit in an artiﬁcial neural network sums up its total input and passes that sum through some (in general) nonlinear activation function.

B. Perceptrons

A simple perceptron is the simplest possible neural network, consisting of only a single unit. As shown in

Figure 6, the output unit’s activation function is the threshold function, u≥θ u<θ

1 γ t(u) =

0

(15)

which we plot in Figure 5. The output z of the perceptron is thus given by,

γ t(u)

1

Figure 5

0

u

θ

1

z =

0

w⋅x≥0 w⋅x<0 (16)

where, x = 1 x1 … xn

T

and,

w = ω0 ω1 … ωn

(17)

T

(18)

A perceptron like that pictured in Figure 6 is capable of learning a certain set of decision boundaries, speciﬁcally those that are linearly separable. The property of linear separability is best understood geometrically.

Consider the two, two-input Boolean functions depicted in Figure 7 — namely, the OR and the XOR functions (ﬁlled circles represent 0, while hollow circles represent 1). The OR function can be represented (and learned) by a two-input perceptron, because a straight line can completely separate the two classes. In other z ω0

ωn ω1 ω2

bias unit

1

x1

…

x2

Figure 6

-4-

xn

EEL5840: Machine Intelligence

ω 0 = – 0.5

Introduction to feedforward neural networks

x2

x2

1

1

0.6

ω2 = 1

0.6

0.4

ω1 = 1

0.8

0.8

0.4

0.2

0.2

0.2

0.4

0.6

0.8

OR function

1

x1

0.2

0.4

0.6

0.8

1

x1

XOR function

Figure 7

words, the two classes are linearly separable. On the other hand, the XOR function cannot be represented (or learned) by a two-input perceptron because a straight line cannot completely separate one class from the other. For three inputs and above, whether or not a Boolean function is representable by a simple perceptron depends on whether or not a plane (or a hyperplane) can completely separate the two classes.

The algorithm for learning a linearly separable Boolean function is known as the perceptron learning rule, which is guaranteed to converge for linearly separable functions. Since this training algorithm does not generalize to more complicated neural networks, discussed below, we refer the interested reader to [2] for further details. C. Activation function

In biological neurons, the activation function can be roughly approximated as a threshold function [equation

(15)], as in the case of the simple perceptron above. In artiﬁcial neural networks that are more complicated than simple perceptrons, we typically emulate this biological behavior through nonlinear functions that are similar to the threshold function, but are, at the same time, continuous and differentiable. [As we will see later, differentiability is an important and necessary property for training neural networks more complicated than simple perceptrons.] Thus, two common activation functions used in artiﬁcial neural networks are the sigmoid function,

1

γ ( u ) = ---------------1 + e –u

(19)

or the hyperbolic tangent function, e u – e –u γ ( u ) = -----------------e u + e –u

(20)

These two functions are plotted in Figure 8 below. Note that the two functions closely resemble the threshold function in Figure 5 and differ from each other only in their respective output ranges; the sigmoid function’s range is [ 0, 1 ] , while the hyperbolic tangent function’s range is [ – 1, 1 ] . In some cases, when a system output does not have a predeﬁned range, its corresponding output unit may use a linear activation function, hyperbolic tangent

sigmoid

1

1

0.8

0.5

γ (u)

γ (u)

0.6

0.4

0

-0.5

0.2

0

-1

-10

-5

u

0

5

10

-10

Figure 8

-5-

-5

u

0

5

10

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

γ (u) = u

(21)

From Figure 8, the role of the bias unit φ 0 should now be a little clearer; its role is essentially equivalent to the threshold parameter θ in Figure 5, allowing the unit output ψ to be shifted along the horizontal axis.

D. Neural network architectures

Figures 9 and 10 show typical arrangements of units in artiﬁcial neural networks. In both ﬁgures, all connections are feedforward and layered; such neural networks are commonly referred to as feedforward multilayer perceptrons (MLPs). Note that units that are not part of either the input or output layer of the neural network are referred to as hidden units, in part since their output activations cannot be directly observed from the outputs of the neural network. Note also that each unit in the neural network receives as input a connection from the bias unit.

The neural networks in Figures 9 and 10 are typical of many neural networks in use today in that they arrange the hidden units in layers, fully connected between consecutive layers. For example, ALVINN, a neural network that learned how to autonomously steer an automobile on real roads by mapping coarse camera images of the road ahead to corresponding steering directions [3], used a single-hidden-layer architecture to achieve its goal (see Figure 11 below).

MLPs are, however, not the only appropriate or allowable neural network architecture. For example, it is frequently advantageous to have direct input-output connections; such connections, which jump hidden-unit layers, are sometimes referred to as shortcut connections. Furthermore, hidden units do not necessarily have to be arranged in layers; later in the course, we will, for example, study the cascade learning architecture, an adaptive architecture that arranges hidden units in a particular, non-layered manner. We will say more about neural network architectures later within the context of speciﬁc, successful neural network applications.

Finally, we point out that there also exist neural networks that allow cyclic connections; that is, connections from any unit in the neural network to any other unit, including self-connections. These recurrent neural networks present additional challenges and will be studied later in the course; for now, however, we will conﬁne our studies to feedforward (acyclic) neural networks only.

E. Simple example

Consider the simple, single-input, single-output neural network shown in Figure 12 below. Assuming sigmoidal hidden-unit and linear output-unit activation functions (equations (19) and (21), respectively), what values of the weights { ω 1, ω 2, …, ω 7 } will approximate the function f ( x ) in Figure 12? z1 z2

zm output layer

signal flow (feedforward)

…

bias unit

…

1

bias unit

1

x1

x2

Figure 9

-6-

…

hidden unit layer

xn

input layer

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

z1

z2

zm output layer

signal flow (feedforward)

…

bias unit

…

hidden unit layer #2

…

hidden unit layer #1

1

bias unit

1

bias unit x1 1

…

x2

xn

input layer

Figure 10

To answer this question, let us ﬁrst express f ( x ) in terms of threshold activation functions [equation (15)]: f(x) = c[γ t(x – a) – γ t(x – b)]

(22)

f ( x ) = cγ t ( x – a ) – cγ t ( x – b )

(23)

Recognizing that the threshold function can be approximated arbitrarily well by a sigmoid function [equation

(19)],

γ t ( u ) → γ ( ku ) as k → ∞

(24)

we can rewrite (23) in terms of sigmoidal activation functions,

Straight

Ahead

Sharp

Right

30 Output

Units

4 Hidden

Units

30x32 Sensor

Input Retina

Figure 11

-7-

ALVINN: Neural Network for Autonomous Steering

Sharp

Left

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

z

f(x) ω7 ω6 ω5 ω2

ω1

ω4

ω3

bias unit

c

x

1

0 a b

Figure 12

x

f ( x ) ≈ cγ [ k ( x – a ) ] – cγ [ k ( x – b ) ] for large k .

(25)

Now, let us write down an expression for z , the output of the neural network. From Figure 12, z = ω5 + ω6 γ ( ω1 + ω2 x ) + ω7 γ ( ω3 + ω4 x )

(26)

Comparing (25) and (26), we arrive at two possible sets of weight values for approximating f ( x ) with z : weights ω1

ω2

ω3

ω4

ω5

ω6

ω7

set #1

– kb

k

– ka

k

0

–c

c

set #2

– ka

k

– kb

k

0

c

–c

3. Some theoretical properties of neural networks

A. Single-input functions

From the example in Section 2(E), we can conclude that a single-hidden layer neural network can model any single-input function arbitrarily well with a sufﬁcient number of hidden units, since any one-dimensional function can be expressed as the sum of localized “bumps.” It is important to note, however, that typically, a neural network does not actually approximate functions as the sum of localized bumps. Consider, for example, Figure 13. Here, we used a three-hidden neural network to approximate a scaled sine wave. Note that even with only three hidden units, the maximum neural network error is less than 0.01.

B. Multi-input functions

Now, does this universal function approximator property for single-hidden layer neural networks hold for multi-dimensional functions? No, because the creation of localized peaks in multiple dimensions requires an

1

0.004

0.8

NN error

0.002

f(x)

0.6

0.4

0.2

0

-0.002

-0.004

-0.006

-0.008

0

0

200

400

x

600

800

1000

Figure 13

-8-

0

200

400

600

x

800

1000

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

additional hidden layer. Consider, for example, Figure 14 below, where we used a four-hidden unit network to create a localized peak. Note, however, that unlike in the single-dimensional example, secondary ridges are also present. Thus, an additional sigmoidal hidden unit in a second layer is required to suppress the secondary ridges, but, at the same time, preserve the localized peak. This ad hoc “proof” indicates that any multi-input function can be modeled arbitrarily well by a two-hidden-layer neural network, as long as a sufﬁcient number of hidden units are present in each layer. A formal proof of this is given by Cybenko [1].

Figure 14

x2 x1 4. Neural network training

There are three basic steps in applying neural networks to real problems:

1. Collect input/output training data of the form:

{ ( x i, y i ) } , i ∈ { 1, 2, …, p } ,

(27)

where x i is the n -dimensional input vector, x i = x i1 x i2 … x in

T

(28)

corresponding to the i th training pattern, and y i is the m -dimensional output vector, y i = y i1 y i2 … y im

T

(29)

corresponding to the i th training pattern, i ∈ { 1, 2, …, p } .

2. Select an appropriate neural network architecture. Generally, this involves selecting the number of hidden layers, and the number of hidden units in each layer. For notational convenience, let, z = Γ ( w, x )

(30)

denote the m -dimensional output vector z for the neural network Γ , with q -dimensional weight vector w, w = ω1 ω2 … ωq

T

(31)

and input vector x . Thus, z i = Γ ( w, x i )

(32)

denotes the neural network outputs z i corresponding to the input vector for the i th training pattern.

3. Train the weights of the neural network to minimize the error measure,

-9-

EEL5840: Machine Intelligence

1

E = -2

Introduction to feedforward neural networks

p

∑

i=1

yi – zi

2

1

= -2

p

m

∑ ∑ ( yij – zij ) 2

(33)

i = 1j = 1

which measures the difference between the neural network outputs z i and the training data outputs y i .

This error minimization is also frequently referred to as learning.

Steps 1 and 2 above are quite application speciﬁc and will be discussed a little later. Here, we will begin to investigate Step 3 — namely, the training of the neural network parameters (weights) from input/output training data.

A. Gradient descent

Note that since z i (as deﬁned in equation (32) above) is a function of the weights w of the neural network, E is implicitly a function of those weights as well. That is, E changes as a function of w . Therefore, our goal is to ﬁnd that set of weights w∗ which minimizes E over a given training data set.

The ﬁrst algorithm that we will study for neural network training is based on a method known as gradient descent. To understand the intuition behind this algorithm, consider Figure 15 below, where a simple onedimensional error surface is drawn schematically. The basic question we must answer is: how do we ﬁnd the parameter ω∗ that corresponds to the minimum of that error surface (point d )?

Gradient descent offers a partial answer to this question. In gradient descent, we initialize the parameter ω to some random value and then incrementally change that value by an amount proportional to the negative derivative, dE

– -----dω

(34)

Denoting ω ( t ) as parameter ω at step t of the gradient descent procedure, we can write this in equation form as, dE ω ( t + 1 ) = ω ( t ) – η ------------dω ( t )

(35)

where η is a small positive constant that is frequently referred to as the learning rate. In Figure 15, given an initial parameter value of a and a small enough learning rate, gradient descent will converge to the global minimum d as t → ∞ . Note, however, that the gradient descent procedure is not guaranteed to always converge to the global minimum for general (non-convex) error surfaces. If we start at an initial ω value of b , iteration (35) will converge to e , while for an initial ω value of c , gradient descent will converge to f as t → ∞ . Thus, gradient descent is only guaranteed to converge to a local minimum of the error surface (for sufﬁciently small learning rates η ), not a global minimum.

E(ω)

b a e

c

f

d

Figure 15 ω - 10 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

Iteration (35) is easily generalized to error minimization over multiple dimensions (i.e. parameter vectors w ), w ( t + 1 ) = w ( t ) – η ∇E [ w ( t ) ]

(36)

where ∇E [ w ( t ) ] denotes the gradient of E with respect to w ( t ) ,

∂E

∂E

∇E [ w ( t ) ] = ∂E

…

∂ ω1 ( t ) ∂ ω2 ( t )

∂ ωq ( t )

T

(37)

Thus, one approach for training the weights in a neural network implements iteration (37) with the error measure deﬁned in equation (33).

B. Simple example

Consider the simple single-input, single-output feedforward neural network in Figure 16 below, with sigmoidal hidden-unit activation functions γ , and a linear output unit. For this neural network, let us, by way of example, compute,

∂E

∂ ω4

(38)

where,

1

E = -- ( y – z ) 2

2

(39)

for a single training pattern 〈 x, y〉 . Note that since differentiation is a linear operator, the derivative for multiple training patterns is simply the sum of the derivatives of the individual training patterns,

∂E

∂ 1

-=

∂ ωj

∂ ωj 2

p

∑ ( yi – zi

i=1

p

)2

∂

∑ ∂ω

=

i=1

j

2

1

-- ( y i – z i ) .

2

(40)

Therefore, generalizing the example below to multiple training patterns is straightforward.

First, let us explicitly write down z as a function of the neural network weights. To do this, we deﬁne some intermediate variables, net 1 ≡ ω 1 + ω 2 x

(41)

net 2 ≡ ω 3 + ω 4 x

(42)

which denote the net input to the two hidden units, respectively, and, h 1 ≡ γ ( net 1 )

(43)

h 2 ≡ γ ( net 2 )

(44)

which denote the outputs of the two hidden units, respectively. Thus, z ω7

ω6 ω5 ω2

ω1

ω4

ω3

bias unit

1

x

- 11 -

Figure 16

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

z = ω 5 + ω 6 h 1 + ω 7 h 2 (linear output unit).

(45)

Now, we can compute the derivative of E with respect to ω 4 . From (39), and remembering the chain rule of differentiation, ∂z

∂E

= –( y – z )

∂ ω4

∂ ω4

(46)

∂z ∂h 2 ∂net 2

∂E

-----------= (z – y)

∂ h 2 ∂ net 2 ∂ω 4

∂ ω4

(47)

∂E

= ( z – y )ω 7 γ'net 2 )x

(

∂ ω4

(48)

where γ'denotes the derivative of the activation function. This example shows that, in principle, computing the partial derivatives required for the gradient descent algorithm simply requires careful application of the chain rule. In general, however, we would like to be able to simulate neural networks whose architecture is not known a priori. In other words, rather than hard-code derivatives with explicit expressions like (48) above, we require an algorithm which allows us to compute derivatives in a more general way. Such an algorithm exists, and is known as the backpropagation algorithm.

C. Backpropagation algorithm

The backpropagation algorithm was ﬁrst published by Rumelhart and McClelland in 1986 [4], and has since led to an explosion in previously dormant neural-network research. Backpropagation offers an efﬁcient, algorithmic formulation for computing error derivatives with respect to the weights of a neural network. As such, it allows us to implement gradient descent for neural network training without explicitly hard-coding derivatives.

In order to develop the backpropagation algorithm, let us ﬁrst look at an arbitrary (hidden or output) unit in a feedforward (acyclic) neural network with activation function γ . In Figure 17, that unit is labeled j . Let h j be the output of unit j , and let net j be the net input to unit j . By deﬁnition, h j ≡ γ ( net j )

(49)

net j ≡ ∑ h k ω kj

(50)

k

Note that net j is summed over all units feeding into unit j ; unit i is one of those units. Let us now compute,

∂E ∂net j

∂E

----------=

∂ net j ∂ω ij

∂ ω ij

(51)

From equation (50),

∂net j

----------- = h i

∂ω ij

(52)

…

hj unit j hi unit i

ω ij

…

- 12 -

γ

net j

Figure 17

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

since all the terms in summation (50), k ≠ i are independent of ω ij . Deﬁning, δj ≡

∂E

∂ net j

(53)

we can write equation (51) as,

∂E

= δj hi

∂ ω ij

(54)

As we will see shortly, equation (54) forms the basis of the backpropagation algorithm in that the δ j variables can be computed recursively from the outputs of the neural network back to the inputs of the neural network.

In other words, the δ j values are backpropagated through the network (hence, the name of the algorithm).

D. Backpropagation example

Consider Figure 18, which plots a small part of a neural network. Below, we derive an expression for δ k (output unit) and δ j (hidden unit one layer removed from the outputs of the neural network). For a single training pattern, we can write,

1

E = -2

m

∑ ( yl – zl ) 2

(55)

l=1

where l indexes the outputs (not the training patterns). Now, δk ≡

∂z k

∂E

∂E

= ------------

∂ z k ∂net k

∂ net k

(56)

Since, z k = γ ( net k )

(57)

we have that,

∂z k

------------ = γ'net k )

(

∂net k

(58) zk unit k

γ net k ω jk

hj γ unit j

net j hi unit i

ω ij

Figure 18

- 13 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

Furthermore, from equation (55),

∂E

= ( zk – yk )

∂ zk

(59)

since all the terms in summation (55), l ≠ k are independent of z k . Combining equations (56), (58) and (59), and recalling equation (54), δ k = ( z k – y k )γ'net k )

(

(60)

∂E

= δk hj

∂ ω jk

(61)

Note that equations (60) and (61) are valid for any weight in a neural network that is connected to an output unit. Also note that h j is the output value of units feeding into output unit k . While this may be the output of a hidden unit, it could also be the output of the bias unit (i.e. 1) or the value of a neural network input (i.e. x j ).

Next, we want to compute δ j in Figure 18 in terms of the δ values that follow unit j . Going back to deﬁnition (53), δj ≡

∂E

=

∂ net j

∂net l

∂E

∑ ∂ net ------------

∂net l l

(62)

j

Note that the summation in equation (62) is over all the immediate successor units of unit j . Thus, δj =

∂net l

∑ δl ------------

∂net

(63)

j

l

By deﬁnition,

∑ ωsl γ ( nets )

net l =

(64)

s

So, from equation (64),

∂net l

----------- = ω jl γ'net j )

(

∂net j

(65)

since all the terms in summation (64), s ≠ j are independent of net j . Combining equations (63) and (65), δj =

(

∑ δl ωjl γ'netj )

(66)

l

δ j = ∑ δ l ω jl γ'net j )

(

(67)

∂E

= δj hi

∂ ω ij

(68)

l

Note that equation (67) computes δ j in terms of those δ values one connection ahead of unit j . In other words, the δ values are backpropagated from the outputs back through the network. Also note that h i is the output value of units feeding into unit j . While this may be the output of a hidden unit from an earlier hiddenunit layer, it could also be the output of a bias unit (i.e. 1) or the value of a neural network input (i.e. x i ).

It is important to note that (1) the general derivative expression in (54) is valid for all weights in the neural network; (2) the expression for the output δ values in (60) is valid for all neural network output units; and (3) the recursive relationship for δ j in (67) is valid for all hidden units, where the l -indexed summation is over all immediate successors of unit j .

- 14 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

E. Summary of backpropagation algorithm

Below, we summarize the results of the derivation in the previous section. The partial derivative of the error,

1

E = -2

m

∑ ( yl – zl ) 2

(69)

l=1

(i.e. a single training pattern) with respect to a weight ω jk connected to output unit k of a neural network is given by, δ k = ( z k – y k )γ'net k )

(

(70)

∂E

= δk hj

∂ ω jk

(71)

where h j is the output of hidden unit j (or the input j ), and net k is the net input to output unit k . The partial derivative of the error E with respect to a weight ω ij connected to hidden unit j of a neural network is given by, δ j = ∑ δ l ω jl γ'net j )

(

(72)

∂E

= δj hi

∂ ω ij

(73)

l

where h i is the output of hidden unit i (or the input i ), and net j is the net input to hidden unit j . The above results are trivially extended to multiple training patterns by summing the results for individual training patterns over all training patterns.

5. Basic steps in using neural networks

So, now we know what a neural network is, and we know a basic algorithm for training neural networks (i.e. backpropagation). Here, we will extend our discussion of neural networks by discussing some practical aspects of applying neural networks to real-world problems. Below, we review the steps that need to be followed in using neural networks.

A. Collect training data

In order to apply a neural network to a problem, we must ﬁrst collect input/output training data that adequately represents that problem. Often, we also need to condition, or preprocess that data so that the neural network training converges more quickly and/or to better local minima of the error surface. Data collection and preprocessing is very application-dependent and will be discussed in greater detail in the context of speciﬁc applications.

B. Select neural network architecture

Selecting a neural network architecture typically requires that we determine (1) an appropriate number of hidden layers and (2) an appropriate number of hidden units in each hidden layer for our speciﬁc application, assuming a standard multilayer feedforward architecture. Often, there will be many different neural network structures that work about equally well; which structures are most appropriate is frequently guided by experience and/or trial-and-error. Alternatively, as we will talk about later in this course, we can use neural network learning algorithms that adaptively change the structure of the neural network as part of the learning process.

C. Select learning algorithm

If we use simple backpropagation, we must select an appropriate learning rate η . Alternatively, as we will talk about later in this course, we have a choice of more sophisticated learning algorithms as well, including the conjugate gradient and extended Kalman ﬁltering methods.

- 15 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

D. Weight initialization

Weights in the neural network are usually initialized to small, random values.

E. Forward pass

Apply a random input vector x i from the training data set to the neural network and compute the neural network outputs ( z k ) , the hidden-unit outputs ( h j ) , and the net input to each hidden unit ( net j ) .

F. Backward pass

1. Evaluate δ k at the outputs, where,

∂E

δ k = -----------∂net k

(74)

for each output unit.

2. Backpropagate the δ values from the outputs backwards through the neural network.

3. Using the computed δ values, calculate,

∂E

------- ,

∂ω i

(75)

the derivative of the error with respect to each weight ω i in the neural network.

4. Update the weights based on the computed gradient, w ( t + 1 ) = w ( t ) – η ∇E [ w ( t ) ] .

(76)

G. Loop

Repeat steps E and F (forward and backward passes) until training results in a satisfactory model.

6. Practical issues in neural networks

A. What should the training data be?

Some questions that need to be answered include:

1. Is your training data sufﬁcient for the neural network to adequately learn what you want it to learn? For example, what if, in ALVINN [3], we down-sampled to 10 × 10 images, instead of 30 × 32 images? Such coarse images would probably not sufﬁce for learning the steering of the on-road vehicle with enough accuracy. At the same time we must make sure that we don’t include training data that is too much or irrelevant for our application (e.g. for ALVINN, music played while driving). Poorly correlated or irrelevant inputs can easily slowdown convergence of, or completely sidetrack, neural network learning algorithms.

2. Is your training data biased? Suppose for ALVINN, we trained the neural network on race track oval. How would ALVINN drive on real roads? Well, it would probably not have adequately learned right turns, since the race track consists of left turns only. The distribution of your training data needs to approximately reﬂect the expected distribution of input data where the neural network will be used after training.

3. Is your task deterministic or stochastic? Is it stationary or nonstationary? Nonstationary problems cannot be trained from ﬁxed data sets, since, by deﬁnition, things change over time.

We will have more on these concerns within the context of speciﬁc applications later.

B. What should your neural network architecture/structure be?

This question is largely task dependent, and often requires experience and/or trial-and-error to answer adequately. Therefore, we will have more on this question within the context of speciﬁc applications later. In

- 16 -

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

general, though, it helps to look at similar problems that have previously been solved with neural networks, and apply the lessons learned there to our current application. Adaptive neural network architectures, that change the structure of the neural network as part of training, are also an alternative to manually selecting an appropriate structure.

C. Preprocessing of data

Often, it is wise to preprocess raw input/output training data, since it can make the learning (i.e. neural network training) converge much better and faster. In computer vision applications, for example, intensity normalization can remove variation in intensity — caused perhaps by sunny vs. overcast days — as a potential source of confusion for the neural network. We will have more on this question within the context of speciﬁc applications later.

D. Weight initialization

Since the weight parameters w are learned through the recursive relationship in (76), we obviously need to initialized the weights [i.e. set w ( 0 ) ]. Typically, the weights are initialized to small, random values. If we were to initialize the weights to uniform (i.e. identical) values instead, the signiﬁcant weight symmetries in the neural network would substantially reduce the effective parameterization of the neural network since many partial error derivatives in the neural network would be identical at the beginning of training and remain so throughout. If we were to initialize the weights to large values, there is a high likelihood that many of the hidden unit activations in the neural network would be stuck in the ﬂat areas of the typical sigmoidal activation functions, where the derivatives evaluate to approximately zero. As such, it could take quite a long time for the weights to converge.

E. Select a learning parameter

If using standard gradient descent, we must select an appropriate learning rate η . This can be quite tricky, as the simple example below illustrates. Consider the trivial two-dimensional, quadratic “error” function,

2

2

E = 20ω 1 + ω 2

(77)

which we plot in Figure 19 below. [Note that equation (77) could never really be a neural network error function, since a neural network typically has many hundreds or thousands of weights.]

For this error function, note that the global minimum occurs at ( ω 1, ω 2 ) = ( 0, 0 ) . Now, let us investigate how quickly gradient-descent converges to this global minimum for different learning rates η ; for the purposes of this example, we will say that gradient descent has converged when E < 10 – 6 . First, we must compute the derivatives,

∂E

-------- = 40ω 1 , and,

∂ω 1

(78)

∂E

-------- = 2ω 2 ,

∂ω 2

(79)

40

E

2

20

1.5

0

-1.5

1

0.5

-1

-0.5

ω1

- 17 -

ω2

0

0

0.5

1

-0.5

Figure 19

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

so that the gradient-descent weight recursion in (76) is given by,

∂E

ω 1 ( t + 1 ) = ω 1 ( t ) – η --------------∂ω 1 ( t )

(80)

ω 1 ( t + 1 ) = ω 1 ( t ) ( 1 – 40η )

(81)

and similarly, ω 2 ( t + 1 ) = ω 2 ( t ) ( 1 – 2η ) .

(82)

From an initial point ( ω 1, ω 2 ) = ( 1, 2 ) , Figure 20 below plots the number of steps to convergence as a function of the learning parameter η . Note that the number of steps to convergence decreases as a function of the learning rate parameter η until about 0.047 (intuitive), but then shoots up sharply until 0.05 , at which point the gradient-descent equations in (81) and (82) become unstable and diverge (counter-intuitive).

# steps to convergence

1400

1200

1000

800

600

400

200

0

0.01

0.02

0.03

0.04

η

0.05

Figure 20

Figure 21 plots some actual gradient-descent trajectories for the learning rates 0.02 , 0.04 and 0.05 . Note that for η = 0.05 , gradient descent does not converge but oscillates about ω 2 = 0 . To understand why this is happening, consider the ﬁxed-point iterations in (81) and (82). Each of these is of the form, ω ( t + 1 ) = cω ( t )

(83)

which will diverge for any nonzero ω ( 0 ) and c > 1 , and converge for c < 1 . Thus, equation (81) will converge for,

1 – 40η < 1

(84)

– 1 < 1 – 40η < 1

(85)

η = 0.02

η = 0.04

η = 0.05

2

2

1.5

1.5

1.5

1

ω2

2

1

1

ω2

0.5

0

ω2

0.5

0

-0.5

0

-0.5

-1.5

-1

-0.5

ω1

0

0.5

1

0.5

-0.5

-1.5

-1

-0.5

ω1

Figure 21

- 18 -

0

0.5

1

-1.5

-1

-0.5

ω1

0

0.5

1

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

0 < η < 0.05

(86)

Since recursion (82) generates the weaker bound,

0<η<1,

(87)

the upper bound in (86) is controlling in that it determines the range of learning rates for which gradient descent will converge in this example.

We make a few observations from this speciﬁc example: First, “long, steep-sided valleys” in the error surface typically cause slow convergence with a single learning rate, since gradient descent will converge quickly down the steep valleys of the error surface, but will take a long time to travel along the shallow valley. Slow convergence of gradient descent is largely why we will study more sophisticated learning algorithms, with de facto adaptive learning rates, later in this course. In this example, convergence along the ω 2 axis is assured for larger η ; however, the upper bound in (86) prevents us from using a (ﬁxed) learning rate greater than or equal to 0.05 . Second, Figure 20, although drawn speciﬁcally for this example, is generally reﬂective of gradient-descent convergence rates for more complex error surfaces as well. If the chosen learning rate is too small, convergence can take a very long time, while learning rates that are too large will cause gradient descent to diverge. This is another reason to study more sophisticated algorithms — since selecting an appropriate learning rate can be quite frustrating, algorithms that do not require such a selection have a real advantage. Finally, note that, in general, it is not possible to determine theoretical convergence bounds, such as those in (86), for real neural networks and error functions. Only the very simple error surface in (77) allowed us to do that here.

F. Pattern vs. batch training

In pattern training, we compute the error E and the gradient of the error ∇E for one input/output pattern at a time, and update weights based on that single training example (Section 5 describes pattern training). It is usually a good idea to randomize the order of training patterns in pattern training, so that the neural network does not converge to a bad local minima or forget training examples early in the training.

In batch training, we compute the error E and the gradient of the error ∇E for all training examples at once, and update the weights based on that aggregate error measure.

G. Good generalization

Generalization to examples not explicitly seen in the training data set is one of the most important properties of a good model, including neural network models. Consider, for example, Figure 22. Which is a better model, the left curve or the right curve? Although the right curve (i.e. model) has zero error over the speciﬁc data set, it will probably generalize more poorly to points not in the data set, since it appears to have modeled the noise properties of the speciﬁc training data set. The left model, on the other hand, appears to have abstracted the essential feature of the data, while rejecting the random noise superimposed on top.

y

y

x

Figure 22

- 19 -

x

EEL5840: Machine Intelligence

Introduction to feedforward neural networks

NN error

cross-validation data

training data

Figure 23 early stopping point

training time

There are two ways that we can ensure that neural networks generalize well to data not explicitly in the training data set. First we need to pick a neural network architecture that is not over-parameterized — in other words, the smallest neural network that will perform its task well. Second, we can use a method known as cross-validation. In typical neural network training, we take our complete data set, and split that data set in two. The ﬁrst data set is called the training data set, and is used to actually train the weights of the neural network; the second data set is called the cross-validation data set, and is not explicitly used in training the weights; rather, the cross-validation set is reserved as a check on neural network learning to prevent overtraining. While training (with the training data set), we keep track of both the training data set error and the cross-validation data set error. When the cross-validation error no longer decreases, we should stop training, since that is a good indication that further learning will adjust the weights only to ﬁt peculiarities of the training data set. This scenario is depicted in the generic diagram of Figure 23 below, where, we plot neural network error as a function of training time. As we indicate in the ﬁgure, the training data set error will generally be lower than the cross-validation data set error; moreover, the training data set error will usually continue to decrease as a function of training time, whereas the cross-validation data set error will typically begin to increase at some point in the training.

[1] G. Cybenko, “Approximation by Superposition of a Sigmoidal Function,” Mathematics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303-14, 1989.

[2] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classiﬁcation, 2nd ed., Chapters 5 and 6,

John Wiley & Sons, New York, 2001. .

[3] D. A. Pomerleau, “Neural Network Perception for Mobile Robot Guidance,” Ph.D. Thesis, School of

Computer Science, Carnegie Mellon University, 1992.

[4] D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing: Exploration in the Microstructure of Cognition, vols. 1 and 2, MIT Press, Cambridge, MA, 1986.

- 20 -…...

Free Essay

...Oil Price Prediction using Artificial Neural Networks Author: Siddhant Jain, 2010B3A7506P Birla Institute of Technology and Science, Pilani Abstract: Oil is an important commodity for every industrialised nation in the modern economy. The upward or downward trends in Oil prices have crucially influenced economies over the years and a priori knowledge of such a trend would be deemed useful to all concernd - be it a firm or the whole country itself. Through this paper, I intend to use the power of Artificial Neural Networks (ANNs) to develop a model which can be used to predict oil prices. ANNs are widely used for modelling a multitude of financial and economic variables and have proven themselves to be a very powerful tool to handle volumes of data effectively and analysing it to perform meaningful calculations. MATLAB has been employed as the medium for developing the neural network and for efficiently handling the volume of calculations involved. Following sections shall deal with the theoretical and practical intricacies of the aforementioned model. The appendix includes snapshots of the generated results and other code snippets. Artificial Neural Networks: Understanding To understand any of the ensuing topics and the details discussed thereof, it is imperative to understand what actually we mean by Neural Networks. So, I first dwell into this topic: In simplest terms a Neural Network can be defined as a computer system modelled on the human brain and nervous......

Words: 3399 - Pages: 14

Free Essay

...ARTIFICIAL NEURAL NETWORKS METHODOLOGICAL ADVANCES AND BIOMEDICAL APPLICATIONS Edited by Kenji Suzuki Artificial Neural Networks - Methodological Advances and Biomedical Applications Edited by Kenji Suzuki Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Ivana Lorkovic Technical Editor Teodora Smiljanic Cover Designer Martina Sirotic Image Copyright Bruce Rolff, 2010. Used under license from Shutterstock.com First published March, 2011 Printed......

Words: 43079 - Pages: 173

Free Essay

... NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos | Abstract This report is an introduction to Artificial Neural Networks. The various types of neural networks are explained and demonstrated, applications of neural networks like ANNs in medicine are described, and a detailed historical background is provided. The connection between the artificial and the real thing is also investigated and explained. Finally, the mathematical models involved are presented and demonstrated. Contents: 1. Introduction to Neural Networks 1.1 What is a neural network? 1.2 Historical background 1.3 Why use neural networks? 1.4 Neural networks versus conventional computers - a comparison 2. Human and Artificial Neurones - investigating the similarities 2.1 How the Human Brain Learns? 2.2 From Human Neurones to Artificial Neurones 3. An Engineering approach 3.1 A simple neuron - description of a simple neuron 3.2 Firing rules - How neurones make decisions 3.3 Pattern recognition - an example 3.4 A more complicated neuron 4. Architecture of neural networks 4.1 Feed-forward (associative) networks 4.2 Feedback (autoassociative) networks 4.3 Network layers 4.4 Perceptrons 5. The Learning Process 5.1 Transfer Function 5.2 An Example to illustrate the above teaching procedure 5.3 The Back-Propagation Algorithm 6. Applications of neural networks 6.1 Neural networks in practice 6.2 Neural networks in medicine 6.2.1 Modelling and Diagnosing the...

Words: 7770 - Pages: 32

Free Essay

... GENETIC ALGORITHM & NEURAL NETWORK" IN WIRELESS NETWORK SECURITY (WNS) ABSTRACT The more widespread use of networks meaning increased the risk of being attacked. In this study illustration to compares three AI techniques. Using for solving wireless network security problem (WNSP) in Intrusion Detection Systems in network security field. I will show the methods used in these systems, giving brief points of the design principles and the major trends. Artificial intelligence techniques are widely used in this area such as fuzzy logic, neural network and Genetic algorithms. In this paper, I will focus on the fuzzy logic, neural network and Genetic algorithm technique and how it could be used in Intrusion Detection Systems giving some examples of systems and experiments proposed in this field. The purpose of this paper is comparative analysis between three AI techniques in network security domain. 1 INTRODUCTION This paper shows a general overview of Intrusion Detection Systems (IDS) and the methods used in these systems, giving brief points of the design principles and the major trends. Hacking, Viruses, Worms and Trojan horses are various of the main attacks that fear any network systems. However, the increasing dependency on networks has increased in order to make safe the information that might be to arrive by them. As we know artificial intelligence has many techniques are widely used in this area such as fuzzy logic, neural network and Genetic algorithms......

Words: 2853 - Pages: 12

Free Essay

...– MGT 501 Neural Network Technique Outline * Overview ………………………………………………………….……… 4 * Definition …………………………………………………4 * The Basics of Neural Networks……………………………………………5 * Major Components of an Artificial Neuron………………………………..5 * Applications of Neural Networks ……………….9 * Advantages and Disadvantages of Neural Networks……………………...12 * Example……………………………………………………………………14 * Conclusion …………………………………………………………………14 Overview One of the most crucial and dominant subjects in management studies is finding more effective tools for complicated managerial problems, and due to the advancement of computer and communication technology, tools used in management decisions have undergone a massive change. Artificial Neural Networks (ANNs) is an example, knowing that it has become a critical component of business intelligence. The below article describes the basics of neural networks as well as some work done on the application of ANNs in management sciences. Definition of a Neural Network? The simplest definition of a neural network, particularly referred to as an 'artificial' neural network (ANN), is provided by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen who defines a neural network as follows: "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs."Neural Network Primer:......

Words: 3829 - Pages: 16

Free Essay

...A Review of ANN-based Short-Term Load Forecasting Models Y. Rui A.A. El-Keib Department of Electrical Engineering University of Alabama, Tuscaloosa, AL 35487 Abstract - Artificial Neural Networks (AAN) have recently been receiving considerable attention and a large number of publications concerning ANN-based short-term load forecasting (STLF) have appreared in the literature. An extensive survey of ANN-based load forecasting models is given in this paper. The six most important factors which affect the accuracy and efficiency of the load forecasters are presented and discussed. The paper also includes conclusions reached by the authors as a result of their research in this area. Keywords: artificial neural networks, short-term load forecasting models Introduction Accurate and robust load forecasting is of great importance for power system operation. It is the basis of economic dispatch, hydro-thermal coordination, unit commitment, transaction evaluation, and system security analysis among other functions. Because of its importance, load forecasting has been extensively researched and a large number of models were proposed during the past several decades, such as Box-Jenkins models, ARIMA models, Kalman filtering models, and the spectral expansion techniques-based models. Generally, the models are based on statistcal methods and work well under normal conditions, however, they show some deficiency in the presence of an abrupt change in environmental or sociological......

Words: 3437 - Pages: 14

Free Essay

...Sanil Modi Biol-4910 Summer 2014 M-F Dr. De Vries, Christopher T. Fields The Effects Of Toll Like Receptor 2 Deletion on Social Behavior Neural Network Introduction During this past summer I had the opportunity to conduct research at the Neuroscience Institute at Georgia State University. The research I participated in was under Christopher T. Fields who is working on getting his doctorial degree. In these last few months I have worked on many exiting projects, learned how create experiments and analyze them. From the first day of lab I learned to work with many different lab instruments, software and mastered the structures of the mice brain. The instruments I started working with were a digital microscope and its software Stereo Investigator that took pictures at HD quality of mice brain. Shortly after came the analysis of the pictures we captured and the software used was ImageJ and Excel. In ImageJ you can measure different thresholds of the mice brain and get analysis which is imputed into excel and then the numbers from excel are put into a statistical software where graphs are made and you can check if your experiments had any change from the control. What I also learned was how mice brains are put on a slide. First you would use a cryostat, which slices the mice brain at the amount of thickness needed. While you are slicing the mice brains you are putting them onto a slide. Then they are taken from the slide put into a buffer solution, which lets you add to......

Words: 1884 - Pages: 8

Free Essay

...SEGMENTATION WITH NEURAL NETWORK B.Prasanna Rahul Radhakrishnan Valliammai Engineering College Valliammai Engineering College prakrish_2001@yahoo.com krish_rahul_1812@yahoo.com Abstract: Our paper work is on Segmentation by Neural networks. Neural networks computation offers a wide range of different algorithms for both unsupervised clustering (UC) and supervised classification (SC). In this paper we approached an algorithmic method that aims to combine UC and SC, where the information obtained during UC is not discarded, but is used as an initial step toward subsequent SC. Thus, the power of both image analysis strategies can be combined in an integrative computational procedure. This is achieved by applying “Hyper-BF network”. Here we worked a different procedures for the training, preprocessing and vector quantization in the application to medical image segmentation and also present the segmentation results for multispectral 3D MRI data sets of the human brain with respect to the tissue classes “ Gray matter”, “ White matter” and “ Cerebrospinal fluid”. We correlate manual and semi automatic methods with the results. Keywords: Image analysis, Hebbian learning rule, Euclidean metric, multi spectral image segmentation, contour tracing. Introduction: Segmentation can be defined as the identification of meaningful image components. It is a fundamental task in image processing providing the basis for any kind......

Words: 2010 - Pages: 9

Free Essay

...Neural Networks for Matching in Computer Vision Giansalvo Cirrincione1 and Maurizio Cirrincione2 Department of Electrical Engineering, Lab. CREA University of Picardie-Jules Verne 33, rue Saint Leu, 80039 Amiens - France exin@u-picardie.fr Universite de Technologie de Belfort-Montbeliard (UTBM) Rue Thierry MIEG, Belfort Cedex 90010, France maurizio.cirricione@utbm.fr 1 2 Abstract. A very important problem in computer vision is the matching of features extracted from pairs of images. At this proposal, a new neural network, the Double Asynchronous Competitor (DAC) is presented. It exploits the self-organization for solving the matching as a pattern recognition problem. As a consequence, a set of attributes is required for each image feature. The network is able to ﬁnd the variety of the input space. DAC exploits two intercoupled neural networks and outputs the matches together with the occlusion maps of the pair of frames taken in consideration. DAC can also solve other matching problems. 1 Introduction In computer vision, structure from motion (SFM) algorithms recover the motion and scene parameters by using a sequence of images (very often only a pair of images is needed). Several SFM techniques require the extraction of features (corners, lines and so on) from each frame. Then, it is necessary to ﬁnd certain types of correspondences between images, i.e. to identify the image elements in diﬀerent frames that correspond to the same element in the scene. This......

Words: 3666 - Pages: 15

Free Essay

...ARTIFICIAL NEURAL NETWORK FOR SPEECH RECOGNITION One of the problem found in speech recognition is recording samples never produce identical waveforms. This happens due to different in length, amplitude, background noise, and sample rate. This problem can be encountered by extracting speech related information using Spectogram. It can show change in amplitude spectra over time. For example in diagram below: X Axis : TimeY Axis : FrequencyZ Axis : Colour intensity represents magnitude | | A cepstral analysis is a popular method for feature extraction in speech recognition applications and can be accomplished using Mel Frequency Cepstrum Coefficient (MFCC) analysis Input Layer is 26 Cepstral CoefficientsHidden Layer is 100 fully-connected hidden-layerWeight is range between -1 and +1 * It is initially random and remain constantOutput : * 1 output unit for each target * Limited to values between 0 and +1 | | First of all, spoken digits were recorded. Seven samples of each digit consist of “one” through “eight” and a total of 56 different recordings with varying length and environmental conditions. The background noise was removed from each sample. Then, calculate MFCC using Malcolm Slaney’s Auditory Toolbox which is c=mfcc(s,fs,fix((3*fs)/(length(s)-256))). Choose intended target and create a target vector. If the training network recognise spoken one, target has a value of +1 for each of the known “one” stimuli and 0 for everything else. This will be......

Words: 341 - Pages: 2

Free Essay

...UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING, NUiCONE-2012, 06-08DECEMBER, 2012 1 Thermal Power Plant Analysis Using Artificial Neural Network Purva Deshpande1, Nilima Warke2, Prakash Khandare3, Vijay Deshpande4 VESIT, Chembur, purva.deshpande@yahoo.co.in, nilimavwarke@gmail.com 3,4 Mahagenco, Mumbai. khandarepk@gmail.com, vijay.deshpande.gmit@gmail.com 1,2 Abstract--Coal-based thermal power stations are the leaders in electricity generation in India and are highly complex nonlinear systems. The thermal performance data obtained from MAHAGENCO KORADI UNIT 5 thermal power plant shows that heat rate and boiler efficiency is changing constantly and the plant is probably losing some Megawatts of electric power, and more fuel usage thus resulting in much higher carbon footprints. It is very difficult to analyse the raw data recorded weekly during the full power operation of the plant because a thermal power plant is a very complex system with thousands of parameters. Thus there is a need for nonlinear modeling for the power plant performance analysis in order to meet the growing demands of economic and operational requirements. The intention of this paper is to give an overview of using artificial neural network (ANN) techniques in power systems. Here Back Propagation Neural Network (BPNN) and Radial Basis Neural Network (RBNN) are used for comparative purposes to model the thermodynamic process of a coal-fired power plant, based on actual......

Words: 3399 - Pages: 14

Free Essay

...1. Describe (a) the basic structure of and (b) the learning process for a 3-layer artificial neural network. A 3-layer artificial neural network consists of an input, output and a hidden layer in the middle. For e.g. To recognize male and female faces, the input layer would be made up of a computer program analyzing a camera shot. The output layer would be the word male or female appearing on the screen. The hidden layer is where all action takes place and connections are made between input and output. In an ANN these connections are mathematical. It works by learning from success (hits) and failures (misses) by making adjustments in these mathematical connections. 2. According to Churchland, why does intrapersonal (within one person) moral conflict occur? Intrapersonal moral conflict occurs when some contextual feature is alternately magnified or minimized and one’s overall perceptual take flips back and forth between two distinct activation patterns in the neighborhood of 2 distinct prototypes. In such case, an individual is morally conflicted eg. Should I protect a friends feeling by lying about someone’s hurtful slur or should I tell him the truth? 3. According to Churchland, when should moral correction occur and why? According to Churchland, moral correction should occur at an early age, before child turns into a young adult. Reasons - 1. Firstly, cognitive plasticity and eagerness to imitate......

Words: 549 - Pages: 3

Free Essay

... Neural Plasticity Team D PSY/340 June 5, 2016 Taleshia L. Chandler, Ph.D Neural Plasticity The current patient, Stephanie, has experienced a stroke, a temporary interruption of normal blood flow to her brain. There are certain functions and limitations of neural plasticity in the patient’s recovery process. Neuroplasticity is defined as the ability of the nervous system to respond to intrinsic or extrinsic stimuli by reorganizing its structure, function, and connections. While almost all survivors of brain damage experience some behavioral recovery, every patient will vary in his or her recovery process. According to Johansson, MD, PHD (2000), there are several mechanisms that are involved in brain plasticity. Specifically, such as in Stephanie’s case, time is of the essence. Brain damage can be triggered by a few factors. The most frequent type of stroke known to cause brain damage is known as ischemia, which is the aftermath of any type of confliction in an artery in which a blood clot is created. The less usual type is called a hemorrhage, which is the result of a damaged artery. Once a patient just like Stephanie has experienced a stroke, physicians must immediately determine whether the stroke was ischemic or hemorrhagic. Making such determination is complicating by nature and physicians have their clock ticking because time is limited. (Kalat, 2013, Chapter 5). Knowing that a hemorrhagic stroke is less likely than a ischemic one, physicians take a chance and......

Words: 778 - Pages: 4

Premium Essay

...Using Neural Networks to Forecast Stock Market Prices Abstract This paper is a survey on the application of neural networks in forecasting stock market prices. With their ability to discover patterns in nonlinear and chaotic systems, neural networks offer the ability to predict market directions more accurately than current techniques. Common market analysis techniques such as technical analysis, fundamental analysis, and regression are discussed and compared with neural network performance. Also, the Efficient Market Hypothesis (EMH) is presented and contrasted with chaos theory and neural networks. This paper refutes the EMH based on previous neural network work. Finally, future directions for applying neural networks to the financial markets are discussed. 1 Introduction From the beginning of time it has been man’s common goal to make his life easier. The prevailing notion in society is that wealth brings comfort and luxury, so it is not surprising that there has been so much work done on ways to predict the markets. Various technical, fundamental, and statistical indicators have been proposed and used with varying results. However, no one technique or combination of techniques has been successful enough to consistently "beat the market". With the development of neural networks, researchers and investors are hoping that the market mysteries can be unraveled. This paper is a survey of current market forecasting techniques with an emphasis on why they are......

Words: 6887 - Pages: 28

Free Essay

...First International Conference on Emerging Trends in Engineering and Technology Rough Set Approach for Feature Reduction in Pattern Recognition through Unsupervised Artificial Neural Network A. G. Kothari A.G. Keskar A.P. Gokhale Rucha Pranjali Lecturer Professor Professor Deshpande Deshmukh agkothari72@re B.Tech Student B.Tech Student diffmail.com Department of Electronics & Computer Science Engineering, VNIT, Nagpur Abstract The Rough Set approach can be applied in pattern recognition at three different stages: pre-processing stage, training stage and in the architecture. This paper proposes the application of the Rough-Neuro Hybrid Approach in the pre-processing stage of pattern recognition. In this project, a training algorithm has been first developed based on Kohonen network. This is used as a benchmark to compare the results of the pure neural approach with the RoughNeuro hybrid approach and to prove that the efficiency of the latter is higher. Structural and statistical features have been extracted from the images for the training process. The number of attributes is reduced by calculating reducts and core from the original attribute set, which results into reduction in convergence time. Also, the above removal in redundancy increases speed of the process reduces hardware complexity and thus enhances the overall efficiency of the pattern recognition algorithm Keywords: core, dimensionality reduction, feature extraction, rough sets, reducts, unsupervised ANN as......

Words: 2369 - Pages: 10