Free Essay

Submitted By PSXjoy

Words 10623

Pages 43

Words 10623

Pages 43

Conditional Random Fields

Roman Klinger

Katrin Tomanek

Algorithm Engineering Report

TR07-2-013

December 2007

ISSN 1864-4503

Faculty of Computer Science

Algorithm Engineering (Ls11)

44221 Dortmund / Germany http://ls11-www.cs.uni-dortmund.de/ Classical Probabilistic Models and

Conditional Random Fields

Roman Klinger∗

Katrin Tomanek∗

Fraunhofer Institute for

Algorithms and Scientiﬁc Computing (SCAI)

Schloss Birlinghoven

53754 Sankt Augustin, Germany

Jena University

Language & Information Engineering (JULIE) Lab

F¨rstengraben 30 u 07743 Jena, Germany

Dortmund University of Technology

Department of Computer Science

Chair of Algorithm Engineering (Ls XI)

44221 Dortmund, Germany

katrin.tomanek@uni-jena.de

roman.klinger@scai.fhg.de roman.klinger@udo.edu Contents

1 Introduction

2

2 Probabilistic Models

2.1 Na¨ Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ıve 2.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3 Maximum Entropy Model . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

4

5

6

3 Graphical Representation

10

3.1 Directed Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Undirected Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Conditional Random Fields

4.1 Basic Principles . . . . . . . .

4.2 Linear-chain CRFs . . . . . .

4.2.1 Training . . . . . . . .

4.2.2 Inference . . . . . . .

4.3 Arbitrarily structured CRFs .

4.3.1 Unrolling the graph .

4.3.2 Training and Inference

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

5 Summary

∗

The authors contributed equally to this work.

14

14

15

19

23

24

25

26

26

version 1.0

1 Introduction

1 Introduction

Classiﬁcation is known as the assignment of a class y ∈ Y to an observation x ∈ X . This task can be approached with probability theory by specifying a probability distribution to select the most likely class y for a given observation x.

A well-known example (Russell and Norvig, 2003) is the classiﬁcation of weather observations into categories, such as good or bad, Y = {good , bad }. For instance, let x be the weather observation on a special day, X = {Monday, Tuesday, . . .}. Now, x can be described by a set of features such as fcloudy (x) = 1, if and only if it is cloudy on day x, fcloudy (x) = 0 otherwise. Other features might be fsunny or frainy . In general, features do not necessarily have to be binary.

Modeling all dependencies in a probability distribution is typically very complex due to interdependencies between features. The Na¨ Bayes assumption of all features ıve being conditionally independent is an approach to address this problem (see Section 2.1).

In nearly all probabilistic models such independence assumptions are made for some variables to make necessary computations manageable.

In the structured learning scenario, multiple and typically interdependent class and observation variables are considered which implicates an even higher complexity in the probability distribution. This is the case for image or music data as well as for natural language text. As for images, pixels near to each other are very likely to have a similar color or hue. In music, diﬀerent succeeding notes follow special laws, they are not independent, especially when they sound simultaneously. Otherwise, music would not be pleasant to the ear. In text, words are not an arbitrary accumulation, the order is important and grammatical constraints hold.

A typical task in natural language processing is known as text segmentation which means the classiﬁcation of units of a textual sequence. Such units are words or other symbols. For each unit a category y ∈ Y has to be assigned. Categories might be linguistically motivated, such as part-of-speech tags, or person names or cities, as in the following text snippet:

[...]

Mister George W. Bush arrived in Rome together with [...]

Here, the task is to assign a ﬁtting label (also called output) sequence such as

[...]

O name name name O O city O O [...]

where each label represents an entity class.1

One approach for modeling linear sequence structures, as can be found in natural language text, are Hidden Markov Models (Rabiner, 1989). For the sake of complexity reduction, strong independence assumptions between the observation variables are made.

This impairs the accuracy of the model. Conditional Random Fields (CRFs, Laﬀerty et al.

(2001)) are developed exactly to ﬁll that gap: While CRFs make similar assumptions on the dependencies among the class variables, no assumptions on the dependencies among observation variables need to be made (see Section 4).

1

In this example the classes are name, city, and O where the latter is deﬁned as not being an entity of interest. 2

2 Probabilistic Models

CRFs have found application in many domains which deal with structured data.

Despite the frequent application of linear-chain CRFs, also other underlying structures have been used to model the respective dependencies among the class variables. Especially in natural language processing, CRFs are currently a state-of-the-art technique for many of its subtasks including basic text segmentation (Tomanek et al., 2007), part-of-speech tagging (Laﬀerty et al., 2001), shallow parsing (Sha and Pereira, 2003), the resolution of elliptical noun phrases (Buyko et al., 2007). CRFs have been proven to be very useful in named entity recognition, especially on documents from the biomedical domain

(Settles, 2004; McDonald and Pereira, 2005; Klinger et al., 2007a,b; McDonald et al.,

2004). Furthermore, CRFs have been applied to gene prediction (DeCaprio et al., 2007), image labeling (He et al., 2004) and object recognition (Quattoni et al., 2005), and also in telematics for intrusion detection (Gupta et al., 2007) and sensor data management

(Zhang et al., 2007).

This paper aims at giving an overview of the basic theory behind Conditional Random

Fields and illustrates how these are related to other probabilistic models. In Section 2, a brief overview of three classical and well-established probabilistic models is given: Na¨ ıve Bayes, Hidden Markov, and Maximum Entropy. The relations between and graphical representations of these diﬀerent approaches are discussed in Section 3. In Section 4, the basic concepts of CRFs (4.1) are explained. This section is mainly focused on the special case of linear-chain CRFs (4.2) and methods for training (4.2.1) and inference (4.2.2).

Moreover, building upon these explanations, a generalization to arbitrarily structured

CRFs is given in Section 4.3.

For further reading we recommend the tutorials of Wallach (2004) and Sutton and

McCallum (2007). They approach the theory behind CRFs from a diﬀerent perspective.

2 Probabilistic Models

In this section, some well-known probabilistic models are discussed. Conditional Random

Fields are founded on the underlying ideas and concepts of these approaches.

The Na¨ Bayes Model is an approach to classify single class variables in dependence ıve of several feature values. In that model, the input values are assumed to be conditionally independent. It is a so called generative approach, modeling the joint probability p(y, x) of the input values x and the class variable y. The Hidden Markov Model is an extension to the Na¨ Bayes Model for sequentially structured data also representing ıve the dependencies of the variables x and y as a joint probability distribution.

Modeling joint probabilities has disadvantages due to computational complexity. The

Maximum Entropy Model, in contrast, is based on modeling the conditional probability p(y|x). Like the Na¨ Bayes Model, it is an approach to classify a single class variable ıve in dependence of several feature values. The diﬀerence is the consideration of conditional probability p(y|x) instead of the joint probability.

While a Hidden Markov Model is a sequential extension to the Na¨ Bayes Model, ıve Conditional Random Fields can be understood as a sequential extension to the Maximum

Entropy Model. Both Maximum Entropy Models and Conditional Random Fields are

3

2 Probabilistic Models

NB

joint

conditional

single class

single class sequence sequence

HMM

ME

joint

conditional

CRF

Figure 1: Overview of probabilistic models: Na¨ Bayes Model (NB), Hidden Markov ıve Model (HMM), Maximum Entropy Model (ME), and Conditional Random

Field (CRF). Depicted aspects are joint versus conditional probability, single class prediction versus prediction on sequential data. known as discriminative approaches.

A graphical comparison of these models is given in Figure 1. In the following, a detailed explanation of these models is given based on Bishop (2006) and Russell and

Norvig (2003).

2.1 Na¨ Bayes ıve A conditional probability model is a probability distribution p(y|x) with an input vector x = (x1 , . . . , xm ), where xi (1 ≤ i ≤ m) are features and y is the class variable to be predicted. That probability can be formulated with Bayes’ law: p(y|x) =

p(y)p(x|y)

.

p(x)

(1)

The denominator p(x) is not important for classiﬁcation as it can be understood as a normalization constant which can be computed by considering all possible values for y.

The numerator can also be written as a joint probability p(y)p(x|y) = p(y, x) ,

(2)

which can be too complex to be computed directly (especially when the number of components in x is high). A general decomposition of that probability can be formulated applying the chain rule p(x1 , . . . , xm ) = m p(xi |xi−1 , . . . , x1 ): i=2 m

p(y, x) = p(y)

p(xi |xi−1 , . . . , x1 , y) .

(3)

i=2

In practice, it is often assumed, that all input variables xi are conditionally independent of each other which is known as the Na¨ Bayes assumption (Hand and Yu, 2001). That ıve 4

2 Probabilistic Models means that p(xi |y, xj ) = p(xi |y) holds for all i = j. Based on this simpliﬁcation, a model known as the Na¨ Bayes classiﬁer is formulated as2 ıve m

p(y|x) ∝ p(y, x) = p(y)

p(xi |y) .

(4)

i=1

This probability distribution is less complex than the one formulated in equation 3.

Dependencies between the input variables x are not modeled, probably leading to an imperfect representation of the real world. Nevertheless, the Na¨ Bayes Model ıve performs surprisingly well in many real world applications, such as email classiﬁcation

(Androutsopoulos et al., 2000a,b; Kiritchenko and Matwin, 2001).

2.2 Hidden Markov Models

In the Na¨ Bayes Model, only single output variables have been considered. To predict ıve a sequence of class variables y = (y1 , . . . , yn ) for an observation sequence x = (x1 , . . . , xn ), a simple sequence model can be formulated as a product over single Na¨ Bayes Models. ıve Dependencies between single sequence positions are not taken into account. Note, that in contrast to the Na¨ Bayes Model there is only one feature at each sequence position, ıve namely the identity of the respective observation: n p(y, x) =

p(yi ) · p(xi |yi ) .

(5)

i=1

Each observation xi depends only on the class variable yi at the respective sequence position3 . Due to this independence assumption, transition probabilities from one step to another are not included in this model. In fact, this assumption is hardly ever met in practice resulting in limited performance of such models. Thus, it is reasonable to assume that there are dependencies between the observations at consecutive sequence positions. To model this, state transition probabilities are added:4 n p(y, x) =

p(yi |yi−1 )p(xi |yi ) .

(6)

i=0

This leads to the well-known Hidden Markov model (HMM, Rabiner (1989)) n P (x) =

p(yi |yi−1 )p(xi |yi ) ,

(7)

y∈Y i=0

2

A ∝ B indicates that A is proportional to B. Here, proportionality is given because of omission the denominator. 3

Recall that xi are diﬀerent observations at diﬀerent sequence positions. In equation 4, in contrast, xi speciﬁes diﬀerent observations at the same position.

4

The initial probability distribution is assumed to be included as p(y0 |y−1 ) = p(y0 )

5

2 Probabilistic Models where Y is the set of all possible label sequences y.

Dependencies between output variables y are modeled. A shortcoming is the assumption of conditional independence (see equation 6) between the input variables x due to complexity issues. As we will see later, CRFs address exactly this problem.

2.3 Maximum Entropy Model

The two models introduced in Section 2.1 and 2.2 are trained to maximize the joint likelihood. In the following, the Maximum Entropy Model 5 is discussed in more detail as it is fundamentally related to CRFs. The Maximum Entropy Model is a conditional probability model. It is based on the Principle of Maximum Entropy (Jaynes, 1957) which states that if incomplete information about a probability distribution is available, the only unbiased assumption that can be made is a distribution which is as uniform as possible given the available information. Under this assumption, the proper probability distribution is the one which maximizes the entropy given the constraints from the training material. For the conditional model p(y|x) the conditional entropy H(y|x)

(Korn and Korn, 2000; Bishop, 2006) is applied, which is deﬁned as

H(y|x) = −

p(y, x) log p(y|x) .

(8)

(x,y)∈Z

The set Z = X × Y consists of X , the set of all possible input variables x, and Y, the set of all possible output variables y. Note that Z contains not only the combinations of x and y occurring in the training data, but all possible combinations.

The basic idea behind Maximum Entropy Models is to ﬁnd the model p∗ (y|x) which on the one hand has the largest possible conditional entropy but is on the other hand still consistent with the information from the training material. The objective function, later referred to as primal problem, is thus p∗ (y|x) = argmax H(y|x) ,

(9)

p(y|x)∈P

where P is the set of all models consistent with the training material. What is meant with “consistent” will be explained in detail later on page 8.

The training material is represented by features. Here, these are deﬁned as binaryvalued functions6 fi (x, y) ∈ {0, 1} (1 ≤ i ≤ m) which depend on both the input variable x and the class variable y. An example for such a function is: fi (x, y) =

1

0

if y = name and x = Mister otherwise 5

(10)

These explanations of Maximum Entropy Models are based on the Maximum Entropy tutorial by

Berger et al. (1996).

6

Such indicator functions are referred to as features instead of feature functions throughout the rest of the paper.

6

2 Probabilistic Models

The expected value of each feature fi is estimated from the empirical distribution p(x, y).

˜

The empirical distribution is obtained by simply counting how often the diﬀerent values of the variables occur in the training data:

˜

E(fi ) =

p(x, y)fi (x, y) .

˜

(11)

(x,y)∈Z

All possible pairs (x, y) are taken into account here. As the empirical probability for a

˜

pair (x, y) which is not contained in the training material is 0, E(fi ) can be rewritten as

1

˜

E(fi ) =

N

fi (x, y) .

(12)

(x,y)∈T

˜

The size of the training set is N = |T |. Thus, E(fi ) can be calculated by counting how often a feature fi is found with value 1 in the training data T ⊆ Z 7 and dividing that number by the size N of the training set.

Analogously to equation 11, the expected value of a feature on the model distribution is deﬁned as

E(fi ) =

p(x, y)fi (x, y) .

(13)

(x,y)∈Z

In contrast to equation 11 (the expected value on the empirical distribution), the model distribution is taken into account here. Of course, p(x, y) cannot be calculated in general because the number of all possible x ∈ X can be enormous. This can be addressed by rewriting E(fi ) by

E(fi ) =

p(x)p(y|x)fi (x, y)

(14)

(x,y)∈Z

and substituting p(x) with the empirical distribution p(x). This is an approximation

˜

to make the calculation of E(fi ) possible (see Lau et al. (1993) for a more detailed discussion). This results in

E(fi ) ≈

p(x)p(y|x)fi (x, y) ,

˜

(15)

(x,y)∈Z

which can (analogously to equation 12) be transformed into

E(fi ) =

1

N

p(y|x)fi (x, y) .

(16)

x∈T y∈Y

Only x values occurring in the training data are considered (x ∈ T ) while all possible y values are taken into account (y ∈ Y). In many applications the set Y typically contains

7

In fact, T is a multiset as the elements from Z can be contained several times. So the subset relation

T ⊆ Z only holds in a special case.

7

2 Probabilistic Models only a small number of variables. Thus, summing over y is possible here and E(fi ) can be calculated eﬃciently.

Equation 9 postulates that the model p∗ (y|x) is consistent with the evidence found in the training material. That means, for each feature fi its expected value on the empirical distribution must be equal to its expected value on the particular model distribution, these are the ﬁrst m constraints

˜

E(fi ) = E(fi ) .

(17)

Another constraint is to have a proper conditional probability ensured by p(y|x) ≥ 0 for all x, y

p(y|x) = 1 for all x.

(18)

y∈Y

Finding p∗ (y|x) under these constraints can be formulated as a constrained optimization problem. For each constraint a Lagrange multiplier λi is introduced. This leads to the following Lagrange function Λ(p, λ):

m

Λ(p, λ) =

H(y|x) primal problem

+

λi

˜

E(fi ) − E(fi )

+λm+1

p(y|x) − 1

i=1

(19)

y∈Y

!

=0

equation 9

!

constraints from

=0

equation 17

constraint from equation 18

This is maximized to get the model formulation p∗ (y|x) in equation 28 on page 10. In λ the following, a detailed derivation is given.

In the same manner as done for the expectation values in equation 15, H(y|x) is approximated: H(y|x) ≈ −

p(x)p(y|x) log p(y|x) .

˜

(20)

(x,y)∈Z

The derivation of equation 20 is given by

∂

p(y|x)

H(y|x) = −˜(x) log p(y|x) + p ∂p(y|x) p(y|x) = −˜(x) (log p(y|x) + 1) . p The derivation of the ﬁrst m constraints in equation 19 is

∂

∂p(y|x)

m

˜ λi E(fi ) − E(fi ) =

i=1

8

(21)

2 Probabilistic Models

∂

=

∂p(y|x)

i=1

λi

m

p(x)p(y|x)fi (x, y) −

˜

p(x, y)fi (x, y)

˜

(x,y)∈Z

(x,y)∈Z

m

=

λi p(x)fi (x, y) .

˜

(22)

i=1

The complete derivation of the Lagrange function from equation 19 is then: m ∂

Λ(p, λ) = −˜(x)(1 + log p(y|x)) + p ∂p(y|x)

λi p(x)fi (x, y) + λm+1 .

˜

(23)

i=1

Equating this term to 0 and solving by p(y|x) leads to m 0 = −˜(x)(1 + log p(y|x)) + p λi p(x)fi (x, y) + λm+1

˜

i=1

m

p(x)(1 + log p(y|x)) =

˜

λi p(x)fi (x, y) + λm+1

˜

i=1 m λi fi (x, y) +

λm+1 p(x) ˜

λi fi (x, y) +

1 + log p(y|x) =

λm+1

−1

p(x)

˜

i=1 m log p(y|x) = i=1 m

p(y|x) = exp

λm+1

−1

p(x)

˜

· exp

λi fi (x, y) i=1 .

(24)

The second constraint in equation 18 is given as p(y|x) = 1 .

(25)

y∈Y

Substituting equation 24 into 25 results in m y∈Y

λi fi (x, y)

· exp

λm+1

−1

p(x)

˜

=1

exp

exp

λm+1

−1

p(x)

˜

=

i=1

1

.

m

exp

(26)

λi fi (x, y) i=1 y∈Y

Substituting equation 26 back into equation 24 results in m p(y|x) = exp

λi fi (x, y)

1

· exp i=1 y∈Y 9

.

m

λi fi (x, y) i=1 (27)

3 Graphical Representation

This is the general form the model needs to have to meet the constraints. The Maximum

Entropy Model can then be formulated as p∗ (y|x) λ 1

=

exp

Zλ (x)

m

λi fi (x, y)

,

(28)

i=1

and Zλ (x) then is m Zλ (x) =

exp

λi fi (x, y)

.

(29)

i=1

y∈Y

This formulation of a conditional probability distribution as a log-linear model and a product of exponentiated weighted features is discussed from another perspective in

Section 3. In Section 4, the similarity of Conditional Random Fields, which are also loglinear models, with the conceptually closely related Maximum Entropy Models becomes evident. For a more detailed discussion of Maximum Entropy Models and related approaches we recommend the book by Pearl (1988) and the Maximum Entropy Tutorial by Berger et al. (1996).

In this section, two kinds of probabilistic models have been introduced. On the one hand generative models, such as Na¨ Bayes and Hidden Markov Models, which are based on ıve joint probability distributions. As can be seen in formula 4 and 6, in such models the observation variables xi topologically precede, also called “generate”, the input variables

y. This characteristic can be seen in the graphical representation (see Section 3), of these models in Figures 3(a) and 4(a). On the other hand, discriminative models, such as Maximum Entropy Models, are based on conditional probability distributions. In the next section, both groups of models are reviewed from a diﬀerent perspective: their graphical representations.

3 Graphical Representation

The underlying probability distributions of probabilistic models can be represented in a graphical form, this is why they are often called probabilistic graphical models. The following explanations are inspired by Bishop (2006).

A probabilistic graphical model is a diagrammatic representation of a probability distribution. In such a graph there is a node for each random variable. The absence of an edge between two variables represents conditional independence between those variables. Conditional independence means that two random variables a and b are independent given a third random variable c if they are independent in their conditional probability distribution, formally p(a, b|c) = p(a|b)p(b|c).8 From such graphs, also

8

Note that in contrast two random variables a and b are statistically independent if and only if p(a, b) = p(a)p(b).

10

3 Graphical Representation called independency graphs, one can read the conditional independence properties of the underlying distribution. Note that a fully connected independency graph does not contain any information about the probability distribution, only the absence of edges is informative: Conditional independence in the probability distribution does not induce the absence of the edge in the graph.9

Conditional independence is an important concept as it can be used to decompose complex probability distributions into a product of factors, each consisting of the subset of corresponding random variables. This concept makes complex computations (which are for example necessary for learning or inference) much more eﬃcient. In general, the decomposition, in fact a factorization of a probability distribution, is written as the product of its factors Ψs , with vs the subset of the respective random variables constituting such a factor p(v) =

Ψs (vs ) .

(30)

s

Let G = (V, E) be a graph with vertexes V and edges E. In an independency graph

(for example the one shown in Figure 2(a)), the vertexes V = X ∪ Y , with X and Y sets of random variables, are depicted by circles. X will typically be considered as the set of input or observation variables (shaded circles), and Y as the set of output variables (empty nodes). An independency graph can have directed or undirected edges, depending on the kind of graphical model it represents (see Sections 3.1 and 3.2).

In a factor graph (Kschischang et al., 2001), such as the one shown in Figure 2(b), the circles represent, as in an independency graph, the random variables of the underlying distribution, depicted by circles. Further, a factor graph contains factor nodes, depicted by small, ﬁlled squares, which represent the factors Ψs (compare with equation 30).10

In a factor graph, the edges are always undirected, linking the random variables to the factor nodes. A factor Ψs includes all random variables to which the respective factor node is directly connected by an edge. Thus, a factor graph represents more explicitly the factorization of the underlying probability distribution. Independency graphs of both directed and undirected graphical models can be transformed into factor graphs.

As an example, assume a probability distribution p(x1 , x2 , y) to factorize as p(x) = p(x1 )p(x2 )p(y|x1 , x2 ). It has the factors Ψ1 (x1 ) = p(x1 ), Ψ2 (x2 ) = p(x2 ), and Ψ3 (y) = p(y|x1 , x2 ). Here, x1 and x2 are conditionally independent given y. Figure 2 shows an independency graph and a factor graph representing this distribution.

In the following, directed and undirected graphical models are discussed. Na¨ Bayes ıve and Hidden Markov Models fall into the ﬁrst group, the Maximum Entropy Model falls into the second group of graphical models.

9

A graph is called dependency graph if the independence of two variables implicates separability of the corresponding nodes in the graph (Beierle and Kern-Isberner, 2003, pp. 337f.).

10

A factor graph consists of two sets of nodes: variable and factor nodes. There are no edges between nodes of the same set, so a factor graph is always bipartite.

11

3 Graphical Representation x2 x1

x2

x1

Ψ3

Ψ1

y

Ψ2

y

(a) Independency graph

(b) Factor graph

Figure 2: Directed graphical model

3.1 Directed Graphical Models

A joint distribution p(v) can be factorized into the product of conditional distributions for each node vk , so that each such conditional distribution is conditioned on its set of p parent nodes vk :

K

p p(vk |vk )

p(v) =

(31)

k=1

This is the same kind of factorization as shown in Figure 2 for the example distribution p(x1 , x2 , y). As another example, take the Na¨ Bayes classiﬁer which is discussed in ıve Section 2.1. Figure 3 graphically represents such a model with three observation variables.

The corresponding probability distribution factorizes as p(y, x1 , x2 , x3 ) = p(y) · p(x1 |y) · ıve p(x2 |y) · p(x3 |y), following the Na¨ Bayes assumption. Analogously, Figure 4 shows an

HMM classiﬁer for a sequence of three input variables x1 , x2 , x3 . The factorization is p(x1 , x2 , x3 , y1 , y2 , y3 ) = Ψ1 (y1 ) · Ψ2 (x1 , y1 ) · Ψ3 (x2 , y2 ) · Ψ4 (x3 , y3 ) · Ψ5 (y1 , y2 ) · Ψ6 (y2 , y3 ) which corresponds11 to the HMM (see equation 6). y y

Ψ1

Ψ2 x1 x2

x3

x2

x1

(a) Independency graph

Ψ3

(b) Factor graph

Figure 3: Na¨ Bayes classiﬁer ıve 11

A dedicated start value y0 = ⊥ is assumed, so that Ψ(y1 ) = p(y0 = ⊥, y1 )).

12

Ψ4 x3 3 Graphical Representation y1 y2

y3

y1

Ψ5

Ψ1

y2

x2

x3

x1

(a) Independency graph

Ψ4

Ψ3

Ψ2 x1 Ψ6 y3

x2

x3

(b) Factor graph

Figure 4: Independency and factor graph for the Hidden Markov Model.

3.2 Undirected Graphical Models

A probability distribution can be represented by an undirected graphical model using a product of non-negative functions of the maximal cliques of G. The factorization is performed in a way that conditionally independent nodes do not appear within the same factor, that means that they belong to diﬀerent cliques: p(v) =

1

Z

ΨC (vC ) .

(32)

C∈C

The factors ΨC ≥ 0 are so-called potential functions of the random variables vC within a clique C ∈ C.

The potential functions may be any arbitrary function. Due to this generality the potential functions do not necessarily have to be probability functions. This is in contrast to directed graphs where the joint distribution is factorized into a product of conditional distributions. Thus, normalization of the product of potential functions is necessary to achieve a proper probability measure. This is yielded by a normalization factor Z.

Calculating Z is one of the main challenges during parameter learning as summing over all possible variables is necessary:

Z=

ΨC (vC ) .

(33)

v C∈C

In Section 2.3 the Maximum Entropy model was discussed which can be formulated by such a product of non-negative potential functions (compare to equation 28) pλ (y|x) =

1

Zλ (x)

m

exp (λi fi (x, y)) .

(34)

i=1

In such log-linear models, potential functions are formulated as the exponential function of weighted features. Such a formulation is frequently used because it fulﬁlls the requirement of strict positivity of the potential functions. Figure 5(a) shows the independency graph for a Maximum Entropy classiﬁer with an observation variable x, a corresponding factor graph with three features is shown in Figure 5(b).

13

4 Conditional Random Fields y y

f1

x

f2

f3

x

(a) Independency graph (b) Factor graph

Figure 5: Maximum Entropy Classiﬁer

Directed and undirected graphical models diﬀer in the way the original probability distribution is factorized. The factorization into a product of conditional probability distributions as done in a directed graphical model is straight-forward. In undirected graphical models a factorization into arbitrary functions is done. This does not require an explicit speciﬁcation how the variables are related. But it comes at the expense of having to calculate the normalization factor.

4 Conditional Random Fields

In the previous section, some well-known probabilistic models were discussed from a mathematical perspective. Moreover, the graphical representation, which characterizes the underlying probability distribution of the model, was shown.

A Hidden Markov Model can be understood as the sequence version of a Na¨ Bayes ıve Model: instead of single independent decisions, a Hidden Markov Model models a linear sequence of decisions. Accordingly, Conditional Random Fields can be understood as the sequence version of Maximum Entropy Models, that means they are also discriminative models. Furthermore, in contrast to Hidden Markov Models, Conditional Random Fields are not tied to the linear-sequence structure but can be arbitrarily structured.

In the following, the idea and theoretical foundation of Conditional Random Fields is illustrated. First, a general formulation of Conditional Random Fields is given followed by an in-depth discussion of the most popular form of CRFs, those with a linear sequence structure. A main focus are aspects of training and inference. This section closes with a short discussion of arbitrarily structured CRFs.

4.1 Basic Principles

Introduced by Laﬀerty et al. (2001), Conditional Random Fields (CRF) are probabilistic models for computing the probability p(y|x) of a possible output y = (y1 , . . . , yn ) ∈ Yn given the input x = (x1 , . . . , xn ) ∈ Xn which is also called the observation. A CRF in

14

4 Conditional Random Fields general can be derived from formula 32: p(v) =

1

Z

ΨC (vC ) .

(35)

C∈C

The conditional probability p(y|x) can be written as p(x, y) p(x) p(x, y)

=

y p(y , x)

p(y|x) =

=

1

Z

1

Z

ΨC (xC , yC )

.

C∈C ΨC (xC , y C )

C∈C y (36)

From this, the general model formulation of CRFs is derived: p(y|x) =

1

Z(x)

ΨC (xC , yC ) .

(37)

C∈C

As described in Section 3, ΨC are the diﬀerent factors corresponding to maximal cliques in the independency graph (see Kschischang et al. (2001)). See Figure 6 for an example of a linear-chain CRF. Each factor corresponds to a potential function which combines diﬀerent features fi of the considered part of the observation and the output. The normalization follows from the denominator of equation 36:

Z(x) =

ΨC (xC , y ) . y (38)

C∈C

In fact, during both training and inference, for each instance a separate graph is used which is built from so-called clique templates. Clique templates make assumptions on the structure of the underlying data by deﬁning the composition of the cliques. Each clique is a set of putatively interdependent variables, namely those contained in the corresponding potential function (Sutton and McCallum, 2007). Examples for clique templates are given in Section 4.2 and 4.3.

4.2 Linear-chain CRFs

A special form of a CRF, which is structured as a linear chain, models the output variables as a sequence. Figure 6 shows the respective independency and factor graphs.

The CRF introduced in equation 37 can be formulated as p(y|x) =

1

Z(x)

15

n

Ψj (x, y) , j=1 (39)

4 Conditional Random Fields

yt

yt+1

yt+2

yt

yt+3

yt+1

yt+2

x

x

(a) Independency graph

yt+3

(b) Factor graph

Figure 6: A Linear Chain Conditional Random Field with n

Z(x) =

Ψj (x, y ) .

(40)

j=1

y

Given the factors Ψj (x, y) in the form m Ψj (x, y) = exp

λi fi yj−1 , yj , x, j

,

(41)

i=1

and assuming n + 1 to be the length of the observation sequence12 , a linear-chain CRF can be written as

n m

1

pλ (y|x) =

· exp

(42)

λi fi yj−1 , yj , x, j .

Zλ (x) j=1 i=1

The index j is needed in comparison to the Maximum Entropy Model because a label sequence is considered instead of a single label to be predicted. In equation 42, j speciﬁes the position in the input sequence x. Note that the weights λi are not dependent on the position j. This technique, known as parameter tying, is applied to ensure a speciﬁed set of variables to have the same value.

The normalization to [0, 1] is given by

Zλ (x) =

n

m

exp y∈Y λi fi yj−1 , yj , x, j .

(43)

j=1 i=1

Summation over Y, the set of all possible label sequences, is performed to get a feasible probability. 12

Note, that the number of factors is n because any two consecutive positions yj−1 and yj are combined in a factor.

16

4 Conditional Random Fields yt yt+1

yt+2

Ψ1 Ψ2 Ψ3

yt+3

...

Ψ4

Ψm

x

Figure 7: Alternative interpretation of a linear-chain CRF

In equation 42 a formulation of a linear-chain CRF is given. Moving the sum over the sequence positions in front of the exponential function, the actual factorization typically done for a CRF gets more evident:

pλ (y|x) =

1

·

Zλ (x)

n

m

exp

λi fi yj−1 , yj , x, j

j=1

.

(44)

i=1

The factor graph in Figure 6(b) corresponds to this factorization. One could also move the sum over the diﬀerent features in front of the exponential function

m n

1

pλ (y|x) =

·

exp λi fi yj−1 , yj , x, j .

(45)

Zλ (x) i=1 j=1

In this interpretation, the factors are not “running” over the sequence but over the fean tures. The factor graph with factors Ψi = exp corresponding j=1 λi fi yj−1 , yj , x, j to features fi is given in Figure 7. This interpretation is less intuitive but shows the relation to the Maximum Entropy model (in Figure 5).

The model can be interpreted with even more factors by moving both sums to the front of the exponential function

1

pλ (y|x) =

·

Zλ (x)

m

n

exp λi fi yj−1 , yj , x, j

.

(46)

i=1 j=1

The corresponding factor graph is not shown here because of the large number of factors in the graph.

The factorization based on maximal cliques (see equation 44) is the one usually applied for a linear-chain CRF. The other two factorizations (see equations 45 and 46) do not adhere to this maximality. In general, factorizing according to cliques consisting of less variable nodes than the maximal clique lead to inaccuracies because not all dependencies are correctly considered. In this case, however, it leads to redundant computations

17

4 Conditional Random Fields p12 (x)

p22 (x)

p11 (x)

S1

S2 p21 (x) p13 (x)

p32 (x) p23 (x)

p31 (x)

S3

p33 (x)

Figure 8: Example for a stochastic ﬁnite state automaton as can be seen in equation 46. The rest of the paper is based on the idea of the ﬁrst factorization. Linear chain CRFs have exactly one clique template C ∈ C: It speciﬁes the independency graph to consist of connections between yj and yj−1 and x: C = Ψj (yj , yj−1 , x) | ∀j ∈

{1, . . . , n} . Because of that special structure, it is possible to represent a linearchain CRF by a stochastic ﬁnite state automaton (SFSA) similar to Hidden Markov

Models. This is beneﬁcial for implementation purposes. In that automaton the transition probabilities depend on the input sequence x. Its structure is in general arbitrary but the most straight-forward approach is to use a fully connected automaton with states Sl where l ∈ Y. One state is used for every symbol in the label alphabet. Such automaton with |Y| = 3 is depicted in Figure 8.

As stated in equation 42, the features are dependent on the label sequence and herewith on the state transitions in the ﬁnite state automaton. So it is important to point out that only a subset of all features fi is used in every transition in the graph.

The strategy to build a linear-chain CRF can now be summarized as follows:

1. Construct an SFSA S = (S, T ) out of the set of states S (with transitions T =

(s, s) ∈ S 2 ). It can be fully connected but it is also possible to forbid some

˙

transitions.13

2. Specify a set of feature templates14 F = {g1 (x, j), . . . , gh (x, j)} on the input sequence. These are not used directly but for the generation of the features fi .

3. Generate set of features F = {∀s, s ∈ S. ∀go ∈ F : fk (s, s, go )}

˙

˙

13

Such a decision might depend on the training data and the transitions contained there.

(

1 if xj = V

14

An example for such a feature template is g1 (x, j) =

.

0 else

18

4 Conditional Random Fields

Until now, only ﬁrst-order linear-chain CRFs have been considered. To deﬁne linearchain CRFs with a higher order (see McDonald and Pereira, 2005), the features need to have the form fi (y, x, j) = fi (hj (y), x, j)

(47)

hj (y) = (yj−k+1 , . . . , yj ) .

(48)

with

The order is given by k. For higher orders15 (k > 2), the same probabilistic state automaton is used by combining diﬀerent previous output values yi in special states. For example, for k = 3 the set of states would be S = {(Si , Sj )} for all i, j ∈ {1, . . . , |S|}

(according to the ﬁrst-order SFSA in Figure 8).

For that special linear-chain structure of the CRF, training and inference are formulated in a similar way as for Hidden Markov Models as basic problems (Rabiner, 1989):

I) Given observation x and a CRF M: How to ﬁnd the most probably ﬁtting label sequence y ?

II) Given label sequences Y and observation sequences X : How to ﬁnd parameters of a CRF M to maximize p(y|x, M)?

Problem I is the most common application of a conditional random ﬁeld, to ﬁnd a label sequence for an observation. Problem II is the question how to train, to adjust the parameters of M which are especially the feature weights λi .

In discriminative approaches, the probability p(x|M) is not modeled. Estimating this is another basic question in context of Hidden Markov Models and not considered here.

4.2.1 Training

For all types of CRFs, as well as for Maximum Entropy Models, the maximum-likelihood method can be applied for parameter estimation. That means, that training the model is done by maximizing the log-likelihood L on the training data T :

¯

L(T ) =

log p(y|x)

(x,y)∈T

exp

log

=

(x,y)∈T

y

∈Y exp

n j=1 m i=1 λi fi n j=1

yj−1 , yj , x, j

m i=1 λi fi

yj−1 , yj , x, j λ2 .

(49)

To avoid overﬁtting the likelihood is penalized with the term − n 2σi2 . This technique i=1 is established for use in Maximum Entropy Models and can also be applied here (see

15

Note that ﬁrst order means k = 2.

19

4 Conditional Random Fields explanations by Chen and Rosenfeld, 2000). The parameter σ 2 models the trade-oﬀ between ﬁtting exactly the observed feature frequencies and the squared norm of the weight vector (McDonald and Pereira, 2005). The smaller the values are, the smaller the weights are forced to be, so that the chance that few high weights dominate is reduced.

For the derivation, the notation of the likelihood function L(T ) is reorganized:

L(T ) =

log

(x,y)∈T

y ∈Y

=

n

n j=1 exp

m i=1 λi fi

m i=1 λi fi

n j=1 exp

yj−1 , yj , x, j

m

−

yj−1 , yj , x, j

i=1

λ2 i 2σ 2

m

λi fi yj−1 , yj , x, j −

j=1 i=1

(x,y)∈T

n

− log n

m

exp y ∈Y m =

λi fi yj−1 , yj , x, j

m

−

j=1 i=1

i=1

λ2 i 2σ 2

λi fi yj−1 , yj , x, j −

(x,y)∈T j=1 i=1

A

−

n

log

exp y ∈Y

(x,y)∈T

m

λi fi yj−1 , yj , x, j

m

− i=1 j=1 i=1

λ2 i .

2σ 2

(50)

C

Zλ (x)

B

The partial derivations of L(T ) by the weights λk are computed separately for the parts A, B, and C. The derivation for part A is given by

∂

∂λk

n

m

n

λi fi yj−1 , yj , x, j =

(x,y)∈T j=1 i=1

fk yj−1 , yj , x, j .

(x,y)∈T j=1

20

(51)

4 Conditional Random Fields

The derivation for part B, which corresponds to the normalization, is given by

∂

∂λk

log Zλ (x) =

(x,y)∈T

(x,y)∈T

=

(x,y)∈T

1 ∂Zλ (x)

Zλ (x) ∂λk

1

Zλ (x)

n

m

exp y ∈Y

λi fi yj−1 , yj , x, j · j=1 i=1 n ·

fk (yj−1 , yj , x, j) . j=1

=

(x,y)∈T y ∈Y

1 exp

Zλ (x)

n

m

λi fi yj−1 , yj , x, j · j=1 i=1

=pλ (y|x)

see equation (42) n ·

fk (yj−1 , yj , x, j) j=1 n

=

pλ (y |x)

fk (yj−1 , yj , x, j)

(52)

j=1

(x,y)∈T y∈Y

Part C, the derivation of the penalty term, is given by

∂

∂λk

m

− i=1 λ2 i 2σ 2

=−

2λk λk =− 2.

2

2σ σ (53)

The log-likelihood function in equation 50 is concave: The ﬁrst term is linear (see equation 51) the second term belongs to the normalization. Hence, it does not change the concavity of the function and the last term is concave (see equation 53), so is the whole function.

Equation 51, the derivation of part A, is the expected value under the empirical distribution of a feature fi : n ˜

E(fi ) =

fi yj−1 , yj , x, j .

(54)

(x,y)∈T j=1

Accordingly, equation 52, the derivation of part B, is the expectation under the model distribution: n

E(fi ) =

pλ (y |x)

(x,y)∈T y ∈Y

fi (yj−1 , yj , x, j) . j=1 21

(55)

4 Conditional Random Fields

The partial derivations of L(T ) can also be interpreted as

∂L(T ) λk ˜

= E(fk ) − E(fk ) − 2 .

∂λk

σ

(56)

Note the relation of equations 54 and 55 to equations 12 and 16 which were formulated for the Maximum Entropy model. Besides the fact that for the CRF several output variables are considered, these equations correspond. A diﬀerence is the absence of the

1

factor N , which is irrelevant for ﬁnding the maximum by the approximation of the ﬁrst

˜

derivation E(fk ) − E(fk ) − λk = 0. σ2 ˜

Computing E(fi ) is easily done by counting how often each feature occurs in the training data (McDonald and Pereira, 2005). Computing E(fi ) directly is impractical because of the high number of possible tag sequences (|Y|). Recall, that for the Maximum

Entropy models, E(fi ) can be computed eﬃciently due to the small number of diﬀerent output variables y in most applications. In a CRF, sequences of output variables lead to enormous combinatorial complexity. Thus, a dynamic programming approach is applied, known as the Forward-Backward Algorithm originally described for Hidden Markov

Models (Rabiner, 1989). This algorithm can also be used for linear-chain Conditional

Random Fields in a slightly modiﬁed form.

According to McDonald and Pereira (2005), a function Tj (s) is deﬁned, which maps a single state s at an input position j to a set of allowed next states at position j + 1, and the inverse function Tj−1 (s), which maps the set of all states of possible predecessors to

s. Special states ⊥ and are deﬁned for start and end of the sequence. An example for the states in Figure 8 is Tj (S1 ) = {S1 , S2 , S3 }. Forward (α) and backward (β) scores will be used, which can be understood in general as messages sent over the network, in the following assumed to be a linear chain (Bishop, 2006): αj (s|x) =

αj−1 (s |x) · Ψj (x, s , s) βj+1 (s |x) · Ψj (x, s, s ) .

s

(57)

(58)

−1

∈Tj (s)

βj (s|x) = s ∈Tj (s)

In relation to the deﬁnition of the potentials in equation 41 the features are deﬁned on special states: Ψj (x, s, s ) = exp ( m λi fi (yj−1 = s, yj = s , x, j)) . i=1 The α functions are messages sent from the beginning of the chain to the end. The β functions are messages sent from the end of the chain to the beginning. They are initialized by α0 (⊥|x) = 1 β|x|+1 ( |x) = 1 .

(59)

(60)

With these messages, it is possible to compute the expectation under the model distribution eﬃciently (Laﬀerty et al., 2001; McDonald and Pereira, 2005) by

E(fi ) =

(x,y)∈T

1

Zλ (x)

n

fi (s, s , x, j) · αj (s|x)Ψj (x, s, s )βj+1 (s |x) . j=1 s∈S s ∈Tj (s)

22

(61)

4 Conditional Random Fields αj (s|x) s1 βj (s|x)

···

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

j

j+1

j+2

n

···

s s +1

···

s

+2

···

.

.

.

s|S|

···

1

Figure 9: Message passing in the Forward-Backward Algorithm. Each column represents one variable, each box in a row one possible value of that variable.

The underlined part of formula 61 can be understood as computing the potentials in all combinations of state sequences in the training data. A nice visualization is the so-called lattice diagram in Figure 9 (Bishop, 2006) in which possible paths of messages sent are depicted. The values for α and β are stored after one iteration so that they have to be computed only once.

The normalization factor is computed by

Zλ (x) = β0 (⊥|x) .

(62)

The Forward-Backward Algorithm has a run-time of O(|S|2 n), so it is linear in the length of the sequence and quadratic in the number of states.

4.2.2 Inference

The problem of inference is to ﬁnd the most likely sequence y for given observations

x. Note, that this is not about to choose a sequence of states, which are individually most likely. That would be the maximization of the number of correct states in the sequence. In contrast, for ﬁnding the most likely sequence the Viterbi Algorithm is applied (Rabiner, 1989). The Viterbi Algorithm is similar to the Forward-Backward

Algorithm. The main diﬀerence is, that instead of summing, a maximization is applied.

The quantity δj (s|x), which is the highest score along a path, at position j, which ends in state s, is deﬁned as δj (s|x) =

max

y1 ,y2 ,...,yj−1

p(y1 , y2 , . . . , yj = s|x) .

23

(63)

4 Conditional Random Fields

The induction step is δj+1 (s|x) = max δj (s ) · Ψj+1 (x, s, s ) . s ∈S

(64)

The array ψj (s) keeps track of the j and s values. The algorithm then works as follows:

1. Initialization:

The values for all steps from the start state ⊥ to all possible ﬁrst states s are set to the corresponding factor value.

∀s ∈ S :

δ1 (s) = Ψ1 (x, ⊥, s) ψ1 (s) = ⊥

(65)

2. Recursion:

The values for the next steps are computed from the current value and the maximum values regarding all possible succeeding states s .

∀s ∈ S : 1 ≤ j ≤ n :

δj (s) = max δj−1 (s )Ψ(x, s , s) s ∈S

ψj (s) = argmax δj−1 (s )Ψ(x, s , s) s ∈S

(66)

3. Termination: p∗ = max δn (s )

(67)

∗ yn = argmax δn (s )

(68)

s ∈S

s ∈S

4. Path (state sequence) backtracking:

Recompute the optimal path from the lattice using the track keeping values ψt .

∗

yt∗ = ψt+1 (yt+1 )

t = n − 1, n − 2, . . . , 1

(69)

Steps 1-3 are very similar to the Forward-Backward Algorithm. A lattice is ﬁlled with the “best” values. Step 4 reads the best path from that lattice.

4.3 Arbitrarily structured CRFs

In Section 4.2, eﬃcient training and inference for the special case of a linear-chain CRF have been discussed. In the following, CRFs with an arbitrary graphical structure, such as a tree or a grid structure, are explained. Diﬀerent CRF structures have been proposed by Sutton and McCallum (2007), Finkel et al. (2005), Laﬀerty et al. (2001), Sutton and

McCallum (2004), and Tsai et al. (2006).

Moving from a linear-chain CRF to a general CRF basically means to abolish the restrictions that the clique templates (see Section 4.1) model a linear structure. Hence, more general algorithms for training and inference have to be applied.

24

4 Conditional Random Fields

...

y1

y2

y3

y4

y5 . . .

y4

y5 . . .

x

(a) Linear Chain

...

y1

y2

y3

x

(b) Skip Chain

Figure 10: Examples for Structures of Conditional Random Fields

4.3.1 Unrolling the graph

In some publications, diﬀerent CRF structures are depicted (see citations above and examples shown in Figure 10). It has to be emphasized, that these structures are meant exemplarily because the actual graph is instantiated separately for each instance for training or inference with the help of the clique templates. Hence, the actual graph structure depends on the considered instance and the speciﬁc type of the CRF. The potential functions Ψj in the model (compare with equation 39) are associated with the clique templates, but not to the cliques in the graph. The process of building the graph for a speciﬁc instance is called “unrolling the graph”.

As already discussed in Section 4.2 the set of clique templates C for a linear-chain

CRF is given by

C = {C} with C = Ψj (yj , yj−1 , x) | ∀j ∈ {1, . . . , n} .

(70)

Accordingly, for a linear-chain CRF there is only one clique template resulting in a linear structure for every possible input value. Only because of this linear sequence structure, an SFSA can be used as a basis for an eﬃcient implementation.

Another possible set of clique templates is

C = {C1 , C2 } with C1 = Ψj (yj , yj−1 , x) | ∀j ∈ {1, . . . , n} and C2 = Ψab (ya , yb , x) | (a, b) ∈ {1, . . . , n}

(71)

2

,

(72)

where a and b are indexes that specify labels with special attributes on the input sequence.

For example in a sequence x = (x1 , . . . , xn ) ∈ Nn the indexes a and b could specify all items where b is divisible by a. Given a concrete sequence 2, 3, 4, 5, 6 this leads to a

CRF with the structure shown in Figure 11. In real life applications parameter tying is often applied, in this example, the value of the weights λj is the same for identical clique templates, independent of the position in the sequence.

25

5 Summary

. . . y2

y3

y4

y6 . . .

y5

x

(a) Independency Graph

Ψ36

Ψ26

Ψ24

. . . y2

y3

Ψ3

Ψ2

y4

Ψ4

x

y5

y6 . . .

Ψ5

(b) Factor Graph

Figure 11: Example for an unrolled skip-chain CRF for the sequence x = (2, 3, 4, 5, 6) according to equations 71 and 72.

4.3.2 Training and Inference

In sequence structures such as the HMM or the linear-chain CRF (which is a simple chain globally conditioned on the input values x) the Forward-Backward and the Viterbi

Algorithm, which are based on sending messages along the chain in the only two possible directions, can be applied. Besides conceptually diﬀerent algorithms, such as sampling methods, there are generalizations of these algorithms for tree-structured graphs, namely the sum-product and the max-sum algorithm (Kschischang et al., 2001).

The basic idea is that the messages sent along the graph are collected from diﬀerent directions before forwarding them. This generalization can also be used on arbitrary graphs. The basic idea is to compute a so-called junction tree from the original graph.

The algorithms can then be applied in a slightly modiﬁed form.

5 Summary

In this tutorial, a detailed overview of Conditional Random Fields and the theory behind and related to it is given. We started with a short recapitulation of well-known probabilistic models. Conditional Random Fields can be considered as an extension to the Maximum Entropy model (a structured learning model instead of a single-position classiﬁcation model) on the one hand, and the Hidden Markov Model (discriminative model instead of a generative model) on the other hand.

Probabilistic models can be represented graphically by means of independency and

26

References factor graphs. We distinguished between directed and undirected graphical models which result in diﬀerent types of factorization. Such factorizations can be read from the graphical representation of a model. They are often a crucial prerequisite for eﬃciently performing complex computations, as needed for training or inference. Aspects of how a CRF can be factorized and the resulting graphical representations are a focus of this paper. Based on the concepts of graphical representation and factorization we have introduced a general formulation of CRFs. For the special case of a CRF based on a linear sequence structure, an in-depth explanation of methods for training and inference was given.

These methods were originally developed for HMMs, but can be reused in a slightly modiﬁed way for linear-chain CRFs.

Finally, we shortly discussed the issues which arise for CRFs with an arbitrary graphical structure. This includes aspects of how the potential functions of the model can be deﬁned by means of clique templates and why the training and inference algorithms used for a linear-chain CRF cannot be used in a non-sequential scenario. For further reading and detailed explanations on algorithms for training and inference on general probabilistic graphical models the interested reader might refer to Bishop (2006); Lepar and Shenoy (1999); Jordan and Weiss (2002); Mooij and Kappen (2007); Borgelt and

Kruse (2002).

Acknowledgements

We would like to thank Christoph M. Friedrich, Juliane Fluck and Ernst G¨nter Schukatu

Talamazzini for discussions and motivations. Special thanks to G¨ nter Rudolph for u fruitful suggestions.

Funding for the work of Roman Klinger was provided by the Fraunhofer-Max-Planck

Cooperation “Learning and Inference Platform” (http://lip.fml.tuebingen.mpg.de/).

The work of Katrin Tomanek was funded by the European Commission within the BOOTStrep project (FP6-028099).

References

[Androutsopoulos et al. 2000a] Androutsopoulos, Ion; Koutsias, John; Chandrinos, Konstantinos V.; Paliouras, George; Spyropoulos, Constantine D.: An evaluation of Naive Bayesian anti-spam ﬁltering. In: Potamias, G.; Moustakis, V.;

Someren, M. van (Editors.): Proceedings of the workshop on Machine Learning in the

New Information Age, 11th European Conference on Machine Learning. Barcelona,

Spain, 2000, pp. 9–17. – URL http://arxiv.org/abs/cs/0006013v1.

[Androutsopoulos et al. 2000b]

Androutsopoulos, Ion; Paliouras, Georgios;

Karkaletsis, Vangelis; Sakkis, Georgios; Spyropoulos, Constantine D.; Stamatopoulos, Panagiotis:

Learning to Filter Spam E-Mail: A Comparison of

27

References a Naive Bayesian and a Memory-Based Approach. In: Zaragoza, H.; Gallinari, P.; Rajman, M. (Editors.): Proceedings of the workshop ”Machine Learning and Textual Information Access”, 4th Conference on Principles and Practice of Knowledge Discovery in Databases. Lyon, France, 2000, pp. 1–13. – URL http://arxiv.org/abs/cs/0009009v1. [Beierle and Kern-Isberner 2003] Beierle, Christoph; Kern-Isberner, Gabriele:

Methoden wissensbasierter Systeme. 2nd. Wiesbaden, Germany: Vieweg, May 2003. – in german.

[Berger et al. 1996] Berger, Adam L.; Pietra, Stephen D.; Pietra, Vincent J. D.:

A Maximum Entropy Approach to Natural Language Processing. In: Computational

Linguistics 22 (1996), No. 1, pp. 39–71.

[Bishop 2006] Bishop, Christopher M.: Pattern Recognition and Machine Learning.

Springer, 2006. – ISBN 0-387-31073-8.

[Borgelt and Kruse 2002] Borgelt, Christian; Kruse, Rudolf: Graphical Models:

Methods for Data Analysis and Mining. Wiley & Sons, 2002.

[Buyko et al. 2007] Buyko, Ekaterina; Tomanek, Katrin; Hahn, Udo: Resolution of Coordination Ellipses in Named Entities in Scientiﬁc Texts. In: Proceedings of the

10th Conference of the Paciﬁc Association for Computational Linguistics (PACLING

2007). Melbourne, Australia, 2007, pp. 163–171.

[Chen and Rosenfeld 2000] Chen, Stanley F.; Rosenfeld, Ronald: A Survey of

Smoothing Techniques for ME Models. In: IEEE Transactions on Speech and Audio

Processing 8 (2000), No. 1, pp. 37–50.

[DeCaprio et al. 2007] DeCaprio, David; Vinson, Jade P.; Pearson, Matthew D.;

Montgomery, Philip; Doherty, Matthew; Galagan, James E.: Conrad: Gene

Prediction using Conditional Random Fields. In: GENOME RESEARCH 17 (2007),

No. 9, pp. 1389–1396.

[Finkel et al. 2005] Finkel, Jenny R.; Grenager, Trond; Manning, Christopher:

Incorporating Non-local Information into Information Extraction Systems by Gibbs

Sampling. In: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL), 2005, pp. 363–370.

[Gupta et al. 2007] Gupta, Kapil K.; Nath, Baikunth; Ramamohanarao, Kotagiri:

Conditional Random Fiels for Intrusion Detection. In: AINAW ’07: Proceedings of the

21st International Conference on Advanced Information Networking and Applications

Workshops. Washington, DC, USA: IEEE Computer Society, 2007, pp. 203–208. –

ISBN 0-7695-2847-3.

[Hand and Yu 2001] Hand, David J.; Yu, Keming: Idiot’s Bayes – not so stupid after all? In: International Statistical Review 69 (2001), No. 3, pp. 385–399.

28

References

[He et al. 2004] He, Xuming; Zemel, Richard S.; Carreira-Perpinan, Migual A.:

Multiscale conditional random ﬁelds for image labeling. In: Proceedings of the 2004

IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Vol. 2, 2004, pp. 695–702.

[Jaynes 1957] Jaynes, Edwin T.: Information Theory and Statistical Mechanics. In:

Physical Review 106 (1957), May, No. 4, pp. 620–630.

[Jordan and Weiss 2002] Jordan, Michael I.; Weiss, Yair.: Chap. Graphical Models:

Probabilistic Inference. In: Arbib, Michael A. (Editors.): The Handbook of Brain

Theory and Neural Networks. Cambridge, MA: MIT Press, 2002. – URL http:

//www.cs.berkeley.edu/~jordan/papers/jordan-weiss.ps.

[Kiritchenko and Matwin 2001] Kiritchenko, Svetlana; Matwin, Stan: Email classiﬁcation with co-training. In: Stewart, Darlene A.; Johnson, J. H. (Editors.):

Proceedings of the conference of the Centre for Advances Studies on Collaborative research. Toronto, Ontario, Canada, November 2001, pp. 192–201.

[Klinger et al. 2007a] Klinger, Roman; Friedrich, Christoph M.; Fluck, Juliane;

Hofmann-Apitius, Martin: Named Entity Recognition with Combinations of Conditional Random Fields. In: Proceedings of the Second BioCreative Challenge Evaluation

Workshop. Madrid, Spain, April 2007, pp. 89–91.

[Klinger et al. 2007b] Klinger, Roman; Furlong, Laura I.; Friedrich, Christoph M.;

Mevissen, Heinz T.; Fluck, Juliane; Sanz, Ferran; Hofmann-Apitius, Martin:

Identifying Gene Speciﬁc Variations in Biomedical Text. In: Journal of Bioinformatics and Computational Biology 5 (2007), December, No. 6, pp. 1277–1296.

[Korn and Korn 2000] Korn, Granino A.; Korn, Theresa M.: Mathematical Handbook for Scientists and Engineers: Deﬁnitions, Theorems, and Formulas for Reference and

Review. 2 Revised. New York: Dover Publications Inc., 2000.

[Kschischang et al. 2001] Kschischang, Frank; Frey, Brendan J.; Loeliger, HansAndrea: Factor Graphs and the Sum-Product Algorithm. In: IEEE Transactions on

Information Theory 47 (2001), No. 2, pp. 498–519.

[Laﬀerty et al. 2001] Lafferty, John D.; McCallum, Andrew; Pereira, Fernando

C. N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling

Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine

Learning (ICML 2001), Morgan Kaufmann Publishers, 2001, pp. 282–289.

[Lau et al. 1993] Lau, Raymond; Rosenfeld, Ronald; Roukos, Salim: Adaptive language modeling using the maximum entropy principle. In: HLT ’93: Proceedings of the workshop on Human Language Technology. Morristown, NJ, USA: Association for Computational Linguistics, 1993, pp. 108–113.

29

References

[Lepar and Shenoy 1999] Lepar, Vasilica; Shenoy, Prakash P.: A Comparison of Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer Architectures for Computing

Marginals of Probability Distributions. In: Cooper, Gregory F.; Moral, Seraﬁn

(Editors.): Uncertainty in Artiﬁcial Intelligence Vol. 14. San Francisco, CA: Morgan

Kaufmann, 1999, pp. 328–337.

[McDonald and Pereira 2005] McDonald, Ryan; Pereira, Fernando: Identifying gene and protein mentions in text using conditional random ﬁelds. In: BMC

Bioinformatics 6 (Suppl 1) (2005), May, No. S6.

[McDonald et al. 2004] McDonald, Ryan T.; Winters, R. S.; Mandel, Mark; Jin,

Yang; White, Peter S.; Pereira, Fernando: An entity tagger for recognizing acquired genomic variations in cancer literature. In: Bioinformatics 20 (2004), pp. 3249–3251.

[Mooij and Kappen 2007] Mooij, Joris M.; Kappen, Hilbert J.: Suﬃcient conditions for convergence of the Sum-Product Algorithm. In: IEEE Transactions on Information

Theory 53 (2007), December, No. 21, pp. 4422–4437. – URL http://arxiv.org/abs/ cs/0504030v2. [Pearl 1988] Pearl, Judea: Probabilistic Reasoning in Intelligent Systems – Networks of Plausible Inference. Revised Second Printing. Morgan Kaufmann Publishers, Inc.,

1988.

[Quattoni et al. 2005] Quattoni, Ariadna; Collins, Michael; Darrell, Trevor:

Conditional Random Fields for Object Recognition. In: Saul, Lawrence K.; Weiss,

Yair; Bottou, Leon (Editors.): Advances in Neural Information Processing Systems

17. Cambridge, MA: MIT Press, 2005, pp. 1097–1104.

[Rabiner 1989] Rabiner, Lawrence R.: A Tutorial on Hidden Markov Models and

Selected Applications in Speech Recognition. In: Proceedings of the IEEE 77 (1989),

No. 2, pp. 257–286.

[Russell and Norvig 2003] Russell, Stuart; Norvig, Peter: Articiﬁal Intelligence –

A Modern Approach. Prentice Hall, 2003.

[Settles 2004] Settles, Burr: Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets. In: Collier, Nigel; Ruch, Patrick;

Nazarenko, Adeline (Editors.): COLING 2004 International Joint workshop on

Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)

2004. Geneva, Switzerland: COLING, 2004, pp. 107–110.

[Sha and Pereira 2003] Sha, Fei; Pereira, Fernando: Shallow parsing with Conditional

Random Fields. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology

(NAACL ’03). Morristown, NJ, USA: Association for Computational Linguistics,

2003, pp. 134–141.

30

References

[Sutton and McCallum 2004] Sutton, Charles; McCallum, Andrew: Collective

Segmentation and Labeling of Distant Entities in Information Extraction / Department of Computer Science, University of Massachusetts. 2004 (TR 04-49). – Technical

Report.

[Sutton and McCallum 2007] Sutton, Charles; McCallum, Andrew: An Introduction to Conditional Random Fields for Relational Learning. In: Getoor, Lise; Taskar,

Benjamin (Editors.): Introduction to Statistical Relational Learning. MIT Press,

November 2007, Chap. 4, pp. 93–127.

[Tomanek et al. 2007] Tomanek, Katrin; Wermter, Joachim; Hahn, Udo: A Reappraisal of Sentence and Token Splitting for Life Science Documents. In: Proceedings of the 12th World Congress on Medical Informatics (MEDINFO 2007). Brisbane,

Australia, August 2007, pp. 524–528.

[Tsai et al. 2006] Tsai, Richard Tzong-Han; Sung, Cheng-Lung; Dai, Hong-Jie;

Hung, Hsieh-Chuan; Sung, Ting-Yi; Hsu, Wen-Lian: NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. In: BMC Bioinformatics 7(Suppl 5) (2006), No. S11.

[Wallach 2004] Wallach, Hanna M.: Conditional Random Fields: An Introduction.

/ Department of Computer and Information Science, University of Pennsylvania. 2004

(MS-CIS-04-21). – Technical Report.

[Zhang et al. 2007] Zhang, Xinhua; Aberdeen, Douglas; Vishwanathan, S. V. N.:

Conditional random ﬁelds for multi-agent reinforcement learning. In: ICML ’07:

Proceedings of the 24th international conference on Machine learning. New York, NY,

USA: ACM, 2007, pp. 1143–1150. – ISBN 978-1-59593-793-3.

31…...

Premium Essay

...Ethical/legal Dilemmas Indiana Wesleyan University Christine R. Johnson HCM-556 Health Care Issues Instructor: Mrs. Brenda Williams Ethical/Legal Dilemmas CRF First Choice’s mission is for each individual to experience all the rights granted to them as citizens. It is our goal as direct care staff to make sure that each individual experiences the highest quality of life possible, without depriving them of their privacy, rights, and privileges. Direct Care Staff (DCS) are always subject to a plethora of legal, ethical, and professional duties which can be very challenging. These duties are generally considered to be to respect a patient's confidentiality and to recognize the duty of care that is owed to all our individuals. Direct Care Staff duties are to always be professional; however there are legal implications if these duties are breached. We also must consider when it is okay as staff to breach these duties and therefore ethical issues arise. Ethics is a set of moral and practical guidelines that influences nursing decisions big and small. As DCS, one of our main priorities is to advocate for our clients. An advocate is “one who expresses and defends the cause of another.” (Merriam-Webster, 2004) In the nursing profession we advocate to protect our consumer’s rights. A consumer’s rights can vary from being responsible for their own care and making choices and decisions in that care to having the right to a staff-consumer relationship based on trust and respect,......

Words: 635 - Pages: 3

Premium Essay

...that provide a comprehensive range of Data Management services in Bioequivalence and Phase 1 studies. The agenda is to convert raw data to reliable, precise and consistent trial results in compliance with the regulatory services and policies. Furthermore, the clinical development programs focus on accelerating the regulatory submission process and lessen the timeline with the help of excellent service quality and advanced thinking. The multiple sections of Biometrics in CRO’s include the following services: Data Management ● Data Acquisition ● ● ● ● ● Data Reconciliation Discrepancy Management Medical Coding (MeDRA & WHODD) Database Lock Data Extraction for Reporting Database Programming ● ● ● ● ● CRF Design (paper & electronic) CRF Annotation Database Programming Validation Checks Metadata Repository Management Biostatistics ● ● ● ● ● ● Randomization Sample size estimation Trial Design Inputs Statistical Analysis Plan Statistical Analysis Biostatistics Subject Matter Expert Pharmacokinetic & Pharmacodynamic Studies ● ● ● ● ● Sampling Point Estimation Technical Document Review PK/PD Query Resolution PK/PD Subject Matter Expert Pharmacokinetic support across study engagement PK/PD Reporting ● ● ● SDMS Data Extraction PK Data Analysis PK Summary Reporting Statistical Reporting ● ● ● ● ● ● TLF Programming CDISC Data Mapping (SDTM & ADaM) Clinical Data Repository Pooled Data Analysis & Reporting Safety (ADR) Reporting......

Words: 495 - Pages: 2

Premium Essay

...forms (CRFs), both business-critical elements of the drug development process. The challenge GSK’s success is underpinned by the effective time to market of its new medicines. Spiralling research and development costs in the race to bring new drugs to market place primary focus on investing first and foremost in science, minimising business support costs where possible. The solution As GSK’s strategic partner for UK reprographics, Williams Lea rose to this challenge. A 12-month programme of process re-engineering was initiated to significantly improve management of clinical trial documents across the organisation, speeding up the process without compromising document quality and personalisation. With the objective of consolidating and centralising document output, we developed a digital, ‘hub-and-satellite’ model, working on a commercial rate basis. This hub-and-satellite approach enables central production management supported by smaller customer service centres A 12-month programme of process re-engineering was initiated to significantly improve management of clinical trial documents across the organisation For more information, visit www.williamslea.com Case study: GlaxoSmithKline A dedicated project team was established to implement and manage the new solution, which encompassed: Creation of a centre of excellence for clinical trial document production (operating 24 hours a day, five days a week) focussing on the production of all submission documents and CRFs......

Words: 611 - Pages: 3

Premium Essay

...strategy Dirt Bikes' management believes that the company could benefit from e-commerce. The company has sold motorcycles and parts primarily through authorized dealers. Dirt Bikes advertises in various magazines catering to dirt bike enthusiasts and maintains booths at important off-road motorcycle racing events. Web browser software Word processing software Web page development tool (optional) 3. Who are Dirt Bikes major competitors? How do their products compare in prices to those of Dirt Bikes? What are some of the product features they emphasize? Husqvarna : * WR 250 $6,298 * TE 510 $7,898 * TC 250 $6,298 * TC 510 $7,198 | KTM: * SX 250 $6,298 * XC 250 W $7,148 * XC 505 F $8,248 | Honda: * CRF® 250R $6,449 * CRF® 450R $7,199 | Yamaha: * WR 250F $6,399 * WR 450F $7,199 | 4. What are the competitive forces that can affect the industry? Some competitive forces that could affect the industry are: increased internet presence could allow both Dirt Bikes USA and it’s competitors to find cheaper parts and labor more easily, an economic downturn could occur and reduce peoples spending on leisure activities. Since Dirt Bikes USA came into the business with a product that claimed a better quality, they are ideally suited to pursue this as a competitive advantage. They can work on increasing quality of not only their products (the motorcycles) but they could also strive to be the best in customer service, and product innovation. ......

Words: 627 - Pages: 3

Premium Essay

...diversification of people moving in, and the economic advantages globalization has brought. His community has benefited from two large global companies, Honda Motor Corp and Emerson that reside in or near the area bringing employment and a diversification of people as more non-U.S. families are relocating to his community. Doyle’s perspective on globalization and its impact on his business have had a combination of both positive and negative attributes. His company, Thermoseal has been able to benefit from the support and international brand recognition from its European base company. With its exclusive licensee in North America of the Group of independent Klinger companies, Thermoseal can utilize name recognition and reputation. They also have access to research and development, technology and patents obtained by Klinger Group. Doyle has a negative impact of globalization due to increased obstacles from Asian imports of inexpensive, inferior knock offs of Thermoseal’s products. Doyle recognizes that while globalization can bring opportunities and potential market growth for his business, threats and greater competition are also brought to the forefront. Todd works in the wealth management business and globalization has positively impacted his business through an increase client base and from economic/investment standpoint. The mortgage crisis of 2008 amplifies how economies are tied together. According to Todd, it did not matter where the investments were located on the......

Words: 838 - Pages: 4

Premium Essay

...including instructions on Valuation, Customs Coding, and other related instructions that have a bearing on operations. − Be fully conversant with H.S. classification and custom duty and taxes − Identify all weaknesses/problems in the department and take appropriate corrective actions after consultations with the Manager. The Deputy Manager has the following specific responsibilities in the Certificate Control Section of the Operation department. − Ensure that the Certificate Control Section conducts its routine activities according to the scheme attached titled “Certificate Control Section – Working Procedures”. − Provide assistance/suggestions to RCs regarding HS classification in case of error in HS Code in draft CRFs/NNRFs. − If the draft copy of the CRFs/NNRFs contains mistakes, notify the relevant RC so that they can initiate corrective actions accordingly. − Send reminders to the RCs if corrective actions are not taken within 24 hours. − Ensure that e-mail/bank copies of the certificates are delivered to the Dispatch Section with proper endorsement and hologram. − Ensure strict confidentially of business information of importers. Under no circumstances shall information about one importer be revealed to another importer. − Ensure that Certificate Control Section is out of bound to all visitors at all times. − In the absence of the Manager, the Deputy Manager shall ensure that BV holograms are kept under lock and key. 2.3. Import Document Processing......

Words: 9778 - Pages: 40

Free Essay

...Chronic Renal Failure (CRF) start to really differ. ARF can be caused by many disease processes or injuries such as; dehydration, rhabdomyolysis, trauma, renal infection, drug toxicity, and sepsis. In fact research conducted by the staff at the May Clinic place causes of ARF into 3 categories: “Acute kidney failure can occur when on the following happens: you have a condition that slows blood flow to your kidneys, you experience direct damage to your kidneys, or your kidneys' urine drainage tubes (ureters) become blocked and wastes can't leave your body through your urine.” (Mayo Clinic). CRF on the other hand usually occurs over a long period of time and is caused by an underlying comorbidity. The top two leading causes of CRF in the United States are Diabetes and Hypertension. Ethnicity also plays a large role in CRF. The National Kidney Disease Education Program notes that “African Americans are almost four times as likely as Caucasians to develop kidney failure. While African Americans make up about 13 percent of the population, they account for 32 percent of the people with kidney failure in the United States. Diabetes and high blood pressure are the leading causes of kidney failure among African Americans” (NKDEP). CRF also has other causes than what is previously listed such as: recurrent kidney infections, prolonged renal obstructions, polycystic kidney disease, and glomerulonephritis (Mayo Clinic). Of course ARF can also transition into CRF as well if not......

Words: 1921 - Pages: 8

Premium Essay

...| |10/11 |No | |Baseline | |10/12 |No | |Baseline | |10/22 |Yes - verbal |CRF |Told he would get M&M’s | |10/23 |Yes – verbal |CRF |Told he would get M&M’s | |10/24 |Yes – verbal |CRF |Told he would get M&M’s | |10/25 |Yes |CRF | | |10/26 |Yes |CRF | | |10/29 |Yes |CRF | | |10/30 |Yes |CRF | | Brandon is required to take a backpack of drawing materials with him to lunch/recess each day. He does not like to eat lunch, but does like to draw. If he has his backpack with him, he can draw during lunch while everyone else eats. While he is completely able to do this on......

Words: 764 - Pages: 4

Free Essay

...slutning - "Hvem kan sove i Berlin?" 2. FORMEL BESKRIVELSE: Digtets grafiske fremtræden (strofer og vers, brud, korte eller lange verslinier, typografi, opsætning, hvide felter eller ”huller”) - 14 strofer - 4 vers 3. METRIK: RIM OG RYTME a b c b Tom Kristensen anvender trokæ i sit digt --> betonet og ubetonede stavelser --> betonet til sidst = mandlig udgang ubetonet til sidst = kvindelig udgang I digtet er der skiftevis en mandlig og en kvindelig udgang, hvilket skaber en harmonisk og traditionel rytme => harmonisk formside 4. SITUATION a. Hvad drejer det sig om i digtet? Situationen starter først rigtig i 6. strofe: Hør, de braser mod Hotellet og forcerer dets Entre. Langs Terrazzogulvet klinger Flugten af en skræmt Portier. Det fysiske rum: Berlin Indenfor < > udenfor <- realplan hotelværelset heste, kvinder Vindue (strofe 2) (sammenligning - "som et vindue") <- billedplan, mentalt Det er et surrealistisk digt bl.a. fordi der bliver brugt symbolsprog (f.eks. heste - det dyriske) b. Fortælles en begivenhed (med et vendepunkt) eller en situation? – undersøg digtet strofe for strofe. 5. HVEM TALER OG TIL HVEM a. Er der et jeg, der udtrykker sine tanker og forestillinger ? b. Bliver nogen instans (person eller begreb) tiltalt i digtet? c. TID: Angives der et tidsforløb? d. RUM: Er digtet geografisk lokaliseret? Opviser digtet overhovedet rumlighed? Opgaver i timen: I bliver inddelt i......

Words: 646 - Pages: 3

Premium Essay

...When the Fiat Research Center (CRF) changed it's mission, explain how the new mission led to an expansion into transaction costs? 1)In economics and related disciplines, a transaction cost is a cost incurred in making an economic exchange (restated: the cost of participating in a market). For example, most people, when buying or selling a stock, must pay a commission to their stockbroker; that commission is a transaction cost of doing the stock deal. To eliminate the price of transactions cost, companies are headed towards outsourcing and other methods to reduce costs. Throughout the 1990’s and early 2000’s it’s no question that the automobile industry was heading into economic chaos. Fiat is an Italian based manufacturer, who like most other companies, felt the hard times. For most car companies 20-30% of sales fell, and Fiat, having 12% of share in the European market, was hit hard. The CRF, Fiat’s research and development team, was very important to the Fiat Company, and sought to help it rise out of troubling economic times. CRF leader Gian Michellone aimed to radically innovate for the CRF, and lead Fiat as well as others out of economic hardships. In order to initiate is plans for innovation, the CRF greatly changed its mission statement. The new mission stated “Instead of simply selling research, CRF is dedicated to providing competitiveness to its customers as a matter of principal.” Michellone aimed towards “competitiveness for customers at competitive prices,”......

Words: 369 - Pages: 2

Free Essay

...Dent-Micallef, 1997). Third, the two ﬁrms display different strategy conﬁgurations: one focuses on international expansion, whereas the other hopes to extend its product portfolio and distribution channels. Therefore, we can investigate service coproduction and value co-creation phenomena in different strategic contexts. Methodologically, our analysis derives inspiration from the ideal of a continuous dialogue between academics and practitioners. Speciﬁcally, we use an iterative narration approach based on the constant interchange between grounded evidence and theoretical reasoning. Primary information was collected during 10 semi-structured interviews with senior managers and project managers from Binda and Cassa di Risparmio di Firenze (CRF) – the two companies that implemented SOA service systems – and IBM Global Value, and IBM Information Systems – the two units that acted as service providers. Each interview lasted approximately one hour and was tape recorded. Secondary archival data was provided by the organisations and employed to both gain additional information and verify some of the self-reported data. 292 In terms of data analysis, we examine the focal cases according to the following steps: – Determination of the business issues that led to the idea of employing SOA systems. – Discussion of the SOA concept and structure in the ﬁrm. – Analysis of the implementation processes from an organizational point of view. – Observation of......

Words: 5742 - Pages: 23

Free Essay

...1965). Saltcedar has substituted dominant stands of native cottonwood (Populus fremontii) in a number of regions due to which habitat of Neotropical migrant birds has reduced substantially (Anderson et al. 1977) and fluvial processes have altered (Blackburn et al. 1982). Saltcedar produces large number of seeds over a length of time (April through October). Germination percentage of the plant depends on seed collection and varies from 19% to 51.3% (Merkel and Hopkins, 1957). When the seedlings get established they grow faster than other native plants although their recruitment is very slow. Sometimes in absence of seeds, broken stems and branches of plants which are partially buried in sediment can produce roots by sprouting (Hulett and Tomanek, 1961). Waisel (1972) found that cuttings under different concentrations of salts result in a significant drop in the development of aerial parts above 0.3 M NaCl. Plants usually grow in soils with an average salinity of 6000ppm and can survive in sites with salinity from 18,000 to 36,000 ppm (Glenn and Nagler, 2005). Seed germination and seedling establishment of saltcedar is higher in areas where environmental conditions such as salinity and high temperature control soil water potential (Gadallah 1996). Saltcedar is disadvantageous once it gets established because it can spread salt in soil surface and subsurface. Upon establishment of saltcedar, native riparian vegetation can be replaced due to vast changes in channel morphology......

Words: 1495 - Pages: 6

Free Essay

...The Mediating Affect of CRF Within the Central Nucleus of the Amygdala in Rats 1.Summarize the background of the article, and how does it relate to the general article. Alcohol consumption is characterized by excessive consumption which leads to feelings of increased anxiety and other negative emotional states.Consequently, these emotions lead to further consumption.Similar to human alcoholics, ethanol- dependent animals show signs of anxiety like behaviors and self administrate ethanol during periods of withdrawal. CRF, also known as corticotropin releasing factor helps mediate the increased anxiety during withdrawal.Additionally,regions of the amygdala are comprised of the CRF system, although unknown where the specific sites are responsible for the CRF component of excessive drinking. Also, numerous studies have shown that the amygdala is involved in “mediating the behavioral and physiological responses associated with anxiety.” This relates to the general article which focuses on the consumption of drug and alcohol, triggering feelings of anxiety which can be quieted by consuming more of it.Specifically, the amygdala is the part of the brain where the feelings of anxiety are triggered. Moreover, new research is showing alcohol's involvement in the transformation of the chemical architecture of the brain, allowing the brain's stress response to contribute to its dependency. CRF, also known as the corticotrophin releasing factor, is a chemical that is involved in the......

Words: 847 - Pages: 4

Free Essay

...of police officers after a shooting incident vary. If the findings of this research were to be implemented, it would have a positive, personal effect on the police officers involved in shootings (Klinger, 2006). The findings showed that most officers experience physiological, psychological, and emotional reactions just before and as they fired a gun. Their recollections of the event were found to be vague, or in extreme cases, they could not remember the incident at all. Also, exceedingly few officers experienced long-lasting negative effects after a shooting. Their post-shooting reactions were influenced by actions and attitudes of family, friends, colleagues, and investigators. These reactions lessened as activity and attention about the incident diminished (Klinger, 2006). Many police officers lie to counselors about the occurrences of a shooting event. This is mainly because they feel the counselors are attached to the department. This is in contrast with officers willing to share their experiences with fellow colleagues involved in shootings. This implies that peer counseling has a more positive impact to these officers than mandatory, critical incident counseling. This suggests peer counseling has a significant personal impact on individual police officers (Klinger, 2006). The study found that not all police officers experienced negative emotions. In about one-third of the shootings, police officers reported emotions of elation that include residual......

Words: 805 - Pages: 4

Free Essay

...renal failure (CRF) are two of three medical conditions that create a cycle of worsening problems to the patient. I would not doubt that Mr. Hodges is also anemic because of his worsening symptoms; a third of patients who suffer from both are. Frequently, non-steroid anti-inflammatory drugs are given to ease symptoms of CHF, increasing symptoms of CRF (Iaina, 2005). 6. As his chronic renal failure worsens what other symptoms and signs might occur in his respiratory, digestive, nervous, and urinary systems? As he progresses into end stage renal failure, urea will build up in the body creating ammonia which in high concentration causes diarrhea, vomiting, and nausea (Zellman. 2010). Mental confusion will develop as ammonia concentrations in the blood increase, tissues will swell as the body can no longer get rid of excess waste, tingling of limbs and labored breathing can be expected as well. 7. What is causing Mr. Hodges’s kidney disease? Mr. Hodges’s kidney disease is most likely being caused by medications he is taking for congestive heart failure. Medications for congestive heart failure can lower blood pressure (Zellman, 2010) restricting blood flow to the kidneys. Congestive heart failure is also more aggressively treated in most cases than chronic renal failure (Iaina, 2006). 8. What are possible treatment options, and what is the prognosis? Possible treatment options are bed rest (Zellman, 2010) a variety of medications such as diuretics for CHF and CRF,......

Words: 744 - Pages: 3