Free Essay

In: Science

Submitted By melosalz

Words 9180

Pages 37

Words 9180

Pages 37

Kim Steenstrup Pedersen

Department of Computer Science

University of Copenhagen

2200 Copenhagen, Denmark kimstp@di.ku.dk Jan Kremer

Department of Computer Science

University of Copenhagen

2200 Copenhagen, Denmark jan.kremer@di.ku.dk Christian Igel

Department of Computer Science

University of Copenhagen

2200 Copenhagen, Denmark igel@di.ku.dk Abstract

In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels

(e.g., human annotations or results from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classiﬁers are particularly well-suited for active learning due to their convenient mathematical properties. They perform linear classiﬁcation, typically in a kernel-induced feature space, which makes measuring the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efﬁciently estimate how strongly learning from a data point inﬂuences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs.

1

Introduction

In many applications of supervised learning in data mining, huge amounts of unlabeled data samples are cheaply available while obtaining their labels for training a classiﬁer is costly. To minimize labeling costs, we want to request labels only for potentially informative samples. These are usually the ones that we expect to improve the accuracy of the classiﬁer to the greatest extent when used for training. Another consideration is the reduction of training time. Even when all samples are labeled, we may want to consider only a subset of the available data because training the classiﬁer of choice using all the data might be computationally too demanding. Instead of sampling a subset uniformly at random, which is referred to as passive learning, we would like to select informative samples to maximize accuracy with less training data. Active learning denotes the process of autonomously selecting promising data points to learn from. By choosing samples actively, we introduce a selection bias. This violates the assumption underlying most learning algorithms that training and test data are identically distributed: an issue we have to address to avoid detrimental effects on the generalization performance. 1

In theory, active learning is possible with any classiﬁer that is capable of passive learning. This review focuses on the support vector machine (SVM) classiﬁer. It is a state-of-the-art method, which has proven to give highly accurate results in the passive learning scenario and which has some favorable properties that make it especially suitable for active learning: (i) SVMs learn a linear decision boundary, typically in a kernel-induced feature space. Measuring the distance of a sample to this boundary is straightforward and provides an estimate of its informativeness. (ii)

Efﬁcient online learning algorithms make it possible to obtain a sufﬁciently accurate approximation of the optimal SVM solution without retraining on the whole dataset. (iii) The SVM can weight the inﬂuence of single samples in a simple manner. This allows for compensating the selection bias that active learning introduces.

2

Active Learning

In the following we focus on supervised learning for classiﬁcation. There also exists a body of work on active learning with SVMs in other settings such as regression [15] and ranking [8, 48]. A discussion of these settings is, however, beyond the scope of this article.

The training set is given by L = {(x1 , y1 ), ..., (x , y )} ⊂ X × Y. It consists of labeled samples that are drawn independently from an unknown distribution D. This distribution is deﬁned over

X × Y, the cross product of a feature space X and a label space Y, with Y = {−1, 1} in the binary case. We try to infer a hypothesis f : X → Z mapping inputs to a prediction space Z for predicting the labels of samples drawn from D. To measure the quality of our prediction, we deﬁne a loss function L : Z × X × Y → R+ . Thus, our learning goal is minimizing the expected loss

R(f ) = E(x,y)∼D L(f (x), x, y) ,

(1)

which is called the risk of f . We call the average loss over a ﬁnite sample L the training error or empirical risk. If a loss function does not depend on the second argument, we simply omit it.

In sampling-based active learning, there are two scenarios: stream-based and pool-based. In streambased active learning, we analyze incoming unlabeled samples sequentially, one sample at a time.

Contrary, in pool-based active learning we have access to a pool of unlabeled samples at once. In this case, we can rank samples based on a selection criterion and query the most informative ones.

Although, some of the methods in this review are also applicable to stream-based learning, most of them consider the pool-based scenario. In the case of pool-based active learning, we have, in addition to the labeled set L, access to a set of m unlabeled samples U = {x +1 , ..., x +m }. We assume that there exists a way to provide us with a label for any sample from this set (the probability of the label is given by D conditioned on the sample). This may involve labeling costs, and the number of queries we are allowed to make may be restricted by a budget. After labeling a sample, we simply add it to our training set.

In general, we aim at achieving a minimum risk by requesting as few labels as possible. We can estimate this risk by computing the average error over an independent test set not used in the training process. Ultimately, we hope to require less labeled samples for inferring a hypothesis performing as well as a hypothesis generated by passive learning on L and completely labeled U.

In practice, one can proﬁt from an active learner if only few labeled samples are available and labeling is costly, or when learning has to be restricted to a subset of the available data to render the computation feasible. A list of real-world applications is given in the general active learning survey [39] and in a review paper which considers active learning for natural language processing [29].

Active learning can also be employed in the context of transfer learning [41]. In this setting, samples from the unlabeled target domain are selected for labeling and included in the source domain.

A classiﬁer trained on the augmented source dataset can then exploit the additional samples to increase its accuracy in the target domain. This technique has been used successfully, for example, in an astronomy application [34] to address a sample selection bias, which causes source and target probability distributions to mismatch [33].

2

3

Support Vector Machine

Support vector machines (SVMs) are state-of-the-art classiﬁers [6, 12, 26, 36, 38, 40]. They have proven to provide well-generalizing solutions in practice and are well understood theoretically [42].

The kernel trick [38] allows for an easy handling of diverse data representations (e.g., biological sequences or multimodal data). Support vector machines perform linear discrimination in a kernelinduced feature space and are based on the idea of large margin separation: they try to maximize the distance between the decision boundary and the correctly classiﬁed points closest to this boundary.

In the following, we formalize SVMs to ﬁx our notation, for a detailed introduction we refer to the recent WIREs articles [26, 36].

An SVM for binary classiﬁcation labels an input x according to the sign of a decision function of the form f (x) = w, φ(x) + b =

αi yi κ(xi , x) + b ,

(2)

i=1

where κ is a positive semi-deﬁnite kernel function [1] and φ(x) is a mapping X → F to a kernelinduced Hilbert space F such that κ(xi , xj ) = φ(xi ), φ(xj ) . We call F the feature space, which includes the weight vector w = i=1 αi yi φ(xi ) ∈ F. The training patterns xi with αi > 0 are called support vectors. The decision boundary is linear in F and the offset from the origin is controlled by b ∈ R.

The distance of a pattern (x, y) from the decision boundary is given by |f (x)/ w |. We call yf (x) the functional margin and yf (x)/ w the geometric margin: a positive margin implies correct classiﬁcation. Let us assume that the training data L is linearly separable in F. Then m(L, f ) = min(x,y)∈L yf (x) deﬁnes the margin of the whole data set L with respect to f (in the following we do not indicate the dependency on L and f if it is clear from the context). We call the feature space region {x ∈ X | |f (x)| ≤ 1} the margin band [9].

A hard margin SVM computes the linear hypothesis that separates the data and yields a maximum margin by solving max w,b,γ

γ

(3)

subject to yi ( w, φ(xi ) + b) ≥ γ w =1

,

i = 1, . . . ,

with w ∈ F, b ∈ R and γ ∈ R [40]. Instead of maximizing γ and keeping the norm of w ﬁxed to one, one can equivalently minimize w and ﬁx a target margin, typically γ = 1.

In general, we cannot or do not want to separate the full training data correctly. Soft margin SVMs mitigate the concept of large margin separation. They are best understood as the solutions of the regularized risk minimization problem min w,b

1 w 2

2

+

Ci Lhinge ( w, φ(xi ) + b, yi ) .

(4)

i=1

Here, Lhinge (f (xi ), yi ) = max(0, 1 − yi f (xi )) denotes the hinge loss. An optimal solution w∗ =

∗

∗ i=1 αi yi φ(xi ) has the property that 0 ≤ αi ≤ Ci for i = 1, . . . , . For soft margin SVMs, the patterns in L need not be linearly separable in F. If they are, increasing the Ci until an optimal

∗

solution satisﬁes αi < Ci for all i leads to the same hypothesis as training a hard margin SVM.

Usually, all samples are given the same weight Ci = C.

4

Uncertainty Sampling

It seems to be intuitive to query labels for samples that cannot be easily classiﬁed using our current classiﬁer. Consider the contrary: if we are very certain about the class of a sample, then we might regard any label that does not reﬂect our expectation as noise. On the other hand, uncovering an expected label would not make us modify our current hypothesis.

3

Uncertainty sampling was introduced by Lewis and Gale [23]. The idea is that the samples the learner is most uncertain about provide the greatest insight into the underlying data distribution.

Figure 1 shows an example in the case of an SVM. Among the three different unlabeled candidates, our intuition may suggest to ask for the label of the sample closest to the decision boundary: the labels of the other candidates seem to clearly match the class of the samples on the respective side or are otherwise simply mislabeled. In the following, we want to show how this intuitive choice can be justiﬁed and how it leads to a number of active learning algorithms that make use of the special properties of an SVM.

Figure 1: The three rectangles depict unlabeled samples while the blue circles and orange triangles represent positively and negatively labeled samples, respectively. Intuitively, the label of the sample xa might tell us the most about the underlying distribution of labeled samples, since in the feature space, φ(xa ) is closer to the decision boundary than φ(xb ) or φ(xc ).

4.1

Version Space

The version space is a construct that helps to keep track of all hypotheses that are able to perfectly classify our current observations [27]. Thus, for the moment, we assume that our data are linearly separable in the feature space. The idea is to speed up learning by selecting samples in a way that minimizes the version space rapidly with each labeling.

We can express the hard margin SVM classiﬁer (3) in terms of a geometric representation of the version space. For this purpose, we restrict our consideration to hypotheses f (x) = w, φ(x) without bias (i.e., b = 0). The following results, however, can be extended to SVMs with b = 0.

The version space V(L) refers to the subset of F that includes all hypotheses consistent with the training set L [27]:

V(L) := w ∈ F | w = 1, yf (x) > 0, ∀(x, y) ∈ L

(5)

In this representation, we can interpret the hypothesis space as the unit hypersphere given by w =

1. The surface of the hypersphere includes all possible hypotheses classifying samples that are mapped into the feature space F. We deﬁne Λ(V) as the area the version space occupies on the surface of this hypersphere. This is depicted in Figure 2. The hypothesis space is represented by the big sphere. The white front, which is cut out by the two hyperplanes, depicts the version space.

Each sample x can be interpreted as deﬁning a hyperplane through the origin of F with the normal vector φ(x). Each hyperplane divides the feature space into two half-spaces. Depending on the label y of the sample x, the version space is restricted to the surface of the hypersphere that lies on the respective side of the hyperplane. For example, a sample x that is labeled with y = +1, restricts the version space to all w on the unit hypersphere for which w, φ(x) > 0, i.e., the ones that lie on the positive side of the hyperplane deﬁned by the normal vector φ(x). Thus, the version space is deﬁned by the intersection of all half-spaces and the surface of the hypothesis hypersphere.

Figure 2a illustrates this geometric relationship.

4

(a) The sphere depicts the hypothesis space. The two hyperplanes are induced by two labeled samples. The version space is the part of the sphere surface (in white) that is on one side of each hyperplane. The respective side is deﬁned by its label.

(b) The center (in black) of the orange sphere depicts the SVM solution within the version space.

It has the maximum distance to the hyperplanes delimiting the version space. The normals of these hyperplanes, which are touched by the orange sphere, correspond to the support vectors.

Figure 2: Geometric representation of the version space in 3D following Tong [43].

If we consider a feature mapping with the property ∀x, z ∈ X : φ(x) = φ(z) , such as normalized kernels, including the frequently used Gaussian kernel, then the SVM solution (3) has a particularly nice geometric interpretation in the version space. Under this condition, the decision function f (x) = w, φ(x) maximizing the margin m(L, f ) (i.e., the minimum y w, φ(x) over L) also maximizes the minimum distance between w and any hyperplane deﬁned by a normal φ(xi ), i = 1, . . . , . The solution is a point within the version space, which is the center of a hypersphere, depicted in orange in Figure 2b. This hypersphere yields the maximum radius possible without intersecting the hyperplanes that delimit the version space. The radius is given by r = y w,φ(x) , φ(x) where φ(x) is any support vector. Changing our perspective, we can interpret the normals φ(xi ) of the hyperplanes touching the hypersphere as points in the feature space F. Then, these are exactly the support vectors since they have the minimum distance to our decision boundary deﬁned by w.

This distance is m(L, f ), which turns out to be proportional to the radius r.

4.2

Implicit Version Space

An explicit version space, as deﬁned above, only exists if the data are separable, which is often not the case in practice. The Bayes optimal solution need not have vanishing training error (as soon as under D we have p(y1 |x) ≥ p(y2 |x) > 0 for some x ∈ X and y1 , y2 ∈ Y with y1 = y2 ). Thus, minimizing the version space might exclude hypotheses with non-zero training error that are in fact optimal. In agnostic active learning [2], we do not make the assumption of an existing optimal zeroerror decision boundary. An algorithm that is theoretically capable of active learning in an agnostic setting is the A2 -algorithm [2]. Here, a hypothesis cannot be deleted due to its disagreement with a single sample. If, however, all hypotheses that are part of the current version space agree on a region within the feature space, this region can be discarded. For each hypothesis the algorithm keeps an upper and a lower bound on its training error (see the work by Balcan et al. [2] for details).

It subsequently excludes all hypotheses which have a lower bound that is higher than the global minimal upper bound. Despite being intractable in practice, this algorithm forms the basis of some important algorithms compensating the selection bias as discussed below.

4.3

Uncertainty-Based Active Learning

Although the version space is restricted to separable problems, it motivates many general active selection strategies. Which samples should we query to reduce the version space? As we have seen previously, each labeled sample that becomes a support vector restricts the version space to one side of the hyperplane it induces in F. If we do not know the correct label of a sample in advance, we

5

(a) Simple Margin will query the sample that induces a hyperplane lying closest to the SVM solution. In this case, it would query sample xa .

(b) Here the SVM does not provide a good approximation of the version space area. Simple

Margin would query sample xa while xc might have been a more suitable choice.

Figure 3: The version space area is shown in white, the solid lines depict the hyperplanes induced by the support vectors, the center of the orange circle is the weight vector w of the current SVM. The dotted lines show the hyperplanes that are induced by unlabeled samples. This visualization is inspired by Tong [43].

should always query the sample that ideally halves the version space. This is a safe choice as we will reduce it regardless of the label. Computing the version space in a high-dimensional feature space is usually intractable, but we can approximate it efﬁciently using the SVM. In the version space, the

SVM solution w is the center of the hypersphere touching the hyperplanes induced by the support vectors. Each hyperplane delimits the version space. Assuming that the center of this hypersphere is close to the center of the version space, we can use it as an approximation. If we now choose a hyperplane that is close to this center, we approximately bisect the version space. Therefore, we want to query the sample x that induces a hyperplane as close to w as possible:

ˆ

x = argmin | w, φ(x) | = argmin |f (x)|

ˆ

x∈U

(6)

x∈U

This strategy queries the sample closest to the current decision boundary and is called Simple Margin [43]. Figure 3 shows this principle geometrically, projected to two dimensions.

By querying the samples closest to the separating hyperplane, we try to minimize the version space by requesting as few labels as possible. However, depending on the actual shape of the version space, the SVM solution may not provide a good approximation and another query strategy would have achieved a greater reduction of the version space area. This is illustrated in Figure 3b. The strategy of myopically querying the samples with the smallest margin may even perform worse than a passive learner.

Therefore, we can choose a different heuristic to approximate the version space more accurately [37,

45]. For instance, we could compute two SVMs for each sample: one for the case we labeled it positively and one assuming a negative label. We can then, for each case, compute the margins m+ = + w, φ(x) and m− = − w, φ(x) . Finally, we query the sample which gains the maximum value for min(m+ , m− ). This quantity will be very small if the corresponding version spaces are very different. Thus, we take the maximum to gain an equal split. This strategy is called MaxMin

Margin [43] and allows us to make a better choice in case of an irregular version space area, as we can see in Figure 4. This, however, comes with the additional costs of computing the margin for each potential labeling.

Uncertainty sampling can also be motivated by trying to minimize the training error directly [9].

Depending on the assumptions made, we can arrive at different strategies. Considering the classiﬁer has just been trained on few labeled data, we assume the prospective labels of the yet unlabeled data to be uncorrelated with the predicted labels. Therefore, we want to select the sample for which we can expect the largest error, namely

1

max(0, 1 − f (x)) + max(0, 1 + f (x)) ,

(7)

x = argmax

ˆ

2 x∈U 6

Figure 4: In this case, the MaxMin Margin strategy would query sample xc . Each of the two orange circles correspond to an SVM trained with a positive and a negative labeling of xc [43].

where we assume the hinge loss and f (x) as deﬁned in (2). Assuming a separable dataset, we are only interested in uncertain samples, i.e., those within the margin band. Under these constraints, any choice of x leads to the same value of the objective (7): we select the sample at random in this case. However, as soon as some labeled samples are available for SVM training, the prediction of the SVM for an unlabeled point x is expected to be positively correlated with the label of x. Thus, we assume correct labeling and look for each sample at the minimum error we gain regardless of the labeling. We want to ﬁnd the sample that maximizes this quantity, i.e., x = argmax min max(0, 1 − f (x)), max(0, 1 + f (x)) = argmin |f (x)| ,

ˆ

x∈U

(8)

x∈U

which gives us the same selection criterion as in (6), the Simple Margin strategy. If all unlabeled samples meet the target margin (i.e., the hinge loss of the samples is zero) and if we assume the

SVM labels them correctly (i.e., |f (x)| ≥ 1), it seems that we have arrived at a hypothesis that already generalizes well. Both, picking samples near or far away from the boundary appears to be non-optimal. Therefore, we simply proceed by choosing a random sample from the unlabeled data.

In practice, we may start by training on a random subset and perform uncertainty sampling until the user stops the process or until all unlabeled samples meet the target margin. In this case, we query another random subset as a validation set and estimate the error. We may repeat the last step until we reach a satisfactory solution.

4.4

Expected Model Change

Instead of trying to minimize an explicit or implicit version space, we can ﬁnd the most informative sample by selecting it based on its expected effect on the current model, the expected model change.

In gradient-based learning, this means selecting the sample x ∈ U that, if labeled, would maximize the expected gradient length, where we take the expectation Ey over all possible labels y.

Non-linear SVMs are usually trained by solving the underlying quadratic optimization problem in its Wolfe dual representation [12, 38]. Let us assume SVMs without bias. If we add a new sample

(x +1 , y +1 ) to the current SVM solution and initialize its coefﬁcient with α +1 = 0, the partial derivative with respect to α +1 of the dual problem W (α) to be maximized is g +1

=

∂W (α)

=1−y

∂α +1

αi yi κ(xi , x

+1

+1 )

=1−y

+1 f (x +1 )

.

(9)

i=1

As αi is constrained to be non-negative, we only change the model if the partial derivative is positive, that is, if y +1 f (x +1 ) < 1. Note that y +1 f (x +1 ) > 1 implies that (x +1 , y +1 ) is already correctly classiﬁed by the current model and meets the target margin.

7

Let us assume that our current model classiﬁes any sample perfectly and that the dataset is linearly separable in the feature space:

1

0

p(y|x) =

If we just consider the partial derivative g arrive at selecting

+1

if y f (x) > 0 otherwise .

(10)

in the expected model change selection criterion, we

x = argmax p(y = 1|x)|1 − f (x)| + p(y = −1|x)|1 + f (x)|

ˆ

x∈U

= argmax p(y = 1|x)(1 − f (x)) + p(y = −1|x)(1 + f (x)) x∈U = argmax x∈U 1 − f (x) if f (x) > 0

= argmin |f (x)| .

1 + f (x) if f (x) < 0 x∈U (11)

Thus, uncertainty sampling can also be motivated by maximizing the expected model change. Next, we want to have a look at approaches that try to exploit the uncertainty of samples and simultaneously explore undiscovered regions of the feature space.

5

Combining Informativeness and Representativeness

By performing mere uncertainty sampling, we may pay too much attention to certain regions of the feature space and neglect other regions that are more representative of the underlying distribution.

This leads to a sub-optimal classiﬁer. To counteract this effect, we could sample close to the decision boundary, but also systematically include samples that are farer away [18, 20].

In Figure 5, we see an example where uncertainty sampling can mislead the classiﬁer. Although φ(xa ) is closest to the separating hyperplane, it is also far away from all other samples in feature space and thus may not be representative of the underlying distribution. To avoid querying outliers, one would like to select samples not only based on their informativeness, but also based on representativeness [13]. In our example, selecting sample xb would be a better choice, because it is located in a more densely populated region where a correct classiﬁcation is of more importance to gain an accurate model.

Figure 5: The white rectangles depict unlabeled samples. The blue circle and the orange triangle are labeled as positive and negative, respectively. In feature space, φ(xa ) lies closer to the separating hyperplane than φ(xb ), but is located in a region, which is not densely populated. Using pure uncertainty sampling, e.g., Simple

Margin, we would query the label of sample xa .

Informativeness is a measure of how much querying a sample would reduce the uncertainty of our model. As we have seen, uncertainty sampling is a viable method to exploit informativeness. Representativeness measures how well a sample represents the underlying distribution of unlabeled data [39]. By using a selection criterion that maximizes both measures, we try to improve our models with less samples than a passive learner while carefully avoiding a model that is too biased. In

8

Figure 6 we can see a comparison of the different strategies using a toy example where we sequentially query six samples. Figure 6a shows a biased classiﬁer as the result of uncertainty sampling.

While the solution in Figure 6b is closer to the optimal hyperplane, it also converges slower, as it additionally explores regions where we are relatively certain about the labels. Combining both strategies, as shown in Figure 6c, yields a decision boundary that is close to the optimal (Figure 6d) with fewer labels.

(a) Uncertainty sampling.

(b) Selecting representative samples.

(c) Combining informativeness and representativeness.

(d) Optimal hyperplane, obtained by training on the whole dataset.

Figure 6: Binary classiﬁcation with active learning on six samples and passive learning on the full dataset.

5.1

Semi-Supervised Active Learning

To avoid oversampling unrepresentative outliers, we can combine uncertainty sampling and clustering [47]. First, we train an SVM on the labeled samples and then apply k-means clustering to all unlabeled samples within the margin band to identify k groups. Finally, we query the k medoids.

As only samples within the margin band are considered, they are all subject to high uncertainty.

The clustering ensures that the informativeness of our selection is increased by avoiding redundant samples. However, when using clustering, one has to decide what constitutes a cluster [14]. Depending on the scale and the selected number of clusters, different choices could be equally plausible. To avoid this dilemma, we can choose another strategy to incorporate density information [22]. We build on the min-max formulation [21] of active learning and request the sample x = argmin

ˆ

xs ∈U

max

min

ys ∈{−1,+1} w,b

1 w 2

9

2

+C

L(f (x), x, y)

(x,y)∈Ls

(12)

where Ls = L ∪ (xs , ys ). We take the minimum regularized expected risk when including the sample xs ∈ U with the label ys that yields the maximum error. Selecting the sample x minimizing

ˆ

this quantity can be approximated by uncertainty sampling (e.g., using Simple Margin).

In this formulation, however, we base our decision only on the labeled samples and do not take into account the distribution of unlabeled samples. Assuming we knew the labels for each sample in U, we deﬁne the set Lsu containing the labeled samples (x, y) for x ∈ U and the training set L. We select the sample x = argmin

ˆ

xs ∈U

min

max

min

yu ∈{±1}nu −1 ys ∈{−1,+1} w,b

1 w 2

2

+C

L(f (x), x, y) ,

(13)

(x,y)∈Lsu

where nu = |U| and yu is the label vector assigned to the samples x ∈ U without the label for xs .

Thus, we also maximize the representativeness of the selected sample by incorporating the possible labelings of all unlabeled samples. By using a quadratic loss-function and relaxing yu to continuous values, we can approximate the solution through the minimization of a convex problem [22].

Both clustering and label estimation are of high computational complexity. A simpler algorithm, which the authors call Hinted SVM, considers unlabeled samples without resorting to these techniques [24]. Instead, the unlabeled samples are taken as so-called hints that inform the algorithm of feature space regions which it should be less conﬁdent in. To achieve this, we try to simultaneously ﬁnd a decision boundary that produces a low training error on the labeled samples while being close to the unlabeled samples, the hints. This can be viewed as semi-supervised learning [10]. It is, however, in contrast to typical semi-supervised SVM approaches that push the decision boundary away from the pool of unlabeled samples.

The performance of this algorithm depends on the hint selection strategy. Using all unlabeled samples might be too costly for large datasets while uniform sampling of the unlabeled pool does not consider the information provided by labeled samples. Therefore, we can start with the pool of all unlabeled samples and iteratively drop instances that are close to already labeled ones. When the ratio of hints to all samples is below a certain threshold, we can switch to uncertainty sampling.

Another problem with uncertainty sampling is that it assumes our current hypothesis to be very certain about regions far from the decision boundary. If this assumption is violated, we will end up with a classiﬁer worse than obtained using passive learning. One way to address this issue is to measure the uncertainty of our current hypothesis and to adjust our query strategy accordingly [28].

To achieve this, we compute a heuristic measure expressing the conﬁdence that the current set of support vectors will not change if we train on more data. It is calculated as c= 2

|LSV | · k

+ − min(kx , kx ) ,

(14)

(x,y)∈LSV

+

−

where LSV are the support vectors and kx and kx are the number of positively and negatively labeled

+

− samples within the k nearest neighbors of (x, y) ∈ LSV . In the extremes, we get c = 1 if kx = kx

+

− and c = 0 if kx = 0 ∨ kx = 0 for all (x, y) ∈ LSV . We can use this measure to decide whether a labeled data point (x, y) should be kept for training by adding it with probability

p(x) =

c if yf (x) ≤ 1

1 − c otherwise.

(15)

to the training data set. This means that more samples are queried within the margin if we are very conﬁdent that the current hyperplane represents the optimal one. In the following, we discuss a related idea, which not only queries samples with a certain probability, but also subsequently incorporates this probability to weight the impact of the training set samples.

5.2

Importance-Weighted Active Learning

When we query samples actively instead of selecting them uniformly at random, the training and test samples are not independent and identically distributed (i.i.d.). Thus, the training set will have a sample selection bias. As most classiﬁers rely on the i.i.d. assumption, this can severely impair the prediction performance.

10

˜

Assume that we sample the training data points from the biased sample distribution D over X × Y, while our goal is minimizing the risk (1) with respect to the true distribution D. If we know the

˜

relationship between D and D, we can still arrive at an unbiased hypothesis by re-weighting the loss for each sample [11, 49]. We introduce the weighted loss Lw (z, x, y) = w(x, y)L(z, x, y) and deﬁne the weighting function pD (x, y) w(x, y) =

(16)

pD (x, y)

˜

˜ reﬂecting how likely it is to observe (x, y) under D compared to D under the assumption that the

˜ This leads us to the basic result support of D is included in D.

E

˜

(x,y)∼D

Lw (f (x), x, y) =

(x,y)∈X ×Y

=

pD (x, y)

˜

pD (x, y)

L(f (x), y)d(x, y) pD (x, y)

˜

pD (x, y)L(f (x), y)d(x, y) =

(x,y)∈X ×Y

E

(x,y)∼D

L(f (x), x, y) . (17)

Thus, by choosing appropriate weights we can modify our loss-function such that we can compute an unbiased estimator of the generalization error R(f ). This technique for addressing the sample selection bias is called importance weighting [3, 4].

We deﬁne a weighted sample set Lw as the training set L augmented with non-negative weights w1 , . . . , w for each point in L. These weights are used to set w(xi , yi ) = wi when computing the weighted loss. For the soft margin SVM minimizing the weighted loss can easily be achieved by multiplying each regularization parameter Ci in (4) with the corresponding weight, i.e., Ci = wi · C [49]. While the weighting gives an unbiased estimator, it may be difﬁcult to estimate the weights reliably and the variance of the estimator may be very high. Controlling the variance is a crucial problem when using importance weighting.

The original importance-weighted active learning formulation works in a stream-based scenario and inspects one sample xt at each step t > 1 [3]. Iteration t of the algorithm works as follows:

1. Receive the unlabeled sample xt .

2. Choose pt ∈ [0, 1] based on all information available in this round.

3. With probability pt , query the label yt for xt , add (xt , yt ) to the weighted training set with weight wt = 1/pt , and retrain the classiﬁer.

In step 2 the query probability pt has to be chosen based on earlier observations: this could be, for instance, the probability that two hypotheses disagree on the received sample xt .

This algorithm can also be adapted to the pool-based scenario [16]. In this case, we can simply deﬁne a probability distribution over all unlabeled samples in the pool. We set the probability for each point in proportion to its uncertainty, i.e., its distance to the decision boundary. This works well if we assume a noise-free setting. Otherwise, this method suffers from the same problems as other approaches that are based on a version space. Given a mislabeled sample (i.e., the label has a very low probability given the features), the active learner can be distracted and focus on regions within the hypothesis space which do not include the optimal decision boundary.

One way to circumvent these problems is to combine importance weighting with ideas from agnostic active learning [50]. We keep an ensemble of SVMs H = {f1 , ..., fK } and train each on a bootstrap sample subset from L, which may be initialized with a random subset of the unlabeled pool U for which labels are requested. After the initialization, we choose points x ∈ U with selection probability pt (x) = pthreshold + (1 − pthreshold )(pmax (x) − pmin (x)) ,

(18)

where pthreshold > 0 is a small minimum probability to ensure that pt (x) > 0. Using Platt’s method [31], we deﬁne pi (x) ∈ [0, 1] to be the probabilistic interpretation of an SVM fi with pmin = min1≤i≤K pi (x) and pmax = max1≤i≤K pi (x). Thus, pt is high if there is a strong disagreement within the ensemble and low if all classiﬁers agree. This allows to deal with noise, because no hypothesis gets excluded forever.

11

6

Multi-Class Active Learning

The majority of research on active learning with SVMs focuses on the binary case, because dealing with more categories makes estimating the uncertainty of a sample more difﬁcult. Furthermore, multi-class SVMs are in general more time consuming to train. There are different approaches to extend SVMs to multi-class classiﬁcation. A popular way is to reduce the learning task to multiple binary problems. This is done by using either a one-vs-one [19, 32] or a one-vs-all [35, 46] approach.

However, performing uncertainty sampling with respect to each single SVM may cause the problem that one sample is informative for one binary task, but bears little information for the other tasks and, thus, for the overall multi-class classiﬁcation.

It is possible to extend the version space minimization strategy to the multi-class case [43]. The area of the version space is proportional to the probability of classifying the training set correctly, given a hypothesis sampled at random from the version space. In the one-vs-all approach for N classes, we maintain N binary SVM models f1 , . . . , fN where the ith SVM model is trained to separate class i from the other classes. We consider minimizing the maximum product of all N version space areas to select

N

(i)

Λ(Vx,y )

x = argmin max

ˆ

x∈U

y∈{−1,1}

(19)

i=1

(i)

where Λ(Vx,y ) is the area of the version space of the i-th SVM if the sample (x, y) : x ∈ U was included in the training set. To approximate the area of the version space, we can use MaxMin

Margin for each binary SVM. The margin of a sample in a single SVM only reﬂects the uncertainty with respect to the speciﬁc binary problem and not in relation to the other classiﬁcation problems.

Therefore, we have to modify our approximation if we want to extend the Simple Margin strategy to the multi-class case.

(i)

Figure 7: Single version space for the multi-class problem with N one-vs-all SVMs [43]. The area Λ(Vx,y=i )

(i)

corresponds to the version space area if the label y = i for the sample we picked. The area Λ(Vx,y=i ) corresponds to the case where y = i. In the multi-class case, we want to measure both quantities to approximate the version space area.

We can interpret each fi (x) as a quantity that measures to what extent x splits the version space.

As we have N different classiﬁers that inﬂuence the area of the version space, we have to quantify this inﬂuence for each one. In Figure 7, we can see the version space for one single binary problem where we want to discriminate between class i and the rest. Thus, we want to approximate the area

(i)

Λ(Vx,y=i ). If we choose a sample x where fi (x) = 0, we approximately halve the version space, for fi (x) = 1, the area approximately stays the same and for fi (x) = −1, we gain a zero area;

(i)

similarly for the area Λ(Vx,y=i ). Therefore, we can use the approximation

0.5 · (1 + fi (x)) · Λ(V (i) ) if y = i

.

(20)

0.5 · (1 − fi (x)) · Λ(V (i) ) if y = i

We can also employ one-versus-one multi-class SVMs [25]. Again, we use Platt’s algorithm [31] to approximate posterior probabilities for our predictions. We simultaneously ﬁt the probabilistic

(i)

Λ(Vx,y ) =

12

model and the SVM hyperparameters via grid-search to derive a classiﬁcation probability pk (x), k = 1, . . . , K, for each of the K SVMs given the sample x. We can use these probabilities for active sample selection in different ways. A simple approach is to just select the sample with the least classiﬁcation conﬁdence. This corresponds to the selection criterion xLC = argmin

ˆ

x∈U

min

pk (x) .

k∈{1,...,K}

(21)

This approach suffers from the same problem we mentioned earlier: the probability is connected only to each single binary problem instead of providing a measure that relates it to the other classiﬁers. To alleviate this, we can choose another approach, called breaking ties [25]. Here, we select the sample with the minimum difference between the highest class conﬁdences, namely xBT = argmin

ˆ

x∈U

min k,l∈{1,...,K},k=l pk (x) − pl (x) .

(22)

This way, we prefer samples which two classiﬁers claim to be certain about, and thus, avoid considering the uncertainty of one classiﬁer in an isolated manner.

7

Efﬁcient Active Learning

In passive learning, selecting a training set comes almost for free, we just select a random subset of the labeled data. In active learning, we have to evaluate whether a sample should be added to the training set based on its ability to improve our prediction. A simple strategy to reduce computational complexity is to not only select single samples, but to collect them in batches instead. However, to proﬁt from this strategy, we have to make sure to create batches that minimize redundancy.

Another consideration that makes efﬁcient computation necessary is that for many algorithms we have to retrain our model on different training subsets. Usually, these subsets differ only by one or the few samples we consider for selection. Thus, we can employ online learning to train our model incrementally. In particular, this makes sense if we analyze the effect that adding a sample has on our model.

7.1

Online Learning

After we have selected a sample to be included in the training set, we have to retrain our model to reﬂect the additional data. Using the whole dataset for retraining is computationally expensive. A better option would be to incrementally improve our model with each selected sample through online learning. LASVM [5, 17] is an SVM solver for fast online training. It is based on a decomposition method [30] solving the learning problem iteratively by considering only a subset of the α-variables in each iteration. In LASVM, this subset considers one unlabeled sample (corresponding to a new variable) in every second iteration, which can, for instance, be picked by uncertainty sampling [5].

7.2

Batch-Mode Active Learning

When confronted with large amounts of unlabeled data, estimating the effect of single samples with respect to the learning objective is costly. Besides online learning, we can also gain a speed-up by labeling samples in batches. A naive strategy is to just select the n samples that are closest to the decision boundary [44]. This approach, however, does not take into account that the samples within the batch might bear a high level of redundancy.

To counteract this redundancy, we can select samples not only due to their individual informativeness, but also if they maximize the diversity within each batch [7]. One heuristic is to maximize the angle between the hyperplanes that the samples induce in version space:

| cos(∠(φ(xi ), φ(xj ))| =

| φ(xi ), φ(xj ) |

=

φ(xi ) φ(xj )

|κ(xi , xj )| κ(xi , xi )κ(xj , xj )

(23)

Let S be the batch of samples, which we initialize with one sample xS . We subsequently add the sample x whose corresponding hyperplane minimizes the maximum angle between any other

ˆ

hyperplane induced by a sample in the batch. It is computed as x = argmin max | cos(∠(φ(x), φ(z))| .

ˆ

x∈U \S z∈S

13

(24)

We can form a convex combination of this diversity measure with the well-known uncertainty measure (distance to the hyperplane) with trade-off parameter λ ∈ [0, 1]. Then, samples that should be added to the batch are iteratively chosen as x = argmin λ|f (x)| + (1 − λ) max | cos(∠(φ(x), φ(z))|

ˆ

z∈S

x∈U \S

.

(25)

We can choose λ = 0.5 to give equal weight to the uncertainty and diversity measure. An optimal value, however, might depend on how certain we are about the accuracy of the current classiﬁer.

8

Conclusion

Access to unlabeled data allows us to improve predictive models in data mining applications. If it is only possible to label a limited amount of the available data due to labeling costs, we should choose this subset carefully and focus on patterns carrying the information most helpful to enhance the model. Support vector machines (SVMs) have convenient properties that make it easy to evaluate how unlabeled samples would inﬂuence the model if they were labeled and included in the training set. Therefore, SVMs are particularly well-suited for active learning. However, there are several challenges we have to address, such as efﬁcient learning, dealing with multiple classes, and that actively choosing the training data introduces a selection bias. Importance weighting seems to be most promising to counteract this bias, and it can be easily incorporated into an active SVM learner.

Devising parallel algorithms for sample selection can speed up learning in many cases. Most of the research in active SVM learning so far has focused on binary decision problems. A challenge for future research is to develop efﬁcient active learning algorithms for multi-class SVMs that address the nature of the multi-class decision in a more principled way.

Acknowledgments.

The authors gratefully acknowledge support from The Danish Council for Independent Research through the project SkyML (FNU 12-125149).

References

[1] N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical

Society, 68(3):337–404, 1950.

[2] M.-F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 65–72. ACM Press, 2006.

[3] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 49–56. ACM

Press, 2009.

[4] A. Beygelzimer, J. Langford, Z. Tong, and D. Hsu. Agnostic active learning without constraints. In Advances in Neural Information Processing Systems (NIPS), pages 199–207. MIT

Press, 2010.

[5] A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classiﬁers with online and active learning. Journal of Machine Learning Research (JMLR), 6:1579–1619, 2005.

[6] B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classiﬁers. In

Workshop on Computational Learning Theory (COLT), pages 144–152. ACM Press, 1992.

[7] K. Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), pages 59–66. AAAI

Press, 2003.

[8] K. Brinker. Active learning of label ranking functions. In Proceedings of the International

Conference on Machine Learning (ICML), pages 129–136. ACM Press, 2004.

[9] C. Campbell, N. Cristianini, and A. Smola. Query learning with large margin classiﬁers. In

Proceedings of the International Conference on Machine Learning (ICML), pages 111–118.

Morgan Kaufmann, 2000.

14

[10] O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, 2006. o [11] C. Cortes, M. Mohri, M. Riley, and A. Rostamizadeh. Sample selection bias correction theory.

In N. Bshouty, G. Stoltz, N. Vayatis, and T. Zeugmann, editors, Algorithmic Learning Theory, pages 38–53. Springer, 2008.

[12] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.

[13] S. Dasgupta. Two faces of active learning. Theoretical Computer Science, 412(19):1767–1781,

2011.

[14] S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In Proceedings of the

International Conference on Machine Learning (ICML), pages 208–215. ACM Press, 2008.

[15] B. Demir and L. Bruzzone. A novel active learning method for support vector regression to estimate biophysical parameters from remotely sensed images. In L. Bruzzone, editor, Proceedings of SPIE 8537, Image and Signal Processing for Remote Sensing XVIII, volume 8537, page 85370L. International Society for Optics and Photonics, 2012.

[16] R. Ganti and A. Gray. Upal: Unbiased pool based active learning. In Proceedings of the

International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pages 422–431,

2012.

[17] T. Glasmachers and C. Igel. Second-order SMO improves SVM online and active learning.

Neural Computation, 20(2):374–382, 2008.

[18] I. Guyon, G. Cawley, G. Dror, and V. Lemaire. Results of the active learning challenge. Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings, 16:19–45,

2011.

[19] T. Hastie and R. Tibshirani.

26(2):451–471, 1998.

Classiﬁcation by pairwise coupling.

Annals of Statistics,

[20] C.-H. Ho, M.-H. Tsai, and C.-J. Lin. Active learning and experimental design with SVMs.

Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings,

16:71–84, 2011.

[21] S. Hoi, R. Jin, J. Zhu, and M. Lyu. Semi-supervised SVM batch mode active learning for image retrieval. In Proceedings of the Conference on Computer Vision and Pattern Recognition

(CVPR), pages 1–7. IEEE, 2008.

[22] S.-J. Huang, R. Jin, and Z.-H. Zhou. Active learning by querying informative and representative examples. In Advances in Neural Information Processing Systems (NIPS), pages 892–900.

MIT Press, 2010.

[23] D. Lewis and W. Gale. A sequential algorithm for training text classiﬁers. In Proceedings of the SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages

3–12. ACM Press, 1994.

[24] C.-L. Li, C.-S. Ferng, and H.-T. Lin. Active learning with hinted support vector machine. Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings, 25:221–

235, 2012.

[25] T. Luo, K. Kramer, D. Goldgof, L. Hall, S. Samson, A. Remsen, and T. Hopkins. Active learning to recognize multiple types of plankton. In Proceedings of the International Conference on Pattern Recognition (ICPR), volume 3, pages 478–481. IEEE, 2004.

[26] A. Mammone, M. Turchi, and N. Cristianini. Support vector machines. Wiley Interdisciplinary

Reviews: Computational Statistics, 1(3):283–289, 2009.

[27] T. M. Mitchell. Generalization as search. Artiﬁcial intelligence, 18(2):203–226, 1982.

[28] P. Mitra, C. Murthy, and S. Pal. A probabilistic active support vector learning algorithm.

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 26(3):413–418, 2004.

[29] F. Olsson. A literature survey of active machine learning in the context of natural language processing. Technical report, Swedish Institute of Computer Science, 2009.

[30] J. Platt. Fast training of support vector machines using sequential minimal optimization. In

B. Sch¨ lkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector o Learning, chapter 12, pages 185–208. MIT Press, 1999.

15

[31] J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Sch¨ lkopf, and D. Schuurmans, editors, Advances o in Large Margin Classiﬁers, pages 61–74. MIT Press, 1999.

[32] J. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classiﬁcation.

Advances in Neural Information Processing Systems (NIPS), 12(3):547–553, 2000.

[33] J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. Lawrence. Dataset Shift in

Machine Learning. MIT Press, 2009.

[34] J. W. Richards, D. L. Starr, H. Brink, A. A. Miller, J. S. Bloom, N. R. Butler, J. B. James,

J. P. Long, and J. Rice. Active learning to overcome sample selection bias: Application to photometric variable star classiﬁcation. The Astrophysical Journal, 744(2):192, 2012.

[35] R. Rifkin and A. Klautau. In defense of one-vs-all classiﬁcation. Journal of Machine Learning

Research (JMLR), 5:101–141, 2004.

´

[36] S. Salcedo-Sanz, J. L. Rojo-Alvarez, M. Mart´nez-Ram´ n, and G. Camps-Valls. Support vector ı o machines in engineering: an overview. Wiley Interdisciplinary Reviews: Data Mining and

Knowledge Discovery, 4(3):234–267, 2014.

[37] G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In

Proceedings of the International Conference on Machine Learning (ICML), pages 839–846.

Morgan Kaufmann, 2000.

[38] B. Sch¨ lkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, o Optimization, and Beyond. MIT Press, 2002.

[39] B. Settles. Active Learning. Morgan & Claypool, 2012.

[40] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.

[41] X. Shi, W. Fan, and J. Ren. Actively transfer domain knowledge. In Proceedings of the

European Conference on Machine Learning and Knowledge Discovery in Databases (ECML

PKDD), pages 342–357. Springer, 2008.

[42] I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.

[43] S. Tong. Active learning: Theory and applications. PhD thesis, Stanford University, 2001.

[44] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proceedings of the International Conference on Multimedia (MM), pages 107–118. ACM Press,

2001.

[45] S. Tong and D. Koller. Support vector machine active learning with applications to text classiﬁcation. Journal of Machine Learning Research (JMLR), 2:45–66, 2002.

[46] V. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.

[47] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. Representative sampling for text classiﬁcation using support vector machines. In Proceedings of the European Conference on Information

Retrieval (ECIR), pages 393–407. Springer, 2003.

[48] H. Yu. Svm selective sampling for ranking with application to data retrieval. In Proceedings of the International Conference on Knowledge Discovery in Data Mining (SIGKDD), pages

354–363. ACM Press, 2005.

[49] B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the International Conference on Data Mining (ICDM), pages

435–442. IEEE, 2003.

[50] L. Zhao, G. Sukthankar, and R. Sukthankar. Importance-weighted label prediction for active learning with noisy annotations. In Proceedings of the International Conference on Pattern

Recognition (ICPR), pages 3476–3479. IEEE, 2012.

16…...

Premium Essay

...MANAGEMENT DEVELOPMENT INSTITUTE OF SINGAPORE TASHKENT INTERNATIONAL TRADE FINANCE – INDIVIDUAL ASSIGNMENT Course Module Lecturer Due Date Weightage : Bsc (Hons) in Banking and Finance : International Trade Finance : Ms Ratna Devi : 5 November 2012 : 30% Assignment —Individual This assignment for the module International Trade Finance carries 30% of the overall assessment grade. Case under Observation: This is a case study of a U.S. company called Cossco, Inc, depicted in different scenarios 1 through 4, carrying a maximum of 70 marks. Each scenario is presented in a way that encapsulates the topics in your syllabus for International Trade Finance. Students are advised to closely read each scenario and understand the issues faced by the CFO and financial analyst. Recommendations are to be given in a logical and concise manner. __________________________________________________________ Scenario 1 20 Marks Cossco, Inc is a U.S based company that has been incorporated in the United States for three years. It’s a small company with total assets worth $300 million. The company produces a single type of product, golf clubs. Cossco during the boom time, has been quite successful. However, the demand for “IRONS”, the company’s primary product in the United States, has been slowly decreasing since last year. Cossco’s shareholders have been pressuring the company to improve its performance. Cossco produces high-quality golf clubs and employs a unique production......

Words: 1782 - Pages: 8

Free Essay

...ASSIGNMENT ON MANAGEMENT OF TECHNOLOGY RAJESWARI.G ID NO. 2010HZ58075 ASSIGNMENT ON In your opinion, what are some important criteria which the firms should take to increase the quality and productivity of their products and service? Assignment on Productivity Improvement Page 2 Important criteria which the firms should take to increase the quality and productivity of their products and services There are few things to be considered for a Firm to increase the market value/productivity by improving the quality of the product, This is generally can be achieved, By improving the existing product, By introducing new products, Product launching / Advertising the same By improving the services for the newly or existing products, Immediate response for the complaints , querries Cost effective , easy accessible products etc., Let we discuss the points in detail, 4 Quick Steps to Improve an Existing Product You may have a niche marketing website that just isn’t producing sales for you at the rate at which you had hoped it would…..or maybe it isn’t producing any income for you at all or it could be that you haven’t actually figured out that what you are selling is, in fact, a niche market product. You might need to do a little ‘tweaking’ and modify your strategies somewhat to get the site performing better. There really are some things that you can do to improve your existing product. Step #1: Bill Cosby, the famous entertainer, once said, “I don’t know......

Words: 5382 - Pages: 22

Premium Essay

...© Deakin University MPE781/981 ASSIGNMENT, TRIMESTER 3, 2012 ECONOMICS FOR MANAGERS T3.2012 Assignment Due date: Nature: Assignment Overview: Monday, January 28, 2013. Individual assignment. This assignment is partly based on an article published in The Australian on April 26, 2012 entitled “Poor bear brunt of ‘nanny taxes’” by Adam Creighton. The article can be downloaded via the library database Newsbank: http://library.deakin.edu.au/record=e1000139~S1. For your convenience, the article is also attached to this assignment. Please read the article carefully before attempting the questions. You will be required to demonstrate your understanding of concepts taught in the unit and relate them to the case in the article. This assignment is designed to encourage you to think about the application of concepts learned in this unit to real world scenarios. Although you can work in groups, this is not a group assignment and you must submit answers individually. You will be graded on your use of appropriate economic theory and concepts, clarity of exposition and overall quality of your answers. Your answers should follow “Guide to assignment writing and referencing”, available at this link: http://www.deakin.edu.au/currentstudents/assets/resources/study-support/study-skills/assignref.pdf. Answer all questions. Limit the total word count of your assignment to less than 3,000 words. Depth is encouraged over breadth: that is, it is more important that you demonstrate you......

Words: 2372 - Pages: 10

Premium Essay

...FST-01 ASSIGNMENT BOOKLET Foundation Course in Science and Technology Bachelor’s Degree Programme (BDP) (Valid from 1st July, 2013 to 31st March, 2014) It is compulsory to submit the assignment before filling in the exam form School of Sciences Indira Gandhi National Open University New Delhi (2013-14) Dear Student, We hope, you are familiar with the system of evaluation to be followed for the Bachelor's Degree Programme. At this stage you may probably like to re-read the section on assignments in the Programme Guide that was sent to you after your enrolment. A weightage of 30 percent, as you are aware, has been earmarked for continuous evaluation, which would consist of one tutor-marked assignment for this course. This assignment is based on all Blocks of this course i.e. Block 1-8. Instructions for Formatting Your Assignments Before attempting the assignments, please read the following instructions carefully: 1. On top of the first page of your answer sheet, please write the details exactly in the following format: ENROLMENT NO.: …………………... NAME: …………………........................ ADDRESS: …………………................. ……………………………………… COURSE CODE COURSE TITLE : ………………………………. : ……………………………….. ASSIGNMENT NO. : ………………………………... STUDY CENTRE : ……………………………..… (NAME AND CODE) PLEASE FOLLOW THE ABOVE FORMAT STRICTLY TO FACILITATE EVALUATION AND TO AVOID DELAY. 2. 3. 4. 5. 6. Use only foolscap size writing paper (but not of very thin variety) for writing your......

Words: 760 - Pages: 4

Premium Essay

...ASSIGNMENT MARKETING MANAGEMENT |CASE |INDIVIDUAL ASSIGNMENT |GROUP ASSIGNMENT | | |(Please select 1 question for your individual |(Please select 2 questions for your group assignment | | |assignment regardless how many questions are given in |regardless how many questions are given in each case) | | |each case) | | | |Submission deadline: JANUARY 4TH, 2014 | | |Submission to: ANHDANGLUCKY@GMAIL.COM | |ZENITH |What would you do to improve the reliability of the |What managerial and research problems that face Zenith?| | |market research, if you were the top management of the | | | |company? |Where could Zenith get the relevant information? | | |What would the best way be to discover the market |What would you do if you were Pearlman? | | |demand of the new product of Zenith? | ...

Words: 693 - Pages: 3

Free Essay

...FACULTY OF EDUCATION AND LANGUAGES OUMH1303: ENGLISH FOR ORAL COMMUNICATION SEMESTER : JANUARY 2010 COURSE ASSIGNMENT (35%) INSTRUCTIONS 1. The assignment would be evaluated on the basis of the accuracy of the answers given and the credibility of the supporting arguments, data and references. 2. Type your assignment using “Times New Roman”, font size 12 with 1.5 line spacing on A4 size paper. Your assignment must be submitted to your tutor before or by the 4th tutorial. 3. 4. Your assignment should be limited to 10 – 12 pages. 5. This assignment is worth 35 % of your final course grade. 6. Plagiarism in any form is prohibited. Plagiarised materials would not be accepted and zero (0) marks would be awarded for the work. 1 ASSIGNMENT QUESTION You are the president of the Parent-Teacher Association (PTA) at an urban school. At the last association meeting, many parents expressed their concern about the poor performance of their children, particularly in Mathematics, Science and the English Language. They felt that the school should work harder towards improving the teaching and learning of these subjects. The PTA could assist but the association does not have enough funds (money) to carry out its projects for the school. You wish to speak about this problem and suggest some solutions at the forthcoming meeting. (a) Which of the following speech types will best describe your speech: informative, persuasive, negotiation, or argumentative......

Words: 626 - Pages: 3

Premium Essay

...FINAL ASSIGNMENT |Programme Title |Edexcel BTEC Level 5 HND Diploma in Business (QCF) | |Unit Title |Marketing Principles | |Unit Code |F/601/0556 | |Assignment No |01 | |Level |Level-5 HND | |Credit value |15 credits | |Assessor | | |Deliverer | | |Handout Date | | |Hand in Date |31/07/2014 | Assignment Title: Making Marketing Decisions You have been......

Words: 1867 - Pages: 8

Premium Essay

...ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD (Department of Business Administration) Course: Human Resource Management (5532) Level: MBA Semester: Autumn, 2010 CHECKLIST This packet comprises the following material: 1) 2) 3) 4) 5) Note: Text book Assignments # 1 & 2 Course outlines Assignment 6 forms (2 sets) Assignment submission schedule In this packet, if you find anything missing out of the above-mentioned material, please contact at the address given below: The Mailing Officer Mailing Section, Block # 28 Allama Iqbal Open University, Sector H/8, Islamabad. Tel: (051) 9057611, 9057612 Mohammad Majid Mahmood Bagram Course Coordinator ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD (Department of Business Administration) WARNING 1. 2. PLAGIARISM OR HIRING OF GHOST WRITER(S) FOR SOLVING THE ASSIGNMENT(S) WILL DEBAR THE STUDENT FROM AWARD OF DEGREE/CERTIFICATE, IF FOUND AT ANY STAGE. SUBMITTING ASSIGNMENTS BORROWED OR STOLEN FROM OTHER(S) AS ONE’S OWN WILL BE PENALIZED AS DEFINED IN “AIOU PLAGIARISM POLICY”. Course: Human Resource Management (5532) Level: MBA Semester: Autumn, 2010 Total Marks: 100 Pass Marks: 40 ASSIGNMENT No. 1 (Units: 1–4) Q. 1 Why HR is called the most important asset and competitive advantage of any organization in the world? (20) Your Solutions 2 Helping Material HR and Competitive Advantage In order to have an effective competitive strategy, the company must have one or more competitive advantage, factors that allow......

Words: 5443 - Pages: 22

Free Essay

...and uniform specifications. 1046.2 WEARING AND CONDITION OF UNIFORM AND EQUIPMENT Police employees wear the uniform to be identified as the law enforcement authority in society. The uniform also serves an equally important purpose to identify the wearer as a source of assistance in an emergency, crisis or other time of need. (a) Uniform and equipment shall be maintained in a serviceable condition and shall be ready at all times for immediate use. Uniforms shall be neat, clean, and appear professionally pressed. All peace officers of this department shall possess and maintain at all times, a serviceable uniform and the necessary equipment to perform uniformed field duty. Personnel shall wear only the uniform specified for their rank and assignment. The uniform is to be worn in compliance with the specifications set forth in the department's uniform specifications that are maintained separately from this policy. All supervisors will perform periodic inspections of their personnel to ensure conformance to these regulations. Civilian attire shall not be worn in combination with any distinguishable part of the uniform. Uniforms are only to be worn while on duty, while in transit to or from work, for court, or at other official department functions or events. If the uniform is worn while in transit, an outer garment shall be worn over the uniform shirt so as not to bring attention to the employee while he/she is off duty. Employees are not to purchase or drink alcoholic beverages......

Words: 2063 - Pages: 9

Premium Essay

...SUNWAY COLLEGE JOHOR BAHRU DIPLOMA IN BUSINESS ADMINISTRATION COURSEWORK / ASSIGNMENT (GROUP) Module Code Module Title Semester Issue Date Due Date Lecturer : BMGT 0304 : HUMAN RESOURCE MANAGEMENT : MARCH-JUNE 2015 : WEEK 2 : WEEK 6 (30th April 2015) : ANTHONY WONG INSTRUCTIONS 1. 2. 3. 4. 5. 6. There are SEVEN (7) pages in this assignment including the cover page. The assignment must be completed in groups as per instruction. The submitted assignment must include the Assignment Cover Page. References must be acknowledged accordingly. Plagiarism/cheating will result in the assignment being marked FAIL. The assignment must be submitted in hardcopy (printed) format and presented. IMPORTANT Assignments must be submitted on their due dates. If an assignment is submitted after its due date, the following penalisation will be imposed: ● ● ● One to two days late Three to five days late More than five days late 20% deducted from the total assignment marks 40% deducted from the total assignment marks Assignment will not be marked. 1 INTRODUCTION This assignment is a partial fulfillment of requirements leading to Diploma in Hotel Management/Business Admin for students taking a subject in Human Resource Management. The assignment will be done by students in suitable group size which approved by the lecturer. PURPOSES The purposes of this assignment are to assess a student’s ability to: 1. Understand the basic concepts or theories learned in the subject matter.......

Words: 1354 - Pages: 6

Premium Essay

...new water feature. Please ensure that you address all the criteria contained within the ‘Assessment Task 1 Marking Sheet’. This assignment will be marked generally in accordance with the Marking Sheet where marks are deducted for non-conformities. Please be aware that simply mentioning the marking criteria/addressing it in a half-sentence or similar does not guarantee full marks; the thoroughness and completeness of how the marking criteria are addressed will determine how many marks for each separate criterion will be awarded; this can be either the full mark, or parts thereof. Furthermore, it is vital that you familiarise yourself with precisely what a Risk Assessment and Risk Treatment incorporate for ISO 31,000 – due to the word count limit, it is advisable to not deviate from the task given, while meeting the marking criteria. Word count limit: The body of this assignment will be in the range of 3000 to 5000 words, excluding any Appendices. You may need to simplify and define the boundaries carefully in order to achieve the word limit. A single hard copy will be submitted in class. The assignment must also be uploaded to the 49006 Turnitin folder within UTSOnline before the due date. Please make sure to submit the complete assignment including Cover Page, Table of Contents, Reference list and Appendices. Emailed assignments will not be accepted. Be aware that several students have fallen foul, in previous semesters, of the sophisticated systems in......

Words: 831 - Pages: 4

Free Essay

...Assignment front sheet Qualification Unit number and title Pearson BTEC Level 5 HND Diploma Business Unit 1: Business Environment Learner name Assessor name Nour Hawarneh Date issued Completion date Submitted on Nov 08, 2015 Jan 18, 2016 Assignment title Your company’s environment LO2 LO3 Assessment Criteria In this assessment you will have the opportunity to present evidence that shows you are able to: 1.1 organisational purposes of businesses Identify the purposes of different types of organisation 1 1.2 Describe the extent to which an organisation meets the objectives of different stakeholders 1 1.3 LO1 Learning outcome Understand the Learning Outcom e Explain the responsibilities of an organisation and strategies employed to meet them 2.1 Explain how economic systems attempt to allocate resources effectively 2 2.2 Assess the impact of fiscal and monetary policy on business organisations and their activities Evaluate the impact of competition policy and other regulatory mechanisms on the activities of a selected organisation 2 Understand the nature of the national environment in which businesses operate Understand the behaviour for oganisations in their market environment 2.3 3.1 LO4 2 Illustrate the way in which market forces shape organisational responses using a range of examples Judge how the business and cultural environments shape the behaviour of a selected organisation 3.3 Be able to assess the significance of the......

Words: 1690 - Pages: 7

Premium Essay

...|Assignment brief – QCF BTEC (L3 ONLY) | |Assignment front sheet | |Qualification |Unit number and title | |BTEC L3 Diploma/Ext. Dipl. – Business |UNIT 1 – BUSINESS ENVIRONMENT | |Learner name | Assessor name | | |MARY EC ZAFRA | |Date issued | Hand in deadline |Submitted on | |14 OCTOBER 2015 | 15 November 2015 |18NOV2015 | | | | |Assignment No. & title |Assignment 1/2 - The Businesses We See | |In this assessment you will have opportunities to provide......

Words: 952 - Pages: 4

Premium Essay

...Application Exercise (Assignment to be submitted) (90 min.) (Not exceeding five pages) |Apply the five forces analysis to your company/division and assess the attractiveness of your industry. | | |Compare the industry attractiveness five years back and today due to the shift in the forces. | | |Guidelines for the assignment | | | | | |Brief introduction of your company, its product portfolio and the markets/segments it caters to. If it | | |is in multiple industries, choose any one industry for the purpose of this assignment. (refer to 2008 | | |HBS note for definition of industry) | | | | | |Consider each threat individually. Take each factor in it and explain its role and significance in your | | |industry. Rate its effect based on the above explanation. (template if shown in class may be used for | | |structured approach but the spirit of the analysis matters more than......

Words: 254 - Pages: 2

Premium Essay

...and analysed to present their financial standing at the end of the year. Comparing the relationships of these ratios reveals that Orica decreased their liquidity, but suffered lower profitability with heavy influences from Minova’s impairment of goodwill. With high asset utilisation and stable efficiency, Orica should focus on improving their maintenance and reliability by addressing the Kooragang plant shutdown. Furthermore, Orica is financing more with debt than equity which introduces some risk to the company given their higher expenses for 2012. Finally, the investment ratios indicate that Orica could be poised for high growth with a stable return, but should first focus on maximising their plants and equipment. Sample Assignment: Part of the content removed II TABLE OF CONTENTS Executive Summary ............................................................................................................................. II Table of Contents ............................................................................................................................... III A. List of Tables ...............................................................................................................................IV 1. Introduction ................................................................................................................................... 1 2. Ratios....................................................................

Words: 2155 - Pages: 9