Free Essay


In: Science

Submitted By melosalz
Words 9180
Pages 37
Active Learning with Support Vector Machines

Kim Steenstrup Pedersen
Department of Computer Science
University of Copenhagen
2200 Copenhagen, Denmark Jan Kremer
Department of Computer Science
University of Copenhagen
2200 Copenhagen, Denmark Christian Igel
Department of Computer Science
University of Copenhagen
2200 Copenhagen, Denmark Abstract
In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels
(e.g., human annotations or results from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classifiers are particularly well-suited for active learning due to their convenient mathematical properties. They perform linear classification, typically in a kernel-induced feature space, which makes measuring the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efficiently estimate how strongly learning from a data point influences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs.



In many applications of supervised learning in data mining, huge amounts of unlabeled data samples are cheaply available while obtaining their labels for training a classifier is costly. To minimize labeling costs, we want to request labels only for potentially informative samples. These are usually the ones that we expect to improve the accuracy of the classifier to the greatest extent when used for training. Another consideration is the reduction of training time. Even when all samples are labeled, we may want to consider only a subset of the available data because training the classifier of choice using all the data might be computationally too demanding. Instead of sampling a subset uniformly at random, which is referred to as passive learning, we would like to select informative samples to maximize accuracy with less training data. Active learning denotes the process of autonomously selecting promising data points to learn from. By choosing samples actively, we introduce a selection bias. This violates the assumption underlying most learning algorithms that training and test data are identically distributed: an issue we have to address to avoid detrimental effects on the generalization performance. 1

In theory, active learning is possible with any classifier that is capable of passive learning. This review focuses on the support vector machine (SVM) classifier. It is a state-of-the-art method, which has proven to give highly accurate results in the passive learning scenario and which has some favorable properties that make it especially suitable for active learning: (i) SVMs learn a linear decision boundary, typically in a kernel-induced feature space. Measuring the distance of a sample to this boundary is straightforward and provides an estimate of its informativeness. (ii)
Efficient online learning algorithms make it possible to obtain a sufficiently accurate approximation of the optimal SVM solution without retraining on the whole dataset. (iii) The SVM can weight the influence of single samples in a simple manner. This allows for compensating the selection bias that active learning introduces.


Active Learning

In the following we focus on supervised learning for classification. There also exists a body of work on active learning with SVMs in other settings such as regression [15] and ranking [8, 48]. A discussion of these settings is, however, beyond the scope of this article.
The training set is given by L = {(x1 , y1 ), ..., (x , y )} ⊂ X × Y. It consists of labeled samples that are drawn independently from an unknown distribution D. This distribution is defined over
X × Y, the cross product of a feature space X and a label space Y, with Y = {−1, 1} in the binary case. We try to infer a hypothesis f : X → Z mapping inputs to a prediction space Z for predicting the labels of samples drawn from D. To measure the quality of our prediction, we define a loss function L : Z × X × Y → R+ . Thus, our learning goal is minimizing the expected loss
R(f ) = E(x,y)∼D L(f (x), x, y) ,


which is called the risk of f . We call the average loss over a finite sample L the training error or empirical risk. If a loss function does not depend on the second argument, we simply omit it.
In sampling-based active learning, there are two scenarios: stream-based and pool-based. In streambased active learning, we analyze incoming unlabeled samples sequentially, one sample at a time.
Contrary, in pool-based active learning we have access to a pool of unlabeled samples at once. In this case, we can rank samples based on a selection criterion and query the most informative ones.
Although, some of the methods in this review are also applicable to stream-based learning, most of them consider the pool-based scenario. In the case of pool-based active learning, we have, in addition to the labeled set L, access to a set of m unlabeled samples U = {x +1 , ..., x +m }. We assume that there exists a way to provide us with a label for any sample from this set (the probability of the label is given by D conditioned on the sample). This may involve labeling costs, and the number of queries we are allowed to make may be restricted by a budget. After labeling a sample, we simply add it to our training set.
In general, we aim at achieving a minimum risk by requesting as few labels as possible. We can estimate this risk by computing the average error over an independent test set not used in the training process. Ultimately, we hope to require less labeled samples for inferring a hypothesis performing as well as a hypothesis generated by passive learning on L and completely labeled U.
In practice, one can profit from an active learner if only few labeled samples are available and labeling is costly, or when learning has to be restricted to a subset of the available data to render the computation feasible. A list of real-world applications is given in the general active learning survey [39] and in a review paper which considers active learning for natural language processing [29].
Active learning can also be employed in the context of transfer learning [41]. In this setting, samples from the unlabeled target domain are selected for labeling and included in the source domain.
A classifier trained on the augmented source dataset can then exploit the additional samples to increase its accuracy in the target domain. This technique has been used successfully, for example, in an astronomy application [34] to address a sample selection bias, which causes source and target probability distributions to mismatch [33].


Support Vector Machine

Support vector machines (SVMs) are state-of-the-art classifiers [6, 12, 26, 36, 38, 40]. They have proven to provide well-generalizing solutions in practice and are well understood theoretically [42].
The kernel trick [38] allows for an easy handling of diverse data representations (e.g., biological sequences or multimodal data). Support vector machines perform linear discrimination in a kernelinduced feature space and are based on the idea of large margin separation: they try to maximize the distance between the decision boundary and the correctly classified points closest to this boundary.
In the following, we formalize SVMs to fix our notation, for a detailed introduction we refer to the recent WIREs articles [26, 36].
An SVM for binary classification labels an input x according to the sign of a decision function of the form f (x) = w, φ(x) + b =

αi yi κ(xi , x) + b ,



where κ is a positive semi-definite kernel function [1] and φ(x) is a mapping X → F to a kernelinduced Hilbert space F such that κ(xi , xj ) = φ(xi ), φ(xj ) . We call F the feature space, which includes the weight vector w = i=1 αi yi φ(xi ) ∈ F. The training patterns xi with αi > 0 are called support vectors. The decision boundary is linear in F and the offset from the origin is controlled by b ∈ R.
The distance of a pattern (x, y) from the decision boundary is given by |f (x)/ w |. We call yf (x) the functional margin and yf (x)/ w the geometric margin: a positive margin implies correct classification. Let us assume that the training data L is linearly separable in F. Then m(L, f ) = min(x,y)∈L yf (x) defines the margin of the whole data set L with respect to f (in the following we do not indicate the dependency on L and f if it is clear from the context). We call the feature space region {x ∈ X | |f (x)| ≤ 1} the margin band [9].
A hard margin SVM computes the linear hypothesis that separates the data and yields a maximum margin by solving max w,b,γ



subject to yi ( w, φ(xi ) + b) ≥ γ w =1


i = 1, . . . ,

with w ∈ F, b ∈ R and γ ∈ R [40]. Instead of maximizing γ and keeping the norm of w fixed to one, one can equivalently minimize w and fix a target margin, typically γ = 1.
In general, we cannot or do not want to separate the full training data correctly. Soft margin SVMs mitigate the concept of large margin separation. They are best understood as the solutions of the regularized risk minimization problem min w,b

1 w 2



Ci Lhinge ( w, φ(xi ) + b, yi ) .



Here, Lhinge (f (xi ), yi ) = max(0, 1 − yi f (xi )) denotes the hinge loss. An optimal solution w∗ =

∗ i=1 αi yi φ(xi ) has the property that 0 ≤ αi ≤ Ci for i = 1, . . . , . For soft margin SVMs, the patterns in L need not be linearly separable in F. If they are, increasing the Ci until an optimal

solution satisfies αi < Ci for all i leads to the same hypothesis as training a hard margin SVM.
Usually, all samples are given the same weight Ci = C.


Uncertainty Sampling

It seems to be intuitive to query labels for samples that cannot be easily classified using our current classifier. Consider the contrary: if we are very certain about the class of a sample, then we might regard any label that does not reflect our expectation as noise. On the other hand, uncovering an expected label would not make us modify our current hypothesis.

Uncertainty sampling was introduced by Lewis and Gale [23]. The idea is that the samples the learner is most uncertain about provide the greatest insight into the underlying data distribution.
Figure 1 shows an example in the case of an SVM. Among the three different unlabeled candidates, our intuition may suggest to ask for the label of the sample closest to the decision boundary: the labels of the other candidates seem to clearly match the class of the samples on the respective side or are otherwise simply mislabeled. In the following, we want to show how this intuitive choice can be justified and how it leads to a number of active learning algorithms that make use of the special properties of an SVM.

Figure 1: The three rectangles depict unlabeled samples while the blue circles and orange triangles represent positively and negatively labeled samples, respectively. Intuitively, the label of the sample xa might tell us the most about the underlying distribution of labeled samples, since in the feature space, φ(xa ) is closer to the decision boundary than φ(xb ) or φ(xc ).


Version Space

The version space is a construct that helps to keep track of all hypotheses that are able to perfectly classify our current observations [27]. Thus, for the moment, we assume that our data are linearly separable in the feature space. The idea is to speed up learning by selecting samples in a way that minimizes the version space rapidly with each labeling.
We can express the hard margin SVM classifier (3) in terms of a geometric representation of the version space. For this purpose, we restrict our consideration to hypotheses f (x) = w, φ(x) without bias (i.e., b = 0). The following results, however, can be extended to SVMs with b = 0.
The version space V(L) refers to the subset of F that includes all hypotheses consistent with the training set L [27]:
V(L) := w ∈ F | w = 1, yf (x) > 0, ∀(x, y) ∈ L


In this representation, we can interpret the hypothesis space as the unit hypersphere given by w =
1. The surface of the hypersphere includes all possible hypotheses classifying samples that are mapped into the feature space F. We define Λ(V) as the area the version space occupies on the surface of this hypersphere. This is depicted in Figure 2. The hypothesis space is represented by the big sphere. The white front, which is cut out by the two hyperplanes, depicts the version space.
Each sample x can be interpreted as defining a hyperplane through the origin of F with the normal vector φ(x). Each hyperplane divides the feature space into two half-spaces. Depending on the label y of the sample x, the version space is restricted to the surface of the hypersphere that lies on the respective side of the hyperplane. For example, a sample x that is labeled with y = +1, restricts the version space to all w on the unit hypersphere for which w, φ(x) > 0, i.e., the ones that lie on the positive side of the hyperplane defined by the normal vector φ(x). Thus, the version space is defined by the intersection of all half-spaces and the surface of the hypothesis hypersphere.
Figure 2a illustrates this geometric relationship.

(a) The sphere depicts the hypothesis space. The two hyperplanes are induced by two labeled samples. The version space is the part of the sphere surface (in white) that is on one side of each hyperplane. The respective side is defined by its label.

(b) The center (in black) of the orange sphere depicts the SVM solution within the version space.
It has the maximum distance to the hyperplanes delimiting the version space. The normals of these hyperplanes, which are touched by the orange sphere, correspond to the support vectors.

Figure 2: Geometric representation of the version space in 3D following Tong [43].

If we consider a feature mapping with the property ∀x, z ∈ X : φ(x) = φ(z) , such as normalized kernels, including the frequently used Gaussian kernel, then the SVM solution (3) has a particularly nice geometric interpretation in the version space. Under this condition, the decision function f (x) = w, φ(x) maximizing the margin m(L, f ) (i.e., the minimum y w, φ(x) over L) also maximizes the minimum distance between w and any hyperplane defined by a normal φ(xi ), i = 1, . . . , . The solution is a point within the version space, which is the center of a hypersphere, depicted in orange in Figure 2b. This hypersphere yields the maximum radius possible without intersecting the hyperplanes that delimit the version space. The radius is given by r = y w,φ(x) , φ(x) where φ(x) is any support vector. Changing our perspective, we can interpret the normals φ(xi ) of the hyperplanes touching the hypersphere as points in the feature space F. Then, these are exactly the support vectors since they have the minimum distance to our decision boundary defined by w.
This distance is m(L, f ), which turns out to be proportional to the radius r.

Implicit Version Space

An explicit version space, as defined above, only exists if the data are separable, which is often not the case in practice. The Bayes optimal solution need not have vanishing training error (as soon as under D we have p(y1 |x) ≥ p(y2 |x) > 0 for some x ∈ X and y1 , y2 ∈ Y with y1 = y2 ). Thus, minimizing the version space might exclude hypotheses with non-zero training error that are in fact optimal. In agnostic active learning [2], we do not make the assumption of an existing optimal zeroerror decision boundary. An algorithm that is theoretically capable of active learning in an agnostic setting is the A2 -algorithm [2]. Here, a hypothesis cannot be deleted due to its disagreement with a single sample. If, however, all hypotheses that are part of the current version space agree on a region within the feature space, this region can be discarded. For each hypothesis the algorithm keeps an upper and a lower bound on its training error (see the work by Balcan et al. [2] for details).
It subsequently excludes all hypotheses which have a lower bound that is higher than the global minimal upper bound. Despite being intractable in practice, this algorithm forms the basis of some important algorithms compensating the selection bias as discussed below.

Uncertainty-Based Active Learning

Although the version space is restricted to separable problems, it motivates many general active selection strategies. Which samples should we query to reduce the version space? As we have seen previously, each labeled sample that becomes a support vector restricts the version space to one side of the hyperplane it induces in F. If we do not know the correct label of a sample in advance, we

(a) Simple Margin will query the sample that induces a hyperplane lying closest to the SVM solution. In this case, it would query sample xa .

(b) Here the SVM does not provide a good approximation of the version space area. Simple
Margin would query sample xa while xc might have been a more suitable choice.

Figure 3: The version space area is shown in white, the solid lines depict the hyperplanes induced by the support vectors, the center of the orange circle is the weight vector w of the current SVM. The dotted lines show the hyperplanes that are induced by unlabeled samples. This visualization is inspired by Tong [43].

should always query the sample that ideally halves the version space. This is a safe choice as we will reduce it regardless of the label. Computing the version space in a high-dimensional feature space is usually intractable, but we can approximate it efficiently using the SVM. In the version space, the
SVM solution w is the center of the hypersphere touching the hyperplanes induced by the support vectors. Each hyperplane delimits the version space. Assuming that the center of this hypersphere is close to the center of the version space, we can use it as an approximation. If we now choose a hyperplane that is close to this center, we approximately bisect the version space. Therefore, we want to query the sample x that induces a hyperplane as close to w as possible:
x = argmin | w, φ(x) | = argmin |f (x)|



This strategy queries the sample closest to the current decision boundary and is called Simple Margin [43]. Figure 3 shows this principle geometrically, projected to two dimensions.
By querying the samples closest to the separating hyperplane, we try to minimize the version space by requesting as few labels as possible. However, depending on the actual shape of the version space, the SVM solution may not provide a good approximation and another query strategy would have achieved a greater reduction of the version space area. This is illustrated in Figure 3b. The strategy of myopically querying the samples with the smallest margin may even perform worse than a passive learner.
Therefore, we can choose a different heuristic to approximate the version space more accurately [37,
45]. For instance, we could compute two SVMs for each sample: one for the case we labeled it positively and one assuming a negative label. We can then, for each case, compute the margins m+ = + w, φ(x) and m− = − w, φ(x) . Finally, we query the sample which gains the maximum value for min(m+ , m− ). This quantity will be very small if the corresponding version spaces are very different. Thus, we take the maximum to gain an equal split. This strategy is called MaxMin
Margin [43] and allows us to make a better choice in case of an irregular version space area, as we can see in Figure 4. This, however, comes with the additional costs of computing the margin for each potential labeling.
Uncertainty sampling can also be motivated by trying to minimize the training error directly [9].
Depending on the assumptions made, we can arrive at different strategies. Considering the classifier has just been trained on few labeled data, we assume the prospective labels of the yet unlabeled data to be uncorrelated with the predicted labels. Therefore, we want to select the sample for which we can expect the largest error, namely
max(0, 1 − f (x)) + max(0, 1 + f (x)) ,
x = argmax
2 x∈U 6

Figure 4: In this case, the MaxMin Margin strategy would query sample xc . Each of the two orange circles correspond to an SVM trained with a positive and a negative labeling of xc [43].

where we assume the hinge loss and f (x) as defined in (2). Assuming a separable dataset, we are only interested in uncertain samples, i.e., those within the margin band. Under these constraints, any choice of x leads to the same value of the objective (7): we select the sample at random in this case. However, as soon as some labeled samples are available for SVM training, the prediction of the SVM for an unlabeled point x is expected to be positively correlated with the label of x. Thus, we assume correct labeling and look for each sample at the minimum error we gain regardless of the labeling. We want to find the sample that maximizes this quantity, i.e., x = argmax min max(0, 1 − f (x)), max(0, 1 + f (x)) = argmin |f (x)| ,



which gives us the same selection criterion as in (6), the Simple Margin strategy. If all unlabeled samples meet the target margin (i.e., the hinge loss of the samples is zero) and if we assume the
SVM labels them correctly (i.e., |f (x)| ≥ 1), it seems that we have arrived at a hypothesis that already generalizes well. Both, picking samples near or far away from the boundary appears to be non-optimal. Therefore, we simply proceed by choosing a random sample from the unlabeled data.
In practice, we may start by training on a random subset and perform uncertainty sampling until the user stops the process or until all unlabeled samples meet the target margin. In this case, we query another random subset as a validation set and estimate the error. We may repeat the last step until we reach a satisfactory solution.

Expected Model Change

Instead of trying to minimize an explicit or implicit version space, we can find the most informative sample by selecting it based on its expected effect on the current model, the expected model change.
In gradient-based learning, this means selecting the sample x ∈ U that, if labeled, would maximize the expected gradient length, where we take the expectation Ey over all possible labels y.
Non-linear SVMs are usually trained by solving the underlying quadratic optimization problem in its Wolfe dual representation [12, 38]. Let us assume SVMs without bias. If we add a new sample
(x +1 , y +1 ) to the current SVM solution and initialize its coefficient with α +1 = 0, the partial derivative with respect to α +1 of the dual problem W (α) to be maximized is g +1


∂W (α)
∂α +1

αi yi κ(xi , x


+1 )


+1 f (x +1 )




As αi is constrained to be non-negative, we only change the model if the partial derivative is positive, that is, if y +1 f (x +1 ) < 1. Note that y +1 f (x +1 ) > 1 implies that (x +1 , y +1 ) is already correctly classified by the current model and meets the target margin.

Let us assume that our current model classifies any sample perfectly and that the dataset is linearly separable in the feature space:

p(y|x) =
If we just consider the partial derivative g arrive at selecting


if y f (x) > 0 otherwise .


in the expected model change selection criterion, we

x = argmax p(y = 1|x)|1 − f (x)| + p(y = −1|x)|1 + f (x)|

= argmax p(y = 1|x)(1 − f (x)) + p(y = −1|x)(1 + f (x)) x∈U = argmax x∈U 1 − f (x) if f (x) > 0
= argmin |f (x)| .
1 + f (x) if f (x) < 0 x∈U (11)

Thus, uncertainty sampling can also be motivated by maximizing the expected model change. Next, we want to have a look at approaches that try to exploit the uncertainty of samples and simultaneously explore undiscovered regions of the feature space.


Combining Informativeness and Representativeness

By performing mere uncertainty sampling, we may pay too much attention to certain regions of the feature space and neglect other regions that are more representative of the underlying distribution.
This leads to a sub-optimal classifier. To counteract this effect, we could sample close to the decision boundary, but also systematically include samples that are farer away [18, 20].
In Figure 5, we see an example where uncertainty sampling can mislead the classifier. Although φ(xa ) is closest to the separating hyperplane, it is also far away from all other samples in feature space and thus may not be representative of the underlying distribution. To avoid querying outliers, one would like to select samples not only based on their informativeness, but also based on representativeness [13]. In our example, selecting sample xb would be a better choice, because it is located in a more densely populated region where a correct classification is of more importance to gain an accurate model.

Figure 5: The white rectangles depict unlabeled samples. The blue circle and the orange triangle are labeled as positive and negative, respectively. In feature space, φ(xa ) lies closer to the separating hyperplane than φ(xb ), but is located in a region, which is not densely populated. Using pure uncertainty sampling, e.g., Simple
Margin, we would query the label of sample xa .

Informativeness is a measure of how much querying a sample would reduce the uncertainty of our model. As we have seen, uncertainty sampling is a viable method to exploit informativeness. Representativeness measures how well a sample represents the underlying distribution of unlabeled data [39]. By using a selection criterion that maximizes both measures, we try to improve our models with less samples than a passive learner while carefully avoiding a model that is too biased. In

Figure 6 we can see a comparison of the different strategies using a toy example where we sequentially query six samples. Figure 6a shows a biased classifier as the result of uncertainty sampling.
While the solution in Figure 6b is closer to the optimal hyperplane, it also converges slower, as it additionally explores regions where we are relatively certain about the labels. Combining both strategies, as shown in Figure 6c, yields a decision boundary that is close to the optimal (Figure 6d) with fewer labels.

(a) Uncertainty sampling.

(b) Selecting representative samples.

(c) Combining informativeness and representativeness.

(d) Optimal hyperplane, obtained by training on the whole dataset.

Figure 6: Binary classification with active learning on six samples and passive learning on the full dataset.


Semi-Supervised Active Learning

To avoid oversampling unrepresentative outliers, we can combine uncertainty sampling and clustering [47]. First, we train an SVM on the labeled samples and then apply k-means clustering to all unlabeled samples within the margin band to identify k groups. Finally, we query the k medoids.
As only samples within the margin band are considered, they are all subject to high uncertainty.
The clustering ensures that the informativeness of our selection is increased by avoiding redundant samples. However, when using clustering, one has to decide what constitutes a cluster [14]. Depending on the scale and the selected number of clusters, different choices could be equally plausible. To avoid this dilemma, we can choose another strategy to incorporate density information [22]. We build on the min-max formulation [21] of active learning and request the sample x = argmin
xs ∈U



ys ∈{−1,+1} w,b

1 w 2



L(f (x), x, y)


where Ls = L ∪ (xs , ys ). We take the minimum regularized expected risk when including the sample xs ∈ U with the label ys that yields the maximum error. Selecting the sample x minimizing
this quantity can be approximated by uncertainty sampling (e.g., using Simple Margin).
In this formulation, however, we base our decision only on the labeled samples and do not take into account the distribution of unlabeled samples. Assuming we knew the labels for each sample in U, we define the set Lsu containing the labeled samples (x, y) for x ∈ U and the training set L. We select the sample x = argmin
xs ∈U




yu ∈{±1}nu −1 ys ∈{−1,+1} w,b

1 w 2



L(f (x), x, y) ,



where nu = |U| and yu is the label vector assigned to the samples x ∈ U without the label for xs .
Thus, we also maximize the representativeness of the selected sample by incorporating the possible labelings of all unlabeled samples. By using a quadratic loss-function and relaxing yu to continuous values, we can approximate the solution through the minimization of a convex problem [22].
Both clustering and label estimation are of high computational complexity. A simpler algorithm, which the authors call Hinted SVM, considers unlabeled samples without resorting to these techniques [24]. Instead, the unlabeled samples are taken as so-called hints that inform the algorithm of feature space regions which it should be less confident in. To achieve this, we try to simultaneously find a decision boundary that produces a low training error on the labeled samples while being close to the unlabeled samples, the hints. This can be viewed as semi-supervised learning [10]. It is, however, in contrast to typical semi-supervised SVM approaches that push the decision boundary away from the pool of unlabeled samples.
The performance of this algorithm depends on the hint selection strategy. Using all unlabeled samples might be too costly for large datasets while uniform sampling of the unlabeled pool does not consider the information provided by labeled samples. Therefore, we can start with the pool of all unlabeled samples and iteratively drop instances that are close to already labeled ones. When the ratio of hints to all samples is below a certain threshold, we can switch to uncertainty sampling.
Another problem with uncertainty sampling is that it assumes our current hypothesis to be very certain about regions far from the decision boundary. If this assumption is violated, we will end up with a classifier worse than obtained using passive learning. One way to address this issue is to measure the uncertainty of our current hypothesis and to adjust our query strategy accordingly [28].
To achieve this, we compute a heuristic measure expressing the confidence that the current set of support vectors will not change if we train on more data. It is calculated as c= 2
|LSV | · k

+ − min(kx , kx ) ,




where LSV are the support vectors and kx and kx are the number of positively and negatively labeled
− samples within the k nearest neighbors of (x, y) ∈ LSV . In the extremes, we get c = 1 if kx = kx
− and c = 0 if kx = 0 ∨ kx = 0 for all (x, y) ∈ LSV . We can use this measure to decide whether a labeled data point (x, y) should be kept for training by adding it with probability

p(x) =

c if yf (x) ≤ 1
1 − c otherwise.


to the training data set. This means that more samples are queried within the margin if we are very confident that the current hyperplane represents the optimal one. In the following, we discuss a related idea, which not only queries samples with a certain probability, but also subsequently incorporates this probability to weight the impact of the training set samples.

Importance-Weighted Active Learning

When we query samples actively instead of selecting them uniformly at random, the training and test samples are not independent and identically distributed (i.i.d.). Thus, the training set will have a sample selection bias. As most classifiers rely on the i.i.d. assumption, this can severely impair the prediction performance.

Assume that we sample the training data points from the biased sample distribution D over X × Y, while our goal is minimizing the risk (1) with respect to the true distribution D. If we know the
relationship between D and D, we can still arrive at an unbiased hypothesis by re-weighting the loss for each sample [11, 49]. We introduce the weighted loss Lw (z, x, y) = w(x, y)L(z, x, y) and define the weighting function pD (x, y) w(x, y) =
pD (x, y)
˜ reflecting how likely it is to observe (x, y) under D compared to D under the assumption that the
˜ This leads us to the basic result support of D is included in D.


Lw (f (x), x, y) =
(x,y)∈X ×Y


pD (x, y)

pD (x, y)
L(f (x), y)d(x, y) pD (x, y)

pD (x, y)L(f (x), y)d(x, y) =
(x,y)∈X ×Y



L(f (x), x, y) . (17)

Thus, by choosing appropriate weights we can modify our loss-function such that we can compute an unbiased estimator of the generalization error R(f ). This technique for addressing the sample selection bias is called importance weighting [3, 4].
We define a weighted sample set Lw as the training set L augmented with non-negative weights w1 , . . . , w for each point in L. These weights are used to set w(xi , yi ) = wi when computing the weighted loss. For the soft margin SVM minimizing the weighted loss can easily be achieved by multiplying each regularization parameter Ci in (4) with the corresponding weight, i.e., Ci = wi · C [49]. While the weighting gives an unbiased estimator, it may be difficult to estimate the weights reliably and the variance of the estimator may be very high. Controlling the variance is a crucial problem when using importance weighting.
The original importance-weighted active learning formulation works in a stream-based scenario and inspects one sample xt at each step t > 1 [3]. Iteration t of the algorithm works as follows:
1. Receive the unlabeled sample xt .
2. Choose pt ∈ [0, 1] based on all information available in this round.
3. With probability pt , query the label yt for xt , add (xt , yt ) to the weighted training set with weight wt = 1/pt , and retrain the classifier.
In step 2 the query probability pt has to be chosen based on earlier observations: this could be, for instance, the probability that two hypotheses disagree on the received sample xt .
This algorithm can also be adapted to the pool-based scenario [16]. In this case, we can simply define a probability distribution over all unlabeled samples in the pool. We set the probability for each point in proportion to its uncertainty, i.e., its distance to the decision boundary. This works well if we assume a noise-free setting. Otherwise, this method suffers from the same problems as other approaches that are based on a version space. Given a mislabeled sample (i.e., the label has a very low probability given the features), the active learner can be distracted and focus on regions within the hypothesis space which do not include the optimal decision boundary.
One way to circumvent these problems is to combine importance weighting with ideas from agnostic active learning [50]. We keep an ensemble of SVMs H = {f1 , ..., fK } and train each on a bootstrap sample subset from L, which may be initialized with a random subset of the unlabeled pool U for which labels are requested. After the initialization, we choose points x ∈ U with selection probability pt (x) = pthreshold + (1 − pthreshold )(pmax (x) − pmin (x)) ,
where pthreshold > 0 is a small minimum probability to ensure that pt (x) > 0. Using Platt’s method [31], we define pi (x) ∈ [0, 1] to be the probabilistic interpretation of an SVM fi with pmin = min1≤i≤K pi (x) and pmax = max1≤i≤K pi (x). Thus, pt is high if there is a strong disagreement within the ensemble and low if all classifiers agree. This allows to deal with noise, because no hypothesis gets excluded forever.


Multi-Class Active Learning

The majority of research on active learning with SVMs focuses on the binary case, because dealing with more categories makes estimating the uncertainty of a sample more difficult. Furthermore, multi-class SVMs are in general more time consuming to train. There are different approaches to extend SVMs to multi-class classification. A popular way is to reduce the learning task to multiple binary problems. This is done by using either a one-vs-one [19, 32] or a one-vs-all [35, 46] approach.
However, performing uncertainty sampling with respect to each single SVM may cause the problem that one sample is informative for one binary task, but bears little information for the other tasks and, thus, for the overall multi-class classification.
It is possible to extend the version space minimization strategy to the multi-class case [43]. The area of the version space is proportional to the probability of classifying the training set correctly, given a hypothesis sampled at random from the version space. In the one-vs-all approach for N classes, we maintain N binary SVM models f1 , . . . , fN where the ith SVM model is trained to separate class i from the other classes. We consider minimizing the maximum product of all N version space areas to select

Λ(Vx,y )

x = argmin max





where Λ(Vx,y ) is the area of the version space of the i-th SVM if the sample (x, y) : x ∈ U was included in the training set. To approximate the area of the version space, we can use MaxMin
Margin for each binary SVM. The margin of a sample in a single SVM only reflects the uncertainty with respect to the specific binary problem and not in relation to the other classification problems.
Therefore, we have to modify our approximation if we want to extend the Simple Margin strategy to the multi-class case.


Figure 7: Single version space for the multi-class problem with N one-vs-all SVMs [43]. The area Λ(Vx,y=i )

corresponds to the version space area if the label y = i for the sample we picked. The area Λ(Vx,y=i ) corresponds to the case where y = i. In the multi-class case, we want to measure both quantities to approximate the version space area.

We can interpret each fi (x) as a quantity that measures to what extent x splits the version space.
As we have N different classifiers that influence the area of the version space, we have to quantify this influence for each one. In Figure 7, we can see the version space for one single binary problem where we want to discriminate between class i and the rest. Thus, we want to approximate the area
Λ(Vx,y=i ). If we choose a sample x where fi (x) = 0, we approximately halve the version space, for fi (x) = 1, the area approximately stays the same and for fi (x) = −1, we gain a zero area;
similarly for the area Λ(Vx,y=i ). Therefore, we can use the approximation
0.5 · (1 + fi (x)) · Λ(V (i) ) if y = i
0.5 · (1 − fi (x)) · Λ(V (i) ) if y = i
We can also employ one-versus-one multi-class SVMs [25]. Again, we use Platt’s algorithm [31] to approximate posterior probabilities for our predictions. We simultaneously fit the probabilistic
Λ(Vx,y ) =


model and the SVM hyperparameters via grid-search to derive a classification probability pk (x), k = 1, . . . , K, for each of the K SVMs given the sample x. We can use these probabilities for active sample selection in different ways. A simple approach is to just select the sample with the least classification confidence. This corresponds to the selection criterion xLC = argmin


pk (x) .



This approach suffers from the same problem we mentioned earlier: the probability is connected only to each single binary problem instead of providing a measure that relates it to the other classifiers. To alleviate this, we can choose another approach, called breaking ties [25]. Here, we select the sample with the minimum difference between the highest class confidences, namely xBT = argmin

min k,l∈{1,...,K},k=l pk (x) − pl (x) .


This way, we prefer samples which two classifiers claim to be certain about, and thus, avoid considering the uncertainty of one classifier in an isolated manner.


Efficient Active Learning

In passive learning, selecting a training set comes almost for free, we just select a random subset of the labeled data. In active learning, we have to evaluate whether a sample should be added to the training set based on its ability to improve our prediction. A simple strategy to reduce computational complexity is to not only select single samples, but to collect them in batches instead. However, to profit from this strategy, we have to make sure to create batches that minimize redundancy.
Another consideration that makes efficient computation necessary is that for many algorithms we have to retrain our model on different training subsets. Usually, these subsets differ only by one or the few samples we consider for selection. Thus, we can employ online learning to train our model incrementally. In particular, this makes sense if we analyze the effect that adding a sample has on our model.

Online Learning

After we have selected a sample to be included in the training set, we have to retrain our model to reflect the additional data. Using the whole dataset for retraining is computationally expensive. A better option would be to incrementally improve our model with each selected sample through online learning. LASVM [5, 17] is an SVM solver for fast online training. It is based on a decomposition method [30] solving the learning problem iteratively by considering only a subset of the α-variables in each iteration. In LASVM, this subset considers one unlabeled sample (corresponding to a new variable) in every second iteration, which can, for instance, be picked by uncertainty sampling [5].

Batch-Mode Active Learning

When confronted with large amounts of unlabeled data, estimating the effect of single samples with respect to the learning objective is costly. Besides online learning, we can also gain a speed-up by labeling samples in batches. A naive strategy is to just select the n samples that are closest to the decision boundary [44]. This approach, however, does not take into account that the samples within the batch might bear a high level of redundancy.
To counteract this redundancy, we can select samples not only due to their individual informativeness, but also if they maximize the diversity within each batch [7]. One heuristic is to maximize the angle between the hyperplanes that the samples induce in version space:
| cos(∠(φ(xi ), φ(xj ))| =

| φ(xi ), φ(xj ) |
φ(xi ) φ(xj )

|κ(xi , xj )| κ(xi , xi )κ(xj , xj )


Let S be the batch of samples, which we initialize with one sample xS . We subsequently add the sample x whose corresponding hyperplane minimizes the maximum angle between any other
hyperplane induced by a sample in the batch. It is computed as x = argmin max | cos(∠(φ(x), φ(z))| .
x∈U \S z∈S



We can form a convex combination of this diversity measure with the well-known uncertainty measure (distance to the hyperplane) with trade-off parameter λ ∈ [0, 1]. Then, samples that should be added to the batch are iteratively chosen as x = argmin λ|f (x)| + (1 − λ) max | cos(∠(φ(x), φ(z))|

x∈U \S



We can choose λ = 0.5 to give equal weight to the uncertainty and diversity measure. An optimal value, however, might depend on how certain we are about the accuracy of the current classifier.



Access to unlabeled data allows us to improve predictive models in data mining applications. If it is only possible to label a limited amount of the available data due to labeling costs, we should choose this subset carefully and focus on patterns carrying the information most helpful to enhance the model. Support vector machines (SVMs) have convenient properties that make it easy to evaluate how unlabeled samples would influence the model if they were labeled and included in the training set. Therefore, SVMs are particularly well-suited for active learning. However, there are several challenges we have to address, such as efficient learning, dealing with multiple classes, and that actively choosing the training data introduces a selection bias. Importance weighting seems to be most promising to counteract this bias, and it can be easily incorporated into an active SVM learner.
Devising parallel algorithms for sample selection can speed up learning in many cases. Most of the research in active SVM learning so far has focused on binary decision problems. A challenge for future research is to develop efficient active learning algorithms for multi-class SVMs that address the nature of the multi-class decision in a more principled way.
The authors gratefully acknowledge support from The Danish Council for Independent Research through the project SkyML (FNU 12-125149).
[1] N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical
Society, 68(3):337–404, 1950.
[2] M.-F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 65–72. ACM Press, 2006.
[3] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance weighted active learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 49–56. ACM
Press, 2009.
[4] A. Beygelzimer, J. Langford, Z. Tong, and D. Hsu. Agnostic active learning without constraints. In Advances in Neural Information Processing Systems (NIPS), pages 199–207. MIT
Press, 2010.
[5] A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classifiers with online and active learning. Journal of Machine Learning Research (JMLR), 6:1579–1619, 2005.
[6] B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In
Workshop on Computational Learning Theory (COLT), pages 144–152. ACM Press, 1992.
[7] K. Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), pages 59–66. AAAI
Press, 2003.
[8] K. Brinker. Active learning of label ranking functions. In Proceedings of the International
Conference on Machine Learning (ICML), pages 129–136. ACM Press, 2004.
[9] C. Campbell, N. Cristianini, and A. Smola. Query learning with large margin classifiers. In
Proceedings of the International Conference on Machine Learning (ICML), pages 111–118.
Morgan Kaufmann, 2000.

[10] O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, 2006. o [11] C. Cortes, M. Mohri, M. Riley, and A. Rostamizadeh. Sample selection bias correction theory.
In N. Bshouty, G. Stoltz, N. Vayatis, and T. Zeugmann, editors, Algorithmic Learning Theory, pages 38–53. Springer, 2008.
[12] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.
[13] S. Dasgupta. Two faces of active learning. Theoretical Computer Science, 412(19):1767–1781,
[14] S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In Proceedings of the
International Conference on Machine Learning (ICML), pages 208–215. ACM Press, 2008.
[15] B. Demir and L. Bruzzone. A novel active learning method for support vector regression to estimate biophysical parameters from remotely sensed images. In L. Bruzzone, editor, Proceedings of SPIE 8537, Image and Signal Processing for Remote Sensing XVIII, volume 8537, page 85370L. International Society for Optics and Photonics, 2012.
[16] R. Ganti and A. Gray. Upal: Unbiased pool based active learning. In Proceedings of the
International Conference on Artificial Intelligence and Statistics (AISTATS), pages 422–431,
[17] T. Glasmachers and C. Igel. Second-order SMO improves SVM online and active learning.
Neural Computation, 20(2):374–382, 2008.
[18] I. Guyon, G. Cawley, G. Dror, and V. Lemaire. Results of the active learning challenge. Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings, 16:19–45,
[19] T. Hastie and R. Tibshirani.
26(2):451–471, 1998.

Classification by pairwise coupling.

Annals of Statistics,

[20] C.-H. Ho, M.-H. Tsai, and C.-J. Lin. Active learning and experimental design with SVMs.
Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings,
16:71–84, 2011.
[21] S. Hoi, R. Jin, J. Zhu, and M. Lyu. Semi-supervised SVM batch mode active learning for image retrieval. In Proceedings of the Conference on Computer Vision and Pattern Recognition
(CVPR), pages 1–7. IEEE, 2008.
[22] S.-J. Huang, R. Jin, and Z.-H. Zhou. Active learning by querying informative and representative examples. In Advances in Neural Information Processing Systems (NIPS), pages 892–900.
MIT Press, 2010.
[23] D. Lewis and W. Gale. A sequential algorithm for training text classifiers. In Proceedings of the SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages
3–12. ACM Press, 1994.
[24] C.-L. Li, C.-S. Ferng, and H.-T. Lin. Active learning with hinted support vector machine. Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings, 25:221–
235, 2012.
[25] T. Luo, K. Kramer, D. Goldgof, L. Hall, S. Samson, A. Remsen, and T. Hopkins. Active learning to recognize multiple types of plankton. In Proceedings of the International Conference on Pattern Recognition (ICPR), volume 3, pages 478–481. IEEE, 2004.
[26] A. Mammone, M. Turchi, and N. Cristianini. Support vector machines. Wiley Interdisciplinary
Reviews: Computational Statistics, 1(3):283–289, 2009.
[27] T. M. Mitchell. Generalization as search. Artificial intelligence, 18(2):203–226, 1982.
[28] P. Mitra, C. Murthy, and S. Pal. A probabilistic active support vector learning algorithm.
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 26(3):413–418, 2004.
[29] F. Olsson. A literature survey of active machine learning in the context of natural language processing. Technical report, Swedish Institute of Computer Science, 2009.
[30] J. Platt. Fast training of support vector machines using sequential minimal optimization. In
B. Sch¨ lkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector o Learning, chapter 12, pages 185–208. MIT Press, 1999.

[31] J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Sch¨ lkopf, and D. Schuurmans, editors, Advances o in Large Margin Classifiers, pages 61–74. MIT Press, 1999.
[32] J. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification.
Advances in Neural Information Processing Systems (NIPS), 12(3):547–553, 2000.
[33] J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. Lawrence. Dataset Shift in
Machine Learning. MIT Press, 2009.
[34] J. W. Richards, D. L. Starr, H. Brink, A. A. Miller, J. S. Bloom, N. R. Butler, J. B. James,
J. P. Long, and J. Rice. Active learning to overcome sample selection bias: Application to photometric variable star classification. The Astrophysical Journal, 744(2):192, 2012.
[35] R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine Learning
Research (JMLR), 5:101–141, 2004.
[36] S. Salcedo-Sanz, J. L. Rojo-Alvarez, M. Mart´nez-Ram´ n, and G. Camps-Valls. Support vector ı o machines in engineering: an overview. Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, 4(3):234–267, 2014.
[37] G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In
Proceedings of the International Conference on Machine Learning (ICML), pages 839–846.
Morgan Kaufmann, 2000.
[38] B. Sch¨ lkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, o Optimization, and Beyond. MIT Press, 2002.
[39] B. Settles. Active Learning. Morgan & Claypool, 2012.
[40] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[41] X. Shi, W. Fan, and J. Ren. Actively transfer domain knowledge. In Proceedings of the
European Conference on Machine Learning and Knowledge Discovery in Databases (ECML
PKDD), pages 342–357. Springer, 2008.
[42] I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.
[43] S. Tong. Active learning: Theory and applications. PhD thesis, Stanford University, 2001.
[44] S. Tong and E. Chang. Support vector machine active learning for image retrieval. In Proceedings of the International Conference on Multimedia (MM), pages 107–118. ACM Press,
[45] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research (JMLR), 2:45–66, 2002.
[46] V. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.
[47] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. Representative sampling for text classification using support vector machines. In Proceedings of the European Conference on Information
Retrieval (ECIR), pages 393–407. Springer, 2003.
[48] H. Yu. Svm selective sampling for ranking with application to data retrieval. In Proceedings of the International Conference on Knowledge Discovery in Data Mining (SIGKDD), pages
354–363. ACM Press, 2005.
[49] B. Zadrozny, J. Langford, and N. Abe. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the International Conference on Data Mining (ICDM), pages
435–442. IEEE, 2003.
[50] L. Zhao, G. Sukthankar, and R. Sukthankar. Importance-weighted label prediction for active learning with noisy annotations. In Proceedings of the International Conference on Pattern
Recognition (ICPR), pages 3476–3479. IEEE, 2012.


Similar Documents

Premium Essay


...MANAGEMENT DEVELOPMENT INSTITUTE OF SINGAPORE TASHKENT INTERNATIONAL TRADE FINANCE – INDIVIDUAL ASSIGNMENT Course Module Lecturer Due Date Weightage : Bsc (Hons) in Banking and Finance : International Trade Finance : Ms Ratna Devi : 5 November 2012 : 30% Assignment —Individual This assignment for the module International Trade Finance carries 30% of the overall assessment grade. Case under Observation: This is a case study of a U.S. company called Cossco, Inc, depicted in different scenarios 1 through 4, carrying a maximum of 70 marks. Each scenario is presented in a way that encapsulates the topics in your syllabus for International Trade Finance. Students are advised to closely read each scenario and understand the issues faced by the CFO and financial analyst. Recommendations are to be given in a logical and concise manner. __________________________________________________________ Scenario 1 20 Marks Cossco, Inc is a U.S based company that has been incorporated in the United States for three years. It’s a small company with total assets worth $300 million. The company produces a single type of product, golf clubs. Cossco during the boom time, has been quite successful. However, the demand for “IRONS”, the company’s primary product in the United States, has been slowly decreasing since last year. Cossco’s shareholders have been pressuring the company to improve its performance. Cossco produces high-quality golf clubs and employs a unique production......

Words: 1782 - Pages: 8

Free Essay


...ASSIGNMENT ON MANAGEMENT OF TECHNOLOGY RAJESWARI.G ID NO. 2010HZ58075 ASSIGNMENT ON In your opinion, what are some important criteria which the firms should take to increase the quality and productivity of their products and service? Assignment on Productivity Improvement Page 2 Important criteria which the firms should take to increase the quality and productivity of their products and services There are few things to be considered for a Firm to increase the market value/productivity by improving the quality of the product, This is generally can be achieved, By improving the existing product, By introducing new products, Product launching / Advertising the same By improving the services for the newly or existing products, Immediate response for the complaints , querries Cost effective , easy accessible products etc., Let we discuss the points in detail, 4 Quick Steps to Improve an Existing Product You may have a niche marketing website that just isn’t producing sales for you at the rate at which you had hoped it would…..or maybe it isn’t producing any income for you at all or it could be that you haven’t actually figured out that what you are selling is, in fact, a niche market product. You might need to do a little ‘tweaking’ and modify your strategies somewhat to get the site performing better. There really are some things that you can do to improve your existing product. Step #1: Bill Cosby, the famous entertainer, once said, “I don’t know......

Words: 5382 - Pages: 22

Premium Essay


...© Deakin University MPE781/981 ASSIGNMENT, TRIMESTER 3, 2012 ECONOMICS FOR MANAGERS T3.2012 Assignment Due date: Nature: Assignment Overview: Monday, January 28, 2013. Individual assignment. This assignment is partly based on an article published in The Australian on April 26, 2012 entitled “Poor bear brunt of ‘nanny taxes’” by Adam Creighton. The article can be downloaded via the library database Newsbank: For your convenience, the article is also attached to this assignment. Please read the article carefully before attempting the questions. You will be required to demonstrate your understanding of concepts taught in the unit and relate them to the case in the article. This assignment is designed to encourage you to think about the application of concepts learned in this unit to real world scenarios. Although you can work in groups, this is not a group assignment and you must submit answers individually. You will be graded on your use of appropriate economic theory and concepts, clarity of exposition and overall quality of your answers. Your answers should follow “Guide to assignment writing and referencing”, available at this link: Answer all questions. Limit the total word count of your assignment to less than 3,000 words. Depth is encouraged over breadth: that is, it is more important that you demonstrate you......

Words: 2372 - Pages: 10

Premium Essay


...FST-01 ASSIGNMENT BOOKLET Foundation Course in Science and Technology Bachelor’s Degree Programme (BDP) (Valid from 1st July, 2013 to 31st March, 2014) It is compulsory to submit the assignment before filling in the exam form School of Sciences Indira Gandhi National Open University New Delhi (2013-14) Dear Student, We hope, you are familiar with the system of evaluation to be followed for the Bachelor's Degree Programme. At this stage you may probably like to re-read the section on assignments in the Programme Guide that was sent to you after your enrolment. A weightage of 30 percent, as you are aware, has been earmarked for continuous evaluation, which would consist of one tutor-marked assignment for this course. This assignment is based on all Blocks of this course i.e. Block 1-8. Instructions for Formatting Your Assignments Before attempting the assignments, please read the following instructions carefully: 1. On top of the first page of your answer sheet, please write the details exactly in the following format: ENROLMENT NO.: …………………... NAME: …………………........................ ADDRESS: …………………................. ……………………………………… COURSE CODE COURSE TITLE : ………………………………. : ……………………………….. ASSIGNMENT NO. : ………………………………... STUDY CENTRE : ……………………………..… (NAME AND CODE) PLEASE FOLLOW THE ABOVE FORMAT STRICTLY TO FACILITATE EVALUATION AND TO AVOID DELAY. 2. 3. 4. 5. 6. Use only foolscap size writing paper (but not of very thin variety) for writing your......

Words: 760 - Pages: 4

Premium Essay


...ASSIGNMENT MARKETING MANAGEMENT |CASE |INDIVIDUAL ASSIGNMENT |GROUP ASSIGNMENT | | |(Please select 1 question for your individual |(Please select 2 questions for your group assignment | | |assignment regardless how many questions are given in |regardless how many questions are given in each case) | | |each case) | | | |Submission deadline: JANUARY 4TH, 2014 | | |Submission to: ANHDANGLUCKY@GMAIL.COM | |ZENITH |What would you do to improve the reliability of the |What managerial and research problems that face Zenith?| | |market research, if you were the top management of the | | | |company? |Where could Zenith get the relevant information? | | |What would the best way be to discover the market |What would you do if you were Pearlman? | | |demand of the new product of Zenith? | ...

Words: 693 - Pages: 3

Free Essay


...FACULTY OF EDUCATION AND LANGUAGES OUMH1303: ENGLISH FOR ORAL COMMUNICATION SEMESTER : JANUARY 2010 COURSE ASSIGNMENT (35%) INSTRUCTIONS 1. The assignment would be evaluated on the basis of the accuracy of the answers given and the credibility of the supporting arguments, data and references. 2. Type your assignment using “Times New Roman”, font size 12 with 1.5 line spacing on A4 size paper. Your assignment must be submitted to your tutor before or by the 4th tutorial. 3. 4. Your assignment should be limited to 10 – 12 pages. 5. This assignment is worth 35 % of your final course grade. 6. Plagiarism in any form is prohibited. Plagiarised materials would not be accepted and zero (0) marks would be awarded for the work. 1 ASSIGNMENT QUESTION You are the president of the Parent-Teacher Association (PTA) at an urban school. At the last association meeting, many parents expressed their concern about the poor performance of their children, particularly in Mathematics, Science and the English Language. They felt that the school should work harder towards improving the teaching and learning of these subjects. The PTA could assist but the association does not have enough funds (money) to carry out its projects for the school. You wish to speak about this problem and suggest some solutions at the forthcoming meeting. (a) Which of the following speech types will best describe your speech: informative, persuasive, negotiation, or argumentative......

Words: 626 - Pages: 3

Premium Essay


...FINAL ASSIGNMENT |Programme Title |Edexcel BTEC Level 5 HND Diploma in Business (QCF) | |Unit Title |Marketing Principles | |Unit Code |F/601/0556 | |Assignment No |01 | |Level |Level-5 HND | |Credit value |15 credits | |Assessor | | |Deliverer | | |Handout Date | | |Hand in Date |31/07/2014 | Assignment Title: Making Marketing Decisions You have been......

Words: 1867 - Pages: 8

Premium Essay


...ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD (Department of Business Administration) Course: Human Resource Management (5532) Level: MBA Semester: Autumn, 2010 CHECKLIST This packet comprises the following material: 1) 2) 3) 4) 5) Note: Text book Assignments # 1 & 2 Course outlines Assignment 6 forms (2 sets) Assignment submission schedule In this packet, if you find anything missing out of the above-mentioned material, please contact at the address given below: The Mailing Officer Mailing Section, Block # 28 Allama Iqbal Open University, Sector H/8, Islamabad. Tel: (051) 9057611, 9057612 Mohammad Majid Mahmood Bagram Course Coordinator ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD (Department of Business Administration) WARNING 1. 2. PLAGIARISM OR HIRING OF GHOST WRITER(S) FOR SOLVING THE ASSIGNMENT(S) WILL DEBAR THE STUDENT FROM AWARD OF DEGREE/CERTIFICATE, IF FOUND AT ANY STAGE. SUBMITTING ASSIGNMENTS BORROWED OR STOLEN FROM OTHER(S) AS ONE’S OWN WILL BE PENALIZED AS DEFINED IN “AIOU PLAGIARISM POLICY”. Course: Human Resource Management (5532) Level: MBA Semester: Autumn, 2010 Total Marks: 100 Pass Marks: 40 ASSIGNMENT No. 1 (Units: 1–4) Q. 1 Why HR is called the most important asset and competitive advantage of any organization in the world? (20) Your Solutions 2 Helping Material HR and Competitive Advantage In order to have an effective competitive strategy, the company must have one or more competitive advantage, factors that allow......

Words: 5443 - Pages: 22

Free Essay


...and uniform specifications. 1046.2 WEARING AND CONDITION OF UNIFORM AND EQUIPMENT Police employees wear the uniform to be identified as the law enforcement authority in society. The uniform also serves an equally important purpose to identify the wearer as a source of assistance in an emergency, crisis or other time of need. (a) Uniform and equipment shall be maintained in a serviceable condition and shall be ready at all times for immediate use. Uniforms shall be neat, clean, and appear professionally pressed. All peace officers of this department shall possess and maintain at all times, a serviceable uniform and the necessary equipment to perform uniformed field duty. Personnel shall wear only the uniform specified for their rank and assignment. The uniform is to be worn in compliance with the specifications set forth in the department's uniform specifications that are maintained separately from this policy. All supervisors will perform periodic inspections of their personnel to ensure conformance to these regulations. Civilian attire shall not be worn in combination with any distinguishable part of the uniform. Uniforms are only to be worn while on duty, while in transit to or from work, for court, or at other official department functions or events. If the uniform is worn while in transit, an outer garment shall be worn over the uniform shirt so as not to bring attention to the employee while he/she is off duty. Employees are not to purchase or drink alcoholic beverages......

Words: 2063 - Pages: 9

Premium Essay


...SUNWAY COLLEGE JOHOR BAHRU DIPLOMA IN BUSINESS ADMINISTRATION COURSEWORK / ASSIGNMENT (GROUP) Module Code Module Title Semester Issue Date Due Date Lecturer : BMGT 0304 : HUMAN RESOURCE MANAGEMENT : MARCH-JUNE 2015 : WEEK 2 : WEEK 6 (30th April 2015) : ANTHONY WONG INSTRUCTIONS 1. 2. 3. 4. 5. 6. There are SEVEN (7) pages in this assignment including the cover page. The assignment must be completed in groups as per instruction. The submitted assignment must include the Assignment Cover Page. References must be acknowledged accordingly. Plagiarism/cheating will result in the assignment being marked FAIL. The assignment must be submitted in hardcopy (printed) format and presented. IMPORTANT Assignments must be submitted on their due dates. If an assignment is submitted after its due date, the following penalisation will be imposed: ● ● ● One to two days late Three to five days late More than five days late 20% deducted from the total assignment marks 40% deducted from the total assignment marks Assignment will not be marked. 1 INTRODUCTION This assignment is a partial fulfillment of requirements leading to Diploma in Hotel Management/Business Admin for students taking a subject in Human Resource Management. The assignment will be done by students in suitable group size which approved by the lecturer. PURPOSES The purposes of this assignment are to assess a student’s ability to: 1. Understand the basic concepts or theories learned in the subject matter.......

Words: 1354 - Pages: 6

Premium Essay

Assignment water feature. Please ensure that you address all the criteria contained within the ‘Assessment Task 1 Marking Sheet’. This assignment will be marked generally in accordance with the Marking Sheet where marks are deducted for non-conformities. Please be aware that simply mentioning the marking criteria/addressing it in a half-sentence or similar does not guarantee full marks; the thoroughness and completeness of how the marking criteria are addressed will determine how many marks for each separate criterion will be awarded; this can be either the full mark, or parts thereof. Furthermore, it is vital that you familiarise yourself with precisely what a Risk Assessment and Risk Treatment incorporate for ISO 31,000 – due to the word count limit, it is advisable to not deviate from the task given, while meeting the marking criteria. Word count limit: The body of this assignment will be in the range of 3000 to 5000 words, excluding any Appendices. You may need to simplify and define the boundaries carefully in order to achieve the word limit. A single hard copy will be submitted in class. The assignment must also be uploaded to the 49006 Turnitin folder within UTSOnline before the due date. Please make sure to submit the complete assignment including Cover Page, Table of Contents, Reference list and Appendices. Emailed assignments will not be accepted. Be aware that several students have fallen foul, in previous semesters, of the sophisticated systems in......

Words: 831 - Pages: 4

Free Essay


...Assignment front sheet Qualification Unit number and title Pearson BTEC Level 5 HND Diploma Business Unit 1: Business Environment Learner name Assessor name Nour Hawarneh Date issued Completion date Submitted on Nov 08, 2015 Jan 18, 2016 Assignment title Your company’s environment LO2 LO3 Assessment Criteria In this assessment you will have the opportunity to present evidence that shows you are able to: 1.1 organisational purposes of businesses Identify the purposes of different types of organisation 1 1.2 Describe the extent to which an organisation meets the objectives of different stakeholders 1 1.3 LO1 Learning outcome Understand the Learning Outcom e Explain the responsibilities of an organisation and strategies employed to meet them 2.1 Explain how economic systems attempt to allocate resources effectively 2 2.2 Assess the impact of fiscal and monetary policy on business organisations and their activities Evaluate the impact of competition policy and other regulatory mechanisms on the activities of a selected organisation 2 Understand the nature of the national environment in which businesses operate Understand the behaviour for oganisations in their market environment 2.3 3.1 LO4 2 Illustrate the way in which market forces shape organisational responses using a range of examples Judge how the business and cultural environments shape the behaviour of a selected organisation 3.3 Be able to assess the significance of the......

Words: 1690 - Pages: 7

Premium Essay


...|Assignment brief – QCF BTEC (L3 ONLY) | |Assignment front sheet | |Qualification |Unit number and title | |BTEC L3 Diploma/Ext. Dipl. – Business |UNIT 1 – BUSINESS ENVIRONMENT | |Learner name | Assessor name | | |MARY EC ZAFRA | |Date issued | Hand in deadline |Submitted on | |14 OCTOBER 2015 | 15 November 2015 |18NOV2015 | | | | |Assignment No. & title |Assignment 1/2 - The Businesses We See | |In this assessment you will have opportunities to provide......

Words: 952 - Pages: 4

Premium Essay


...Application Exercise (Assignment to be submitted) (90 min.) (Not exceeding five pages) |Apply the five forces analysis to your company/division and assess the attractiveness of your industry. | | |Compare the industry attractiveness five years back and today due to the shift in the forces. | | |Guidelines for the assignment | | | | | |Brief introduction of your company, its product portfolio and the markets/segments it caters to. If it | | |is in multiple industries, choose any one industry for the purpose of this assignment. (refer to 2008 | | |HBS note for definition of industry) | | | | | |Consider each threat individually. Take each factor in it and explain its role and significance in your | | |industry. Rate its effect based on the above explanation. (template if shown in class may be used for | | |structured approach but the spirit of the analysis matters more than......

Words: 254 - Pages: 2

Premium Essay


...and analysed to present their financial standing at the end of the year. Comparing the relationships of these ratios reveals that Orica decreased their liquidity, but suffered lower profitability with heavy influences from Minova’s impairment of goodwill. With high asset utilisation and stable efficiency, Orica should focus on improving their maintenance and reliability by addressing the Kooragang plant shutdown. Furthermore, Orica is financing more with debt than equity which introduces some risk to the company given their higher expenses for 2012. Finally, the investment ratios indicate that Orica could be poised for high growth with a stable return, but should first focus on maximising their plants and equipment. Sample Assignment: Part of the content removed II TABLE OF CONTENTS Executive Summary ............................................................................................................................. II Table of Contents ............................................................................................................................... III A. List of Tables ...............................................................................................................................IV 1. Introduction ................................................................................................................................... 1 2. Ratios....................................................................

Words: 2155 - Pages: 9