The Learning-Curve Sampling Method Applied to Model-Based Clustering

Meek, Christopher; Thiesson, Bo; Heckerman, David
June 2002
Journal of Machine Learning Research;Summer2002, Vol. 2 Issue 3, p397
Academic Journal
We examine the learning-curve sampling method, an approach for applying machinelearning algorithms to large data sets. The approach is based on the observation that the computational cost of learning a model increases as a function of the sample size of the training data, whereas the accuracy of a model has diminishing improvements as a function of sample size. Thus, the learning-curve sampling method monitors the increasing costs and performance as larger and larger amounts of data are used for training, and terminates learning when future costs outweigh future benefits. In this paper, we formalize the learning-curve sampling method and its associated cost-benefit tradeoff in terms of decision theory. In addition, we describe the application of the learning-curve sampling method to the task of model-based clustering via the expectation-maximization (EM) algorithm. In experiments on three real data sets, we show that the learning-curve sampling method produces models that are nearly as accurate as those trained on complete data sets, but with dramatically reduced learning times. Finally, we describe an extension of the basic learning-curve approach for model-based clustering that results in an additional speedup. This extension is based on the observation that the shape of the learning curve for a given model and data set is roughly independent of the number of EM iterations used during training. Thus, we run EM for only a few iterations to decide how many cases to use for training, and then run EM to full convergence once the number of cases is selected.


Related Articles

  • Statistical evaluation of learning curve effects in surgical trials. Cook, Jonathan A.; Ramsay, Craig R.; Fayers, Peter // Clinical Trials;2004, Vol. 1 Issue 5, p421 

    Randomized controlled trials (RCTs) in surgery have been impeded by concerns that improvements in the technical performance of a new technique over time (a "learning curve") may distort comparisons. The statistical assessment of learning curves in trials has received little attention. In this...

  • Adaptive Internal Model Control of a DC Motor Drive System Using Dynamic Neural Network. Zouari, Farouk; Ben Saad, Kamel; Benrejeb, Mohamed // Journal of Software Engineering & Applications;Mar2012, Vol. 5 Issue 3, p168 

    This work concerns the study of problems relating to the adaptive internal model control of DC motor in both cases conventional and neural. The most important aspects of design building blocks of adaptive internal model control are the choice of architectures, learning algorithms, and examples...

  • Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs. Urry, Matthew J.; Sollich, Peter // Journal of Machine Learning Research;Jul2013, Vol. 14 Issue 7, p1801 

    We consider learning on graphs, guided by kernels that encode similarity between vertices. Our focus is on random walk kernels, the analogues of squared exponential kernels in Euclidean spaces. We show that on large, locally treelike graphs these have some counter-intuitive properties,...

  • The learning curve and production standards: Learning implications. Briscoe, Nat R.; Roark, Stephen // Review of Business;Spring91, Vol. 12 Issue 4, p31 

    Examines the components of the learning curve model from a behavioral perspective. Historical recognition of learning; Example of learning effects on output; Areas of learning examined including preproduction learning; Intratask learning; Interaction effects; Achieving an atmosphere of learning.

  • Boom, Bust, and Failures to Learn in Experimental Markets. Paich, Mark; Sterman, John D. // Management Science;Dec1993, Vol. 39 Issue 12, p1439 

    Boom and bust is a pervasive dynamic for new products. Word of mouth, marketing, and learning curve effects can fuel rapid growth, often leading to overcapacity, price war, and bankruptcy. Previous experiments suggest such dysfunctional behavior can be caused by systematic "misperceptions of...

  • Claim Evaluation for Combined Effect of Multiple Claim Factors. Singh, Amarjit // Cost Engineering;Dec2001, Vol. 43 Issue 12, p13 

    Presents a problem in engineering where multiple claim factors occur simultaneously, and offers an integrated and comprehensive solution when all claims factors are present together. Analysis of the problem; Application of learning curve equation; Evaluation with overstaffing and overtime.

  • Learning Equivalence Classes of Bayesian-Network Structures. Chickering, David Maxwell // Journal of Machine Learning Research;Summer2002, Vol. 2 Issue 3, p445 

    Two Bayesian-network structures are said to be equivalent if the set of distributions that can be represented with one of those structures is identical to the set of distributions that can be represented with the other. Many scoring criteria that are used to learn Bayesian-network structures...

  • Learning curves concept(Letter). Aberdeen, M. E. // Accountancy;Jun76, Vol. 87 Issue 994, p17 

    A letter to the editor is presented about learning curves.

  • Insurance Banks Ride Learning Curve. Ruquet, Mark E. // National Underwriter / Life & Health Financial Services;2/25/2002, Vol. 106 Issue 8, p21 

    Deals with the progress of banks after the implementation of the Gramm-Leach-Bliley Financial Services Modernization Act in 1999. Activities of banks concerning insurance companies; Impact of the learning curve on the progress of banks; Overview of the activities of InsurBanc.


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics