I am a research fellow funded by an EPSRC Doctoral Prize. My research is centred around the Algorithm Selection Problem and its applications in Artificial Intelligence. Read more in my research proposal.
Before that, I was a PhD student working on the Dominion project, which I am still involved with. My work was supervised by Ian Miguel and Ian Gent. My external SICSA supervisor was Derek Long. I was supported by a Scottish Informatics and Computer Science Alliance (SICSA) scholarship. The final version of my thesis can be found here (PDF).
I am a member of the School of Computer Science and the Centre for Interdisciplinary Research in Computational Algebra (CIRCA).
Have a look at my overview of the Algorithm Selection literature!
The library I use for creating CSS3 presentations is available as a github project here.
I have written a visualiser for Minion. It shows the domains of the variables of the problem in a matrix during search. It is included with Minion from version 0.8. An animation for Minion solving the magic square problem for a square of size 4 is here (large gif animation).
I have written a patch for RWeka to allow instance weights to be specified. It adds the new parameter "weight", which takes a list of weights for the instances as an argument, to the existing methods. I have tried contacting the package maintainer, but failed. Patch against version 0.4-1.
Machine learning is an established method of selecting algorithms to solve hard search problems. Despite this, to date no systematic comparison and evaluation of the different techniques has been performed and the performance of existing systems has not been critically compared with other approaches. We compare the performance of a large number of different machine learning techniques from different machine learning methodologies on five data sets of hard algorithm selection problems from the literature. In addition to well-established approaches, for the first time we also apply statistical relational learning to this problem. We demonstrate that there is significant scope for improvement both compared with existing systems and in general. To guide practitioners, we close by giving clear recommendations as to which machine learning techniques are likely to achieve good performance in the context of algorithm selection problems. In particular, we show that linear regression and alternating decision trees have a very high probability of achieving better performance than always selecting the single best algorithm.
Andreas Distler, Christopher Jefferson, Tom Kelsey, Lars Kotthoff, The Semigroups of Order 10, 18th International Conference on Principles and Practice of Constraint Programming, Québec, Canada, October 2012
The number of finite semigroups increases rapidly with the number of elements. Since existing enumeration formulae do not give the complete number of semigroups of given order up to symmetric equivalence, the remainder can only be found by careful search. We describe the use of mathematical results combined with distributed Constraint Satisfaction to show that the number of non-equivalent semigroups of order 10 is 12,418,001,077,381,302,684. This solves a previously open problem in Mathematics, and has directly led to improvements in Constraint Satisfaction technology.
Many state of the art Algorithm Selection systems use Machine Learning to either predict the run time or a similar performance measure of each of a set of algorithms and choose the algorithm with the best predicted performance or predict the best algorithm directly. We present a technique based on the well-established Machine Learning technique of stacking that combines the two approaches into a new hybrid approach and predicts the best algorithm based on predicted run times. We demonstrate significant performance improvements of up to a factor of six compared to the previous state of the art. Our approach is widely applicable and does not place any restrictions on the performance measure used, the way to predict it or the Machine Learning used to predict the best algorithm. We investigate different ways of deriving new Machine Learning features from the predicted performance measures and evaluate their effectiveness in increasing performance further. We use five different regression algorithms for performance prediction on five data sets from the literature and present strong empirical evidence that shows the effectiveness of our approach.
Lars Kotthoff, On Algorithm Selection, with an Application to Combinatorial Search Problems, PhD dissertation, St Andrews, 2012
The Algorithm Selection Problem is to select the most appropriate way
for solving a problem given a choice of different ways. Some of the most
prominent and successful applications come from Artificial Intelligence
and in particular combinatorial search problems. Machine Learning has
established itself as the de facto way of tackling the Algorithm
Selection Problem. Yet even after a decade of intensive research, there
are no established guidelines as to what kind of Machine Learning to use
This dissertation presents an overview of the field of Algorithm Selection and associated research and highlights the fundamental questions left open and problems facing practitioners. In a series of case studies, it underlines the difficulty of doing Algorithm Selection in practice and tackles issues related to this. The case studies apply Algorithm Selection techniques to new problem domains and show how to achieve significant performance improvements. Lazy learning in constraint solving and the implementation of the alldifferent constraint are the areas in which we improve on the performance of current state of the art systems. The case studies furthermore provide empirical evidence for the effectiveness of using the misclassification penalty as an input to Machine Learning.
After having established the difficulty, we present an effective technique for reducing it. Machine Learning ensembles are a way of reducing the background knowledge and experimentation required from the researcher while increasing the robustness of the system. Ensembles do not only decrease the difficulty, but can also increase the performance of Algorithm Selection systems. They are used to much the same ends in Machine Learning itself.
We finally tackle one of the great remaining challenges of Algorithm Selection — which Machine Learning technique to use in practice. Through a large-scale empirical evaluation on diverse data taken from Algorithm Selection applications in the literature, we establish recommendations for Machine Learning algorithms that are likely to perform well in Algorithm Selection for combinatorial search problems. The recommendations are based on strong empirical evidence and additional statistical simulations.
The research presented in this dissertation significantly reduces the knowledge threshold for researchers who want to perform Algorithm Selection in practice. It makes major contributions to the field of Algorithm Selection by investigating fundamental issues that have been largely ignored by the research community so far.
Dharini Balasubramaniam, Christopher Jefferson, Lars Kotthoff, Ian Miguel, Peter Nightingale, An Automated Approach to Generating Efficient Constraint Solvers, 34th International Conference on Software Engineering, Zurich, Switzerland, June 2012
Combinatorial problems appear in numerous settings, from timetabling to
industrial design. Constraint solving aims to find solutions to such
problems efficiently and automatically. Current constraint solvers are
monolithic in design, accepting a broad range of problems. The cost of
this convenience is a complex architecture, inhibiting efficiency,
extensibility and scalability. Solver components are also tightly
coupled with complex restrictions on their configuration, making
automated generation of solvers difficult.
We describe a novel, automated, model-driven approach to generating efficient solvers tailored to individual problems and present some results from applying the approach. The main contribution of this work is a solver generation framework called Dominion, which analyses a problem and, based on its characteristics, generates a solver using components chosen from a library. The key benefit of this approach is the ability to solve larger and more difficult problems as a result of applying finer-grained optimisations and using specialised techniques as required.
Lars Kotthoff, Ian Gent, Ian Miguel, A Preliminary Evaluation of Machine Learning in Algorithm Selection for Search Problems, Fourth Annual Symposium on Combinatorial Search, Barcelona, Spain, July 2011
Best Student Paper Award
The results presented for SATzilla are flawed because of a technical issue. Please have a look at the corrected results.
Machine learning is an established method of selecting algorithms to solve hard search problems. Despite this, to date no systematic comparison and evaluation of the different techniques has been performed and the performance of existing systems has not been critically compared to other approaches. We compare machine learning techniques for algorithm selection on real-world data sets of hard search problems. In addition to well-established approaches, for the first time we also apply statistical relational learning to this problem. We demonstrate that most machine learning techniques and existing systems perform less well than one might expect. To guide practitioners, we close by giving clear recommendations as to which machine learning techniques are likely to perform well based on our experiments.
Please note that this version fixes erroneous figure labels and captions in the published version.
We present preliminary results of an investigation into the suitability of virtualised hardware — in particular clouds — for running computational experiments. Our main concern was that the reported CPU time would not be reliable and reproducible. The results demonstrate that while this is true in cases where many virtual machines are running on the same physical hardware, there is no inherent variation introduced by using virtualised hardware compared to non-virtualised hardware.
Ian Gent, Chris Jefferson, Lars Kotthoff, Ian Miguel, Modelling Constraint Solver Architecture Design as a Constraint Problem, Annual ERCIM Workshop on Constraint Solving and Constraint Logic Programming, York, UK, April 2011
Designing component-based constraint solvers is a complex problem. Some components are required, some are optional and there are interdependencies between the components. Because of this, previous approaches to solver design and modification have been ad-hoc and limited. We present a system that transforms a description of the components and the characteristics of the target constraint solver into a constraint problem. Solving this problem yields the description of a valid solver. Our approach represents a significant step towards the automated design and synthesis of constraint solvers that are specialised for individual constraint problem classes or instances.
Dharini Balasubramaniam, Lakshitha de Silva, Chris Jefferson, Lars Kotthoff, Ian Miguel, Peter Nightingale, Dominion: An Architecture-driven Approach to Generating Efficient Constraint Solvers, 9th Working IEEE/IFIP Conference on Software Architecture (WICSA), June 2011
Constraints are used to solve combinatorial problems in a variety of industrial and academic disciplines. However most constraint solvers are designed to be general and monolithic, leading to problems with efficiency, scalability and extensibility. We propose a novel, architecture-driven constraint solver generation framework called Dominion to tackle these issues. For any given problem, Dominion generates a lean and efficient solver tailored to that problem. In this paper, we outline the Dominion approach and its implications for software architecture specification of the solver.
We report the first evaluation of Constraint Satisfaction as a computational framework for solving closest string problems. We show that careful consideration of symbol occurrences can provide search heuristics that provide several orders of magnitude speedup at and above the optimal distance. We also report the first analysis and evaluation – using any technique – of the computational difficulties involved in the identification of all closest strings for a given input set. We describe algorithms for web-scale distributed solution of closest string problems, both purely based on AI backtrack search and also hybrid numeric-AI methods.
Constraint problems can be trivially solved in parallel by exploring different branches of the search tree concurrently. Previous approaches have focused on implementing this functionality in the solver, more or less transparently to the user. We propose a new approach, which modifies the constraint model of the problem. An existing model is split into new models with added constraints that partition the search space. Optionally, additional constraints are imposed that rule out the search already done. The advantages of our approach are that it can be implemented easily, computations can be stopped and restarted, moved to different machines and indeed solved on machines which are not able to communicate with each other at all.
Ian Gent, Lars Kotthoff, Ian Miguel, Peter Nightingale, Machine learning for constraint solver design — a case study for the alldifferent constraint, 3rd Workshop on Techniques for implementing Constraint Programming Systems (TRICS), St Andrews, Scotland, September 2010
Constraint solvers are complex pieces of software which require many
design decisions to be made by the implementer based on limited
information. These decisions affect the performance of the finished
solver significantly. Once a design decision has been
made, it cannot easily be reversed, although a different decision may be
more appropriate for a particular problem.
We investigate using machine learning to make these decisions automatically depending on the problem to solve. We use the alldifferent constraint as a case study. Our system is capable of making non-trivial, multi-level decisions that improve over always making a default choice and can be implemented as part of a general-purpose constraint solver.
Lars Kotthoff, Ian Miguel, Peter Nightingale, Ensemble classification for constraint solver configuration, 16th International Conference on Principles and Practices of Constraint Programming, St Andrews, Scotland, September 2010
The automatic tuning of the parameters of algorithms and automatic selection of algorithms has received a lot of attention recently. One possible approach is the use of machine learning techniques to learn classifiers which, given the characteristics of a particular problem, make a decision as to which algorithm or what parameters to use. Little research has been done into which machine learning algorithms are suitable and the impact of picking the "right" over the "wrong" technique. This paper investigates the differences in performance of several techniques on different data sets. It furthermore provides evidence that by using a meta-technique which combines several machine learning algorithms, we can avoid the problem of having to pick the "best" one and still achieve good performance.
Ian Gent, Chris Jefferson, Lars Kotthoff, Ian Miguel, Neil Moore, Peter Nightingale, Karen Petrie, Learning When to Use Lazy Learning in Constraint Solving, 19th European Conference on Artificial Intelligence, Lisbon, Portugal, August 2010
Learning in the context of constraint solving is a technique by which previously unknown constraints are uncovered during search and used to speed up subsequent search. Recently, lazy learning, similar to a successful idea from satisfiability modulo theories solvers, has been shown to be an effective means of incorporating constraint learning into a solver. Although a powerful technique to reduce search in some circumstances, lazy learning introduces a substantial overhead, which can outweigh its benefits. Hence, it is desirable to know beforehand whether or not it is expected to be useful. We approach this problem using machine learning (ML). We show that, in the context of a large benchmark set, standard ML approaches can be used to learn a simple, cheap classifier which performs well in identifying instances on which lazy learning should or should not be used. Furthermore, we demonstrate significant performance improvements of a system using our classifier and the lazy learning and standard constraint solvers over a standard solver. Through rigorous cross-validation across the different problem classes in our benchmark set, we show the general applicability of our learned classifier.
Lars Kotthoff, Ian Gent and Ian Miguel, Using machine learning to make constraint solver implementation decisions, SICSA PhD conference, June 2010
Programs to solve so-called constraint problems are complex pieces of software which require many design decisions to be made more or less arbitrarily by the implementer. These decisions affect the performance of the finished solver significantly. Once a design decision has been made, it cannot easily be reversed, although a different decision may be more appropriate for a particular problem. We investigate using machine learning to make these decisions automatically depending on the problem to solve with the alldifferent constraint as an example. Our system is capable of making non-trivial, multi-level decisions that improve over always making a default choice.
Ian Gent, Chris Jefferson, Lars Kotthoff, Ian Miguel, Peter Nightingale, Specification of the Dominion Input Language Version 0.1, CIRCA preprint, November 2009
Lars Kotthoff, Dominion — A constraint solver generator, 15th International Conference on Principles and Practice of Constraint Programming Doctoral Consortium, Lisbon, Portugal, September 2009
This paper proposes a design for a system to generate constraint solvers that are specialised for specific problem models. It describes the design in detail and gives preliminary experimental results showing the feasibility and effectiveness of the approach.
Lars Kotthoff, Constraint solvers: An empirical evaluation of design decisions, CIRCA preprint, February 2009
This paper presents an evaluation of the design decisions made in
four state-of-the-art constraint solvers; Choco, ECLiPSe, Gecode,
and Minion. To assess the impact of design decisions, instances of
the five problem classes n-Queens, Golomb Ruler, Magic Square,
Social Golfers, and Balanced Incomplete Block Design are modelled
and solved with each solver. The results of the experiments are not
meant to give an indication of the performance of a solver, but
rather investigate what influence the choice of algorithms and data
The analysis of the impact of the design decisions focuses on the different ways of memory management, behaviour with increasing problem size, and specialised algorithms for specific types of variables. It also briefly considers other, less significant decisions.
Every year, a large amount of geographical data on Maya settlements is generated in the course of excavations, surveys and similar operations. Only some of those data are published in reports and most of it is not used more than once. We present a new effort that aims to make publicly available such data as the location of individual sites as well as detailed maps, excavation pictures and contextual information. All this is available through an easy-to-use web interface at www.mayamap.org that brings the power of traditional GIS systems to the layman user.
We present preliminary results of interdisciplinary efforts in determining the use and management of ancient Maya rural landscapes in northwestern Belize. Through our comprehensive data set combining archaeology, GIS, land survey and soil science, we provide a singular insight into the strategies used by the people that inhabited this site on the periphery of the Maya world. Georeferenced maps — the results of intensive pedestrian survey — have been combined with a NASA digital elevation model as well as hydrological and soil data into a regional geodatabase, which includes the results of ongoing test units, and larger excavations.
If I'm not in the office, it's possible that you can find me in the jungle of Belize excavating and/or mapping Maya ruins. Check out the interactive map. Failing that, I might be making fancy drinks while fighting spider monkeys.
In ancient history, I graduated from the University of Leipzig with a Diplom in Computer Science (roughly equivalent to a Master's degree). My thesis (pdf) was supervised by Gerhard Brewka and has the catchy title "Using Constraints to render Websites — Applications of Artificial Intelligence in E-Commerce environments". After that, I spent a year in Japan on the Vulcanus programme.