Computer Science @ University of St Andrews

University of St Andrews crest

PhD Projects

  • Academics in robes with Vint Cerf
  • Lecture on web technologies
  • First year lecture
  • MSc students in the lab
  • MSc students in the lab
  • Undergraduate tutorial
  • Lecture
  • MSc students enjoying a sunny day
  • MSc students at the summer BBQ
  • First year students in a lab-based exercise class
  • MSc students in the snow after November graduation
  • MSc students after graduation
  • Undergraduate students after summer graduation
  • Blue sky thinking
  • Lecture
  • The Jack Cole building
  • Staff discussion

This sample of projects gives you an idea of the type of research that we do. However, it is also possible to work in other areas and you should look at the range of research that is going on and decide on what is of most interest to you. Projects are divided according to the School's main research themes:

Some staff have projects in several themes.

If you are interested in any of the projects below, or have your own project idea, contact a potential supervisor (one of the people listed below). You will eventually need to write a proposal.

Systems Engineering

Dr Juliana K.F. Bowles

Parametric Model-to-Model Transformations

Modelling is a key element in reducing the complexity of software systems during their development and maintenance. In model-driven engineering (MDE), models and model transformations play essential roles in software development. A model is an abstract description of an artifact, and model transformations are used to generate new models out of existing ones. Model-to-model transformations, for instance, can be used to normalise, weave, optimise, simulate and refactor models, as well Here are two potential projects I'm interested in supervising. If you are interested in either, or want to talk about your own project related to anything concerning optimising compilers and runtime systems, send me an e-mail. I'm always happy to chat about potential projects before submission of a formal to translate between modelling languages.

This project is about specifying a choice of non-functional properties of a system (performance and /or dependability attributes such as reliability, security, safety, etc) as (parametric) model annotations of a software engineering model and implement a model-driven parametric transformation between the annotated model and the underlying model used in an analysis tool (e.g., UPPALL, PIPE2, etc). The idea is that different kinds of annotations (and hence their associated models), can be mapped onto different formal models for different analyses and approaches of formal verification.

Specification of dependability measures for distributed mobile systems

There is an increasing need for dedicated programming and specification formalisms that address aspects such as (code and agent/object) mobility, remote execution, dynamic distribution of resources over a network, security, reliability and performance. Ideally such high-level modelling languages allow us to select the best design for a system, and rank alternatives on the basis of particular metrics for non-functional properties of interest. For instance, in the case of performance common metrics include response time, throughput and utilisation. If specification formalisms allow the statement of required dependability and performance measures (the later for example are increasingly used for service-level agreements), efficient and reliable computation of these measures should then become common practice for distributed mobile application development. This projects proposes a high-level specification language and a formal framework for the verification and computation of dependability and performance measures.

Logical framework for mobile distributed systems

There are many different kinds of logics for specifying and reasoning about discrete-time and continuous-time Markov chains. Popular logics over CTMCs include CSL (continuous stochastic logic) that can be used as a specification formalism for performance and dependability properties. CSL is a state-based logic and can only describe state properties of systems. Extensions such as acCSL are both action and state based and can also capture properties over sequeces of actions. Neither of these (or similar existing) logics can, however, adequately express properties over distributed mobile systems reflecting the spatical structure of a model, computation mobility and location awareness.

This project aims to develop a distributed stochastic logic for specifying and reasoning about mobile distributed applications. The logic is interpreted over Continuous-time Markov Chains (CTMCs), a popular model used for performance and dependability evaluation. In addition, this project will implement efficient model checking algorithms that take advantage of the distributed nature of the CTMC, improving the efficiency of existing stochastic model checking tools.

Dr Ishbel Duncan

Cloud Security

A major concern in Cloud adoption is security and the US Government has just announced a Cloud Computing Security Group (Mar 4 2009) in acknowledgement of the expected problems such networking will entail. However, basic network security is flawed at best. Even with modern protocols, hackers and worms can attack a system and create havoc within a few hours. Within a Cloud, the prospects for incursion are many and the rewards are rich. Architectures and applications must be protected and security must be appropriate, emergent and adaptive. Should security be centralized or decentralized? Should one body manage security services? What security is necessary and sufficient? How do we deal with emergent issues?

There are many areas of research within the topic of Cloud Security from formal aspects to empirical research outlining novel techniques. Cloud Privacy and Trust are further related areas of potential research.

Cloud VV&T and Metrics

Verification, Validation and Testing are all necessary to basic system evaluation and adoption but when the system and data sources are distributed, these tasks are invariably done in an ad hoc or random manner. Normal test strategies for testing code, applications or architecture may not be applicable in a cloud; software developed for a non distributed environment may not work in the same way in a cloud and multiple thread, network and security protocols may inhibit normal working. The future of testing will be different under new environments; novel system testing strategies may be required to facilitate verification and new metrics will be required to describe levels of system competence and satisfaction.

There are many areas of research within the topic of Cloud VV&T from formal verification through to empirical research and metric validation of multi part or parallel analysis. Testing can be applied to systems, security, architecture models and other constructs within the Cloud environment. Failure analysis, taxonomies, error handling and recognition are all related areas of potential research.

Prof. Kevin Hammond

Abstractions of Cloud Computing

The MapReduce progamming model has proved to be highly successful for implementing a variety of real-world tasks on cloud computing systems. For example, Google has successfully demonstrated the use of MapReduce to automatically parallelise computations across the large-scale clusters that form the basic components of cloud computing systems [1] and IBM and Google are jointly promoting the Apache Hadoop system for University teaching/research. The key to this success is to develop powerful abstractions, or skeletons, that capture the essence of a pattern of computation, and which allow software algorithms to be plugged in to perform the required computations. By doing this, we can separate *what* the cloud computing system is required to do, from *how* it achieves that effect. This allows tremendous flexibility in placing computations within a cloud, including automatic mapping and reconfiguration to match dynamically changing cloud computing resources.

This PhD project will investigate new, advanced high-level programming models for cloud computing that will be even more powerful than Google's MapReduce. They will apply to a wider variety of problem, and deliver even greater parallel performance. This will be achieved by applying ideas from functional programming to cloud computing, creating very high levels of abstraction, including nested, hierarchical structures that can be mapped to cooperating groups of parallel computing clusters.

[1] MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean & Sanjay Ghemawat, Communications of the ACM, January 2008, 51:1, pp. 107-113

Dr Dharini Balasubramaniam

Capturing Uncertainty in Software Architectures

As the use of software becomes more widespread in a variety of domains, traditional assumptions about static requirements, operating conditions and resource availability no longer hold. Therefore it is important for modern software systems to be designed and implemented taking into account the possibility of uncertainty originating from stakeholders, business conditions and execution environments.

Software architecture is recognised as a vital basis for both the development and evolution of systems as well as the early analysis of reliability and consistency. If we are to anticipate and manage uncertainty in software then the architecture must also capture points and types of uncertainty at a suitable level of abstraction.

Most existing work on uncertainty with respect to software architecture focuses on determining or predicting reliability of systems based on structural and behavioural information from their architecture. Dynamic architecture description languages (ADLs) already allow some uncertainty to be expressed, such as variation in the cardinality and connectivity of components, while product line architectures identify explicit points of variation where one of a number of options may be chosen. However, achieving the goal of developing systems for uncertain environments requires that software architectures support a well-defined model of uncertainty, which can be used in later development and maintenance.

This project aims to express uncertainty as an integral and essential aspect of software architecture, specifying ideal and acceptable properties of systems. It will involve exploring possible sources and points of uncertainty at the architectural level, expressing these in a coherent model and implementing and evaluating the model. Grasp, a general purpose ADL designed and developed at St Andrews, can be used as a starting point for the work. The quality and potential uses of the uncertainty model may be evaluated in the context of a specific application domain.

A Framework for Incremental Software Development

Traditional software engineering practices, which require design decisions to be made early in the software life cycle, are increasingly over-restrictive as continuous evolution of requirements, models and code becomes the norm for software systems. An ideal incremental development framework will allow systems to be engineered from a partial or empty initial specification through refinement, with both the specification and the implementation being able to evolve consistently throughout the lifecycle.

The aim of this project is to build and evaluate a framework for incremental development based on an existing general purpose programming language such as Java. Specific issues of interest include support for placeholder elements in specification and code for expected but as yet unknown parts, the definition of execution semantics for these elements, and static and dynamic refinement mechanisms including the assignment of values to placeholders.

Dr Graham Kirby

Infrastructure to Support Complex and Evolving Socio-temporal Datasets

This project investigates classes of application systems that require new approaches to evolution. These include scientific applications, data mining, financial applications, multimedia applications, and heterogeneous databases. Such systems are characterised by the need to store large amounts of data whose structure must evolve as it is discovered incrementally. In database terms this requires that the data be mapped in complex ways to an evolving schema.

As a motivating example, imagine a virtual time capsule capturing university life during the 600th Anniversary celebration period. It would model the evolving temporal and anthropological relationships of as many aspects as possible of the university and the town of St Andrews. This project addresses the associated conceptual and technological demands.

Analysis and Linking of Large-Scale Genealogical Datasets [with Alan Dearle]

This project investigates methods for analysing and linking large sets of individual genealogical records. The core problem is to take a set of digitised records, and from this create a set of inter-linked pedigrees for the population. This specification may be usefully refined with the introduction of provenance and confidence, so that rather than producing a single set of pedigrees, a set of potential pedigrees is produced along with evidence for the relationships within them. From such a representation it is possible to project out various specific sets by defining appropriate criteria.

We thus need a computational process that will not only give results such as X is the mother of Y, but also that X may be the mother of Y based on information P,Q,R, and that Z may be the mother of Y based on S,T,V. This will provide a richer information source for future research, and requires new ways of approaching the problem. In the past, algorithms have been run on data sets to produce definitive pedigrees. We propose to attack this at a meta level with a reasoning engine that will not only produce results but also reasons for those results.

For flexibility and long-term usefulness, we do not envisage a one-off process in which a set of records is fed into an algorithm and a set of pedigrees is output. Instead, a continuous process will accept an indefinite stream of records to feed into and refine an established knowledge base. Similarly, the set of rules that govern the relationships between records may be refined and evolved over time. This evolutionary approach yields a highly flexible knowledge and reasoning engine; however, allowing rules and inferred relationships to change carries the danger of information being lost during the process. To prevent this, we propose a non-destructive append-only approach to the storage of data and meta-data.

It is hoped to be able to evaluate the techniques developed using the full set of birth/death/marriage records from Scotland 1850 to the present. A preliminary phase of the project is now under way.

Dr Alan Miller

Human Interaction: technology as a tool

Interacting with the past through stereoscopic headsets and natural gestures.

Stereoscopic headsets offer the ability to place the user within a historic scene at a greater level of immersion than is possible through screen based technologies. Natural gesture interaction enables users access to an environment without the mediation of intermediate devices. This study will investigate marrying the two to support intuitive exploration of, interaction with, communication within and modification of immersive environments. environment.

Technology and engagement

Quality of Service and Quality of Experience in Binocular Applications.

The emergence of consumer grade binocular headsets capable of being driven by consumer computation opens up the possibility of stereoscopic 3D applications moving out of the laboratory into the real world. To achieve this it is necessary to understand the relationship between the quality of service that these systems provide, frame rate, resolution and delay and users quality of experience. This study will involve lab based experiments and the application of stereoscopic headsets to real world scenarios. For example enhancing the appreciation of cultural heritage sites. It will result in a better understanding of the system requirements for these emergent applications in real world scenarios. It will utilise established collaborations with museums and heritage organisations.

Bridging Realities

Meeting the system and cognitive challenges of Cross Reality

Virtual reality differ from augmented reality in that rather than annotating the real world they enable exploration of an alternative one. There are many scenarios where simultaneous engagement with two realities is desirable. For example when visiting an historic site it is beneficial to appreciate the site both as it is now and as it was at one or more points in the past. To achieve this it is necessary to have mobile systems that are capable of presenting high quality virtual reality and to enable the user to easily navigate and switch their attention between the two realities. This study will address the associated system and cognitive challenges.

Heritage and Technology

Archaeology, 3D and the representation of data.

There is tremendous interest in archaeology and the insights that it can provide into the past. Yet there is a gap between the popular and often cursory treatment that is given in film and the often dry detail that can be found in survey reports. This project will investigate how 3D technologies can be used to create interfaces to archaeological data that are both engaging and provide intuitive context.

Causes of Ubiquity

New Technology, Old Habits investigating emergent narrative modes.

Starting from the premise that new technology is often used in traditional ways until sufficient expertise and familiarity is acquired. After which users develop their creative wings and are able to create new modes of interaction and engagement. This study will look at the early use of film and stereoscopic technologies. It will ask what are the consequences for narrative of liberating 3D from two dimensional frames and investigate emergent new modes of narrative.

Dr Susmit Sarkar

In general I am interested in applying formal techniques from programming languages and verification to low-level systems, both software and hardware. Particularly interesting in this space, and hard to get right, is concurrency. Here is a sampling of projects I will be interested in, but I am happy to discuss other ideas in this general area.

Program Logics for Low-Level Relaxed Memory Programs

There has been lots of work done on proving shared-memory concurrent programs correct, by the use of very sophisticated program logics such as Concurrent Separation Logic and RGsep. However, in the real world, shared-memory concurrent programs do not satisfy a key building block of such logics, an assumption that memory is sequentially consistent. Instead, when programming at the low-level in C or C++, or even in relatively higher-level languages such as Java, programmers have to deal with relaxed memory consistency. How and whether sophisticated program logics can scale up to this setting is an open research question.

Developing reasoning principles for relaxed memory consistency will involve applying high-level reasoning techniques to low-level code. If we have such a logic, we can build tools that can automatically analyse code and make them safe, efficient, and correct by suggesting appropriate fences or other mechanisms. With multiprocessors everywhere from personal mobile devices to servers, this is an important problem with a potential of high impact, both in theory and in practice.

Verification of programmer-oriented cache protocol designs

Shared-memory multiprocessors are ubiquitous, but programming them remains challenging. The programming model exposed by such many-core processors depends crucially on a "memory consistency model", a contract between the hardware and programmers. On the hardware side, one key mechanism to implement the memory consistency model is the cache-coherence protocol, which communicates memory operations between processors. However, the connection between the cache-coherence protocol and the memory consistency model is unclear, particularly for modern protocols and models. Collaborators at the University of Edinburgh are designing new cache coherence protocols with subtle invariants. This project is to formally verify the relationship between the hardware protocols and the programmer view of the memory consistency model they provide.

Efficient architectural level simulators for multiprocessors

Writing correct code for multiprocessors is hard, particularly in the presence of low level features of relaxed memory consistency models, hardware interrupt mechanisms, and memory accesses of different overlapping sizes. This project is to build testing
infrastructure and tools for multiprocessors. Previous work has clarified the programmer-visible model of ARM and IBM Power multiprocessors [PLDI'11], and a simulator (ARMMEM/PPCMEM) for small (~10 assembly instruction) fragments has already been found useful for systems programmers[LWN]. This project is to extend ARMMEM in several ways to be able to handle more realistic programs. There is ongoing work to integrate a larger instruction set (current ARMMEM only handles a small subset of the ISA) to handle a userspace program compiled from C.

The project will aim to use model-checking techniques to make the tool explore the state space in a smart and speedy fashion. There are several techniques, e.g. Dynamic Partial Order Reduction, and/or SMT solvers, that can profitably be applied to the problem. Depending on interest, there are two directions to make the tool even more useful. One is to add systems features such as hardware interrupts and page tables. This will involve close collaboration with the ISA architects at ARM/IBM to understand the various implementation constraints and interactions. Another is to work with the new ISO C/C++ concurrency model (C11), which is mostly untested, but programmers would like to exploit it in creating high performance portable parallel code. The tools can be specialised to C11 compiler output, thus testing compilers such as LLVM.

Computer Systems

Dr Colin Allison

Named Data Networking

NDN is a particular project that had made significant inroads into the design and implementation of networks that reflect the conceptual framework of content-centric and information-centric networks. PhD projects in any particular area of this framework are of interest. Themes include security, multimedia, mobile devices, real-time traffic.

Network Visualisation

The 3D Web (see below)

Dr Adam Barker

Big Data Lab

The era of Big Data is upon us – the Volume, Velocity and Variety of enterprise and scientific data are growing at an exponential rate and will continue to do so for the foreseeable future. Hidden inside a flood of heterogeneous raw data is the knowledge capable of having transformative impacts across virtually every area of society. Raw data must be converted into knowledge and understanding in order to stimulate scientific discoveries, economic growth and industrial innovation.

With the advent of cloud computing, resizable infrastructure for data analysis is now available to everyone via an on-demand pay-per-use model. Cloud computing provides one solution to Big Data infrastructure requirements, however writing scalable applications and algorithms that utilise cloud infrastructure is an extremely demanding and error prone task. Engineers still have no off-the shelf solutions that they can readily use to manage, move, analyse and visualise complex distributed data sets.

In order to unlock the potential of Big Data we need to overcome a significant number of research challenges including: managing diverse sources of unstructured data with no common schema, removing the complexity of writing auto-scaling algorithms, real time analytics, suitable visualisation techniques for Petabyte scale data sets etc.

Big Data Lab’s research mission is to identify, engineer and evaluate disruptive technologies that address current and future Big Data challenges. Our data-intensive research agenda is pursued by engineering technologies that extend the state-of-the-art and importantly address real problems in the Big Data space. We follow agile development practices and conduct regular stand-up meetings to report progress, plan ahead and foster collaboration. Much of our work follows a multi-disciplinary approach; we work on problems at the intersection of Science, Engineering, Finance, and Computer Science.

Below are a list of potential PhD projects within the broad areas of Distributed Systems, Data-intensive Systems and Big Data. I am looking for talented engineering focused PhD students. If you come to study a PhD under my supervision you will be integrated into a friendly agile research team working on the cutting edge of Big Data infrastructure research.

Interoperating Clouds

Many problems at the forefront of science, engineering and medicine require the integration of large-scale data and computing. The majority of these data sets are physically distributed from one another, owned and maintained by different institutions, scattered throughout the globe.

To meet the data storage and computing demands, data-centric applications are increasingly relying on cloud computing services. Clouds are however centrally hosted and managed; in order to run your application in the cloud sets of highly distributed data and computation must be moved to one centralised location for processing. In a world where data sets and computation are owned and maintained by different institutions this can be an unrealistic assumption.

This research will explore the challenges of building applications consisting of multiple standalone, distributed clouds. Challenges includes: cloud interoperability, reducing wide-area network transfer between interoperating clouds through proxies and in-network data caching, and cloud resource selection given metrics such as cost and performance.

Data Migration in the Cloud

The cost and time to move data around is currently one of the major bottlenecks in the cloud. Users with large volumes of data therefore may wish to specify where that data should be made available, when it may be moved around, etc. Furthermore, regulations, such as the data protection regulations, may place constraints on the movement of data and the national jurisdictions where it may be maintained. The aim of this project is to investigate the practical issues which affect data migration in the cloud and to propose mechanisms to specify policies on data migration and to use these as a basis for a data management system.

Wide-Area MapReduce

The MapReduce programming model has proved to be highly successful for implementing a variety of real-world tasks on cloud computing systems. Google is the most prevalent example but Hadoop has also been widely adopted in the open source community.

MapReduce operates well in a closely coupled cluster environment, typically run internally within an organization. The aim of this project is to investigate how the MapReduce model scales over very wide area networks where the nodes necessary to complete the computation are spread between physically distributed clouds/services.

Architectural Choreography Models

The scalability of service choreography as an architectural concept has not yet been effectively assessed. There are relatively few pure decentralised choreography languages and even fewer concrete implementations, the most prevalent being the Web Services Choreography Description Language (WS-CDL).

This PhD project intends to address the problems associated with designing and deploying large-scale choreographies where hundreds of independent, autonomous services need to communicate. Scenarios of this scale are realistic in the High Performance Computing domain and are typically not addressed by specifications and use-cases. Research into either adapting an existing, or creating a new choreography specification based on a solid formal foundation (such as pi-calculus) is essential when scaling up the interactions. Tool support, automation and visualisation technology are avenues of engineering research to ease the design complexity.

Prof. Saleem Bhatti

I generally undertake and supervise very practical research, involving building, breaking, measuring and analysing real systems as part of the experimental work related to the PhD. All the experimental work I supervise is based on development on Unix-like systems (usually FreeBSD and/or Linux), and I would expect applicants to have good technical skills in using and maintaining such systems. This would include associated skills, such as systems programming (e.g in C, bash, python etc), use of dev tools (such as make, git / mercurial etc) and documentation tools (latex, gnuplot etc).

Identifier-Locator Network Protocol

There are many PhD projects possible in this topic area.
All the PhD projects in this area are very practical and will require students with excellent skills in either Linux or FreeBSD. There is a growing team of students working in this area at St Andrews.

The Identifier-Locator Network Protocol represents a new approach to Internet architecture, where IP addresses are deprecated, and new namespaces -- Locators and Identifiers -- are used, with dynamic bindings. This has many potential benefits for the Internet in terms of functionality, scalability and security. (RFCs 6740-6748 contain more information and are linked at, along with research papers on ILNP.)

Possible topic areas include the list below (in no particular order), but there are others:

  • The scalability of large-scale multi-homing. Impact on the default-free zone (DFZ) of the Internet routing infrastructure and state displacement with ILNP from routing to DNS.
  • Mobility for individual hosts and whole networks (sites). Scalability and performance trade-offs between a purely end-to-end model and a site-controlled model.
  • Site-security enhancement. Site resilience to traffic-based and routing-based denial of service (DoS) attacks. New firewall and site-border protection capability.
  • Enabling multi-path transport for site-controlled traffic engineering. Use of a high-degree of multi-homing at a site or host to enable path resilience and load-balancing.
  • Localised addressing and site address management. Enabling robustness to dynamic re-numbering of sites. Managing IP-layer connectivity for a site via site-border routers.
  • New APIs for ILNP-aware transport protocols and applications. Investigating new application capabilities when identity and location are decoupled. The use of dynamic between between application sessions and transport flows.
  • Information-centric / content-centric / data-centric networking. Investigating the applicability and use of an identifier-locator split to an information centric approach, if applied at high layers.
  • New approaches to wide-area virtual machine (VM) image mobility and migration. Adopting the ILNP mobility model in VM image migration within and across network sites, data-centres and cloud services.

The impact of naming-based networking on DNS

The Identifier-Locator Network Protocol (ILNP) makes extensive use of DNS. While the initial context of the investigation is ILNP, it is very likely that results could be applicable to other application areas. PhD topics in this area will explore the limits of, and make proposals for improvement to, the scalability, performance, security, resilience and dynamicity of DNS when used in implementing new functionality, such as:

  • Support for both site and node multi-homing.
  • Native, scalable support for mobile nodes and mobile networks.
  • NAT-tolerant networking, by decoupling location from identity.
  • IP Security and systems security.
  • Site-controlled traffic engineering (TE) capability through naming.
  • Native support for multi-path transport protocols.
  • New approaches to supporting VM systems across network sites, data-centres and cloud services.

A new approach to Ubiquitous / Pervasive Systems

In Weiser's original vision of ubiquitous systems, the computer is invisible but also everywhere: computing is integrated into the workflows, processes and devices of everyday life. Could cloud systems coupled with multi-function hand-held devices, offers the potential to fulfil at least part of Weiser's vision? However, Weiser also had the foresight to realise that the Internet Protocol (IP), was not capable of delivering the communication capability that is required for pervasive systems. Today, almost 20 years later, this constraint remains true, both for IPv4 and IPv6. This PhD project will investigate the provision of cloud services based on the use of the Identifier Locator Network Protocol (ILNP), a next-generation Internet protocol, providing rich connectivity by design, rather than as engineering retro-fits that are visible in today's IPv4 and IPv6 networks.

Energy-Aware Applications and Systems

The use of energy will be a major factor for all aspects of life in the future. There is increasing use of computer systems in everyday life also, for example in cloud systems to provide always on services. Currently, it is estimated that, worldwide, ICT systems are second only to the aviation industry (and only just) in carbon output. The evaluation of energy usage in such systems is therefore crucial in proposing scalable solutions to the world's computing needs, while being efficient in use of energy. Taking a measurement-based approach, this PhD project will consider energy-usage in end systems and services, rather than in data-centres, to examine the potential for demand-side energy reduction. The aim is to enable users and applications to become energy-aware and energy-efficient in their use of ICT systems. This may involve examining:

  • measurement, monitoring and metrics for systems energy usage.

  • energy-usage feedback to, and energy-usage control features for, ICT users.

  • incentives and mechanisms for enabling ICT users to become energy-aware and energy-efficient.

  • enabling applications to make appropriate quality of service (QoS) / quality of experience (QoE) / performance trade-offs to become more energy efficient.

Professor Simon Dobson

Designing really low-power, pragmatic sensing systems

Engineering very small sensor devices is a hardware challenge for power management and communications - but also a significant programming challenge, in that applications must made some serious trade-offs in terms of the frequency of sensing, and amount of communication vs computation, the effects that these choices have on robustness, system lifetime, and the like. This project will explore the theory and practice of how to make these trade-offs in a principled way, demonstrated both in simulation and for real systems in the field.

Integrating complex networks with sensing

Complex networks are a new paradigm for modelling complex phenomena like traffic networks, disease propagation, and information flow. We can also instrument real-world phenomena to allow us to observe them better. How do these two approaches complement each other? Can we use sensors to validate simulations, or use simulations to control and guide a process going on in the real world? This project will build a software framework to explore these issues using a suitably-chosen application area, and make the two approaches work together.

Hiding the cloud

If you're a scientist, analyst, or in any way not a specialised computer scientist, programming with big data is a scary prospect. You have to convert from the familiar domain of Excel spreadsheets and GUIs to a world of Python programming and VM provisioning. It doesn't have to be like that, and this project will investigate what are the programming paradigms that best fit with cloud computing, demonstrating the ideas by building an environment that hides the cloud's complexity from scientific applications.

Prof. Ian Gent

An Experimental Laboratory in the Cloud

Computer Science is highly suited to experimental science. Unfortunately, many Computer Scientists are very bad at conducting high quality experiments. The goal of this project is to make experiments better by using the Cloud in a number of ways. The core idea is that experiments are formed as artifacts, for example as a virtual machine than can be put into the cloud. For example, a researcher might want to experiment on the speed of their algorithms in different operating systems. They would make a number of different virtual machines containing each version, which would be sent to the cloud, the experiment run, and the results collected. As well as the results of running the experiment being stored in the cloud, the experiment itself is also there, making the cloud into an experimental repository as well as the laboratory. This enables reproducibility of experiments, a key concept that has too often been ignored. While using the cloud, the project can feed back into research on clouds by investigating how experiments involving the cloud itself can be formulated for use in our new Experimental Laboratory.

Dr Tristan Henderson

I welcome novel and well-written research proposals in any of my current areas of interest, namely, wireless networks, privacy, social networks, network measurement, and data archiving. You can find out more about my current interests by looking at my papers. Even more information on my research, including tips for prospective PhD students, can also be found on my web pages. Potential applicants are advised to read these pages and then contact me.

In particular I am currently interested in supervising PhD projects in two areas:

Human-Data Interaction

Human-Data Interaction (HDI) is our term for the interaction between individuals and the data that are generated by and about them, and the users of such data, such as service providers.

Some example HDI questions that could become interesting PhD research topics include:

  • How can people reason about the complex systems that process and use their personal data? What are the visualisation, usability, feedback, or interaction challenges?
  • Can we build APIs that allow us to selectively share our personal data and also maintain privacy?
  • How can we improve on the current state of the Quantified Self? Can we improve data acquisition (e.g., context-aware/predictive/energy-efficient sensing, or collection of data from other sources) or data sharing (e.g., differentially private publication and querying)?
  • How can we best store personal data? Where should data be stored? Should they be stored? Can we build systems that enable HDI applications without requiring the release of data to all and sundry?
  • Advertising is a particular HDI application of personal interest. How can we improve on existing advertising systems, which involve much pervasive monitoring of potential customers to provide targeted advertisements?

If you are interested in any of the above topics I suggest that you read our HDI introductory paper and the references therein, before formulating some ideas of your own and then contacting me.

Reproducible mobile and wireless research

Reproducibility is a fundamental part of the scientific method, and various members of the School are interested in reproducible and replicable research. In particular you can see Ian Gent's recomputation project, and also the paper that we wrote as a result of a recent summer school on reproducible research.

I am particularly interested in improving the reproducibility of wireless network research. We run CRAWDAD, the world's largest wireless network data archive, with over 100 datasets and tools used by over 6,500 researchers worldwide. But sharing data is only part of the reproducibility story.

Some interesting questions about reproducibility in wireless network research include:

  • How can we build systems that automatically capture and document a wireless network experiment?
  • How can we verify that a wireless network experiment is reproducible? It is typically impossible to reproduce all of the features of a field experiment, but how can we verify or validate that those features that have been shared are correct?
  • How can we package a wireless network experiment and share it in a way that enables reproducibility? Are the VM-based solutions that have been proposed for theoretical or computational science appropriate for systems work?
  • How can we share wireless network data in a privacy-preserving manner? (You may notice that this is a common theme also explored in HDI)

Again if you are interested in this area then please formulate some ideas and then contact me.

Dr Graham Kirby

Ad-Hoc Clouds [with Alan Dearle]

The idea of an ad-hoc cloud is to deploy cloud services over an organization's existing infrastructure, rather than using dedicated machines within data centres. Key to this approach is the ability to manage the use of computational and storage resources on individual machines, by the cloud infrastructure, to the extent that the cloud is sufficiently non-intrusive that individuals will permit its operation on their machines. This project will investigate how this may be achieved.

Resource-Aware Distributed Databases [with Alan Dearle]

The aim of this project is to further develop Angus Macdonald's PhD work on workstation-based distributed databases. This investigated the issues in providing ACID semantics and automatic backup on a constantly shifting pool of non-dedicated machines within an organisation. Areas with potential for further research include autonomic optimisation of the placement of data and computation, the relaxation of strict consistency rules, and application to other database models.

Towards Pervasive Personal Data

This project will investigate techniques for enabling pervasive file data across all the storage resources available to an individual or an organization. These may encompass personal computers and mobile devices, machines available within a work environment, and commercial online services. The envisioned infrastructure observes file changes occurring in any particular storage location, and automatically propagates those changes to all other appropriate locations, similarly to existing services such as Windows Live Mesh, DropBox and SugarSync, but:

  • does not rely on external services
  • observes and exploits variations in machine and network capabilities
  • exploits disparate storage facilities, including those provided by machines not exclusively under the user’s control
  • exploits disparate data transfer mechanisms, including physical movement of passive devices
  • supports high-level user policies governing data placement and synchronization strategy

The main challenges to implementing such infrastructure are in detecting and exploiting patterns in resource availability; planning, executing and monitoring good routes and schedules for data propagation; and supporting users in visualizing current system state and its relation to the goals of their high-level policies.

Testing Distributed Software

The PlanetLab development platform allows the developer of distributed software to test it under real wide-area network conditions. However, there are significant administrative overheads involved in participation. This project would investigate the issues involved in developing a testbed for distributed applications, running on a number of local clusters, and allowing the experimenter to simulate wide-area latencies, bandwidths and machine failures. Such a testbed would need to automate the execution of repeatable experiments, and the recording and analysis of measurements. Aspects to be investigated include the system properties that can be measured, the accuracy to which real-world behaviour can be predicted, and the ease of use of the testbed.

Dr Ian Miguel

Constraint-Based Cloud Management [with Alan Dearle & Graham Kirby]

A cloud may be viewed as comprising the union of a dynamically changing set of cloudlets, each of which provides some particular functionality. Each cloudlet runs on a potentially dynamically changing set of physical machines. A given machine may host parts of multiple cloudlets. The mappings between cloudlets and physical resources must be carefully managed in order to yield desirable high-level properties such as performance, dependability and efficient resource usage. To be practical, such management must be automatic - but producing timely high-quality management decisions for a cloud of significant scale is a difficult task. The aim of this project is to apply constraint programming techniques to solve this problem efficiently.

Dr Juan Ye

Activity recognition

Sensor data comprehension and interpretation relies on the ability to infer human activities that give rise to a particular data stream. Activity recognition is a challenging research topic, far beyond traditional classification problems, which is mainly due to both the variety, heterogeneous, and noise of sensor data and the complexity and ambiguity of human behaviours. Machine learning and data mining techniques have been attempted to solve small-scale activity recognition problems in the past few years, but there are still lots of open areas that have not been (at least sufficiently) explored. Most machine learning and data mining techniques rely on training data; however, training data in pervasive environments are often difficult to collect or are poorly annotated. The main research question is how a new activity recognition technique can transfer the knowledge learned on a limited number of users to a much larger pool of users and adapt the learning in a long term. How can the technique leverage the use of open source knowledge like Wikipedia or WordNet, open social media data like twitter or Four-square, or more generally crowd-sourcing?

Uncertainty management and programming

Uncertainty is a pervasive factor in many sensor-enabled infrastructures, compromising the integrity of the resultant dataset. Incorrect calibration, deteriorating performance over time, or the omnipresent noise within the physical environment, may all reduce the quality and provenance of the data. Imperfect sensor data may result in incorrect inference of real-world phenomena, including human activity recognition for example, thereby reducing the perceived quality of service and experience. Thus a model for handling uncertainty is essential for robust and human-centric ambient intelligence services. The research question is how to address this critical issue of uncertainty in ambient intelligence, with an emphasis on strategies for detecting, resolving, and programming against uncertainty.

Behaviour-aware computing

Behaviour-aware computing is a relative new research topic, immediately following context- and activity-aware computing. It is not only about inferring or predicting human behaviours, but more about adapting to user behaviours and changing their behaviours. This is an interdisciplinary research, often involving how to apply theories and models in psychology to designing a feedback system. Earlier attempts reside in shaping pro-health and -environmental behaviours including diet, fitness, green transportation, energy consumption etc. The main research challenge is sustainability -- how to sustain user's "good" behaviour in a long-term period.

Dr John Thomson

Here are two potential projects I'm interested in supervising. If you are interested in either, or want to talk about your own project related to anything concerning optimising compilers and runtime systems, send me an e-mail. I'm always happy to chat about potential projects before submission of a formal application.

Powering up Performance for Emerging Programming Languages – A Machine Learning Approach

Computers are increasingly programmed in a plethora of new languages which have emerged recently in a relatively short time. These languages often offer something extra over traditional C-like programming, whether that is ease-of-use, rapid development, web integration, functional programming paradigms or provable correctness. Still more languages are being developed right now!

Unfortunately, the performance of programs written in these new languages often lags far behind what could be achieved by a C implementation, even though they are doing the same task! A programmer might be forced to write in C for performance reasons, even though a newer language would be much more suitable for the task at hand. Closing this gap is vital for emerging programming languages to be used in performance critical scenarios.

One of the primary explanations for this is the lack of engineering effort in optimising compilers and particularly parallel runtime systems for emerging languages, in comparison to the huge investment over time in traditional systems. New language developers simply cannot afford to employ armies of engineers to optimise their systems. Machine learning has previously been shown to be effective in increasing performance of computer systems, by modelling and reasoning about optimisation decisions rather than using expensive analysis or hand-tuning . This project is about employing sophisticated machine learning techniques in compilers and runtime systems to produce a new methodology for creating and optimising software tools for new languages, which produces excellent performance results (both in terms of speed and energy efficiency) but takes a fraction of the engineering time which would otherwise be required.

Optimisation for Multi-core Heterogeneous Systems and Resource Sharing

Today's modern computing systems contain highly heterogeneous processing elements (GPUs, CPUs, SIMD engines, accelerators – maybe FPGAs!). In order to fully exploit the huge potential performance of these devices, specialised runtime systems must be developed which can schedule between different heterogeneous elements. Additional complexity is added when we consider resource sharing on a heterogeneous platform.

From embedded tablet/phone processors to High Performance Computing, sharing of resources between multiple users/processes is becoming more and more critical. A HPC system may have many users, all competing for processing resources of varying nature. The challenge for the runtime system is to gain the best possible throughput for the system, while maintaining fairness. Energy usage is also critical. A good scheduler may want to turn off certain processing elements completely in order to save energy, sending processes to marginally less efficient processors but saving energy overall. The interaction of all these elements creates a challenging and exciting optimisation problem; this project is concerned with solving it.

Artificial Intelligence

Prof. Steve Linton

Computational Group Theory with Map-Reduce

The MapReduce skeleton, introduced by Google to provide a uniform framework for their massively parallel computations is proving remarkably flexible, and is being proposed as a uniform framework for high peformance computing in the cloud. This project would investigate a range of problems in the established area of computational abstract algebra in order to see whether, or how, they can be effectively parallelised using this framework.

Parallel algorithms for finitely presented groups

One of the most compact ways of describing many interesting finite and infinite groups is by a presentation — giving generators and relations between them that define the group. Many algorithms for studying groups given in this way are found in the literature, dating back to the 1930s, but almost all seem to be essentially, and often practically, sequential. This project would explore practically usable parallel algorithms for both multi-core and distributed memory systems. While some undergraduate mathematics would be useful, what is needed for this is not very deep, and programming, and especially parallel programming skills are probably more useful.

An architecture for large mathematical databases

Many large collections, some very large, of mathematical objects have been computed over the years. At the moment, each is stored in data files of its own format and accessed via bespoke interfaces. Simply using a “standard” database would not work well — the data formats, the inclusion of “implicit” data that is not stored directly and the very special forms of search needed would not be well served. This project would design build and evaluate a uniform framework for such databases. Possible extensions would include mechanisms for accepting and verifying contributions to such databases, tracking provenance, offering stable doi’s for past versions of the database, etc.

Dr Mark-Jan Nederhof

Extended Domain of Locality in Natural Language Parsing

Many systems for processing language assume that syntactic structure is projective, i.e. that all constituents in a sentence are composed of consecutive words. For languages such as English, this is a defensible viewpoint, but for many other languages, no meaningful projective structures can be build even for typical sentences. Formalisms such as linear context-free rewriting systems can model non-projective syntactic structure. The purpose of the project is to investigate the capabilities of such formalisms, from theoretical and practical perspectives.

Dr Mike Weir

Investigation of Learnt Coordination of Agile Robot Movements

Many robots are deliberately restricted to moving slowly enough so that the executed speed is not a planning problem. While this makes for one less major problem to solve, it restricts the range of tasks a robot is able to carry out relative to its physical capacity. As robots become more integrated into society, demands for robots to be more agile will inevitably increase to exploit their full physical capacity for a wider range of tasks. That is, agility may be required when the task is such that there are threats that can cause failure at the task unless the robot moves fast enough with the right move at the right time. Examples include the robot avoiding collision with moving objects, balancing unstable loads or itself, and in general coordinating movement of aspects or parts of itself. The project builds on recent advances made in the Intelligent Computation Group for mobile control at speed.

For agility to be successful, coordination of motion in the face of irregular perturbation is key. Such coordination may be provided through subgoals that synchronise the coordination in space-time within a generic homeostatic balancing system. The homeostatic objective is to keep the current state within acceptable limits for each component around a mobile local minimum in the overall travel surface, analogous to keeping the bubble in a spirit level centralised. A neural system is to provide a homeostatic mechanism that learns how to react to increased gradients in variables that threaten the balance of the coordinated activity at various times, The investigation is to research the extent to which successful coordination may be learnt and robustly and optimally maintained. The main deliverable is a generic module system applicable to the widest possible range of robots and tasks.

Dr Christopher Jefferson

Puzzle Game Generation with Constraints

Constraint Programming is an A.I. technique which is used to solve a range of combinatorial problems, from timetable generation to Sudoku solving. Constraint Programming has been used to automatically generate levels for an iPhone game ( The goal of this project is to use constraint programming to develop the tools required to describe any puzzle game and then automatically generate levels for that game, measuring each level's difficult and interestingness.

Computation Group Theory with Constraint Programming

Computation Group Theory (CGT) is the algorithmic study of groups ( ) with computers. Many problems in CGT are best solved using backtracking search and constraint programming. This project will extend constraint programming to handle the data structures and problems of interest to group theorists (groups and permutations in particular). This project does not require previous knowledge of group theory, although it will require a love of mathematics, proofs and algorithms.

Human Computer Interaction

Dr Colin Allison

Open Virtual Worlds for Cultural Heritage, Education and the 3D Web [with Alan Miller]

The Open Virtual Worlds group carries out research, development and deployment in all aspects of the emerging 3D immersive web. These include Future Internet, Scalability and Performance in Distributed Systems, protocol development, 3D Web, QoS & QoE, technology enhanced education, novel learning environments, mobile learning, Security and Trust in Hypergrids.

Please get in touch if you want to discuss a proposal. A general overview of Open Virtual Worlds research is given below.

Continuing advances in ICT capabilities in line with Moore’s Law have removed most of the low-level performance barriers to the routine use of immersive 3D multi-user environments, increasingly being referred to as the immersive 3D Web. In addition, the recent emergence of reasonably robust open source virtual world platforms - servers and clients - has also increased the feasibility of using these technologies for open learning resources. However, many technical barriers remain to bringing open virtual worlds into the mainstream of educational technology. If open virtual worlds on the 3D Web are compared with the familiar 2D Web there are many significant differences in accessibility and usability, for both creators and consumers. For example, it takes seconds at most for a browser to download a complex web page but it takes minutes for a 3D viewer to download a typical virtual world. Similarly, the quality of experience in-world can degrade if the network quality-of-service is not maintained at appropriate levels. Accessibility is another major issue – the 2D web only requires one well known port to be open at a firewall, whereas virtual worlds require hundreds of obscure ports to be open – a serious barrier for educational institutions which typically enforce comprehensive blocking on all but a very small number of essential ports such those used for the Web, e-mail and secure sockets. Scalability in terms of the number of concurrent users in an immersive learning environment can also be a major concern, especially if these were to be incorporated into a MOOC (Massive Online Open Courseware) which aims to scale to tens of thousands of learners.
These studies will show how the quality of experience for users relates to the network quality of service, the number of concurrent users and the computational capabilities of the servers. A key challenge is not simply scalability but the variance in load. For example, an induction session might require a hundred students to congregate in the same 3D open learning environment for an hour, but the normal load is less than ten. Could this challenge be met through use of Virtual Machines and Cloud Computing? In order to research the feasibility of this approach an appropriate benchmark must be developed and then incorporated into a testbed which can be used to predict the performance of any particular virtual world. Can virtual machines and the Cloud could act as an adaptive resource for open learning on the 3D Web?

Professor Aaron Quigley

Automatic User Interface distribution in mobile multi-display environments

[with Miguel Nacenta]

An increasing number of interactive displays of different sizes, portability and form factors are now in common use. We can also see people using multiple displays connected to a single machine or displays as part of different devices. However, applications are primarily designed to operate on a single computer with a single logical display. Instead applications might be realised that effectively take advantage of the wide range of input, affordances, and output capability of these multi-display, multi-device and multi-user environments. However, apart from building one-off software to support such multi-display, multi-device and even multi-user applications what are the tools, techniques and technologies needed to support software engineers, application designers, developers and interactions designers in realising applications which can logically spread from one device to many as and when they become available and can be appropriated.

We are looking for a student interested in the infrastructure and design challenges of applications in multi-display environments. This project will be undertaken in SACHI: The St Andrews Computer Human Interaction research group (a HCI Group). This group includes leading academics in the field of HCI. Our members have had central roles in conferences such as CHI, MobileHCI, UIST, ITS, LoCA and Pervasive. We publish in top-tier venues and work with leading researchers internationally. As a group we co-supervise our research students, collaborate on various funded projects and activities with industry and domain experts. We have an international HCI seminar series and our members have access to a dedicated HCI laboratory with specialist equipment including a Microsoft Surface 2 and 1, DiamondTouch, eye tracker, Optitrack and a range of mobile, tablet and novel interface technologies. Our areas of interest and PhD topics on offer include, Ubiquitous Computing, Information Visualisation, Tactile Interfaces, Multi-display Environments and Natural Language and Gestures. We seek students curious to advance HCI research with more sophisticated interfaces and interaction, which can harness evermore powerful, embedded and interconnected computation.

Dr. Miguel Nacenta

The projects indicated below provide only a general idea of what areas I am interested in (they are fairly abstract, too). If you have your own ideas, or want to explore further, I encourage you to contact me in advance of your application. I will be able to help you. The easiest way to contact me is by e-mail.

Visualisation of Text through Text

Following my work on FatFonts I am interested in working with students on developing typographical techniques to visualise data furture, including (but not limited to), tag-clouds and information typography. Students interested in this topic should have interest and/or previous experience in the following areas: computer graphics, typography, open-type, information visualisation and perception.

Perceptual Effects of User Interface Paradigms

Although user interfaces are evolving rapidly and there are a growing set of paradigms (e.g., tangible user interfaces, multitouch, freestyle gestures in the air) it is still very difficult to know which type of interface is better in which conditions and for which tasks. In this this project, we will investigate perceptual and cognitive aspects of existing interfaces to achieve a more fundamental understanding of what makes the paradigms different, and use this knowledge to a) create new paradigms of interaction, b) provide advice to practitioners in interaction design, c) design new alternative types of interactive devices.

We are looking for a student with two possible backgrounds: a) a solid background in psychology/perception that has also some interest in computer science, HCI and interactive systems, and who has (or is ready to acquire) good programming skills; or b) a computer scientist that has some background or interest in perception, psychology and psychophysics. This project will be undertaken within SACHI: The St Andrews Computer Human Interaction research group (a HCI Group). This group includes leading academics in the field of HCI. Our members have had central roles in conferences such as CHI, MobileHCI, UIST, ITS, LoCA and Pervasive. We publish in top-tier venues and work with leading researchers internationally. As a group we co-supervise our research students, collaborate on various funded projects and activities with industry and domain experts. We have an international HCI seminar series and our members have access to a dedicated HCI laboratory with specialist equipment including a Microsoft Surface 2 and 1, DiamondTouch, eye tracker, Optitrack and a range of mobile, tablet and novel interface technologies. Our areas of interest and PhD topics on offer include, Ubiquitous Computing, Information Visualisation, Tactile Interfaces, Multi-display Environments and Natural Language and Gestures.

User Interface Design for Optimal Workflow and Productivity

It is often difficult to stay engaged through a full day of work. Procrastination and distracted behaviour is common and has enormous costs to society. Although behavioural researchers are starting to understand the basis of this behaviour, the application of this knowledge is still difficult. This project is centred on the creation of interfaces that, using current knowledge from the behavioural sciences, can help us make us more productive and happier about our work.

We are looking for a student with two possible backgrounds: a) a solid background in psychology that has also some interest in computer science, HCI and interactive systems, and who has (or is ready to acquire) good programming skills; or b) a computer scientist that has some background or interest in human-behaviour and the psychology of self-management. This project will be undertaken within SACHI: The St Andrews Computer Human Interaction research group (a HCI Group). This group includes leading academics in the field of HCI. Our members have had central roles in conferences such as CHI, MobileHCI, UIST, ITS, LoCA and Pervasive. We publish in top-tier venues and work with leading researchers internationally. As a group we co-supervise our research students, collaborate on various funded projects and activities with industry and domain experts. We have an international HCI seminar series and our members have access to a dedicated HCI laboratory with specialist equipment including a Microsoft Surface 2 and 1, DiamondTouch, eye tracker, Optitrack and a range of mobile, tablet and novel interface technologies. Our areas of interest and PhD topics on offer include, Ubiquitous Computing, Information Visualisation, Tactile Interfaces, Multi-display Environments and Natural Language and Gestures.

Advanced interfaces for Information Visualization and Graphical Manipulation

This project intends to develop radically new interfaces for tasks in the domain of information visualization and manipulation of graphical information (including graphical editors). See, for example transmogrifiers. The main goal is to leverage relatively new interface paradigms (including multi-touch, pen+touch) to create interfaces that are flexible and allow practitioners to experiment very quickly with different representations of data.

We are looking for a student with strong graphics programming skills and an interest in information visualization and/or graphic design. This project will be undertaken within SACHI: The St Andrews Computer Human Interaction research group (a HCI Group). This group includes leading academics in the field of HCI. Our members have had central roles in conferences such as CHI, MobileHCI, UIST, ITS, LoCA and Pervasive. We publish in top-tier venues and work with leading researchers internationally. As a group we co-supervise our research students, collaborate on various funded projects and activities with industry and domain experts. We have an international HCI seminar series and our members have access to a dedicated HCI laboratory with specialist equipment including a Microsoft Surface 2 and 1, DiamondTouch, eye tracker, Optitrack and a range of mobile, tablet and novel interface technologies. Our areas of interest and PhD topics on offer include, Ubiquitous Computing, Information Visualisation, Tactile Interfaces, Multi-display Environments and Natural Language and Gestures.

Programming Languages

Prof. Kevin Hammond

Parallel Functional Programming

Purely functional programming languages, such as Haskell, offer many advantages for parallelism. Unlike more conventional approaches, based around the use of concurrency primitives, for example, parallel functional programming offers a high-level perspective where threads can be automatically introduced and managed in a safe way. One of the most fundamental advantages of the functional paradigm is purity. In a purely functional language, as exemplified by Haskell, all side effects are explicit. This means that parallelism is intrinsic in Haskell programs, creating many more opportunities for parallel execution than in languages with implicit side effects. It is also impossible for parallel computations to conflict with each other, causing deadlock for example. Purity provides deterministic parallelism: the result will always be the same, however many cores a program is executed on (so parallel programs can be debugged using sequential execution, for example). This gives an enormous saving in effort. Programmers can concentrate on solving the problem, rather than porting a sequential algorithm to an ill-defined parallel paradigm; there are simply no locks, deadlocks or race conditions to worry about.

This project will explore new and advanced parallelisation ideas based around parallel functional programming constructs, with a view to radically improving the programmability of parallel systems. There are many possible research directions (described in a forthcoming textbook), but the evaluation strategy approach, developed jointly at St Andrews and exploited by Microsoft Research is a particularly promising one.

ParaForming: Forming Parallel Programs using Advanced Software Refactoring

Historically, parallel programs were usually written in “boutique” style, with immense effort invested to produce efficient, or even working, parallel programs. Few applications programmers have the necessary skills, or time, to deal with such complexities, however. What is needed is a “low pain, high gain” solution. Refactoring tools, where the programmer is assisted in writing their parallel programs by a system that can automatically identify possible parallelisations is a new and exciting approach that can dramatically simplify the process of writing parallel programs. A refactoring-based methodology gives many advantages over unaided parallel programming: it helps identify general patterns of parallelism; it guides the programmers through the process of refining a parallel program, whether new or existing; it enforces separation of concerns between application programmers (those coding the business logic of the application) and system programmers (those implementing the fundamental parallel constructs); and it reduces time-to-deployment. All of these advantages help programmers understand how to write parallel programs.

This project will develop new refactoring-based tools and techniques for parallelism, extending and enhancing work in the 3.5M euro ParaPhrase project, that is coordinated by the University of St Andrews. Key issues include dealing with advanced, heterogeneous parallel architectures, involving combinations of GPUs and CPUs; providing good hygienic abstractions that cleanly separate components written in a variety of programming languages; identifying new high-level patterns of parallelism; developing new rule-based mechanisms for rewriting (refactoring) source-level programs based on those patterns etc.