Research Statement

The aim of my research is to make programmers' life easier by giving them better tools.

Compared to other engineering disciplines, the state of the art in software engineering is still somewhat immature. My research focuses on improving software engineering practices by producing better programming languages and better software engineering tools. My research approach is to analyze the needs of programmers by studying patterns in the software design or in the development process, to find appropriate applicable technologies or theoretical results, and to integrate them into better languages and/or tools. This requires solving problems arising from the integration of existing technologies or theories, possibly developing new theories, producing prototypes of compilers and software engineering tools, and evaluating how well the produced tools satisfy the needs of programmers. The long-term goal is to influence the development of future commercial programming languages, compilers, and software engineering tools.

Software engineering practices and problems differ widely between different application domains. I have been concentrating on language/tool development for two software development domains: scientific computation (in particular, computational chemistry) and object-oriented (OO) applications development. Recently, I have also started looking at embedded systems programming.

The majority of software for scientific computations is written in the low-level languages FORTRAN and C. The computational structure of some of this software, however, has sufficient underlying structure that it could benefit from special-purpose software engineering tools or domain-specific programming languages. E.g., electronic structure calculations in quantum chemistry and in physics involve large collections of tensor contractions (generalized matrix multiplications). Currently, chemists spend weeks or months manipulating formulas containing dozens or hundreds of terms with Mathematica, hand-optimizing the computation, and writing FORTRAN code by hand. The computation can take on the order of 1 TFLOP week or more and can require multiple TBs of storage.

We are developing a domain-specific language that allows chemists to specify the computation in a high-level Mathematica-style language. The compiler for this language, the Tensor Contraction Engine (TCE), searches for an optimal implementation and generates FORTRAN code. First, algebraic transformations are used to reduce the number of operations. We then minimize the storage requirements to fit the computation within the disk limits by fusing loops. We have designed an algorithm that finds the optimal evaluation order if intermediate arrays are allocated dynamically and are working on combining loop fusion with dynamic memory allocation. If the computation does not fit within the disk limits, recomputation must be traded off for a reduction in storage requirements. If the target machine is a multi-processor machine, we optimize the communication cost together with finding a fusion configuration for minimizing storage. Finally, we minimize the data access times by minimizing disk-to-memory and memory-to-cache traffic and generate FORTRAN code. We have completed a first prototype of the TCE and are working on implementing the communication minimization and data access optimization algorithms. In future research, we will extend this approach to handle common subexpressions, symmetric matrices, and sparse matrices.

Object-oriented languages provide abstraction mechanisms that help with structuring large software. Many commercial programs, such as desktop applications or games, are produced in an OO style, either in an OO language or by hand-coding OO techniques in a language such as C. However, as our analysis of OO design patterns shows, the abstraction mechanisms provided by the most widely used OO languages C++ and Java are still not sufficient. An OO programming style is the appropriate language mechanism for structuring data and if the software is expected to evolve by refining data representations. However, OO languages do not provide adequate support for introducing new abstractions retroactively or if the software evolves by adding functionality to a stable data structure. Many OO design patterns can be considered work-arounds for applying an OO style to problems that are more adequately solved in a different, e.g., a functional, programming style.

We have designed an extension of C++ with interfaces (then called signatures) and structural subtyping, which allow adding abstractions retroactively, and an extension of Java with structural subtyping. Our C++ extension has been part of the GCC distribution until Version 2.95. We implemented structural subtyping for Java as an extension of the Sun JDK 1.1.5 compiler. Currently, we are working on designing the language Brew as an extension of Java with an object model based on our analysis of design patterns and on implementing a compiler for the language in Brew. Until Brew and the Brew compiler are sufficiently far developed, we are framing our research results in the context of Java. For evolving software by adding functionality to a stable data structure, it would be desirable to have support for multimethods in the language. We developed an extension of Java with retroactive abstraction and multimethods, called Half & Half, and are currently working on implementing it. We have shown that multimethods together with a closure mechanism allow programming in a functional style similar to that in ML.

When developing OO software, it is necessary to ensure that the methods on an object are called in the correct order. We have designed language support for specifying the object protocols in class and interface declarations in Java. This language support allows the compiler to verify that the class protocol satisfies the interface protocol and to generate debugging code for monitoring the order of method calls at run time. We are implementing support for protocols as an extension of the Sun JDK compiler. We are also working on language support for mobility. Mobile object libraries for Java, such as the Aglets library do not allow migrating the execution state of methods executing in an agent. We have developed a translation mechanism and mobile threads that allow a multithreaded agent to migrate with all of its execution state. We are currently finishing the implementation as a preprocessor in the Brew compiler.

Embedded systems are becoming ubiquitous. From cell phones, to household appliances, to automobiles, many consumer products are being equipped with one or more processors that interact with the electronic or mechanical parts of the product. The software engineering practices for embedded systems, however, are still rather primitive. Many systems are programmed in assembly or C; other systems are programmed with high-level but inefficient languages such as Matlab/Simulink; there is little use of OO languages or techniques. For teaching embedded systems programming, the current alternatives are using real equipment in an expensive laboratory for small numbers of students or teaching a dry course.

We are developing a virtual testbed that employs an actual digital signal processor (DSP) but in which the external devices controlled by the DSP, such as power switches or motors, are simulated. We will use this testbed for developing a course sequence that allows teaching hands-on embedded systems programming to large numbers of students at low cost. This virtual testbed could also serve as a testing tool for testing individual embedded systems with part of the hardware attached but with other hardware or the communication with other processors being simulated. Once this virtual testbed is completed, we will use it as a research platform for developing language support and software engineering tool support for embedded systems programming.


Gerald Baumgartner
Last modified: Thu Jan 29 23:55:40 Eastern Standard Time 2004