Research Statement
The aim of my research is to make programmers' life easier by giving
them better tools.
Compared to other engineering disciplines, the state of the art in
software engineering is still somewhat immature. My research focuses
on improving software engineering practices by producing better
programming languages and better software engineering tools. My
research approach is to analyze the needs of programmers by studying
patterns in the software design or in the development process, to find
appropriate applicable technologies or theoretical results, and to
integrate them into better languages and/or tools. This requires
solving problems arising from the integration of existing technologies
or theories, possibly developing new theories, producing prototypes of
compilers and software engineering tools, and evaluating how well the
produced tools satisfy the needs of programmers. The long-term goal
is to influence the development of future commercial programming
languages, compilers, and software engineering tools.
Software engineering practices and problems differ widely between
different application domains.
I have been concentrating on language/tool development for two
software development domains: scientific computation (in particular,
computational chemistry) and object-oriented (OO) applications
development. Recently, I have also started looking at embedded
systems programming.
The majority of software for scientific computations is written in the
low-level languages FORTRAN and C. The computational structure of
some of this software, however, has sufficient underlying structure
that it could benefit from special-purpose software engineering tools
or domain-specific programming languages. E.g., electronic structure
calculations in quantum chemistry and in physics involve large
collections of tensor contractions (generalized matrix
multiplications). Currently, chemists spend weeks or months
manipulating formulas containing dozens or hundreds of terms with
Mathematica, hand-optimizing the computation, and writing FORTRAN code
by hand. The computation can take on the order of 1 TFLOP week or
more and can require multiple TBs of storage.
We are developing a domain-specific language that allows chemists to
specify the computation in a high-level Mathematica-style language.
The compiler for this language, the Tensor Contraction Engine (TCE),
searches for an optimal implementation and generates FORTRAN code.
First, algebraic transformations are used to reduce the
number of operations. We then minimize the storage requirements to
fit the computation within the disk limits by fusing loops. We
have designed an algorithm that finds the optimal evaluation order if
intermediate arrays are allocated dynamically and are
working on combining loop fusion with dynamic memory allocation. If
the computation does not fit within the disk limits, recomputation
must be traded off for a reduction in storage requirements.
If the target machine is a multi-processor machine, we optimize the
communication cost together with finding a fusion configuration for
minimizing storage. Finally, we minimize the data access times by
minimizing disk-to-memory and memory-to-cache traffic and
generate FORTRAN code. We have completed a first prototype of the TCE
and are working on implementing the communication minimization and
data access optimization algorithms. In future research, we will
extend this approach to handle common subexpressions, symmetric
matrices, and sparse matrices.
Object-oriented languages provide abstraction mechanisms that help
with structuring large software. Many commercial programs, such as
desktop applications or games, are produced in an OO style, either in
an OO language or by hand-coding OO techniques in a language such as
C. However, as our analysis of OO design patterns shows, the
abstraction mechanisms provided by the most widely used OO languages
C++ and Java are still not sufficient. An OO programming style
is the appropriate language mechanism for structuring data and if the
software is expected to evolve by refining data representations.
However, OO languages do not provide adequate support for introducing
new abstractions retroactively or if the software evolves by adding
functionality to a stable data structure. Many OO design patterns can
be considered work-arounds for applying an OO style to problems that
are more adequately solved in a different, e.g., a functional,
programming style.
We have designed an extension of C++ with interfaces (then called
signatures) and structural subtyping, which allow adding
abstractions retroactively, and an extension of Java with structural
subtyping. Our C++ extension has been part of the GCC
distribution until Version 2.95. We implemented structural
subtyping for Java as an extension of the Sun JDK 1.1.5 compiler.
Currently, we are working on designing the language Brew as an
extension of Java with an object model based on our analysis of design
patterns and on implementing a compiler for the language in Brew.
Until Brew and the Brew compiler are sufficiently far developed, we
are framing our research results in the context of Java. For evolving
software by adding functionality to a stable data structure, it would
be desirable to have support for multimethods in the language. We
developed an extension of Java with retroactive abstraction and
multimethods, called Half & Half, and are currently working on
implementing it. We have shown that multimethods together
with a closure mechanism allow programming in a functional style
similar to that in ML.
When developing OO software, it is necessary to ensure that the
methods on an object are called in the correct order. We have
designed language support for specifying the object protocols in class
and interface declarations in Java. This language support allows
the compiler to verify that the class protocol satisfies the interface
protocol and to generate debugging code for monitoring the order of
method calls at run time. We are implementing support for protocols
as an extension of the Sun JDK compiler. We are also working on
language support for mobility. Mobile object libraries for Java, such
as the Aglets library do not allow migrating the execution state of
methods executing in an agent. We have developed a translation
mechanism and mobile threads that allow a multithreaded agent
to migrate with all of its execution state. We are currently
finishing the implementation as a preprocessor in the Brew compiler.
Embedded systems are becoming ubiquitous. From cell phones, to
household appliances, to automobiles, many consumer products are being
equipped with one or more processors that interact with the electronic
or mechanical parts of the product. The software engineering
practices for embedded systems, however, are still rather primitive.
Many systems are programmed in assembly or C; other systems are
programmed with high-level but inefficient languages such as
Matlab/Simulink; there is little use of OO languages or techniques.
For teaching embedded systems programming, the current alternatives
are using real equipment in an expensive laboratory for small numbers
of students or teaching a dry course.
We are developing a virtual testbed that employs an actual digital
signal processor (DSP) but in which the external devices controlled by
the DSP, such as power switches or motors, are simulated. We will
use this testbed for developing a course sequence that allows teaching
hands-on embedded systems programming to large numbers of students at
low cost. This virtual testbed could also serve as a testing tool for
testing individual embedded systems with part of the hardware attached
but with other hardware or the communication with other processors
being simulated. Once this virtual testbed is completed, we will use
it as a research platform for developing language support and software
engineering tool support for embedded systems programming.
Gerald Baumgartner
Last modified: Thu Jan 29 23:55:40 Eastern Standard Time 2004