The Tensor Contraction Engine
The majority of software for scientific computations is written in the lowlevel languages FORTRAN and C. The computational structure of some of this software, however, has sufficient underlying structure that it could benefit from specialpurpose software engineering tools or domainspecific programming languages. E.g., electronic structure calculations in quantum chemistry and in physics involve large collections of tensor contractions (generalized matrix multiplications). Currently, chemists spend weeks or months manipulating formulas containing dozens or hundreds of terms with Mathematica, handoptimizing the computation, and writing FORTRAN code by hand. The computation can take on the order of 1 TFLOP week or more and can require multiple TBs of storage. We have developed a domainspecific language that allows chemists to specify the computation in a highlevel Mathematicastyle language. The compiler for this language, the Tensor Contraction Engine (TCE), searches for an optimal implementation and generates FORTRAN code. First, algebraic transformations are used to reduce the number of operations. We then minimize the storage requirements to fit the computation within the disk limits by fusing loops. We have designed an algorithm that finds the optimal evaluation order if intermediate arrays are allocated dynamically and are working on combining loop fusion with dynamic memory allocation. If the computation does not fit within the disk limits, recomputation must be traded off for a reduction in storage requirements. If the target machine is a multiprocessor machine, we optimize the communication cost together with finding a fusion configuration for minimizing storage. Finally, we minimize the data access times by minimizing disktomemory and memorytocache traffic and generate FORTRAN code. We have completed a first prototype of the TCE and are working on implementing the communication minimization and data access optimization algorithms. In future research, we will extend this approach to handle common subexpressions, symmetric matrices, and sparse matrices. The Tensor Contraction Engine (TCE) is the application of compiler optimization and sourcetosource translation technology to craft a domain specific language for manybody theories in chemistry and physics. The underlying equations of these theories are all expressed as contractions of manydimensional arrays or tensors There may be many thousands of such terms in any one problem but their regularity means that they can be translated into efficient massively parallel code that respects the boundedness of each level of the memory hierarchy and minimizes overall runtime with effective tradeoff of increased computation for reduced memory consumption. The approach has been overwhelming successful and now NWChem contains about 1M lines of humangenerated code and over 2M lines of machine generated code. The resulting scientific capabilities would have taken many mandecades of effort and new theories/models can be tested in a morning on physically relevant systems instead of on small test systems after months of effort. In combination with the OCE (operator contraction engine) that turns Feynmanlike diagrams into tensor expressions the TCE represents perhaps the first endtoend production quality example of a solution to the semantic gap. We are currently working on generalizing our optimization approach to handwritten code by combining it with polyhedral model transformations. Motivated by the successes of the modeldriven searchbased optimization approach of the TCE and the polyhedral modelbased parallelization of Pluto, we are working on developing an optimization infrastructure in the ROSE Compiler that combines the key aspects of the TCE and Pluto and provides the flexibility to continue research on optimizing tensor computations for parallel, distributed, and/or outofcore computations for any machine architecture, including multicores and GPUs. For an overview of the project, see our Proceedings of the IEEE paper.For more information about version 1.0 of the TCE (the "Prototype" TCE), please, see our Getting and Using the TCE page.
There are several components available as part of the TCE software. For details, please, see our TCE Software page.
Collaborators
 J. Ramanujam, ECE Division, School of Electrical Engineering and Computer Science, Louisiana State University
 P. Sadayappan Dept. of Computer Science and Engineering, Ohio State University
 David E. Bernholdt, Oak Ridge National Laboratories
 Robert J. Harrison, Oak Ridge National Laboratories
 So Hirata, Dept. of Chemistry, University of Illinois at UrbanaChampaign
 Marcel Nooijen, Dept. of Chemistry, University of Waterloo
 Russell M. Pitzer, Dept. of Chemistry, Ohio State University
Senior Personnel
 Alexander Auer, Dept. of Molecular Theory and Spectroscopy, Max Planck Institute
 Daniel Cociorva, FICO
 Venkatesh Choppella, International Institute of Information Technology, Hyderabad
 ChiChung Lam, Dept. of Computer Science and Engineering, Ohio State University
Students
 Kevin Hartline
 Vishnu Khaspa
 Arvind Saini
Former Students
 Pamela Bhattacharya (MS, August 2008), Microsoft
 Alina Bibireata (MS, March 2004), GE Healthcare
 Xiaoyang Gao, IBM
 Albert Hartono, Intel Labs
 Kit Hymel (BS, May 2013)
 Sandhya Krishnan, Google
 Sriram Krishnamoorthy, Pacific Northwest National Labs.
 Qingda Lu, Intel
 Ajay R. Panyala (PhD, August 2014)
 Srinivas Pola (MS, Decemer 2008)
 Brian Poulin (BS, May 2013), Postlethwaite & Netterville
 Swarup Kumar Sahoo, Dept. of Computer Science, University of Illinois at UrbanaChampaign
 Alexander Sibiriakov, MainConcept  DivX
 Vaidyanathan Sivaraman, Dept. of Mathematical Sciences, Binghampton University
 Vamshi Kodimala (MS, May 2012), Infosys
 SaiRaj Yalamanchili, LSU
Software

The Loop Fusion Algorithm:
Memory Minimization and SpaceTime TradeOffs
An implementation in ML, September 2010. 
Components of the TCE Software, January 2008. 
The Tensor Contraction Engine, Version 1.0
The "Prototype" TCE.
Funding
This material is based upon work supported by the National Science Foundation and the Louisiana Board of Regents under the following grants. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Louisiana Board of Regents. Louisiana Board of Regents Support Fund, Enhancement Program, Award #20130008669, July 2015  June 2017.
 NSF CISE Computing Research Infrastructure (CRI) Program, Award #1059417, June 2011  May 2016.
 NSF Experimental Program to Stimulate Competitive Research (EPSCOR), Award #1003897, Oct. 2014  Sep. 2015.
 NSF Foundations of Computing Processes and Artifacts (CPA) Program, Award #0541409, May 2006  Apr. 2011.
 NSF Computer Systems Research (CSR) Program, Award #0509442, Sept. 2005  Aug. 2009.
 NSF Information Technology Research (ITR) Program, Award #0121676, Sept. 2001  Sept. 2007.