The Tensor Contraction Engine
The majority of software for scientific computations is written in the low-level languages FORTRAN and C. The computational structure of some of this software, however, has sufficient underlying structure that it could benefit from special-purpose software engineering tools or domain-specific programming languages. E.g., electronic structure calculations in quantum chemistry and in physics involve large collections of tensor contractions (generalized matrix multiplications). Currently, chemists spend weeks or months manipulating formulas containing dozens or hundreds of terms with Mathematica, hand-optimizing the computation, and writing FORTRAN code by hand. The computation can take on the order of 1 TFLOP week or more and can require multiple TBs of storage. We have developed a domain-specific language that allows chemists to specify the computation in a high-level Mathematica-style language. The compiler for this language, the Tensor Contraction Engine (TCE), searches for an optimal implementation and generates FORTRAN code. First, algebraic transformations are used to reduce the number of operations. We then minimize the storage requirements to fit the computation within the disk limits by fusing loops. We have designed an algorithm that finds the optimal evaluation order if intermediate arrays are allocated dynamically and are working on combining loop fusion with dynamic memory allocation. If the computation does not fit within the disk limits, recomputation must be traded off for a reduction in storage requirements. If the target machine is a multi-processor machine, we optimize the communication cost together with finding a fusion configuration for minimizing storage. Finally, we minimize the data access times by minimizing disk-to-memory and memory-to-cache traffic and generate FORTRAN code. We have completed a first prototype of the TCE and are working on implementing the communication minimization and data access optimization algorithms. In future research, we will extend this approach to handle common subexpressions, symmetric matrices, and sparse matrices. The Tensor Contraction Engine (TCE) is the application of compiler optimization and source-to-source translation technology to craft a domain specific language for many-body theories in chemistry and physics. The underlying equations of these theories are all expressed as contractions of many-dimensional arrays or tensors There may be many thousands of such terms in any one problem but their regularity means that they can be translated into efficient massively parallel code that respects the boundedness of each level of the memory hierarchy and minimizes overall runtime with effective trade-off of increased computation for reduced memory consumption. The approach has been overwhelming successful and now NWChem contains about 1M lines of human-generated code and over 2M lines of machine generated code. The resulting scientific capabilities would have taken many man-decades of effort and new theories/models can be tested in a morning on physically relevant systems instead of on small test systems after months of effort. In combination with the OCE (operator contraction engine) that turns Feynman-like diagrams into tensor expressions the TCE represents perhaps the first end-to-end production quality example of a solution to the semantic gap. We are currently working on generalizing our optimization approach to handwritten code by combining it with polyhedral model transformations. Motivated by the successes of the model-driven search-based optimization approach of the TCE and the polyhedral model-based parallelization of Pluto, we are working on developing an optimization infrastructure in the ROSE Compiler that combines the key aspects of the TCE and Pluto and provides the flexibility to continue research on optimizing tensor computations for parallel, distributed, and/or out-of-core computations for any machine architecture, including multi-cores and GPUs. For an overview of the project, see our Proceedings of the IEEE paper.For more information about version 1.0 of the TCE (the "Prototype" TCE), please, see our Getting and Using the TCE page.
There are several components available as part of the TCE software. For details, please, see our TCE Software page.
Collaborators
- J. Ramanujam, ECE Division, School of Electrical Engineering and Computer Science, Louisiana State University
- P. Sadayappan Dept. of Computer Science and Engineering, Ohio State University
- David E. Bernholdt, Oak Ridge National Laboratories
- Robert J. Harrison, Oak Ridge National Laboratories
- So Hirata, Dept. of Chemistry, University of Illinois at Urbana-Champaign
- Marcel Nooijen, Dept. of Chemistry, University of Waterloo
- Russell M. Pitzer, Dept. of Chemistry, Ohio State University
Senior Personnel
- Alexander Auer, Dept. of Molecular Theory and Spectroscopy, Max Planck Institute
- Daniel Cociorva, FICO
- Venkatesh Choppella, International Institute of Information Technology, Hyderabad
- Chi-Chung Lam, Dept. of Computer Science and Engineering, Ohio State University
Students
- Kevin Hartline
- Vishnu Khaspa
- Arvind Saini
Former Students
- Pamela Bhattacharya (MS, August 2008), Microsoft
- Alina Bibireata (MS, March 2004), GE Healthcare
- Xiaoyang Gao, IBM
- Albert Hartono, Intel Labs
- Kit Hymel (BS, May 2013)
- Sandhya Krishnan, Google
- Sriram Krishnamoorthy, Pacific Northwest National Labs.
- Qingda Lu, Intel
- Ajay R. Panyala (PhD, August 2014)
- Srinivas Pola (MS, Decemer 2008)
- Brian Poulin (BS, May 2013), Postlethwaite & Netterville
- Swarup Kumar Sahoo, Dept. of Computer Science, University of Illinois at Urbana-Champaign
- Alexander Sibiriakov, MainConcept - DivX
- Vaidyanathan Sivaraman, Dept. of Mathematical Sciences, Binghampton University
- Vamshi Kodimala (MS, May 2012), Infosys
- SaiRaj Yalamanchili, LSU
Software
-
The Loop Fusion Algorithm:
Memory Minimization and Space-Time Trade-Offs
An implementation in ML, September 2010. -
The Tensor Contraction Engine
Components of the TCE Software, January 2008. -
The Tensor Contraction Engine, Version 1.0
The "Prototype" TCE.
Publications
2013
-
Model-Driven Search-Based Loop Fusion Optimization for
Handwritten Code
A. Panyala, P. Bhattacharya, G. Baumgartner, J. Ramanujam. In Proceedings of the 17th Workshop on Compilers for Parallel Computers (CPC '13), Lyon, France, 3-5 July 2013.
2012
-
Empirical Performance-Model Driven Data Layout Optimization
and Library Call Selection for Tensor Contraction Expressions
Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam, P. Sadayappan. Journal of Parallel and Distributed Computing, Vol. 72, No. 3, March 2012, pp. 338-352, doi: 10.1016/j.jpdc.2011.09.006.
2011
-
Memory-Optimal Evaluation of Expression Trees Involving
Large Objects
C. Lam, T. Rauber, G. Baumgartner, D. Cociorva, P. Sadayappan. Computer Languages, Systems & Structures, Vol. 37, No. 2, July 2011, pp. 63-75, doi: 10.1016/j.cl.2010.09.003.
2009
-
Performance Optimization of Tensor Contraction Expressions for
Many-Body Methods in Quantum Chemistry
A. Hartono, Q. Lu, T. Henretty, S. Krishnamoorthy, H. Zhang, G. Baumgartner, D.E. Bernholdt, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan. Journal of Physical Chemistry A, Vol. 113, No. 45, September 2009, pp. 12715-12723.
2007
-
Efficient Search-Space Pruning for Integrated Fusion and
Tiling Transformations
X. Gao, S. Krishnamoorthy, S.K. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, P. Sadayappan. Concurrency and Computation: Practice and Experience, Vol. 19, No. 18, December 2007, pp. 2425-2443.
2006
-
Identifying Cost-Effective Common Subexpressions to Reduce
Operation Count in Tensor Contraction Evaluations
A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen, G. Baumgartner, D.E. Bernholdt, V. Choppella, R.M. Pitzer, J. Ramanujam, A. Rountev, P. Sadayappan. In V.N. Alexandrov, G.D. van Albada, P.M.A. Sloot, J.J. Dongarra (eds.): Proceedings of the International Conference on Computational Science 2006 (ICCS '06), Part I, Reading, United Kingdom, 28-31 May 2006. Lecture Notes in Computer Science, Vol. 3991, Springer-Verlag, 2006, pp. 267-275. -
Layout Transformation Support for the Disk Resident Arrays
Framework
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha, P. Sadayappan. Journal of Supercomputing, Vol. 36, No. 2, May 2006, pp. 153-170. -
Efficient Synthesis of Out-of-Core Algorithms
Using a Nonlinear Optimization Solver
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Ramanujam, P. Sadayappan, V. Choppella. Journal of Parallel and Distributed Computing, Vol. 66, No. 5, May 2006, pp. 659-673. -
Memory Minimization for Tensor Contractions using
Integer Linear Programming
A. Allam, J. Ramanujam, G. Baumgartner, P. Sadayappan. In Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL '06), Rhodes Island, Greece, 29 April 2006. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '02), IEEE Computer Society Press, 8 pages. -
Efficient Parallel Out-of-Core Matrix Transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan. International Journal on High Performance Computing and Networking, Vol. 2, No. 2/3/4, 2006, pp. 110-119. -
Automatic Code Generation for Many-Body Electronic Structure
Methods: The Tensor Contraction Engine
A. Auer, G. Baumgartner, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R.J. Harrison, A. Hartono, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan, A. Sibiryakov. Molecular Physics, Vol. 104, No. 2, January 2006, pp. 211-228. -
Efficient Search-Space Pruning for Integrated Fusion and
Tiling Transformations
X. Gao, S. Krishnamoorthy, S.K. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, P. Sadayappan. In Proceedings of the 12th Workshop on Compilers for Parallel Computers (CPC '06), A Coruña, Spain, 9-11 January 2006, 15 pages.
2005
-
Efficient Search-Space Pruning for Integrated Fusion and
Tiling Transformations
X. Gao, S. Krishnamoorthy, S.K. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, P. Sadayappan. In E. Ayguadé, G. Baumgartner, J. Ramanujam, P. Sadayappan (eds.), Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC '05), Hawthorne, New York, 20-22 October 2005. Lecture Notes in Computer Science, Vol. 4339, Springer-Verlag, 2006, pp. 215-229. -
Performance Modeling and Optimization of
Parallel Out-of-Core Tensor Contractions
X. Gao, S.K. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J. Ramanujam, P. Sadayappan. In Proceedings of the ACM SIGPLAN 2005 Symposium on Principles and Practice of Parallel Programming (PPoPP '05), Chicago, Illinois, 15-17 June 2005, pp. 266-276. -
Automated Operation Minimization of Tensor Contraction
Expressions in Electronic Structure Calculations
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D.E. Bernholdt, S. Hirata, C. Lam, R.M. Pitzer, J. Ramanujam, P. Sadayappan, In Proceedings of the International Conference on Computational Science 2005 (ICCS '05), Atlanta, Georgia, 22-25 May 2005, Part I. Lecture Notes in Computer Science, Vol. 3514, Springer-Verlag, pp. 155-164. -
Automated Operation Minimization of Tensor Contraction
Expressions in Electronic Structure Calculations
A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D.E. Bernholdt, S. Hirata, C. Lam, R.M. Pitzer, J. Ramanujam, P. Sadayappan, Technical Report No. OSU-CISRC-2/05-TR10, Dept. of Computer Science and Engineering, The Ohio State University, February 2005. -
Synthesis of High-Performance Parallel Programs for a Class
of Ab Initio Quantum Chemistry Models
G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan, A. Sibiryakov. Proceedings of the IEEE, Vol. 93, No. 2, February 2005, pp. 276-292.
2004
-
Efficient Layout Transformation Support for Disk-based
Multidimensional Arrays
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha, P. Sadayappan. In L. Bougé, V.K. Prasanna (eds.), Proceedings of the 11th Annual International Conference on High-Performance Computing (HiPC '04), Bangalore, India, 19-22 December 2004. In Lecture Notes in Computer Science, Vol. 3296, Springer-Verlag, pp. 386-398. -
Compiler Techniques for Efficient Parallelization of
Out-of-Core Tensor Contractions
X. Gao, S.K. Sahoo, Q. Lu, G. Baumgartner, C. Lam, J. Ramanujam, P. Sadayappan. Technical Report No. OSU-CISRC-12/04-TR67, Dept. of Computer Science and Engineering, The Ohio State University, December 2004. -
Layout Transformation Support for the Disk Resident Arrays
Framework
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha, P. Sadayappan. In Proceedings of the Los Alamos Computer Science Initiative Symposium (LACSI '04). Santa Fe, New Mexico. 12-14 October 2004, 14 pages. -
Empirical Performance-Model Driven Data Layout Optimization
Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam, P. Sadayappan. In R. Eigenmann, Z. Li, S. Midkiff (eds.), Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing (LCPC '04), West Lafayette, Indiana, 22-25 September 2004. Lecture Notes in Computer Science, Vol. 3602, Springer-Verlag, 2005, pp. 72-86. -
A High-Level Approach to Synthesis of High-Performance Codes
for Quantum Chemistry: The Tensor Contraction Engine
G. Baumgartner, D.E. Bernholdt, V. Choppella, J. Ramanujam, P. Sadayappan. In Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC '04), Chiemsee, Germany, 7-9 July 2004, pp. 281-290. -
Efficient Synthesis of Out-of-Core Algorithms Using a
Nonlinear Optimization Solver
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Ramanujam, P. Sadayappan, V. Choppella. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '04), Santa Fe, New Mexico, 26-30 April 2004. IEEE Computer Society Press, Abstract p. 34b, 10 pages. Best paper award.
2003
-
Data Locality Optimization for Synthesis of Efficient
Out-of-Core Algorithms
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella. In Proceedings of the International Conference on High-Performance Computing (HiPC '03), Hyderabad, India, 17-20 December 2003. Lecture Notes in Computer Science, Vol. 2913, Springer-Verlag, pp. 406-417. Best paper award. -
Efficient Parallel Out-of-Core Matrix Transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster '03), Hong Kong, China, 1-4 December 2003. IEEE Computer Society Press, pp. 300-307. -
Memory-Constrained Data Locality Optimization for
Tensor Contractions
A. Bibireata, S. Krishnan, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam, D.E. Bernholdt, V. Choppella. In L. Rauchwerger (ed.), Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC '03), College Station, Texas, 2-4 October 2003. Lecture Notes in Computer Science, Vol. 2958, Springer-Verlag, 2004, pp. 93-108. -
On Efficient Out-of-Core Matrix Transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan. Technical Report OSU-CISRC-9/03-TR52, Dept. of Computer and Information Science, The Ohio State University, September 2003. -
Global Communication Optimization for Tensor Contraction
Expressions under Memory Constraints
D. Cociorva, X. Gao, S. Krishnan, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam. In Proceedings of the 2003 International Parallel and Distributed Processing Symposium (IPDPS '03), Nice, France, 22-26 April 2003. IEEE Computer Society Press, Abstract p. 37b, 8 pages. -
Compile-Time Optimizations for Tensor Contraction Expressions
G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam. In Proceedings of the Workshop on Compilers for Parallel Computers (CPC '03), Amsterdam, The Netherlands, 8-10 January, 2003, pp 281-290.
2002
-
A High-Level Approach to Synthesis of High-Performance Codes for
Quantum Chemistry
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, S. Hirata, C. Lam, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan. In Proceedings of Supercomputing 2002, Baltimore, Maryland, 16-22 November 2002. IEEE Computer Society Press, Abstract p. 5, 10 pages. -
Memory-Constrained Communication Minimization for a Class of
Array Computations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam. In B. Pugh, C. Tseng (eds.), Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computing (LCPC '02), College Park, Maryland, 25-27 July 2002. Lecture Notes in Computer Science, Vol. 2481, Springer-Verlag, 2005, pp. 1-15. -
Automatic Synthesis of High-Performance Codes for Quantum
Chemistry Applications
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan. In Proceedings of the Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL '02), New York, New York, 22 June 2002, 10 pages. -
Space-Time Trade-Off Optimization for a Class of Electronic
Structure Calculations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02), Berlin, Germany, 17-19 June 2002, pp. 177-186. -
A Performance Optimization Framework for Compilation of Tensor
Contraction Expressions into Parallel Programs
G. Baumgartner, D.E. Bernholdt, D. Cociorva, R.J. Harrison, C. Lam, M. Nooijen, J. Ramanujam, P. Sadayappan. 7th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS '02), In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '02), Fort Lauderdale, Florida, 15 April 2002, IEEE Computer Society, pp. 106-114.
2001
-
Towards Automatic Synthesis of High-Performance Codes for
Electronic Structure Calculations: Data Locality Optimization
D. Cociorva, J. Wilkins, G. Baumgartner, P. Sadayappan, J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison. In Proceedings of the International Conference on High-Performance Computing (HiPC '01), Hyderabad, India, 17-21 December 2001. Lecture Nodes in Computer Science, Vol. 2228, Springer-Verlag, pp. 237-248. -
Space-Time Trade-Off Optimization for a Class of Electronic
Structure Calculations
D. Cociorva, G. Baumgartner, C. Lam, P. Sadayappan, J. Ramanujam, M. Nooijen, D.E. Bernholdt, R.J. Harrison. Technical Report No. OSU-CISRC-11/01-TR24, Dept. of Computer and Information Science, The Ohio State University, November 2001. -
Loop Optimizations for a Class of Memory-Constrained
Computations
D. Cociorva, J. Wilkins, C. Lam, G. Baumgartner, P. Sadayappan, J. Ramanujam. In Proceedings of the 15th ACM International Conference on Supercomputing (ICS '01), Sorrento, Italy, 16-21 June 2001, pp. 103-113. -
Memory-Optimal Evaluation of Expression Trees Involving Large
Objects
C. Lam, D. Cociorva, G. Baumgartner, P. Sadayappan. Technical Report No. OSU-CISRC-5/99-TR13, Dept. of Computer and Information Science, The Ohio State University, May 1999, updated March 2001.
1999
-
Memory-Optimal Evaluation of Expression Trees Involving Large
Objects
C. Lam, D. Cociorva, G. Baumgartner, P. Sadayappan. In Proceedings of the 1999 International Conference on High Performance Computing (HiPC '99), Calcutta, India, 17-20 December 1999, IEEE Computer Society. Lecture Notes in Computer Science, Vol. 1745, Springer-Verlag, pp. 103-110. -
Optimization of Memory Usage Requirement for
a Class of Loops Implementing Multi-Dimensional Integrals
C. Lam, D. Cociorva, G. Baumgartner, P. Sadayappan, In J. Ferrante, L. Carter (eds.), Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing (LCPC '99), San Diego, California, 4-6 August 1999. Lecture Notes in Computer Science, Vol. 1863, Springer-Verlag, pp. 350-364.
Funding
This material is based upon work supported by the National Science Foundation and the Louisiana Board of Regents under the following grants. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Louisiana Board of Regents.- Louisiana Board of Regents Support Fund, Enhancement Program, Award #20130008669, July 2015 - June 2017.
- NSF CISE Computing Research Infrastructure (CRI) Program, Award #1059417, June 2011 - May 2016.
- NSF Experimental Program to Stimulate Competitive Research (EPSCOR), Award #1003897, Oct. 2014 - Sep. 2015.
- NSF Foundations of Computing Processes and Artifacts (CPA) Program, Award #0541409, May 2006 - Apr. 2011.
- NSF Computer Systems Research (CSR) Program, Award #0509442, Sept. 2005 - Aug. 2009.
- NSF Information Technology Research (ITR) Program, Award #0121676, Sept. 2001 - Sept. 2007.