MIT researchers propose a new programming language called “Exo” to write high-performance code on hardware accelerators

In a recent research paper, to write high-performance programs on hardware accelerators, researchers from MIT CSAIL and UC Berkeley developed a new programming language named “Exo.” A new programming language based on the “Exocompilation” principle to better facilitate the construction of high-performance libraries for specialized hardware. Exocompilation externalizes support for target-specific code generation and optimization rules to user-level code. Exo allows user libraries to create specific hardware instructions, custom memory, and accelerator configuration state.

By using these specialized accelerator chips, Exo helps low-level performance engineers convert elementary programs that define what they want to compute into highly sophisticated algorithms that perform the same task much faster.

You must create entirely new software support when developing an accelerator, a procedure in which programs offload certain activities to system hardware specifically to speed up those operations. Although hardware accelerators are not ready-made, they can perform some tasks much faster than CPUs. To work with the entire application system, the software must make efficient use of the instructions from the accelerators. This leads to a significant amount of technical effort that must be maintained for each programming language compiled for a new chip.

Engineers can use Exo, for example, to transform simple matrix multiplication software into more complex software that uses these unique accelerators to run orders of magnitude faster.

Exo is based on the idea of ​​Exocompilation, unlike other programming languages ​​and compilers. “Traditionally, a lot of research has focused on automating the optimization process for specific hardware.” “It’s great for most programmers, but the compiler is more often a problem than a solution for performance engineers. Since the compiler optimizes itself, if it makes a mistake and gives you 45% efficiency instead of 90%, there is no viable way to rectify it.

Exocompilation gives control back to the performance engineer. Instead of the compiler, the performance engineer now decides which optimizations to use, when to use them, and in what order. This can avoid wasting time struggling with the compiler or running everything by hand. Exo is responsible for ensuring that all of these improvements are accurate at the same time. Because of this, the performance engineer can focus on improving performance rather than troubleshooting complex, optimized code.

“The Exo language is a parameterized compiler on the hardware it targets; the same compiler can adapt to many different hardware accelerators. Exo provides a consistent, abstract approach to describing the “shape” of the hardware you want to target instead of requiring you to write a ton of complicated C++ code to build for a new accelerator. Then, rather than creating an entirely new compiler from the start, you can modify the current Exo compiler to conform to this new definition. This innovation can have a significant impact because it will allow hardware developers to experiment and ship more designs without having to worry about the expense of creating new compilers for each new hardware design. The industry can stop relying on outdated hardware, which is only profitable due to ecosystem lockdown and, therefore, inefficient.

Scientific computing and machine learning applications are powered by “key routines”, cores or high performance computing (HPC) routines, accelerated by the fastest computer processors available today, such as such as Google’s TPU, Apple’s Neural Engine or NVIDIA’s Tensor Cores. .

Despite the awkward terminology, programs are crucial. For example, a “library” or collection of subroutines known as basic linear algebra (BLAS) subroutines enable many machine learning activities, including neural networks, weather forecasting, cloud computing and drug discovery. These tasks all require linear algebra calculations. These new processors, whose creation involves hundreds of engineers, cannot be as efficient as these HPC software libraries.

This performance boost is done manually to ensure that every last compute cycle on these processors is utilized. Hardware developers go to great lengths to deliver an additional five or ten percent performance to these theoretical peaks. HPC routines often operate at 90% or more of their maximum theoretical efficiency. Therefore, this effort would be wasted if the program was not aggressively optimized, precisely what Exo aims to prevent.

Performance engineers can define new chips they want to optimize using Exocompilation, eliminating the need to switch compilers. In the past, compiler developers were responsible for maintaining the hardware interface specification, but with most of these new accelerator chips, the hardware interface is proprietary. Companies should maintain their own copy (or “fork”) of a typical compiler that has been customized to support their specific chip. For this, teams of compiler developers must be hired in addition to performance engineers.

“In Exo, they externalize the Exocompiler from the hardware-specific backends specification. This allows us to more clearly distinguish between Exo, an open-source project, and hardware-specific proprietary code. They demonstrated that Exo enables rapid code creation just as efficient as Intel’s manually optimized Math Kernel library.

Exo’s future is to study a more efficient scheduling meta-language and develop its semantics to enable parallel programming models so that it can be used with even more accelerators, such as GPUs.

This Article is written as a summary article by Marktechpost Research Staff based on the research paper 'Exocompilation for Productive Programming of Hardware Accelerators'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and MIT article.

Please Don't Forget To Join Our ML Subreddit

Scott R. Banks