Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Workshop Reg Pass. Upgrade Registration

Unlocking the Potential of Large Language Models for High-Performance Computing Code

SessionAI Assisted Software Development for HPC (AI4DEV)

DescriptionHigh-Performance Computing (HPC) has long been the driving force behind advancements in science, engineering, and beyond. Yet, realizing the full potential of HPC applications has often been hampered by the intricate nature of programming for the underlying parallel systems. In this keynote, we explore a transformative approach that bridges the gap between human ingenuity and computational power using the capabilities of large language models (LLMs).

Our research is an exploration of how cutting-edge LLMs can be tailored to the demanding domain of HPC, where computational speed and efficiency reign supreme. While LLMs have showcased remarkable proficiency in understanding and generating code, their training data primarily comes from general-purpose codebases. In stark contrast, HPC code involves intricate mathematical modeling, parallelism, and optimization, demanding customized adaptations.

That is why our journey for ‘HPC LLMs’ began with the collection of an extensive dataset, HPCorpus, that represents the culmination of HPC code in C, C++, and Fortran from diverse domains. Armed with this invaluable resource, we embarked on an ambitious mission to enhance the capabilities of language models in the realm of HPC. The creation of Tokompiler, a pioneering HPC-specific code tokenizer, marked a pivotal turning point. Tokompiler, designed to preprocess code for language models, introduced a revolutionary approach that harnessed abstract syntax trees (ASTs) into the source code itself and reshaped the way language models comprehend and generate code, resembling how compilers perceive our codes, not humans. Building upon this innovation, we undertook comprehensive pre-training efforts with CompCoder, adapting transformer-based language models to the intricacies of HPC. This journey has culminated in novel downstream tasks, including the generation of OpenMP and MPI code, where our models shine by transforming serial code into efficient parallel one. Together, these milestones represent a great leap forward in the convergence of AI and HPC from a different perspective, promising to redefine the landscape of computational science.

As we stand at the crossroads of AI and HPC, the possibilities are boundless. Our journey is merely the prologue, unveiling a multitude of untapped opportunities in HPC code comprehension, generation, and optimization. From refining domain-specific code to tackling complex simulations and accelerating scientific breakthroughs, the horizons are vast. The symbiotic partnership between LLMs and HPC promises to revolutionize how HPC practitioners write code. Looking ahead, we envision a future where LLMs for HPC become indispensable tools for researchers and developers in their quest for unprecedented speed, accuracy, and efficiency.

Presenter

Gal Oren

Technion

Event Type

Workshop

TimeMonday, 13 November 202311:50am - 12:30pm MST

Location601

ask a question

give feedback