Navigating the Molecular Maze: A Python-Powered Approach to Virtual Drug Screening | | |
Simultaneous Evaluation of Mindful Fault Checking across the CPU and GPU | | |
Near-Optimal Reduce on the Cerebras Wafer-Scale Engine | | |
Comparative Study of the Cache Utilization Trends for Regional Scientific Data Caches | | |
ProxyStreams: Leveraging Lightweight Proxies for Portable Streams | | |
A Comparison of Deep and Shallow Residual Networks for Medical Imaging Classification | | |
Chasing Clouds with Donkeycar: Holistic Exploration of Edge and Cloud Inferencing Trade-Offs in E2E Self-Driving Cars | | |
A Reinforcement Learning-Based Backfilling Strategy for HPC Batch Jobs | | |
Road To Reliability: Optimizing Self-Driving Consistency With Real-Time Speed Data | | |
Using Deep Neural Networks to Classify Hot-Cold Data Storage | | |
Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pretraining | | |
Supercharging Scientific Serverless: Slashing Cold Starts with Python UniKernels | | |
Dynamic and First-Class Priorities | | |
How Much Noise Is Enough: On Privacy, Security, and Accuracy Trade-Offs in Differentially Private Federated Learning | | |
Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training | | |
A Formal Specification of Tensor Cores via Satisfiability Modulo Theories | | |
Accelerating Collective Communications with Lossy Compression on GPU | | |
Lossy and Lossless Compression for BioFilm Optical Coherence Tomography (OCT) | | |
File Aggregation for Asynchronous Multi-Level Checkpointing | | |
Fast Operations on Compressed Arrays without Decompression | | |
Utilizing Large Language Models for Disease Phenotyping in Obstructive Sleep Apnea | | |
Enabling Transparent, High-Throughput Data Movement for Scientific Workflows on HPC Systems | | |
ROI Preservation in Streaming Lossy Compression | | |
Better Data Splits for Machine Learning with Astartes | | |
Cloud Computing at Scale: Tracking 4.5 Million Heartbeats of 3D Coronary Flow via the Longitudinal Hemodynamic Mapping Framework | | |
Genome Assembly Using an Asynchronous Distributed Actor-Based Approach | | |
Accelerating CRUD with Chrono Dilation for Time-Series Storage Systems | | |
Incremental Graph Clustering in Parallel | | |
NetCDFaster: A Geospatial Cyberinfrastructure for Multi-Dimensional Scientific Datasets Full-Stack I/O and Visualization | | |
A Heterogeneous, In Transit Approach for Large Scale Cellular Modeling | | |
Sensitivity of Black-Box Statistical Prediction of Lossy Compression Ratios for 3D Scientific Data | | |
Seeing the Trees for the Forest: Describing HPC Filesystem Trees with the Grand Unified File-Index (GUFI) | | |
Cray EX40 Cluster Intrusion Detection System | | |
Job Level Communication-Avoiding Detection and Correction of Silent Data Corruption in HPC Applications | | |
Case Study for Performance Portability of GPU Programming Frameworks for Hemodynamic Simulations | | |
Fast Checkpointing of Large Language Models with TensorStore CHFS | | |
I/O Efficient Machine Learning |
Artificial Intelligence/Machine Learning
| |
Preemptive Intrusion Detection: Real-World Measurements, Bayesian-Based Detection, and AI-Driven Countermeasures |
Artificial Intelligence/Machine Learning
| |
High Performance Serverless for HPC and Clouds | | |
Corralling the Computing Continuum: Mobilizing Modern Distributed Resources for Machine Learning and Accessible Computing | | |
Scaling HPC Applications through Predictable and Reliable Data Reduction Methods | | |
Interactive In-Situ Visualization of Large Distributed Volume Data |
Data Analysis, Visualization, and Storage
| |
Enabling Reproducibility and Scalability of Scientific Workflows in HPC and Cloud | | |
Modernizing Simulation Software for the Exascale Era | | |
Charged Particle Track Reconstruction Algorithms for Massively Parallel Systems | | |
Design Automation Tools and Software for Quantum Computing | | |
Overcoming the Gap between Compute and Memory Bandwidth in Modern GPUs | | |
High Performance Computing for Optimization of Radiation Therapy Treatment Plans | | |
PanSim: A Performance-Portable Agent Based Model | | |
Ares – Simulating Type Ia Supernovae on Heterogeneous HPC Architectures | | |
Balancing Latency and Throughput of Distributed Inference by Interleaved Parallelism | | |
Scalable Algorithms for Analyzing Large Dynamic Networks Using CANDY | | |
Parallel Optimization Methods for Direct Numerical Simulation of High Reynolds Number Wall Turbulence with a Grid Size of 100 Billion | | |
Performant Low-Order Matrix-Free Finite Element Kernels on GPE Architectures | | |
Introducing Prefetching and Data Compression to Accelerate Checkpointing for Inverse Seismic Problems | | |
GPU-Accelerated Dense Covariance Matrix Generation for Spatial Statistics Applications | | |
ParLeiden: Boosting Parallelism of Distributed Leiden Algorithm on Large-Scale Graphs | | |
Scalable Reduced-Order Modeling for Three-Dimensional Turbulent Flow | | |
Unstructured Finite Element Models of Cardiac Electrophysiology Using a Deal.II-Based Library | | |
A Methodology for Accelerating Variant Calling on GPU | | |
Developing an Inverse Reinforcement Learning Methodology to Predict the Progression of Colorectal Cancer | | |
Accelerating Actor-Based Distributed Triangle Counting | | |
Scaling K-Path Centrality Using Optimized Distributed Data Structure | | |
Simulating Quantum Systems with NWQ-Sim on HPC | | |
A Hybrid Factorization Solver with Mixed Precision Arithmetic for Sparse Matrices | | |
Towards Enabling Digital Twins Capabilities for a Cloud Chamber | | |
High-Performance PMEM-Aware Collective I/Os |
Architecture and Networks
| |
An Early Case Study with Multi-Tenancy Support in SPDK’s NVMe-over-Fabric Designs | | |
Optimizing Workflow Performance by Elucidating Semantic Data Flow | | |
The Many Facets of a Dynamic Graph Processing System | | |
sys-sage: A Fresh View on Dynamic Topologies and Attributes of HPC Systems | | |
Simulating Application Agnostic Process Assignment for Graph Workloads on Dragonfly and Fat Tree Topologies | | |
Geospatial Filter and Refine Computations on NVIDIA Bluefield Data Processing Units (DPU) | | |
NeoRodinia: Evaluation of High-Level Parallel Programming Models and Compiler Transformation for GPU Offloading | | |
Integrating TEZIP into LibPressio: A Case Study of Integrating a Dynamic Application into a Static C Environment | | |
Characterizing GPU Effectiveness on NRP for IceCube fp32 Compute | | |
Exploring Userspace Memory Mapping for RDMA-Enabled Network-Attached Memory | | |
Minimizing Data Movement Using Distant Futures | | |
Why Wait!? Hades: An Active, Content-Aware System for Precalculating Derived Quantities | | |
Exploring Green Cryptographic Hashing Algorithms for Eco-Friendly Blockchains | | |
Automating HPC Model Selection on Edge Devices | | |
Graph Based Anomaly Detection in Chimbuko: Feasible or Fallible? | | |
Investigating Anomalies in Compute Clusters: An Unsupervised Learning Approach | | |
Temporal Classification of Allocations for Reduced Memory Usage | | |
Toward Inductive Synthesis of Compiler Heuristics: A Case Study with Register Allocation | | |
Neural Domain Decomposition for Variable Coefficient Poisson Solvers | | |
Software Development Case Study: The Acceleration of a Distributed Application Using GPUs | | |
Delivering Digital Skills Across the Digital Divide: Creating an Accessible On-Demand Self-Paced HPC Virtual Training Lab | | |
EE-HPC – A Framework for Energy Efficient HPC System Operation | | |
Real-Time Change Point Detection in Molecular Dynamics Streaming Data | | |
A High-Performance I/O Framework for Accelerating DNN Model Updates Within Deep Learning Workflow | | |
HPC Accelerated Generative Deep Learning Approach for Creating Digital Twins of Climate Models | | |
A Portable Software Environment for Ultrahigh-Resolution ELM Development on GPUs | | |
Optimizing Uncertainty Quantification of Vision Transformers in Deep Learning on Novel AI Architectures | | |
Two-Phase IO Enabling Large-Scale Performance Introspection |
Performance Measurement, Modeling, and Tools
| |
Characterizing One-/Two-Sided Designs in OpenSHMEM Collectives | | |
Modeling Parallel Programs Using Large Language Models | | |
MPI Performance Analysis in Vlasiator: Unraveling Communication Bottlenecks | | |
Exploring Julia as a Unifying End-to-End Workflow Language for HPC on Frontier | | |
Exploring the Impacts of Multiple I/O Metrics in Identifying I/O Bottlenecks | | |
Pipit: Simplifying Analysis of Parallel Execution Traces | | |
Characterizing the Performance of the Implicit Massively Parallel Particle-in-Cell iPIC3D Code | | |
Early Experience in Characterizing Training Large Language Models on Modern HPC Clusters | | |
Transfer Learning Workflow for High-Quality I/O Bandwidth Prediction with Limited Data | | |
DFToy: A New Proxy App for DFT Calculations | | |
Hybrid CPU-GPU Implementation of Edge-Connected Jaccard Similarity in Graph Datasets | | |
Preserving Data Locality in Multidimensional Variational Quantum Classification |
Artificial Intelligence/Machine Learning
| |
SCALABLE – Scalable Lattice Boltzmann Leaps to Exascale | | |
Improving Memory Interfacing in HLS-Generated Accelerators with Custom Caches |
Programming Frameworks and System Software
| |
Evaluating Performance Portability of GPU Programming Models |
Performance Measurement, Modeling, and Tools
| |
The Impact of Process Topology on RMA Programming Models: A Study on NERSC Perlmutter | | |
Scalable Fine-Grained Gang Scheduling for HPC Systems with Unreliable Broadcast Synchronization Mechanisms | | |
Sophisticated Tools for Performance Analysis and Auto-Tuning of Performance Portable Parallel Programming | | |
That's Right – The Same C++ STL Asynchronous Parallel Code Runs on CPUs and GPUs | | |
Simulating Larger Quantum Circuits with Circuit Cutting and Quantum Serverless | | |
Quantum Task Offloading with the OpenMP API | | |
Unleashing CGRA Potential for HPC | | |
Quantum Computing Case Study in Aerospace Field | | |
Radium: Transparent Distributed Execution via Process Virtualization | | |
QASM-to-HLS: A Framework for Accelerating Quantum Circuit Emulation on High-Performance Reconfigurable Computers | | |
Conversing Faults: The 2019 Ridgecrest Earthquake |
Data Analysis, Visualization, and Storage
| |
A Journey to the Center of the Milky Way: Stellar Orbits around Its Central Black Hole |
Data Analysis, Visualization, and Storage
| |
Visualizing Megafires: How AI Can Be Used to Drive Wildfire Simulations with Better Predictive Skill |
Data Analysis, Visualization, and Storage
| |
ExaWind at NREL: Upping the Ante |
Data Analysis, Visualization, and Storage
| |
Visualizing the Impact of the Asian Summer Monsoon on the Composition of the Upper Troposphere and Lower Stratosphere |
Data Analysis, Visualization, and Storage
| |