Presentation
PcMINER: Mining Performance-Related Commits at Scale
DescriptionPerformance inefficiencies in software can severely impact application quality and resource utilization. Addressing these issues often requires significant developer effort, yet the lack of large-scale, open-source performance datasets hinders the development of effective mitigation strategies. To fill this gap, we present PcMINER, a tool that mines performance inefficiency-related commits from GitHub at scale. PcMINER uses PcERT-KD, a transformer model that classifies these commits with accuracy comparable to 7B parameter LLMs but with reduced computational costs, making it ideal for CPU cluster deployment. By mining GitHub repositories with a 50-node CPU cluster, PcMINER has generated a dataset of 162K performance-related commits in C++ and 103.8K in Python. This dataset promises to enhance data-driven approaches to detecting performance inefficiencies.
In the poster session, I will present the problem, motivation, methodology, and results, with additional details that may be accessible through a QR code, and will provide a brief oral overview.
In the poster session, I will present the problem, motivation, methodology, and results, with additional details that may be accessible through a QR code, and will provide a brief oral overview.

Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Doctoral Showcase
Posters
TimeTuesday, 19 November 202412pm - 5pm EST
LocationB302-B305
TP
XO/EX