CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Daily Paper Cast

Jun 06, 2025•23 min•Ep. 874

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

🤗 Upvotes: 35 | cs.AR, cs.AI, cs.CL, cs.LG, cs.PL

Authors:
Ahmed Heakl, Sarim Hashmi, Gustavo Bertolo Stahl, Seung Hun Eddie Han, Salman Khan, Abdulrahman Mahmoud

Title:
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Arxiv:
http://arxiv.org/abs/2505.16968v3

Abstract:
We introduce CASS, the first large-scale dataset and model suite for cross-architecture GPU code transpilation, targeting both source-level (CUDA <--> HIP) and assembly-level (Nvidia SASS <--> AMD RDNA3) translation. The dataset comprises 70k verified code pairs across host and device, addressing a critical gap in low-level GPU code portability. Leveraging this resource, we train the CASS family of domain-specific language models, achieving 95% source translation accuracy and 37.5% assembly translation accuracy, substantially outperforming commercial baselines such as GPT-4o, Claude, and Hipify. Our generated code matches native performance in over 85% of test cases, preserving runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 16 GPU domains with ground-truth execution. All data, models, and evaluation tools are released as open source to foster progress in GPU compiler tooling, binary compatibility, and LLM-guided hardware translation.

For the best experience, listen in Metacast app for iOS or Android