Skip to content

Add examples for scaling LLM inference on ALCF systems#53

Draft
rickybalin wants to merge 51 commits into
masterfrom
llm_inference
Draft

Add examples for scaling LLM inference on ALCF systems#53
rickybalin wants to merge 51 commits into
masterfrom
llm_inference

Conversation

@rickybalin

@rickybalin rickybalin commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Add scripts showing how to scale LLM inference on Aurora and Polaris via vLLM. Included are example with MPI, EnsembleLauncher and Dragon covering a few different types of workflows.

To Do:

  • MPI on Aurora
  • MPI on Polaris
  • EL batched on Aurora
  • EL batched on Polaris
  • EL request-driven on Aurora
  • EL request-driven on Polaris
  • Dragon request-driven on Aurora
  • Dragon request-driven on Polaris
  • Performance plot comparing approaches on Aurora
  • README with instructions for all approaches

Improvements:

  • Do not do explicit batching of the prompts for the MPI case.
  • In MPI script, don't scatter prompts from 0, can read them from local-rank and scatter within node group or directly read from all ranks since prompt file is on /tmp.
  • In MPI script, likely don't need the HF token as argument, it's already set/checked in submit script.
  • CPU bindings in EL batched case.
  • vLLM initialization time does not scale well on Polaris for both MPI and EL_batched approaches.

@rickybalin rickybalin self-assigned this Jun 16, 2026
rickybalin and others added 12 commits June 16, 2026 22:28
* added EL scripts

* Update README.md

* modified Readme

---------

Co-authored-by: harikrishna1410 <harikrishna1410@gmail.com>
Also fixed path on Aurora submit scripts
* modified request driven to match the dargon and MPI format

* modified request driven to match the dargon and MPI format

* Updated the request driven EL inference

* bugfixes from merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants