Add examples for scaling LLM inference on ALCF systems by rickybalin · Pull Request #53 · argonne-lcf/GettingStarted

rickybalin · 2026-06-09T21:45:05Z

Add scripts showing how to scale LLM inference on Aurora and Polaris via vLLM. Included are example with MPI, EnsembleLauncher and Dragon covering a few different types of workflows.

To Do:

Improvements:

Do not do explicit batching of the prompts for the MPI case.
In MPI script, don't scatter prompts from 0, can read them from local-rank and scatter within node group or directly read from all ranks since prompt file is on /tmp.
In MPI script, likely don't need the HF token as argument, it's already set/checked in submit script.
CPU bindings in EL batched case.
vLLM initialization time does not scale well on Polaris for both MPI and EL_batched approaches.

… into llm_inference

* added EL scripts * Update README.md * modified Readme --------- Co-authored-by: harikrishna1410 <harikrishna1410@gmail.com>

Other small fixes here and there

Also fixed path on Aurora submit scripts

* modified request driven to match the dargon and MPI format * modified request driven to match the dargon and MPI format * Updated the request driven EL inference * bugfixes from merge

… into llm_inference

ALso update performance plots

rickybalin added 4 commits May 29, 2026 22:09

Make llm_inference directory

51335c1

Add preliminary dragon example

e3c849e

Add MPI workflow for llm inference

d5d4070

Fix scaling issue with mpi script

66225da

rickybalin mentioned this pull request Jun 9, 2026

Update of docs for vLLM on Aurora and Polaris argonne-lcf/user-guides#1207

Draft

9 tasks

rickybalin and others added 12 commits June 10, 2026 16:02

Add info to README and bcast.c

6e2518a

Add polaris script for mpi approach

e024a19

Reduce gpu mem utilization

b0c8ff5

Add tp>1 support on polaris for mpi approach

24a459b

Small changes for Aurora mpi approach

0643782

Merge with remote

aacd30a

update single gpu script for mpi on aurora

946def2

Add tp>1 support for Aurora mpi approach

de6baa5

Add offline vllm variables

377b97d

Small change to submit script

91c361a

Add initial performance plot

9d34a81

Updates for dragon

cf8c0c1

rickybalin self-assigned this Jun 16, 2026

rickybalin and others added 12 commits June 16, 2026 22:28

Update dragon scripts

9ff56a5

small change to dragon scripts

92b913c

Add dragon tp>1 script for polaris

0b1dc9d

Merge branch 'llm_inference' of github.com:argonne-lcf/GettingStarted…

65b4d3b

… into llm_inference

Update README

b4034e1

Add EnsembleLauncher scripts to LLM Inference examples (#54)

5981f4d

* added EL scripts * Update README.md * modified Readme --------- Co-authored-by: harikrishna1410 <harikrishna1410@gmail.com>

More docs for EL and initial changes to batched EL example

4205ba8

Other small fixes here and there

Add performance for Polaris

ad40489

More changes for EL

3077016

Also fixed path on Aurora submit scripts

(Broken) push structure of EL batch script

4df5b03

Batched EL runs on 1 Aurora node

c091c44

Integrate el2 (#55)

73e8ae1

* modified request driven to match the dargon and MPI format * modified request driven to match the dargon and MPI format * Updated the request driven EL inference * bugfixes from merge

rickybalin and others added 23 commits June 22, 2026 21:00

EL batched 8B models runs on 2 nodes

a744776

Some fixes to batched EL

dabf87c

Minor change

6d852b0

Add EL batched submit script for Polaris

adab18a

Implement new get_gpus() from EL

7ac1b7b

Merge branch 'llm_inference' of github.com:argonne-lcf/GettingStarted…

78efb3a

… into llm_inference

First pass support for EL batched on Polaris

8f9234a

Small changes for batched EL on Polaris

9a47ce1

Updates to EL request driven

4c6216a

Merge branch 'llm_inference' of github.com:argonne-lcf/GettingStarted…

3be3f37

… into llm_inference

More changes to request driven EL

7a5400d

added vllm logging in batched el inference

64040b0

Progress with EL request driven

04539fd

Merge branch 'llm_inference' of github.com:argonne-lcf/GettingStarted…

340a93d

… into llm_inference

Update README

72a5147

Increase wait timeout for futures in EL batched

d4ae783

Update performance plots

08f9a2a

Fix bug with request-driven EL scripts

64853d7

ALso update performance plots

Small change

e106695

Increase ready timeout for request driven EL

b2b6cfa

Remove Dragon until next release

76bbb9d

Add Polaris support for EL request driven

737955d

Update scaling plot

89cc727

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add examples for scaling LLM inference on ALCF systems#53

Add examples for scaling LLM inference on ALCF systems#53
rickybalin wants to merge 51 commits into
masterfrom
llm_inference

rickybalin commented Jun 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

rickybalin commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rickybalin commented Jun 9, 2026 •

edited

Loading