Builder Daily

2026-05-10

Fine-tuning vision-language-action models on a single DGX Spark — what works in May 2026

Pi-0.5, OpenVLA-2, and RT-2-Edge can all be LoRA-finetuned on a single DGX Spark (128 GB unified) with 100-300 demos. The 4-hour overnight run gives you a deployable robot policy.

For roboticists running a DGX Spark (or comparable 128 GB unified-memory consumer-tier rig), the May 2026 question of “can I fine-tune a vision-language-action model overnight on this thing” has a clear yes answer for three model families. Here’s what actually works.

The three options

ModelParamsLoRA fits inTraining time / 200 demosInference on Spark
OpenVLA-2-7B7B24 GB peak3.5 hours8 Hz
Pi-0.53B (visual+action heads)14 GB peak1.8 hours22 Hz
RT-2-Edge5B19 GB peak2.7 hours14 Hz

All three fit comfortably in DGX Spark’s 128 GB unified memory with room for the full optimizer state. None require gradient checkpointing.

What you need before starting

Pi-0.5 is the easiest entry point — fastest training, smallest memory footprint, ships a public LoRA recipe.

git clone https://github.com/physical-intelligence/pi-zero-five
cd pi-zero-five
pip install -e .[lora]
python tools/finetune_lora.py \
  --model pi-zero-five-base \
  --demos /path/to/your-task-demos \
  --output ./checkpoints/your-task \
  --rank 32 --steps 8000 --lr 1e-4 \
  --eval_every 1000 --eval_demos 20

Expect ~1.8 hours for 200 demonstrations, 8000 steps. Save every 1000 steps and pick the eval-best checkpoint.

What doesn’t work yet on Spark

Practitioner note

For robotics builders evaluating Spark as a fine-tune rig: it works, with caveats. LoRA on 3-7B VLAs is comfortable. Full fine-tuning is not. If your project requires full FT of a 7B+ VLA, plan for cloud GPU time on a separate budget line; if LoRA is sufficient (most policy adaptation tasks are), one Spark gets you through prototype-to-pilot for ~$3500 capex. Start with Pi-0.5 — it’s the fastest iteration loop and the policy quality is competitive with OpenVLA-2 on contact-rich tasks.


Sources

Tip