Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

NVIDIA

Santa Clara, CA, United States Full-time June 03, 2026

Opportunity Description

NVIDIA Dynamo is a high-throughput, low-latency inference framework for serving generative AI and reasoning models across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU shards, routes requests, and manages shared KV cache across heterogeneous clusters so that many accelerators feel like a single system at datacenter scale. As large language models rapidly outgrow the memory and compute budget of any single GPU, this platform enables efficient, resilient deployment of cutting-edge LLM workloads.

We are seeking a Principal Systems Engineer to define the vision and roadmap for memory management of large-scale LLM and storage systems.

What you'll be doing:
+ Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote file/object/cloud storage to support large-scale LLM inference.
+ Architect and implement deep integrations w...

Full-time other-general

Ready to Apply?

Submit your application for Principal Software Engineer – Large-Scale LLM Memory and Storage Systems at NVIDIA

Apply for this Position

Location Santa Clara, CA

Country United States

Type Full-time

Category other-general

Posted June 03, 2026

Deadline June 07, 2026

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

Opportunity Description

Ready to Apply?

Opportunity Details

About NVIDIA

NVIDIA

Share This Opportunity