A

Software Development Manager, AWS Neuron SDK - Distributed Training

Amazon

Cupertino, CA, United States Full-time June 29, 2026
Apply Now

Opportunity Description

Description
Job description
AWS Neuron is a software stack for the Annapurna Inferentia and Trainium machine
learning accelerators hosted inside AWS EC2 Trn1/2 and Inf1 servers.

As the Principal Engineer for the Neuron Distributed Training team, you will be responsible for working hands-on with a strong team of engineers to help design and optimize ML on Neuron devices. Specifically focus on bringing up a coherent solution across the stack to increase the training resiliency for ultra clusters with thousands of nodes. You will Scale and Optimize the application stack for LLMs that leverage multi-modal modes of input/output-generation such as Text, Vision, Video, Audio etc. You will be responsible for the full development life cycle of providing Distributed Training support for multi-modal transformer models such as MM-Llama3.2, DiT/Pixart, CLIP etc. You will develop scalability features and performance optimizations in the Neuron ML Framework components to enable them...
Full-time other-general

Ready to Apply?

Submit your application for Software Development Manager, AWS Neuron SDK - Distributed Training at Amazon

Apply for this Position