As a Lead ML engineer your goal is to lead a capable team of Senior ML Engineers to design, architect and optimise ML inference on a distributed network of heterogeneous GPUs. Further on, you are crucial part of scaling it to the world’s largest inference network by FLOPS.
How impact and your day looks like
- Scalability: You will enable large-scale deployments by optimizing model inference across a distributed network of heterogeneous GPUs. This ensures that our models can scale efficiently, regardless of the underlying hardware.
- Performance: Your work will minimize latency and maximize throughput, enabling faster and more reliable inference. This will directly affect user experience, making our applications more responsive and capable.
- Cost-Efficiency: By optimizing resource usage and reducing the overhead of running models on mixed GPU environments, you will help lower the operational costs associated with ML workloads.
- Design & architect: You push the boundaries of what's possible with GPU computing, exploring novel techniques for parallelism, load balancing, and model deployment that leverage the strengths of different GPU types.
- Production ready: You build production grade logging and metrics systems to monitor network health and performance.
- Developer Experience: You have greatly improved the experience for developers bringing their GPUs to Galadriel network, ensuring all features are clear, well-documented, and intuitive. Developers find Galadriel easy to understand and build on, with minimal unexpected issues.
- R&D: You have stayed up-to-date with the latest research in ML inference, applying cutting-edge advancements to Galadriel’s network.
Experience
You have previously lead teams which deployed and managed ML models at scale in production.
Requirements
- Strong problem-solving skills and ability to excel in an early-stage startup environment
- Strong in Python and C++
- Extensive experience with ML frameworks such as TensorFlow, PyTorch, ONNX, or JAX, particularly for deploying and optimizing models for inference.
- Strong experience with Linux
- Experience in optimizing code for GPU acceleration
- Can run experiments that validate a model and give concrete numbers for things we care about (e.g. latency, breaking limits of the system, inference throughput vs latency, etc)
- Team player with great communication skills to work closely with other engineers in the team
Bonus: