As a Lead ML engineer your goal is to lead a capable team of Senior ML Engineers to design, architect and optimise ML inference on a distributed network of heterogeneous GPUs. Further on, you are crucial part of scaling it to the world’s largest inference network by FLOPS.

How impact and your day looks like

Scalability: You will enable large-scale deployments by optimizing model inference across a distributed network of heterogeneous GPUs. This ensures that our models can scale efficiently, regardless of the underlying hardware.
Performance: Your work will minimize latency and maximize throughput, enabling faster and more reliable inference. This will directly affect user experience, making our applications more responsive and capable.
Cost-Efficiency: By optimizing resource usage and reducing the overhead of running models on mixed GPU environments, you will help lower the operational costs associated with ML workloads.
Design & architect: You push the boundaries of what's possible with GPU computing, exploring novel techniques for parallelism, load balancing, and model deployment that leverage the strengths of different GPU types.
Production ready: You build production grade logging and metrics systems to monitor network health and performance.
Developer Experience: You have greatly improved the experience for developers bringing their GPUs to Galadriel network, ensuring all features are clear, well-documented, and intuitive. Developers find Galadriel easy to understand and build on, with minimal unexpected issues.
R&D: You have stayed up-to-date with the latest research in ML inference, applying cutting-edge advancements to Galadriel’s network.

Experience

You have previously lead teams which deployed and managed ML models at scale in production.

Requirements

Strong problem-solving skills and ability to excel in an early-stage startup environment
Strong in Python and C++
Extensive experience with ML frameworks such as TensorFlow, PyTorch, ONNX, or JAX, particularly for deploying and optimizing models for inference.
Strong experience with Linux
Experience in optimizing code for GPU acceleration
Can run experiments that validate a model and give concrete numbers for things we care about (e.g. latency, breaking limits of the system, inference throughput vs latency, etc)
Team player with great communication skills to work closely with other engineers in the team

Bonus: