We are seeking an experienced technical leader to head our collective communication library development team. This role involves leading a team of engineers in developing high-performance collective communication implementations for multi-NPU and multi-node AI workloads.
Key Responsibilities
Lead the design and development of collective communication primitives (All-Reduce, All-to-All, Gather/Scatter and etc)
Architect scalable communication protocols for multi-NPU and multi-node systems
Optimize communication performance for NPU architectures
Provide technical leadership to the team members in NPU programming, distributed systems, and communication protocols
Work with a success-driven worldwide international team (Network, NPU, QA, AI, DL/ML Framework)
Define project milestones, deliverables, and technical roadmaps
Ensure compatibility with major AI frameworks (PyTorch, TensorFlow, JAX).
Key Responsibilities
Lead the design and development of collective communication primitives (All-Reduce, All-to-All, Gather/Scatter and etc)
Architect scalable communication protocols for multi-NPU and multi-node systems
Optimize communication performance for NPU architectures
Provide technical leadership to the team members in NPU programming, distributed systems, and communication protocols
Work with a success-driven worldwide international team (Network, NPU, QA, AI, DL/ML Framework)
Define project milestones, deliverables, and technical roadmaps
Ensure compatibility with major AI frameworks (PyTorch, TensorFlow, JAX).
Requirements:
Required Qualifications
BSc/MSc in computer science/computer engineering or equivalent
8+ years of experience in systems programming and distributed computing
5+ years of leadership experience managing technical teams
Expert-level C/C++ programming with focus on performance optimization
Experience with NPU programming (Triton / CUDA / HIP / OpenCL)
Deep understanding of distributed systems, communication protocols, and network programming
Experience with DL/ML frameworks (PyTorch, TensorFlow) and distributed training / inferencing
Experience with performance profiling and optimization tools
Strong communication and interpersonal skills
Preferred Qualifications
Experience with NPU communication library development
Contributions to open-source projects (PyTorch, TensorFlow, communication libraries)
Familiarity with containerization and orchestration
Interoperability experience with partners, vendors and external teams.
Required Qualifications
BSc/MSc in computer science/computer engineering or equivalent
8+ years of experience in systems programming and distributed computing
5+ years of leadership experience managing technical teams
Expert-level C/C++ programming with focus on performance optimization
Experience with NPU programming (Triton / CUDA / HIP / OpenCL)
Deep understanding of distributed systems, communication protocols, and network programming
Experience with DL/ML frameworks (PyTorch, TensorFlow) and distributed training / inferencing
Experience with performance profiling and optimization tools
Strong communication and interpersonal skills
Preferred Qualifications
Experience with NPU communication library development
Contributions to open-source projects (PyTorch, TensorFlow, communication libraries)
Familiarity with containerization and orchestration
Interoperability experience with partners, vendors and external teams.
This position is open to all candidates.



















