Required Data Engineer – AI Infra Group
Tel Aviv Full-time
The Job
We are on an expedition to find you, someone who is passionate about creating intuitive, out-of-this-world data platforms. You'll architect and ship our streaming lake-house and data platform, turning billions of raw threat signals into high-impact, self-serve insights that protect countries in real time all while building on top-of-the-line technologies, such as Iceberg, Flink, Paimon, Fluss, LanceDB, ClickHouse and more.
Responsibilities
Design and maintain agentic data pipelines that adapt dynamically to new sources, schemas, and AI-driven tasks
Build self-serve data systems that allow teams to explore, transform, and analyze data with minimal engineering effort
Develop modular, event-based pipelines across AWS environments, combining cloud flexibility with custom open frameworks
Automate ingestion, enrichment, and fusion of cybersecurity data including logs, configs, and CTI streams
Collaborate closely with AI engineers and researchers to operationalize LLM and agent pipelines within the CLM ecosystem
Implement observability, lineage, and data validation to ensure reliability and traceability
Scale systems to handle complex, high-volume data while maintaining adaptability and performance
Own the data layer end-to-end including architecture, documentation, and governance.
Tel Aviv Full-time
The Job
We are on an expedition to find you, someone who is passionate about creating intuitive, out-of-this-world data platforms. You'll architect and ship our streaming lake-house and data platform, turning billions of raw threat signals into high-impact, self-serve insights that protect countries in real time all while building on top-of-the-line technologies, such as Iceberg, Flink, Paimon, Fluss, LanceDB, ClickHouse and more.
Responsibilities
Design and maintain agentic data pipelines that adapt dynamically to new sources, schemas, and AI-driven tasks
Build self-serve data systems that allow teams to explore, transform, and analyze data with minimal engineering effort
Develop modular, event-based pipelines across AWS environments, combining cloud flexibility with custom open frameworks
Automate ingestion, enrichment, and fusion of cybersecurity data including logs, configs, and CTI streams
Collaborate closely with AI engineers and researchers to operationalize LLM and agent pipelines within the CLM ecosystem
Implement observability, lineage, and data validation to ensure reliability and traceability
Scale systems to handle complex, high-volume data while maintaining adaptability and performance
Own the data layer end-to-end including architecture, documentation, and governance.
Requirements:
5+ years of experience building large-scale distributed systems or platforms, preferably in ML or data-intensive environments
Proficiency in Python with strong software engineering practices, familiarity with data structures and design patterns
Deep understanding of orchestration systems (e.g., Kubernetes, Argo) and distributed computing frameworks (e.g., Ray, Spark)
Experience with GPU compute infrastructure, containerization (Docker), and cloud-native architectures
Proven track record of delivering production-grade infrastructure or developer platforms
Solid grasp of ML workflows, including model training, evaluation, and inference pipelines.
5+ years of experience building large-scale distributed systems or platforms, preferably in ML or data-intensive environments
Proficiency in Python with strong software engineering practices, familiarity with data structures and design patterns
Deep understanding of orchestration systems (e.g., Kubernetes, Argo) and distributed computing frameworks (e.g., Ray, Spark)
Experience with GPU compute infrastructure, containerization (Docker), and cloud-native architectures
Proven track record of delivering production-grade infrastructure or developer platforms
Solid grasp of ML workflows, including model training, evaluation, and inference pipelines.
This position is open to all candidates.


















