Who are we?
We are Spyne, redefining how cars are marketed and sold with cutting-edge Generative AI. What started as a bold idea—using AI-powered visuals to help dealers sell online faster—has evolved into a full-fledged AI-first automotive retail ecosystem.
Backed by $16 M in Series A funding from Vertex Ventures, Accel, and other top investors, we're scaling fast:
✔ Expanded across the US & EU markets
✔ Launched industry-first AI-powered Image & 360° solutions
✔ Achieved a 5× revenue surge in 15 months, aiming for 3–4× growth this year
🚀 Know Our Journey
- 2020: Launched as a visual merchandising platform
- 2023: Pivoted to AI-driven automotive retail solutions
- 2024: Achieved 5× revenue growth in 15 months, aiming for 3–4× more
- Today: Driving the GenAI revolution with AI-powered sourcing, pricing, CRM, and Agentic AI for dealerships
👉 Read more about us:
What Are We Looking For?
We're seeking a highly skilled Data Engineer to establish and own Spyne's dedicated data engineering function—our first. As Spyne AI accelerates market penetration across US rooftops and processes massive, high-velocity streams of unstructured computer vision payloads (Studio AI) and conversational state events (Vini AI), our data warehousing volume and complexity have scaled exponentially.
This is not a maintenance role. You will actively restructure our entire data platform—taking absolute ownership from our DevOps team and raising our data infrastructure to true tech-industry standards. You'll build the foundational data layer that powers our BI platforms, ML observability, and long-term analytics roadmap.
📍 Location: Gurugram (Work from Office, 5 days a week)
🖥 Role: Full-Time, Data Engineer
What Will You Do?
- Data Warehousing Architecture & Modeling: Design, scale, and own our core enterprise Data Warehouse built on ClickHouse Cloud—implementing robust data modeling methodologies, efficient time-based partitioning, and the centralized foundational data layer that powers all downstream BI platforms.
- CDC & Schema Evolution: Spearhead our transition from self-hosted Debezium/Kafka to managed ClickPipes; design performant ELT pipelines using ClickHouse Materialized Views, JSONExtract, and arrayJoin functions to parse deep, complex MongoDB Atlas JSON arrays into clean, flattened analytical tables.
- Advanced ClickHouse Engine Tuning: Manage SharedReplacingMergeTree tables partitioned by time; handle complex edge cases including cross-partition physical deletions (MongoDB tombstone events) and eliminate Cartesian explosions during array joins and LEFT JOIN operations.
- Event Sourcing for ML Pipelines: Maintain and optimize our append-only observability architecture (SQS → Lambda → ClickHouse Async Inserts) to track GPU and CPU ML workloads orchestrated via AWS Step Functions and AWS Batch, leveraging AggregatingMergeTree and anyLast state combinators to unify partial state updates.
- Performance Optimization & OOM Prevention: Troubleshoot and optimize heavy analytical queries to prevent concurrent Out-Of-Memory crashes when Metabase dashboards fire heavy models simultaneously; aggressively push down filters and leverage GLOBAL IN hash-lookups to eliminate broadcast overhead.
- AWS Data Networking: Navigate secure cross-account data transit within public-internet-denied cloud perimeters using AWS PrivateLink, VPC Lattice Service Networks, and MSK Multi-VPC connectivity secured via IAM authentication.
- Data Platform Ownership: Define and drive our long-term data warehousing roadmap; document architecture decisions, establish data quality standards, and reduce bandwidth currently falling on the DevOps team.
- Collaboration: Partner closely with ML Engineering, Product, and DevOps teams to ensure data pipelines are reliable, observable, and aligned with evolving product requirements.
What You Must Have?
- Experience: 3-5 years in a dedicated data engineering role, with proven ownership of production-grade data warehouse or analytics infrastructure.
- ClickHouse: Deep, hands-on expertise with ClickHouse—including engine selection (ReplacingMergeTree, AggregatingMergeTree), Materialized Views, partitioning strategies, and query optimization. ClickHouse Cloud experience is strongly preferred.
- ELT & CDC Pipelines: Demonstrated experience designing and operating Change Data Capture pipelines using Debezium, Kafka, or managed equivalents (ClickPipes, AWS DMS); strong command of schema evolution and data transformation patterns.
- Complex Data Modeling: Proficiency in parsing and flattening deeply nested JSON structures (JSONExtract, arrayJoin); experience modeling data from NoSQL sources (MongoDB Atlas) into analytical schemas.
- Event-Driven & Streaming Architectures: Hands-on experience with event sourcing patterns, async insert architectures, and streaming systems such as Apache Kafka / AWS MSK and AWS SQS/Lambda.
- AWS Infrastructure: Strong working knowledge of AWS data services—S3, Lambda, Step Functions, Batch, MSK—and AWS networking constructs including PrivateLink, VPC, and IAM-based authentication.
- Performance Debugging: Proven ability to diagnose and resolve OOM errors, slow analytical queries, and pipeline bottlenecks at scale.
- Scripting & Automation: Proficient in Python and SQL for pipeline development, data validation, and operational tooling.
- Multi-Database Expertise (Strong Plus): Architecture experience and performance tuning across MySQL, PostgreSQL, MongoDB, and Kafka—comfort navigating polyglot data environments is highly valued.
- Education: Bachelor's or master's degree in Computer Science, Data Engineering, or a related field.
Why is Spyne an Employee-Centric Company? 🚀
- Comprehensive Health & Life Coverage – GMC, GPA, and GTLI benefits for you and your family
- Performance-Driven Growth – Fast career progression, ownership from Day 1, and stock options for top performers
- Elevate Learning & Development – Access LinkedIn Learning, mentorship programs, and hands-on AI data projects to upskill daily
- Collaborative Office Culture – Thrive in our energetic, innovation-first workplace
Why Spyne?
- Strong Culture: A supportive, transparent environment with high autonomy
- Competitive Compensation: Market-leading salary, equity, and benefits
- Dynamic Growth: Join us at a pivotal growth stage—be the architect of our entire data platform, not just a contributor
- Cutting-Edge Tech: Work with ClickHouse Cloud, distributed ML pipelines, and real-time automotive AI data at a scale very few engineers encounter