Design, build, and maintain scalable automated data pipelines (ETL/ELT) using technologies like Airflow, DBT, or Spark.
Create and optimize data warehouse schemas (Star, Snowflake) to ensure performant querying for downstream users.
Implement and manage cloud-based data storage solutions (e.g., Snowflake, BigQuery, Redshift).
Develop validation frameworks to ensure data integrity, accuracy, and security across all layers.
Monitor and tune system performance, identifying bottlenecks in complex queries or ingestion processes.
Partner with software engineers to integrate data from internal applications and with stakeholders to translate business requirements into technical specs.
Required Skills & Qualifications
5+ Years of Experience
Expert-level Python or Java/Scala, and masterful SQL skills.
Experience with distributed systems (Spark, Kafka, Flink) and containerization (Docker, Kubernetes).
Hands-on experience with AWS, GCP, or Azure data stacks.
Familiarity with CI/CD practices and “infrastructure as code” (Terraform).
A knack for debugging complex data flows and an obsession with automation.