Data EngineeringFull-timeHybrid

Senior Data Engineer (Apache Spark)

Lead the design of large-scale distributed data processing systems using Apache Spark and cloud platforms at APPIT Software in San Francisco.

San Francisco, USA

Full-time

Data Engineering

Responsibilities

Architect and optimize large-scale Spark jobs processing terabytes of data daily
Design data lake and lakehouse architectures on AWS S3 or Azure Data Lake Storage
Mentor junior data engineers on distributed computing best practices and Spark internals
Build and maintain real-time and batch data pipelines with robust fault-tolerance
Partner with ML engineers to deliver feature stores and training data sets at scale
Drive performance tuning including partitioning, caching, and shuffle optimization strategies

Requirements

6+ years of data engineering experience with at least 3 years focused on Apache Spark
Deep understanding of distributed computing, MapReduce paradigms, and cluster resource management
Expert-level Python and/or Scala programming for Spark application development
Experience with cloud data services (AWS EMR, Glue, Redshift, or Azure Synapse)
Strong knowledge of data lake architectures, Delta Lake, or Apache Iceberg table formats
Proven ability to optimize Spark jobs for cost efficiency and processing speed

Nice to Have

Experience with Databricks Unified Analytics Platform
Knowledge of streaming with Spark Structured Streaming or Flink
Contributions to open-source data projects

Skills

Apache SparkPythonScalaAWSDelta LakeSQLData LakeAirflow

Apply for this position

Fill in your details below to submit your application.

Related Positions

Data EngineeringOn-site

Senior Data Engineer (Apache Spark)

Lead the design of large-scale distributed data processing systems using Apache Spark and cloud platforms at APPIT Software in San Francisco.

San Francisco, USA

Full-time

Data Engineering

Responsibilities

Architect and optimize large-scale Spark jobs processing terabytes of data daily
Design data lake and lakehouse architectures on AWS S3 or Azure Data Lake Storage
Mentor junior data engineers on distributed computing best practices and Spark internals
Build and maintain real-time and batch data pipelines with robust fault-tolerance
Partner with ML engineers to deliver feature stores and training data sets at scale
Drive performance tuning including partitioning, caching, and shuffle optimization strategies

Requirements

6+ years of data engineering experience with at least 3 years focused on Apache Spark
Deep understanding of distributed computing, MapReduce paradigms, and cluster resource management
Expert-level Python and/or Scala programming for Spark application development
Experience with cloud data services (AWS EMR, Glue, Redshift, or Azure Synapse)
Strong knowledge of data lake architectures, Delta Lake, or Apache Iceberg table formats
Proven ability to optimize Spark jobs for cost efficiency and processing speed

Nice to Have

Experience with Databricks Unified Analytics Platform
Knowledge of streaming with Spark Structured Streaming or Flink
Contributions to open-source data projects

Skills

Apache SparkPythonScalaAWSDelta LakeSQLData LakeAirflow

Senior Data Engineer (Apache Spark)

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

Data Engineer (Python & SQL)

Snowflake Data Engineer

Senior DevOps Engineer

Senior AI/ML Engineer

Senior Data Engineer (Apache Spark)

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

Data Engineer (Python & SQL)

Snowflake Data Engineer

Senior DevOps Engineer

Senior AI/ML Engineer