Senior Apache Spark Technical Lead - Scala, Python

HCL Technologies

HCL Technologies

Software Engineering, IT

Posted on May 29, 2026
Job Description
Senior Apache Spark Technical Lead - Scala, Python
Ambattur, Tamil Nadu

Job Summary

Job Description: Senior Data Engineer (PySpark / Dataproc / GCP)

We are looking for a Senior Data Engineer with strong hands-on expertise in Python (PySpark) and Google Cloud Dataproc to design, develop, and operate scalable data pipelines on Google Cloud Platform. This role focuses on building reliable, production-grade data solutions across batch and streaming use cases.

Key Responsibilities

Key Responsibilities

• Design, build, and optimize data pipelines using PySpark on Dataproc

• Develop performant, maintainable Spark jobs using Python, with a strong focus on reliability and cost efficiency

• Manage Dataproc clusters, including provisioning, tuning, autoscaling, and ephemeral cluster usage

• Design end-to-end data architectures from ingestion to analytics and downstream consumption

• Collaborate with data consumers, platform teams, and stakeholders to deliver scalable solutions

• Ensure data quality, observability, and operational excellence in production environments

Skill Requirements

Required Skills & Experience

Core Skills: PySpark & Dataproc

• Strong expertise in Python, with extensive hands-on experience using PySpark

• Deep experience developing, tuning, and optimizing Spark batch and streaming workloads

• Practical experience with Google Cloud Dataproc, including:

o Cluster lifecycle management

o Initialization actions and custom configurations

o Autoscaling policies and cost optimization

o Use of ephemeral clusters for job-based execution

• Solid understanding of Spark internals (execution plans, caching, partitions, joins, shuffles, checkpointing)

Google Cloud Platform (GCP)

• Strong working experience with core GCP services, including:

o BigQuery for analytics and data warehousing

o Google Cloud Storage (GCS) as a data lake

o Cloud Run for containerized data services and microservices

o Cloud SQL for relational and transactional workloads

o Pub/Sub for event-driven and streaming ingestion

• Familiarity with IAM, service accounts, and secure service-to-service communication

Programming Languages

• Advanced proficiency in Python for production data pipelines

• Experience with Scala and/or Java for Spark development is a plus

• Ability to write clean, testable, and well-documented code

Data Storage & Processing

• Proven experience designing data lakes on GCS, including:

o Partitioning strategies and lifecycle management

o Optimized file formats such as Parquet and Avro

• Strong experience integrating Spark pipelines with BigQuery

• Knowledge of data modeling concepts for analytics and reporting

Workflow Orchestration

• Experience orchestrating pipelines using:

o Apache Airflow (Cloud Composer), or

o Native Dataproc job submissions and workflow templates

• Familiarity with monitoring, alerting, retries, and dependency management

Data Pipeline Design

• Strong experience designing and developing end-to-end data pipelines

• Ability to build scalable, fault-tolerant, and maintainable systems

• Hands-on experience implementing data validation, error handling, logging, and monitoring

• Experience working with both batch and streaming processing patterns

Streaming & Event Driven Processing

• Hands-on experience with streaming data pipelines

• Practical understanding of event-based ingestion and near real-time processing

Other Requirements

1.Relevant certifications in apache spark, scala, or python are a plus
Information at a Glance

Why HCLTech?

At HCLTech, you'll supercharge your potential. You'll find your career. And you'll find your spark. All at a place that knows that helping its customers stay on top starts by putting its people first.

HCLTech is a global technology company, home to more than 226,300 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2025 totaled $14.5 billion.

Benefits

At HCLTech, we believe in empowering our employees with comprehensive benefits that support their professional growth and enhance their well-being. When you sign up for a career with us, you gain access to:

Industry-benchmarked compensation

Best-in-class healthcare benefits

Personal time off

Maternity and paternity benefits

Access to skills / higher education programs/resources

Discounts on products and services via Benefit Box

Participate in CSR programs and live life with a purpose

Opportunities to grow and advance your career

Note: The benefits listed above vary depending on the nature of your employment and the country where you work. Some benefits may be available in some countries but not in all.