Senior Apache Spark Technical Lead - Scala, Python
HCL Technologies
Software Engineering, IT
Job Summary
Job Description: Senior Data Engineer (PySpark / Dataproc / GCP)
We are looking for a Senior Data Engineer with strong hands-on expertise in Python (PySpark) and Google Cloud Dataproc to design, develop, and operate scalable data pipelines on Google Cloud Platform. This role focuses on building reliable, production-grade data solutions across batch and streaming use cases.
Key Responsibilities
Key Responsibilities
• Design, build, and optimize data pipelines using PySpark on Dataproc
• Develop performant, maintainable Spark jobs using Python, with a strong focus on reliability and cost efficiency
• Manage Dataproc clusters, including provisioning, tuning, autoscaling, and ephemeral cluster usage
• Design end-to-end data architectures from ingestion to analytics and downstream consumption
• Collaborate with data consumers, platform teams, and stakeholders to deliver scalable solutions
• Ensure data quality, observability, and operational excellence in production environments
Skill Requirements
Required Skills & Experience
Core Skills: PySpark & Dataproc
• Strong expertise in Python, with extensive hands-on experience using PySpark
• Deep experience developing, tuning, and optimizing Spark batch and streaming workloads
• Practical experience with Google Cloud Dataproc, including:
o Cluster lifecycle management
o Initialization actions and custom configurations
o Autoscaling policies and cost optimization
o Use of ephemeral clusters for job-based execution
• Solid understanding of Spark internals (execution plans, caching, partitions, joins, shuffles, checkpointing)
Google Cloud Platform (GCP)
• Strong working experience with core GCP services, including:
o BigQuery for analytics and data warehousing
o Google Cloud Storage (GCS) as a data lake
o Cloud Run for containerized data services and microservices
o Cloud SQL for relational and transactional workloads
o Pub/Sub for event-driven and streaming ingestion
• Familiarity with IAM, service accounts, and secure service-to-service communication
Programming Languages
• Advanced proficiency in Python for production data pipelines
• Experience with Scala and/or Java for Spark development is a plus
• Ability to write clean, testable, and well-documented code
Data Storage & Processing
• Proven experience designing data lakes on GCS, including:
o Partitioning strategies and lifecycle management
o Optimized file formats such as Parquet and Avro
• Strong experience integrating Spark pipelines with BigQuery
• Knowledge of data modeling concepts for analytics and reporting
Workflow Orchestration
• Experience orchestrating pipelines using:
o Apache Airflow (Cloud Composer), or
o Native Dataproc job submissions and workflow templates
• Familiarity with monitoring, alerting, retries, and dependency management
Data Pipeline Design
• Strong experience designing and developing end-to-end data pipelines
• Ability to build scalable, fault-tolerant, and maintainable systems
• Hands-on experience implementing data validation, error handling, logging, and monitoring
• Experience working with both batch and streaming processing patterns
Streaming & Event Driven Processing
• Hands-on experience with streaming data pipelines
• Practical understanding of event-based ingestion and near real-time processing
Other Requirements
Why HCLTech?
At HCLTech, you'll supercharge your potential. You'll find your career. And you'll find your spark. All at a place that knows that helping its customers stay on top starts by putting its people first.
HCLTech is a global technology company, home to more than 226,300 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2025 totaled $14.5 billion.
Benefits
At HCLTech, we believe in empowering our employees with comprehensive benefits that support their professional growth and enhance their well-being. When you sign up for a career with us, you gain access to:
Industry-benchmarked compensation
Best-in-class healthcare benefits
Personal time off
Maternity and paternity benefits
Access to skills / higher education programs/resources
Discounts on products and services via Benefit Box
Participate in CSR programs and live life with a purpose
Opportunities to grow and advance your career
Note: The benefits listed above vary depending on the nature of your employment and the country where you work. Some benefits may be available in some countries but not in all.

