Senior Site Reliability Engineer Lead
HCL Technologies
Software Engineering
Job Summary
As a Site Reliability Engineer supporting MQ, NATS/Event Broker, you will be responsible for the stability and resilience of Mastercard’s messaging backbone. You will partner closely with application teams, platform engineering, infrastructure, and security teams to reduce operational risk, improve system reliability, and ensure issues are detected and resolved before customer impact.
This is a production‑focused engineering role, not an application development role.
Key Responsibilities
- Ensure high availability, performance, and resilience of MQ, NATS/Event Broker platforms across environments.
- Participate in on‑call rotations and provide hands‑on support during production incidents.
- Lead or contribute to incident triage, mitigation, and service restoration.
- Perform root cause analysis (RCA) and drive corrective and preventive actions to closure.
- Design, implement, and maintain monitoring, alerting, and dashboards to enable proactive detection.
- Support and govern production changes, including upgrades, patching, certificate renewals, and configuration changes.
- Assess operational readiness for changes and ensure rollback and validation plans are in place.
- Automate operational tasks and workflows to reduce manual effort and improve recovery times.
- Partner with application teams to support onboarding, scaling, and operational best practices.
- Create and maintain runbooks, SOPs, and operational documentation.
- Contribute to continuous improvement of reliability, observability, and operational processes.
Skill Requirements
- Experience supporting mission‑critical production systems with on‑call responsibility.
- Strong understanding of distributed systems and messaging platforms.
- Hands‑on experience with MQ, NATS/Event Broker, or similar middleware technologies.
- Experience with monitoring, logging, and alerting tools.
- Proficiency in at least one scripting or programming language (e.g., Python, Bash, Java).
- Solid knowledge of Linux, networking fundamentals, and system troubleshooting.
- Ability to troubleshoot complex, multi‑component issues under pressure.
Other Requirements
- Experience operating enterprise‑scale messaging or event‑driven platforms.
- Familiarity with clustering, replication, persistence, and high‑availability patterns.
- Experience working in regulated environments with strong change management practices.
- Exposure to automation, reliability engineering, or SRE best practices.
Why HCLTech?
At HCLTech, you'll supercharge your potential. You'll find your career. And you'll find your spark. All at a place that knows that helping its customers stay on top starts by putting its people first.
HCLTech is a global technology company, home to more than 226,300 people across 60 countries, delivering industry-leading capabilities centered around digital, engineering, cloud and AI, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. Consolidated revenues as of 12 months ending December 2025 totaled $14.5 billion.
Benefits
At HCLTech, we believe in empowering our employees with comprehensive benefits that support their professional growth and enhance their well-being. When you sign up for a career with us, you gain access to:
Industry-benchmarked compensation
Best-in-class healthcare benefits
Personal time off
Maternity and paternity benefits
Access to skills / higher education programs/resources
Discounts on products and services via Benefit Box
Participate in CSR programs and live life with a purpose
Opportunities to grow and advance your career
Note: The benefits listed above vary depending on the nature of your employment and the country where you work. Some benefits may be available in some countries but not in all.

