Senior Site Reliability Engineer

EPAM · зарплата не указана · локация не указана · сайт компании · опубликовано 9 июня 2026 г.

Компания EPAM

Источник сайт компании

Опубликовано 9 июня 2026 г.

Зарплата зарплата не указана

Описание вакансии

We are seeking a Senior Site Reliability Engineer to join our SRE/DevOps organization. The successful candidate will design and implement automated provisioning, deployment, management, and monitoring solutions for a large-scale, rapidly evolving portfolio of SaaS services. Working closely with architecture and development teams, the Senior Engineer will contribute to engineering standards and best practices, drive CI/CD and observability improvements, and support the team in delivering reliable, scalable infrastructure.
Responsibilities
Design and implementation of CI/CD pipelines leveraging Kubernetes, Flux, and related cloud-native technologies
Implementation and maintenance of monitoring and management solutions for cloud-based products using a combination of commercial off-the-shelf (COTS) and in-house tooling
Collaboration with architects and development teams on standardized, scalable approaches for log management, service components, and infrastructure elements
Evaluation and integration of AI-enabled tooling across observability, pipeline efficiency, and SRE troubleshooting workflows
Development of DevOps tooling that reduces manual toil, strengthens security posture, and minimizes human error
Build and maintenance of resilient, self-scaling systems that minimize customer impact while supporting a sustainable operational environment
Participation in incident response, root cause analysis, and post-incident review processes
Requirements
Minimum 7 years of professional experience in software development, DevOps, and/or Site Reliability Engineering
Minimum 3 years of experience with building and maintaining CI/CD pipelines and SRE automation within cloud environments at scale
Experience with monitoring and alerting platforms (e.g., PagerDuty, Prometheus, Grafana)
Hands-on experience with deployment and management of cloud infrastructure
Experience with Amazon Web Services (e.g., EC2, Elasticsearch, Lambda, CloudFormation)
Working experience with at least one additional cloud provider (GCP, Azure, or OCI)
Experience with CI/CD toolchains (Jenkins, Kubernetes, Flux)
Proficiency in one or more of the following languages: Python, Go, Java, or C
English proficiency at B2 level or higher
Nice to have
Experience with application of Generative AI or ML-based tooling within an operations context
Experience with DevSecOps practices and security automation
Background in architecture of monitoring and management systems for enterprise SaaS products

Навыки

site reliability engineering
amazon web services
grafana
jenkins
kubernetes
pagerduty
prometheus
flux
go language
google cloud platform
java
microsoft azure

Открыть вакансию в ленте