Career Level Manager
Join a dedicated team of Agile SRE and test automation engineers who will be responsible for the E2E infrastructure, support, and deployment of code into Lab/Production environments. Highly uptime SLA oriented team and process driven. SRE team is expanding along with knowledge pool and converging on a single entity to support mobile/broadband for different geography.
What you'll do:
Design, Develop, document, and implement a PaaS solution to onboard and integrate vendor provided or requested applications with my client's telecommunications infrastructure.
As a SRE engineer, be responsible for the engineering and support of production environments, including automation of patches, upgrades, reliability, and performance improvements
Take part in an on-call Rota to action symptoms before they become outages and Automate dashboards and reporting for the platform against SLOs, SLAs and KPIs.
Develop assurance, monitoring, and management capabilities for PaaS infrastructure using Zabbix, Prometheus, Grafana, and ELK stack. Supervise and lead Linux VMs, Containers, and applications.
Manage the operational playbook for the PaaS infrastructure, the services running within it and creation automated reports for various services and PaaS infrastructure.
Support and lifecycle management of various applications and services, including patching, upgrades, updates, and troubleshooting.
What you'll bring:
Experience building and managing Kafka, Zookeeper, Couchbase, PostgreSQL and Consul clusters.
Linux system administration & configuration management, primarily with CentOS and Ubuntu.
Experience working with Git and performing code reviews.
Experience of building and maintaining CI/CD pipelines (Gitlab / Jenkins)
Experience with automation/orchestration with tools such as Ansible and Terraform