Senior Site Reliability Engineer
Apply now
Thank you for your submission. We'll be in touch.
Oops! Something went wrong while submitting the form.
Responsibilities
We’re seeking a senior Site Reliability Engineer/DevOps who is passionate about building the best infrastructure and maintaining the health of the systems.
- Design and maintain scalable, secure, and reliable infrastructure to support Regie.ai's SaaS platform and AI/data workloads.
- Architect a unified monitoring and alerting system for engineering teams to continuously monitor and improve system availability, reliability, performance.
- Drive infrastructure automation and CI/CD improvements to reduce operational overhead and deployment risk.
- Optimize infrastructure costs, support compliance efforts (e.g., SOC 2), and enforce security best practices.
Required Skills & Qualifications
- 6+ years of experience in SRE, DevOps, or infrastructure engineering roles.
- Extensive hands-on experience with AWS and its core services.
- Strong experience with Terraform (or similar IaC tools), Docker and containerization, and modern CI/CD systems.
- Proficient in scripting or programming languages such as Python and Bash.
- Deep experience with monitoring and alerting tools (e.g., New Relic, Prometheus, Grafana, PagerDuty).
- Strong hands-on experience with both SQL and NoSQL databases (e.g., MongoDB, PostgreSQL, MySQL).
- Proven track record of designing and maintaining production-grade infrastructure with high availability and low latency.
- Excellent troubleshooting abilities, along with strong communication and collaboration skills.
- Solid understanding of cloud security and compliance best practices, including SOC 2 readiness and audit support.