Senior Site Reliability Engineer

Apply now

Thank you for your submission. We'll be in touch.

Oops! Something went wrong while submitting the form.

Responsibilities

We’re seeking a senior Site Reliability Engineer/DevOps who is passionate about building the best infrastructure and maintaining the health of the systems.

Design and maintain scalable, secure, and reliable infrastructure to support Regie.ai's SaaS platform and AI/data workloads.
Architect a unified monitoring and alerting system for engineering teams to continuously monitor and improve system availability, reliability, performance.
Drive infrastructure automation and CI/CD improvements to reduce operational overhead and deployment risk.
Optimize infrastructure costs, support compliance efforts (e.g., SOC 2), and enforce security best practices.

Required Skills & Qualifications

6+ years of experience in SRE, DevOps, or infrastructure engineering roles.
Extensive hands-on experience with AWS and its core services.
Strong experience with Terraform (or similar IaC tools), Docker and containerization, and modern CI/CD systems.
Proficient in scripting or programming languages such as Python and Bash.
Deep experience with monitoring and alerting tools (e.g., New Relic, Prometheus, Grafana, PagerDuty).
Strong hands-on experience with both SQL and NoSQL databases (e.g., MongoDB, PostgreSQL, MySQL).
Proven track record of designing and maintaining production-grade infrastructure with high availability and low latency.
Excellent troubleshooting abilities, along with strong communication and collaboration skills.
Solid understanding of cloud security and compliance best practices, including SOC 2 readiness and audit support.