You will be part of the Engineering team at ServiceTitan to help improve our products and building new ones. We provide exciting opportunities for engineers to come in and develop the major features in the rapidly growing startup. We build for perfection, use the most modern DevOps tools on Microsoft .NET platform, have an amazing culture, and love to solve complex problems.
As a Senior SRE Manager, you will lead a team of world-class engineers globally distributed in support of ServiceTitan’s infrastructure and systems automation. The SRE Manager will have the technical ability to support and solve all issues while being able to build a team of intelligent engineers empowering success. You will be a subject matter expert for ServiceTitan infrastructure and manage a variety of initiatives to improve reliability and stability.
ServiceTitan engineers support a culture of collaboration, curiosity, and openness to help us all achieve success. Site Reliability engineering is a discipline which combines knowledge of software development and infrastructure engineering. We support a DevOps mentality and folks who take big risks to solve complex problems. The ability to provide creative solutions in high-pressure situations is a key attribute to a successful Site Reliability Engineering Manager.
As our Senior Manager, Site Reliability Engineering, you will:
- Lead a team of software engineers in support of ServiceTitan’s infrastructure and services; Be an example for all engineers to provide leadership with a focus on problem solving via automation
- Own end-to-end uptime metrics with a focus on “lights-on” mentality to maintain mission critical systems for our customers
- Improve reliability in all realms of infrastructure including systems availability, monitoring, metrics, load balancing, performance, and deployments
- Own cloud infrastructure
- Own and improve CI/CD process, systems management, fail-over procedures, and creating redundancies at all levels
- Design, write, and own software to improve reliability, performance, latency, and availability of ServiceTitan’s world class application
- Be responsible for owning and improving all aspects of Cloud services (Azure, AWS, GCP), CI/CD, blue-green deployments, O/S (Linux/Windows), container orchestration, networking, load-balancing, service discovery, and security
To be successful in this role, you'll need:
- Expertise in Software Development in one or more of the following languages: C#, Python, or Golang
- Experience managing an engineering team supporting a 24/7 available web/Saas application at scale
- BA/BS degree in Computer Science or related technical field, or equivalent practical experience
- Experience with the Azure, Windows, Linux and other Microsoft software/services
- Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns.
- Demonstrated ability to debug, fix, and optimize code
- Full-stack troubleshooting skills across network, application, hardware, and distributed services layers
- Superb communications skills, both written and verbal
- Passionate about developing software to solving complex infrastructure challenges
- Excited about delivering a reliable, scalable, and performant infrastructure
- Highly motivated, smart, independent, and problem solver who thrives in a fast-paced, bottoms-up environment
- Intensely eager to meet the needs of our customers and deliver best-of-breed SaaS solutions