Careers

                                                                                                                      

Senior Site Reliability Engineer at ServiceTitan
Glendale, CA, US

You will be part of the Engineering team at ServiceTitan to help improve our products and build new ones. We provide exciting opportunities for engineers to come in and have a huge impact on a rapidly growing startup. We build for perfection, use the most modern tools on the Microsoft .NET platform, have an amazing culture, and love to solve complex problems.

At ServiceTitan, the SRE team engages the entire lifecycle of software development from ideation to operating predictably at scale.  As an SRE at ServiceTitan, you will identify and build software to improve uptime, improve performance, and improve the overall customer experience. You will collaborate with architects and software engineers to deliver a highly available and highly automated infrastructure.

As our Senior Site Reliability Engineer you will:

  •   Design, develop, and deliver the necessary software engineering solutions to manage Azure
        cloud environments to minimize failed customer interactions.
    ●   Own reliability, availability, and performance of ServiceTitan’s SaaS.
    ●   Proactively monitor, measure, and improve all areas of infrastructure and operations.
  •   Increase efficiencies through automation, service delivery, and process improvements.

To be successful in this role, you'll need:

  • 4 years of experience in scripting, 2 of these types: Python, PowerShell, Bash, GO or DOS.
  • Experience running and maintaining customer-facing Internet-oriented production environment.
  • 2+ years of operational experience with widely used mobile applications.
  • BA/BS in Computer Science, Computer Engineering or in a related technical discipline.
  • Be able to craft beautiful infrastructure as code solutions.
  • Experience with the Azure, Windows, Linux and common Microsoft software/services.
  • Experience leveraging cloud architecture and applying site reliability principles.
  • Demonstrated sensitivity to operational concerns.
  • Demonstrated ability to debug code and troubleshoot outages.
  • Full-stack troubleshooting skills across all software and hardware layers.
  • Superb communication skills, both written and verbal.
  • Passionate about solving complex infrastructure challenges.
  • Excited about delivering a reliable high-quality product.
  • Highly motivated, smart, independent person who thrives in a fast-paced innovative environment.
  • Intensely eager to meet the needs of our customers and deliver best-of-breed SaaS solutions.
  • Experience using telemetry to understand throughput, limitations, and constraints in a service.
  • Basic understanding of architectural patterns to improve uptime.
  • Able to Monitor and improve site stability.
  • Passion for system, application and business metrics.