Senior Site Reliability Engineer
Sumo Logic
Warsaw, Poland
1 d. temu

Sumo Logic is a SaaS machine data analytics platform for logs, metrics and events. We help DevOps, SecOps and ITOps teams in solving complex observability problems.

Our customers, including Epic Games, Pokemon, Twitter, BBC and Toyota, choose our solution because it allows them to easily monitor and optimize their large scale applications, systems and infrastructures.

Our micro services architecture in AWS ingests hundreds of TB daily across many geographic regions. We also have short release cycles and no legacy versions to maintain.

We write in Scala, Java, Golang and Python and use open-source technologies such as Kafka, Kubernetes and Cassandra.

As a Site Reliability Engineer you will work towards enhancing the reliability of Sumo Logic product. Our customers rely not only on a rich feature set of the product but also on it being always available - often it’s their primary tool for maintaining their own software.

The SRE team is unique in Sumo Logic as it doesn’t own any product service, you will work towards the whole codebase of the Sumo Logic product.

You will identify the weakest links in either reliability or performance, research and benchmark possible improvements, and implement solutions in cooperation with the owning teams.

You will not focus narrowly, there’s a broad spectrum of topics and projects you might get involved in. You will not operate the software, but create tools for other teams to increase visibility, observability, and scalability of Sumo Logic services.

As a Senior Site Reliability Engineer you will :

  • Deal with software which processes data at a huge scale
  • Identify reliability improvement areas based on past evidence of production incidents
  • Research, benchmark, optimize and implement solutions aiming at improving the performance and reliability of our product
  • Work with other teams in Sumo Logic Engineering to increase the observability of their services, share reliability knowledge, automate toil, improve their tooling and replace manual processes
  • Program in Scala
  • Example projects :

  • SLI (Service Level Indicator) monitoring
  • Performance measurements and visualization tooling (perf, ebpf, flamegraphs)
  • Configuration as Code
  • Optimizing usage of cloud services (AWS)
  • You should have :

  • At least BSc in Computer Science or related field
  • 5+ years of professional experience, ideally in SRE role
  • Strong coding skills in any language. Object oriented languages are preferred
  • Strong troubleshooting skills in complex systems
  • Ability to rapidly learn new software, frameworks, open source tools and development languages
  • Strong knowledge of large-scale internet service architecture (e.g. load balancing)
  • Strong understanding of Unix and TCP / IP fundamentals
  • Ideally you will also have :

  • Experience with performance, scalability, and reliability issues of 24x7 commercial services
  • Proficiency with the AWS ecosystem
  • Self-driven and being proactive
  • Configuration and maintenance of common infrastructure such as Apache ZooKeeper, HAProxy
  • Experience working in a test-driven environment
  • Why it’s worth applying :

  • Great people
  • Great salary
  • RSU (Restricted Stock Unit) package
  • $2000 / year educational budget (conferences, trainings)
  • Great offices in Warsaw and Krakow (of course now we are working 100% remotely).
  • Free individual English lessons with a native speaker
  • Medical insurance for you and your family / partner
  • Sports card
  • Zgłoś tę pracę
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Aplikuj
    Mój adres email
    Klikając przycisk "Kontynuuj", wyrażam zgodę neuvoo na przetwarzanie moich danych i wysyłanie powiadomień e-mailem, zgodnie z zasadami przedstawionymi przez neuvoo. W każdej chwili mogę wycofać moją zgodę lub zrezygnować z subskrypcji.
    Kontynuuj
    Formularz wniosku