Senior Site Reliability Engineer at OneSignal | Powderkeg

Location: United Kingdom (Remote)

Employment Type: Full-time

Team: Engineering

OneSignal is a Remote First Collaboration Company, offering Remote work for those who reside in London, UK. In collaboration with our partner Elements Global Services, an in-office experiences available in London, UK.

Our blog contains more information about the OneSignal Engineering [career ladder](https://onesignal.com/blog/how-to- introduce-an-engineering-career-ladder-to-your-company/), [compensation model](https://onesignal.com/blog/how-to-build-a-competitive-and-robust- engineering-compensation-model/), remote-first culture, and our diverse team. Our salary bands are available on AngelList.

We have grown rapidly to where we are today serving billions of HTTP requests daily and sending upwards of over 10 billion messages daily. We achieved this scale writing scale sensitive components in languages like Rust and Go. This potent combination of high performance with efficient resource utilization has given us an incredible competitive edge.

With our UK partner Elements Global Services, we are hiring SREs to help us continue to scale by operating and engineering the future of our infrastructure. We are maintaining 99.95% uptime today, and we are investing to ensure we maintain that as then business continues to grow and as the product evolves.

Your primary task will be software engineering with a focus on infrastructure, operations, and automation. You'll be building systems to run our product, improving internal services, and advising product teams on architecture as it relates to the operability of the service.

The systems you'll be responsible include all of the services which power our product. This ranges from off-the-shelf services like haproxy, nginx, Redis, PostgreSQL, Kafka, Kubernetes, etc. to our in-house services such as the Rails web app, various Rust backend services, and our high-performance API layer written in Go.

You'll be working with Kubernetes to automate our data center operations and writing operational services to automate database operations. One of the key challenges in this role is to not only understand systems to the point of being able to manually operate by hand but also to understand in sufficient detail to write software systems to automate such operations.

For some additional context on how we think about SRE, please see the [_introductory chapter_](https://landing.google.com/sre/sre- book/chapters/introduction/) of the Google SRE book.

In a typical month, an SRE at might:

  • Improve our CI/CD pipeline to improve deploy performance
  • Develop new tools to enable other developers to better spend their time
  • Add new code to the system to enable messaging users on a new platform
  • Help evaluate a new storage technology to further scale our stack
  • Provision and configure new hardware
  • Investigate network issues
  • Improve application and infrastructure monitoring

What you'll bring:

  • At least 3 years experience working as a software engineer
  • Experience operating reliable production systems at scale
  • Knowledge of Linux systems internals
  • Easily bored running tasks by hand and the ability to automate such tasks
  • Experience with PostgreSQL

Preferred skills and experience:

  • Experience working with Cloud Providers(AWS/GCP/Azure)
  • Operational experience deploying and managing Kubernetes
  • Experience writing Kubernetes controllers and operators
  • Recent experience writing Go and/or Rust
  • Past experience as an SRE
  • Experience working with Layers 1-3 of the OSI networking model
  • Experience with any of Redis, Kafka, etcd, ZooKeeper, nginx, haproxy

Benefits and Perks

  • Flexible work hours
  • 20 days paid vacation + 8 holidays
  • Equity - as the company grows in value, you benefits
  • Yummy Foods: Lunch and snacks provided when in office
  • Choice of workstation!
  • Sweet Swag:You'll need another closet for all the OneSignal gear & jackets!

In keeping with our beliefs and goals, no employee or applicant will face discrimination/harassment based on: race, color, ancestry, national origin, religion, age, gender, marital domestic partner status, sexual orientation, gender identity, disability status, or veteran status. Above and beyond discrimination/harassment based on 'protected categories,' we also strive to prevent other, subtler forms of inappropriate behavior (e.g., stereotyping) from ever gaining a foothold in our office. Whether blatant or hidden, barriers to success have no place in our workplace.

Job Summary
  • Job Title
    Senior Site Reliability Engineer
  • Company
    OneSignal
  • Location
    San Mateo, CA
  • Employment Type
    Full time
Ready to apply?
Ready to apply?