Staff Site Reliability Engineer
We’re transforming the software industry. We’re Flexera. With more than 50,000 customers across the world, we’re achieving that goal. But we know we can’t do any of that without our team. Ready to help us re-imagine the industry during a time of substantial growth and ambitious plans? Come and see why we’re consistently recognized by Gartner, Forrester and IDC as a category leader in the marketplace.
Flexera delivers Technology Value Optimization solutions that enable some of the largest companies in the world to inform their IT so they can transform their IT. From on-prem to the cloud, companies can get the IT asset data needed to rightsize, reallocate spend, reduce risk and maximize ROI.
We are seeking someone with interests in working on a SaaS/Cloud product with an on-going migration to a microservices architecture, in a DevOps culture, with a strong CI/CD approach.
What will you do?
- Help to eliminate operational toil - seek to automate repetitive operations work.
- Work with product development teams to ensure that our new features are able to meet SLAs.
- Own architectural designs decisions for the our SaaS environments on AWS
- Good understanding of FinOps and how to optimize costs on AWS
- Help mature the delivery process for teams; defining CI/CD pipelines, designing canary release deploys, building in automated fallbacks or optimizing the build chain, you help craft the appropriate solution for the product.
- Optimize product service code to ensure that it's secure, scalable and performant.
- Optimize testing capabilities to increase the assurances we have with each release.
- Improve the fault detection for our services.
- Create dashboards which help communicate the metrics for a given product service.
- Work with product owners and product engineering teams to perform capacity planning.
- Work with product engineering teams to understand performance and behavior patterns.
- Be part of an on-call rotation for alerts that require engineering expertise to diagnose.
- Help carry out root cause analysis for incidents, and design solutions (both software and human processes) that will help to ensure the same problem doesn't happen in the same way again.
- Developer experience and a strong interest in both the delivery and ongoing operation of software.
- Produced robust well tested automation code preferably in Terraform.
- Expertise working with containers (Docker) and container orchestration (Kubernetes/Elastic Container Service).
- Excellent communication skills including experience in writing good documentation.
- Expertise with popular AWS services (EC2, ECS, EKS, RDS, S3 etc).
- Knowledge of tools and patterns around CI/CD (familiar with Jenkins, GitHub Actions, or similar).
- Knowledge of operations including incident management, immutable infrastructure as code (esp. Terraform or CloudFormation), problem solving.
- Observability knowledge; Logs, Tracing, Metrics and experience in a few of Elastic Stack, DataDog,, Zipkin or Prometheus.
Flexera is proud to be an equal opportunity employer. Qualified applicants will be considered for open roles regardless of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by local/national laws, policies and/or regulations.
Flexera understands the value that results from employing a diverse, equitable, and inclusive workforce. We recognize that equity necessitates acknowledging past exclusion and that inclusion requires intentional effort. Our DEI (Diversity, Equity, and Inclusion) council is the driving force behind our commitment to championing policies and practices that foster a welcoming environment for all.