hero

Accelerate your career.

Explore opportunities across TA's portfolio

Staff Site Reliability Engineer

Flexera

Flexera

Software Engineering
Bengaluru, Karnataka, India
Posted on Friday, October 27, 2023

We’re transforming the software industry. We’re Flexera. With more than 50,000 customers across the world, were achieving that goal. But we know we can’t do any of that without our team. Ready to help us re-imagine the industry during a time of substantial growth and ambitious plans? Come and see why we’re consistently recognized by Gartner, Forrester and IDC as a category leader in the marketplace.

Flexera delivers Technology Value Optimization solutions that enable some of the largest companies in the world to inform their IT so they can transform their IT. From on-prem to the cloud, companies can get the IT asset data needed to rightsize, reallocate spend, reduce risk and maximize ROI.

Flexera is looking for an experienced Staff Site Reliability Engineer to join our SRE team. We're a fast-growing, category-leading organization with ambitious objectives and a positive, inclusive culture. We're looking for passionate professionals who want to grow their talents and achieve great things. If that sounds like you, we want to talk to you about joining our team.

As a Site Reliability Engineer, you will be tasked with everything from helping with product design, to diagnosing issues, and writing automated scripts for mediating issues that occur in our production systems. You will be driven to build fault tolerant, scalable systems and automate away as much operational toil as you can. You align with the goals of the DevOps movement in improving collaboration between the development and operations disciplines.

We are seeking someone with expensive experience working on a SaaS/Cloud product with a microservices architecture.

Responsibilities:

  • Help to eliminate operational toil - seek to automate repetitive operations work
  • Work with product development teams to ensure that our new features are able to meet SLAs
  • Help mature the delivery process for teams; defining/managing automated deployment pipelines such as Jenkins pipelines, designing canary release deploys, building in automated fallbacks or optimizing the build chain, Infrastructure & pipeline as code, you help craft the appropriate solution for the product
  • Optimize product service code to ensure that it's secure, scalable and performant
  • Optimize testing capabilities to increase the assurances we have with each release
  • Improve the fault detection for our services
  • Create dashboards which help communicate the metrics for a given product service
  • Work with product owners and product engineering teams to perform capacity planning
  • Work with product engineering teams to understand performance and behavior patterns
  • Be part of an on-call rotation for alerts that require engineering expertise to diagnose
  • Help carry out root cause analysis for incidents, and design solutions (both software and human processes) that will help to ensure the same problem doesn't happen in the same way again
  • Contribute to platform security

Education and Experience:

  • Computer Science degree, or related industry experience managing a mission critical production system in AWS (or equivalent Azure/Google cloud) for at least 7 years.

Critical Skills / Competencies:

  • Hands-on Kubernetes Experience: Proven track record of designing, deploying, and managing multiple Kubernetes clusters in AWS running on EKS.
  • AWS Expertise: Deep knowledge of AWS services and their integration with Kubernetes as demonstrated by hands-on expertise from previous jobs.
  • Problem-Solving: Excellent problem-solving skills with the ability to diagnose and resolve complex Kubernetes issues.
  • Communication Skills: Strong communication and teamwork skills to collaborate effectively with cross-functional teams in various time zones around the world.
  • Documentation: Ability to maintain clear and organized documentation for reference.
  • Certifications: Relevant certifications in Kubernetes and AWS are a plus.
  • Agile software delivery methodologies
  • Experience designing scalable services
  • Experience designing distributed, fault-tolerant systems
  • A good understanding of SQL databases
  • A positive attitude and willingness to learn
  • Strong conflict resolution competence
  • Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why

Bonus Skills:

The following list of items are not pre-requisites for the role, but might give you a bit more of an idea about what you may expect to come across in your SRE role at Flexera:

  • Python / Ruby / Golang / Java / C# / C / C++ / Bash experience
  • Experience with Monitoring systems such as Zabbix, New Relic, ELK, Prometheus, Datadog
  • Security background
  • SQL, NOSQL and Graph databases
  • Elasticsearch
  • Relevant Certification e.g. AWS, GCP, Azure
  • Experience of Disciplined Agile Delivery (DAD)

Flexera is proud to be an equal opportunity employer. Qualified applicants will be considered for open roles regardless of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by local/national laws, policies and/or regulations.

Flexera understands the value that results from employing a diverse, equitable, and inclusive workforce. We recognize that equity necessitates acknowledging past exclusion and that inclusion requires intentional effort. Our DEI (Diversity, Equity, and Inclusion) council is the driving force behind our commitment to championing policies and practices that foster a welcoming environment for all.