Site Reliability Engineer
Precisely is the leader in data integrity. We empower businesses to make more confident decisions based on trusted data through a unique combination of software, data enrichment products and strategic services. What does this mean to you? For starters, it means joining a company focused on delivering outstanding innovation and support that helps customers increase revenue, lower costs and reduce risk. In fact, Precisely powers better decisions for more than 12,000 global organizations, including 99 of the Fortune 100. Precisely's 2500 employees are unified by four company core values that are central to who we are and how we operate: Openness, Determination, Individuality, and Collaboration. We are committed to career development for our employees and offer opportunities for growth, learning and building community. With a "work from anywhere" culture, we celebrate diversity in a distributed environment with a presence in 30 countries as well as 20 offices in over 5 continents. Learn more about why it's an exciting time to join Precisely!
This role is open only for candidates willing to work in hybrid mode from Goa office.
Intro and job overview:
We are looking for a Site Reliability Engineer who will interface with senior management, Platform Engineering, QA and the Precisely development teams to continuously improve the stability, reliability and efficiency of our global SaaS platform. This individual will work with the team to architect and deploy the tools and systems that will make our production environment more resilient, in order to effectively respond to and address incidents.
Responsibilities and Duties:
- Partner closely with SaaS Development, Pipeline Engineering, and Platform Engineering teams to ensure that SRE is an integral part of Precisely’s Continuous Delivery model for SaaS applications.
- Design and build necessary tooling and automation to ensure that we are able to manage our cloud native infrastructure in a reliable, maintainable , observable and secure way
- Build a true 1-team culture despite globally disparate teams.
- Establish a 24x7 incidence response process that addresses Precisely’s SLA for SaaS Products through efficient alerting, playbook documentation and blameless postmortems.
- Build relationships across product management, development, and support organizations to socialize the culture of SRE.
- Drive the culture of observability through the SaaS development organization.
- Leads prioritization of reliability features and contributes to the design, development and delivery of effective tooling, alerts, and automated responses to identify and address reliability risks.
- Ensure appropriate security cloud tooling is planned for and implemented in the production environment.
- Regularly defend the quality, scalability and reliability of Precisely’s production SaaS environment.
Requirements and Qualifications:
- Atleast 2 years of experience in a global multi-tenanted production environment.
- Hands on skills on Kubernetes, AWS/GCP/Azure, Terraform/Cloudformation/Ansible.
- Strong knowledge on Linux fundamentals, experience troubleshooting production issues.
- Experience working in a 24x7 production environment.
- Strong understanding of SRE and general SaaS service management principles.
- Past experience working with SRE teams and handling on-call coordination challenges.
- Strong collaboration, communication and interpersonal skills.
- The ability to operate calmly in challenging and stressful situations.
- A deep understanding of Kubernetes and Cloud Networking or previous experience in infrastructure.
- Exposure to any programming language (Go/Python/C,C++) is a big plus.