hero

Accelerate your career.

Explore opportunities across TA's portfolio

Lead / Senior Lead Site Reliability Engineer - WorkWave

IFS

IFS

Software Engineering
Colombo, Sri Lanka
Posted on Nov 20, 2024

Company Description

IFS is a billion-dollar revenue company with 6000+ employees on all continents. Our leading AI technology is the backbone of our award-winning enterprise software solutions, enabling our customers to be their best when it really matters–at the Moment of Service™. Our commitment to internal AI adoption has allowed us to stay at the forefront of technological advancements, ensuring our colleagues can unlock their creativity and productivity, and our solutions are always cutting-edge.

At IFS, we’re flexible, we’re innovative, and we’re focused not only on how we can engage with our customers but on how we can make a real change and have a worldwide impact. We help solve some of society’s greatest challenges, fostering a better future through our agility, collaboration, and trust.

We celebrate diversity and understand our responsibility to reflect the diverse world we work in. We are committed to promoting an inclusive workforce that fully represents the many different cultures, backgrounds, and viewpoints of our customers, our partners, and our communities. As a truly international company serving people from around the globe, we realize that our success is tantamount to the respect we have for those different points of view.

By joining our team, you will have the opportunity to be part of a global, diverse environment; you will be joining a winning team with a commitment to sustainability; and a company where we get things done so that you can make a positive impact on the world.

We’re looking for innovative and original thinkers to work in an environment where you can #MakeYourMoment so that we can help others make theirs. With the power of our AI-driven solutions, we empower our team to change the status quo and make a real difference.

If you want to change the status quo, we’ll help you make your moment. Join Team Purple. Join IFS.

Job Description

The WorkWave Team is seeking an experienced Lead / Senior Lead Site Reliability Engineer (SRE) to drive reliability, scalability, and operational excellence across our cloud-based infrastructure. This role is crucial in ensuring high availability, monitoring, and streamlined deployment processes across various environments, including AWS and hybrid systems. The Lead / Senior Lead SRE will work closely with cross-functional teams to optimize system reliability and efficiency, actively contributing to a robust infrastructure that supports business growth.

Responsibilities

  • Design, manage, and optimize scalable infrastructure across cloud environments with a focus on reliability, availability, and performance. Implement comprehensive monitoring and observability systems to ensure proactive issue detection and resolution.

  • Lead incident response for critical infrastructure issues across cloud platforms, drive root cause analysis, and implement corrective measures to minimize recurrence.

  • Collaborate with cross-functional teams to create efficient, automated CI/CD pipelines that support cloud, hybrid, and on-prem deployments, enabling smooth and reliable delivery.

  • Apply IaC best practices across environments using tools that ensure consistent provisioning, configuration, and management of resources in cloud environments.

  • Ensure new services meet reliability and scalability requirements across all environments before deployment. Conduct capacity planning and performance tuning to adapt to business needs.

  • Develop and maintain comprehensive documentation for infrastructure, deployment workflows, monitoring configurations, and incident management procedures, providing clear guidance across teams.

  • Provide mentorship and technical guidance to team members, sharing knowledge of best practices in reliability engineering and infrastructure management.

  • Research and integrate new tools and technologies to improve the efficiency, scalability, and resilience of our SRE processes across cloud and hybrid infrastructures.

Qualifications

  • Bachelor’s or Master’s Degree in Computer Science, Information Technology, or a related field.

  • 4-5+ years of experience in Site Reliability Engineering or DevOps with a focus on multi-environment infrastructure and cloud platforms.

  • Strong track record of managing and optimizing infrastructure in production environments, including incident management and system troubleshooting.

  • Proficient in CI/CD pipeline automation and infrastructure as code practices across cloud and hybrid environments.

Skills and Competencies

  • Expertise in monitoring, observability, and incident management using tools like Grafana, AWS X-Ray, and CloudWatch, with a focus on RCA and proactive alerting.
  • Proficiency in automation and scripting (e.g., Python, Bash) and Infrastructure as Code (IaC) tools such as Terraform or AWS CloudFormation.
  • In-depth knowledge of AWS services for reliability, including Auto Scaling, Elastic Load Balancing, RDS, and S3, with a focus on high availability and fault tolerance.
  • Hands-on experience with CI/CD pipelines using AWS CodePipeline, CodeBuild, or third-party tools integrated with AWS services.
  • Excellent communication and collaboration skills to drive system reliability and foster cross-functional teamwork in a cloud-first environment.

Additional Information

We believe that coming together as a community, in person, is important for innovation, connection and fostering a sense of belonging. Our roles have the right balance of remote and in-office working to enable flexibility for managing your life along with ensuring a real connection with your colleagues and the broader IFS community.