FlowCV Logo
SHRUTHI NAIDUSite Reliability Engineer
Profile

A site reliability engineer, looking to make sense out of chaos while driving necessary change to help build a reliable platform.

Skills
Cloud: — AWS, GCP, Private Cloud
OS: — Linux
Infra as Code: — Terraform
Container Management: — Docker, Kubernetes
Software and version control: — Bash, Python, Github
Observability and Alerting: — ELK Stack, Prometheus, Grafana, HoneyComb, Big Panda
Collaboration Tools: — JIRA, Confluence, Pager Duty, Slack, Zendesk
Professional Experience

Workday

Site Reliability Engineer
03/2022 – present | Sydney, Australia

Building a reliable and highly availble product, though active engagement in incidents and postmortems and guiding the team towards SRE an transformation, by fostering a mindset focused on triage and root cause analysis. Additionally, taking up the role of SRE Product Owner within the ANZ region with an aim to infuse agile methodology into daily operations via project work while also acitvely working on roadmap creation and planning for the scrum team.

BlockFi

Site reliability engineer (Remote)
04/2021 – 12/2021 | Bangalore, India

Being an SRE evangelist, handling incidents as a first line responder and spearheading blameless incident postmortems. Assisting service teams in adopting best practices in terms of deployments, alerting and metrics while managing tooling and maintenance of the public cloud infrastucture.

Projects : Wireguard, Service Improvements, Observability stack

Criterion Networks

Site Reliability Engineer
07/2019 – 03/2021 | Bangalore, India

Ensuring customer success through incident management and maintaining client usage metrics, while assiting the QA team in post/pre deployment testing while advocating for adoption of SRE practices in metrics and observability. Maintaning the hybrid cloud environment while developing good processas well as product documentation.

Projects : Client Metrics

Jcpenney Services India

Engineer 1 ( Systems engineer)
12/2016 – 06/2019 | Bangalore, India

Providing L1 support for all issues from around 450+ store locations in the US, data centers as well as IBOs (international buying offices) handling hardware installation and troubleshooting. Leading the operations center (NOC) team on daily task allocation, incident queue management, process documentation as well as root cause analysis.

Acheivements

Employee of the month

Criterion Networks
04/2020

Received this award for working on building the SRE team while

maintaining customer success with all clients from onboarding to exit.

Warrior Spirit Award

JCPenney

Received this award for building the operations (NOC) team from the ground up including setting processes in place as well as training team members.

Certificate of appreciation - Best Performer - Q2 FY14

Hewlett Packard
04/2014

90% NPS / 90% Delight / 0% DSAT /

100% TPR / 90% FCR

Education

Master's Degree (MTECH)

09/2014 – 06/2016

Major : Digital communication and Networking

School : The Oxford college of engineering

Bachelor's Degree (BTECH)

08/2009 – 06/2013

Major : Electronics and Communication

School : New Horizon college of engineering

Projects

Service resilicency

Workday
present
  • Working with the Service health team to investigate vulnerabilities within the product and working closely with service teams to assist in the reduction of these errors.
  • Establishing a team process for conducting root cause analysis on the product's most error-prone services, aiming to enhance the overall customer experience.
  • SRE Bot

    Workday
    present
  • Developing a team-centric BOT to ensure consistent interaction between the SRE team and external service teams.
  • By instilling the SRE mindset during the product development, the initiative also focuses on enhancing the team's proficiency in agile methodology.
  • The ultimate goal of the product is to expand its operational capabilities, including features like incident management initial triage, automated callouts, and JIRA queue management, among others.
  • Service Improvement

    BlockFi
  • Tasked with identifying areas of improvement for an internal application service team at a previous organisation.
  • Helped set up adequate monitoring and logging dashboards to measure cloud resource usage, service specific metrics and alerting while bringing forward the need to have in-code documentation and deployment runbooks empowering the team to handle their production changes independant of the infrastructure team.
  • Client Usage Metrics

    Criterion
  • Account and Client specific information was colated and maintained over Metabase at a previous organisation.
  • The tool presented information on the users consuming the cloud resoucres at a specific time as well as over an inout duration.
  • This helped in finding out multiple dicrepancies in the data maintained on the cloud database and were mitigated with better querying techniques and backend process changes.
  • These dashboards also helped in understanding product resource cost at a higher level.
  • Interests
    Singing
    Sport enthusiast
    Amatuer Guitarist
    Part of the Toastmaster fraternity
    Attending tech conferences and webinars
    Courses

    KCNA

    ACG
    present

    Google Cloud Professional Certificate: Cloud Engineer

    10/2021

    Terraform Associate

    ACG
    06/2023