A site reliability engineer, looking to make sense out of chaos while driving necessary change to help build a reliable platform.
AWS, GCP
Linux
Docker, Kubernetes
Python, Github
ELK Stack, Prometheus, Grafana, HoneyComb, Big Panda
JIRA, Confluence, Pager Duty, Slack, Zendesk
Building a reliable and highly availble product, though active engagement in incidents and postmortems and guiding the team towards SRE an transformation, by fostering a mindset focused on triage and root cause analysis. Additionally, worked as an SRE Product Owner within the ANZ region with an aim to infuse agile methodology into daily operations via project work. Managed and prioritized the product backlog for collaborating with engineering teams to define and implement SLOs/SLIs while also working to onboard alerts onto the existing observability platform.
Projects : Service team SLO/SLI engagement, Mean Time to Detect reduction via alerting, Team SRE Bot.
Being an SRE evangelist, handling incidents as a first line responder and spearheading blameless incident postmortems. Assisting service teams in adopting best practices in terms of deployments, alerting and metrics while managing tooling and maintenance of the public cloud infrastucture.
Projects : Service Improvements - Observability setup
Ensuring customer success through incident management and maintaining client usage metrics, while assiting the QA team in post/pre deployment testing while advocating for adoption of SRE practices in metrics and observability. Maintaning the hybrid cloud environment while developing good processes well as maintaining product documentation.
Projects : Client Usage Metrics and cloud costing
Providing L1 support for all issues from around 450+ store locations in the US, data centers as well as IBOs (international buying offices) handling hardware installation and troubleshooting. Leading the operations center (NOC) team on daily task allocation, incident queue management, process documentation as well as root cause analysis.
Major : Digital communication and Networking
School : The Oxford college of engineering
Major : Electronics and Communication
School : New Horizon college of engineering
Received this recognition from my managers for putting customers first while always looking to improve how our team works.
Issued by the manager of the service health for my zest to learn by curiosity.
Received this award for working on building the SRE team while
maintaining customer success with all clients from onboarding to exit.
Received this award for building the operations (NOC) team from the ground up including setting processes in place as well as training team members.