Monitoring the health of a large distributed
system supporting user-facing analytics
Plug and play, composite alerting functionality,
Full implementation in
a matter of days
time-to-data for users
Behind every good user experience is a solid back-end ecosystem. SEO analytics provider Moz, renowned for its tools and accessible, community-centered approach, is a case in point of how seamless IT behind the scenes results in happy customers. Intuitive and easy to use, Moz analytics track everything from keyword ranking to link building to brand mentions. Now, what happens behind the scenes?
When Senior Engineer Tyler Murray took over leadership of Moz Analytics Platform Engineering, he was tasked with rebuilding the analytics platform from the ground up, which included finding the right monitoring and alerting solution.
From his deep experience in operations and analytics, Murray knew he needed an immediate, fully functional monitoring and alerting solution.
“The analytics distribution system has a lot of moving parts,” said Murray. “When we strategized our SLAs, we had to figure out how to guarantee visibility into system health.”
Murray and his team of 10+ chose Librato. “It made so much more sense than building an in-house monitoring solution,” Murray added.
“It was plug and play. We had metrics being reported on within a day. We were able to focus on making inferences, rather than putting in time to build and maintain a monitoring system of our own”
Librato was the first item on Moz analytics team’s daily agenda. Per Murray: “We’d come to work in the morning, glance at production dashboard and immediately gauge what was healthy and what wasn’t, and see how the day would go. ”The dashboard was a selection of different instruments that contributed to the overall health and quality of the Moz app. It displayed stats from Moz’s homegrown queuing system, which sent job duration, individual failures, and failure rates to Librato.
Murray’s team leveraged practically all of Librato’s functionality. They used Librato to track the sizes of different queues and time-to-data for customers, apply raw counters, and track computed metrics, among other things. Their primary use cases were the ability to generate composite metrics from existing streams, and alerting. Murray loved the ability to hook alerts into PagerDuty: engineers received Librato alerts based on indicators for system health, data quality and speed, which allowed them to identify triggers and promptly drill down to root causes.
When asked whether Librato helped his personal mission to bring sanity to the process of software development, Murray affirmed: “Having an ecosystem within the app is priceless. No need to build and maintain things, vet new libraries for alerting....Librato is plug and play, and about as sane as you can get with implementing a metrics monitoring system.”
“Having an ecosystem within the app is priceless. No need to build and maintain things, vet new libraries for alerting ... Librato is plug and play, and about as sane as you can get with implementing a metrics monitoring system.”
“With Librato, we were able to quickly acquire data from users, process it and send it back to them. We were able to reduce our time-to-first-data for the user to under 30 minutes,” Murray told us. “When people caught on to the benefit, more people started to use Moz, and they used us for longer periods of time.” While the engineering team was certainly delighted with Librato, the big win was for Moz’s customers.