Seamless monitoring on Heroku
The ability to efficiently collect, visualize and alert in one place.
Centralized business and operational metrics
Changed the way the team thinks and works with monitoring data
Stitch Fix is a fashion retailer that pioneered a personalized online retail experience blending expert styling, proprietary technology and a unique product line with a simple philosophy: “buy what you like and return the rest.” Behind complex warehouse, packing and shipping logistics is a smart technology built by a group of 40 PhD statisticians, and a star group of engineers.
We talked to Dave Copeland, Director of Engineering at Stitch Fix, to learn how he and his team ensure both a seamless web site experience, and that each package gets out properly tagged and tracked.
Before Librato, Stitch Fix was using a hodgepodge of tools for monitoring on Heroku. This was difficult to manage, and the lack of a centralized place for alerts and visualization capabilities were obvious flaws of the system. Copeland decided to try the Heroku Graphite add-on, but was unhappy because the integration required a copious amount of code changes. “It was difficult to make it work, and frankly, it looked terrible,” said Copeland. Copeland had first heard about Librato at a Heroku conference in San Francisco; after 6-9 months of using Graphite, he decided to give Librato a try.
Copeland was looking for a clean, low-maintenance means of augmenting his operational visibility on Heroku without replacing his existing tools. “Librato was such a simple integration - a no-brainer,” said Copeland. The turnkey integration immediately begins monitoring myriad dyno-level metrics, so by just turning it on, the engineering team began to see live performance data from across their infrastructure. To these, Copeland added custom business metrics that he was interested in tracking and alerting on. This worked beyond expectations, and Stitch Fix ended up migrating all alerts to Librato.
Today, Copeland relies on Librato heavily for his operational metrics. In fact, the entire engineering team of 21 people uses Librato: a runbook for setting up every new application includes how to set up Librato alerts. Having all alerts in one monitoring system on Heroku allows everyone to react to relevant issues, and ensures that alerts are routed to the right engineering teams. Librato alerts help Stitch Fix circumvent alert fatigue and avoid wasting resources, without compromising the fantastic customer experience.
Librato also helps the Stitch Fix team alert on database connections and background workers. “So much of our business logic is in the background workers,” says Copeland. Copeland’s team uses the Resque Ruby library for creating jobs, and Librato monitors the health of the queues, for instance how many workers are in each queue, how many failed, if queues are building up, etc.
“SaaS products often present issues that are confusing and make little sense. Librato is not like that - we love it because it is easy and intuitive to work with.”
The engineering process has evolved to thinking about monitoring before the code is even written, which has in turn enabled the team to detect problems before they affect customers. “Whenever we build a new feature, there is always a discussion point: is there a metric here we want to track?,” said Copeland. The shift in development process that Librato enabled warded off operational catastrophes for the team many times.
“Every customer is important, and we want every Stitch Fix experience to be seamless: from using the web site to making sure the right piece of clothing is in the box” says Copeland. Librato makes it easy by keeping all metrics in check.
Stitch Fix uses many services, but, according to Copeland, “SaaS products often present issues that are confusing and make little sense. Librato is not like that - we love it because it is easy and intuitive to work with.”