
.png)
Our Elastic cluster ticked along happily for a few more months, before we began to notice nodes dropping more frequently than they should (once every few days).
#MIXMAX FREE TRIAL UPGRADE#
As we look to the future and the chance to upgrade to Elasticsearch 5.x, we expect faster queries without this workaround, due to the BKD tree in Elasticsearch. We implemented the schema change, which saw those 300-500ms queries drop to ~13ms. Suddenly, there were only ~66 unique values at most on the edges, which made the queries much faster. However, if we use seconds, our values were all multiples of 1000. These 16 bit precision steps meant there could be up to 2^16=65536 values on the edges. In particular `long` values were indexed using 4 values - one that identified all bits, one that identified the first 48 bits, one that identified the first 32 bits and one that identifies the first 16 bits. In those builds, Elastic indexed] numeric fields using prefix terms. The issue is isolated to Elastic versions pre-5.0. Across a test set of 500 million documents, we saw a 30x improvement.Īdrian Grand (Software Engineer at Elastic) helped explain the cause in more detail. While debugging on a sample cluster locally, we noticed that performance improved dramatically if we stored and queried upon our timestamps in seconds instead of milliseconds (our original choice of milliseconds was a reflection of our predominantly-JS stack). Over the course of about two weeks, we built a prototype which satisfied our performance SLAs (server response and page load times), and launched.Īs we wrote here, we began to notice some of our queries performed quite poorly (300-500ms), particularly during periods of peak load.

We could dump relevant events into a single index, and it would provide super-fast searching (including the natural language search we wanted), filtering, paging, sorting and aggregation out-of-the-box. Besides, Mongo couldn’t provide us with the natural language search features we wanted.Ī couple of us had used Elasticsearch in the past and it seemed like a good fit here.

That would obviously be inefficient, CPU-intensive and slow. But that approach had a number of flaws: it would have required us to query documents from multiple collections, then interleave, sort and truncate them. Our primary data store is Mongo, so our initial hope was that we’d be able to construct the feed directly from there. In May 2016, we began exploring how we’d build this feature. In addition, our users expect their feed to load near-instantly, especially as it’s the entry point to our app. At last count, there were over a billion documents in the collections which make up the live feed, and our document storage grows at ~20% month-on-month. Our customers, such as Looker, Lever and Asana, rely on Mixmax as their primary communications platform, so you can imagine that the live feed crunches a lot of data. It’s searchable (using natural language queries across recipients and subject), filterable, live-updating, and provides top-level aggregate statistics. It shows you how people interact with all the messages you and your team send, providing a chronological overview all email sends, opens, clicks, downloads, meeting confirmations, replies, and more. One of our most popular features is the Mixmax Live Feed. Just like you use Slack to talk within your team, you use Mixmax to talk to people outside of your immediate team, most notably folks in other organizations. Mixmax is a platform for all your externally-facing communications.
