At Knowtworthy, we have been hard at work on our new Meeting Transcription service. This service gives users the ability to capture their conversations and automatically transcribe their meetings. The service will be launching soon, but here is a little technical look into what we have been cooking up.

Background

Knowtworthy is a meetings communication platform to help teams organize their meetings and better understand the conversations that they have. We do this through our web app,  where a team's minutes are organized to maximize productivity. Additionally, our Meeting Transcription system provides real-time transcripts and post-meeting feedback to meeting participants on what trends we saw over the course of the meeting.

The newest service that we are working to release is Meeting Transcription. This service consists of several interconnected micro-services that ingest and process audio data.

Today, I’m going to discuss one such micro-service, the Audio Ingestion service (sorry for the unappealing name). Additionally, I will talk about the problems we ran into and the solutions that we discovered along the way. The purpose of the audio stream ingestion server is to handle the buffers of audio data sent continuously over the course of a meeting. This service is arguably the most critical one in the whole Meeting Transcription pipeline as it acts as the gateway for our users. Once through the ingestion service, data flows through the pipeline. Bugs in this micro-service could corrupt the audio for all downstream services — detrimentally affecting our outputs.

Where We Started

At Knowtworthy, we use Node.js religiously. Having one language (JavaScript) for the frontend and the backend saves us a lot of development time and troubles when dealing with cross-language interactions (JSON, I’m looking at you). Since everyone on the team knows JS, our intuition was to maintain consistency when building this micro-service.

Our first prototype of the Audio Ingestion service was built on Node.js with Socket.IO to manage the streaming protocol. Initially, it worked great, passing all of our unit tests. But we soon came to realize that this solution had some problems.

The first and most significant issue was performance. As we started integrating the service into the web app, our development machines would run at nearly 100% CPU usage every time we started a stream. Having to use one vCPU per audio stream was not going to be cost-effective in the cloud. To run this on AWS, we estimated that a c5.large (2 vCPUs) would suffice to run one stream for certain and two if we are lucky. A quick cost breakdown showed we would be spending about $0.0425/hr or $31.025/mo per stream in the best case. The AWS bill alone was a dealbreaker.

The second problem we ran into was Node.js’s concurrency model. Normally, the concurrency model in Node.js is extremely helpful for designing IO focused services but there was one question on our minds: How do we maintain the order of the buffers for downstream services? Since each buffer called a callback function, we needed some way to track the order of the messages. We opted for a timestamp-based counter and predicated its success on the accuracy and consistency of Amazon’s Time Sync Service. But the concurrency model left room for the system to lock if a prior buffer wasn’t processed fast enough and the following timestamps for following buffers wouldn’t be considered. We didn’t see any way to mitigate this problem, let alone measure it to diagnose issues in the future.

With the high costs of AWS and concurrency models that prevented us from reliably tracking sequential messages, we were forced to look elsewhere.

Migrating to Golang

Initially, we started by looking for languages and/or tools to solve these problems. Python looked promising, but we were afraid that it would suffer the same performance problems as Node.js. Nobody wanted to work with Java and by extension Spring (we take developer satisfaction seriously). C++ was proposed for maximum performance but it came with a large time commitment. Lastly, Golang came up. Go had the performance profile we were looking for, close to C++, and had several well-tested Websocket frameworks that allowed for sequential processing. Go looked like the solution we were looking for and so we began to rebuild the ingestion service from the ground up.

Results

The new service built in Go solved our performance concerns. When running the Audio Ingestion service on our development machines, CPU usage rarely peaked over 2-4% and a memory usage of 5MB for a single stream. Extrapolating these results, we estimated 25-50 concurrent streams per vCPU. This made the prices on AWS significantly more palatable.

Go’s concurrency model of goroutines instead of promises and callbacks turned out to be a natural fit for the problem at hand. Having lightweight goroutines that can pass information around made the sequential processing easier. With this model, we processed buffers from the connection, validate their contents, and push them to other goroutines for further processing. Furthermore, the service could process the sequential bits as the buffers arrived without blocking the main thread from reading new buffers. Lastly, we moved away from a timestamp-based counter to a standard counter not influenced by the plethora of problems that timestamps suffer from, but that’s an article for another time.

Conclusion

This isn't meant to be some anti-Node.js article telling you to update every backend service to Go, but rather it is a lesson about finding the right tool for the job. Node.js excels at building lightweight concurrent web backends in a language many developers are familiar with. Our Audio Ingestion service represents the hottest path in our application and needed a tool to cater to those needs. Go turned out to be the tool that fit those needs. But even Go was far from perfect. We wrestled with integrating it with other tools like databases and APIs. Node.js really shone here with the seemingly endless number of high quality NPM packages to leverage for any given task.

A key takeaway that we learned from this was to start small, simple prototype. The first version took about a day to complete. The time saved on development was critical. Once the prototype is complete, look to optimize the hot paths with specialized tools that service specific needs.

Overall, we were very pleased with the results of our new audio ingestion service. First, Go provided the performance gains we were searching for. Second, the significant departure from the Node.js performance numbers demonstrated that we weren’t using the right tool for the job when we set out. Lastly, goroutines fit very well for our use case and allowed for the implementation to flow nicely manage the flow of data. We are looking forward to integrating more Go code for our more performance-critical micro-services in the future.