What I was doing
I worked on a webapp that needed to execute some long-running cpu-bound tasks. Each task was completely cpu-bound for about five minutes. I didn't want to run these tasks in the webapp, since node is single threaded and it would not be good to block http requests/everything else while the job ran. One common solution is to have a work queue, with some dedicated worker processes that handle the jobs.
What went wrong
I used rabbitmq for the work queue, but I noticed that the worker processes would have connection errors with rabbitmq when a task would take more than a few minutes. I used amqplib to connect to rabbitmq, and the only thing I saw about timeouts in their documentation was in relation to the heartbeat setting:
heartbeat: the period of the connection heartbeat, in seconds. Defaults to 0, meaning no heartbeat. OMG no heartbeat!
My heartbeat setting was 0, so I assumed there was no heartbeat, and therefore this couldn't be the cause of my connection errors.
Sorting it out
It is a little hard to track down the behavior of the heartbeat in the rabbitmq documentation. I eventually learned that there is also a server-side heartbeat setting. The effective heartbeat is the minimum time between the server and client setting (with 0 representing an infinite heartbeat duration). I was setting the client heartbeat to 0, and unbeknownst to me, the server heartbeat was set to 120 seconds. This resulted in an effective heartbeat of 120 seconds. Since my process was cpu bound for longer than that, the heartbeats were unable to send, and the rabbitmq connection was closed.
I ended up invoking a
setImmediate call in my cpu-bound code every 10 seconds or so, allowing the heartbeats to send successfully. After that, the connection errors vanished.