Rack, CSV streaming, and Ruby's Enumerator
When a response body to a HTTP request gets big it's a good idea to stream it. A classic example of this is a CSV download--while you might get away without streaming for smaller response bodies as soon as the CSV file size is over a few megabytes you're going to see timeouts and performance issues.
We saw some of these issues with Mailmatch, and I'm going to take you through how we solved these by adding streaming support to our CSV downloads. Note that this tutorial is fairly Ruby, Rack and Sinatra specific although you should be able to apply the principals to your Rack based framework of choice.
It turns out that streaming is baked into Rack's protocol. The body section of Rack's array spec can be anything that responds to
each(). In practice this often an array containing the string response body. However we can take advantage of this behavior by providing our own streaming object that responds to
Our Sinatra route is going to lookup a record, set up the content disposition headers, and return
List#as_csv (which we'll define later).
get '/lists/:id/csv' do @list = List.first!(id: params[:id]) attachment 'list.csv' @list.as_csv end
as_csv method is going to return a Enumerator. We're doing a little magic at the start of the method with
enum_for to instantiate the Enumerator.
class List def as_csv return enum_for(:as_csv) unless block_given? emails.each do |email| yield CSV.generate_line(email.as_csv) end end end
That's it! The Enumerator responds to
each and make sure our response is streamed to the client one CSV line at a time.
While in this case
emails is just an array we're iterating over, in practice you'll want to paginate over large datasets so you don't have to load them all into memory.