Overlay rendering performance seemed bottlenecked by drawImage calls. To
reduce both the number of calls and the number of different source
buffers, cache overlay buffers for squares of chunks. This adds a very
small extra cost for updates (one additional drawImage) and some cost
for drawing chunks outside of view, but this is more than made up for by
the savings.
By default, the aggregate are 4x4 squares of chunks.