In the recent Communications of the ACM, Sachin Date considers the perennial question, “Should you upload or ship big data?

The answer is, of course, “it depends”. That has always been the answer.

We even have an old term for this fundamental calculation, “Sneakernet”, referring to the option of hand carrying a copy of the data to the destination, versus whatever current network technology is available.

The “depends” part is actually pretty stable over the years, too. Network transfer rate and latency are compared to transfer and latency to and from a storage medium, plus the latency to transport of the physical medium from site to site.

I love this question because it is a great example of the need to think end to end, and also requires thorough understanding of the whole system. Inside the problem, there is a case of Ahmdal’s law, because the system is no faster than the slowest component in it. You also have to think about network latency as well as bandwidth, and possibly about reliability and sustained performance.

Finally, you have to think about context, cost, and requirements. Is it necessary to transfer the data in a few hours, or would “tomorrow” be soon enough? How much money would a faster transfer be worth, and how much would it cost to achieve? What other activities will be inhibited by or waiting on the transfer?

This can be a critical inquiry. I recall one project that needed to transfer each day’s data to a remote location for next day processing. But careful analysis of the technology available (assuming the current speed of light and seconds in a day) indicated that it would take more than 24 hours to transfer each 24 hours worth of data. Ooops!  (My report suggested that the project should invest in efforts to slow the rotation of the Earth, in order to lengthen the day.)

In other words, this is a wonderful test of how to think about computer systems. It may be necessary to do some empirical research to determine the actual performance of the system (as opposed to what is advertised). Even better, the answer is often different than naive intuitions going in.

Date works out these numbers for contemporary “cloud” technology and options for portable storage. Networks and storage has grown faster, but data has grown even faster, so the trade off is still alive, if relevant to much larger absolute sizes of data transfer than in earlier years.

You can look at the article for the details.

