Evaluating high-throughput reliable multicast for grid applications in production networks

Abstract

Grid computing can be characterized as a distributed infrastructure that is a collection of computing resources within or across locations that are aggregated to act as a unified processing resource. In some of the anticipated future Grid applications, the same data will be transmitted to multiple sites. It is widely accepted that this can be, in theory, best achieved using reliable multicast protocols. This paper addresses the use of reliable multicast in Grid computing, identifying applications and requirements, and discussing how these requirements are met by the main protocols being standardized by the Internet Engineering Task Force (IETF). The emphasis of the study is on high-performance computing and communication, and on 1-N reliable multicast using the NACK Oriented Reliable Multicast (NORM) protocol family. The two existing implementations are evaluated and compared with TCP, in scenarios that require 1-N transmission of data with a deadline. Unlike other work, our conclusions are backed by experimental evaluation of existing protocols in production environments. Therefore, the results shown portray the range of performance and cost that will be obtained if Grid applications are to embrace reliable multicast. The evaluation shows that NORM protocols can greatly reduce the network overhead incurred in 1-N transfers using TCP, but currently fail to satisfy the high-throughput requirements of the Grid. © 2005 IEEE.

Publication
IEEE CCGrid