Cloud and Desktop Grid Computing

The performance variation of cloud resources makes it difficult to run certain scientific applications in the cloud because of their unique synchronization and communication requirements. This problem is similar to that of desktop grids, except that cloud networks are more reliable. While applications with little or no communication between worker nodes (such as independent task applications) perform well in such computational environments, applications that rely on frequent communication (such as distributed matrix multiplications) perform rather poorly.

We argue that by assigning individual tasks to (groups of) nodes with appropriate computational and communication characteristics, it is possible to achieve better performance for such applications. A centralized scheduler that considers performance information for a large number of nodes, however, would become a bottleneck.

Our solution to this problem is employ decentralized scheduling algorithms for many-task applications that assign individual tasks to cloud nodes based on periodic performance measurements of the cloud resources.

As a proof of concept, we have developed a vector-based scheduling algorithm that assigns tasks to nodes based on measuring the compute performance and the queue length of those nodes. Our experiments with a set of tasks in CloudLab show that the application proceeds in three distinct phases: flooding the cloud nodes with tasks, a steady state in which all nodes are busy, and the end game in which the remaining tasks are executed on the fastest nodes.

In previous work, we have developed a biologically inspired and fully-decentralized approach to the organization of computation in a desktop grid that is based on the autonomous scheduling of strongly mobile agents on a peer-to-peer network. Our approach achieves the following design objectives: near-zero knowledge of network topology, zero knowledge of system status, autonomous scheduling, distributed computation, lack of specialized nodes. Every node is equally responsible for scheduling and computation, both of which are performed with practically no information about the system.

We have implemented an extension of Java with strong mobility that allows multi-threaded agents to migrate with all of their execution state by translating Java with strong mobility into Java with weak mobility. We built a prototype grid infrastructure, the Organic Grid, in which an application is scheduled by encapsulating it in an agent together with a scheduler specific to the application characteristics. Similar to other desktop grids, the Organic Grid can be deployed in a screen saver.



Former Students