## Analysis of Load Distribution Strategies for Signature Search and Join Operation in Distributed Computing Systems

##### Abstract

Effective parallel computation of large amount of computational load in distributed computing system is of great interest in the area of the distributed database system and the large scale scientific experiment where large amount of data with weak dependency among computational load is found. In this dissertation research we employ Divisible Load Theory (DLT) to find the optimal allocation of computation load with divisible property to the multiple computing processors in the two areas of distributed computing problems. First, we apply DLT to find a specific pattern or signature in large amount of fine-grained data in an optimal way by strategically distributing data to distributed computing processors in two representative parallel architectures, tree network and linear daisy chain network topology. First, a closed form solution to find the average time of finding a single or multiple signatures is presented when multiple processors are utilized in parallel. Next, a load distributing strategy based on the statistics of the signatures to minimize expected signature search time is presented. In the next study, we consider the processing of join operation in distributed database system where the join operation on the database records is processed on multiple processors in parallel way. Three types of join operations are examined, namely,Sort-Merge Join, Hash Join and Grace Hash join operations on tree network topology. With tree network topology, the communication medium with and without broadcasting support are considered. DLT is applied to obtain the optimal amount of records to be distributed to multiple processors to have the best speed up for a given number of processors. It is shown that utilizing divisible load theory for data allocation gives better speed-up than the equal allocation scheme, where the database records is equally allocated to the multiple processors. The sequence of processors is also taken into consideration to gain the optimal speed-up and compared with the random sequencing of distributing records.