Data-Intensive Computing on Grid Computing Environment

Palak Raina, Hitali Shah

Authors

Palak Raina, Hitali Shah Author

Keywords:

Grid computing, Data management, Data Integration, Heterogeneous resources, Data intensive computing, Gfarm

Abstract

Grid computing raises challenging issues in many areas of computer science, bioinformatics, high energy physics and especially in the area of distributed computing, as Computational Grids cover increasingly large data, networks and span many organizations. In Grid computing environment data-intensive applications involve large overhead costs due to a concentration of access to the files on common nodes. To avoid this problem in traditional distributed filesystems, users have to distribute the file access manually. However, such solution has some difficulties for users in the Grid environment. We propose a data management mechanism for data-intensive computing on Grid filesystem. Our technique improves the file access performance by automatically scheduling the file access and the data management on the filesystem. The filesystem is based on dynamically configured node groups corresponding to the network topology. Utilizing the configuration, it monitors file access to detect concentrated situations, creates the file replica, and schedules its placement and access. We applied the proposal technique to the Gfarm, a filesystem that scales to the Grid. We emulate real application workloads using a job scheduler and confirmed a speedup of factor 3.7 compared with a filesystem without automatic file access distribution techniques.

Data-Intensive Computing on Grid Computing Environment

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Make a Submission

ISO

Google Scholar DOI

ResearchGate

Latest publications

Information