Abstracts

The OurGrid Community Success Stories Running a Job Joining the Community Download Research Developer's Corner

Abstracts

Peer-to-Peer Desktop Grids in the Real World: The ShareGrid Project

ShareGrid is a peer-to-peer desktop grid aimed at satisfying the computing needs of the small research laboratories located in the Piedmont area in Northern Italy. ShareGrid adopts a cooperative approach, in which each participant allows the other ones to use his/her own resources on a reciprocity basis. ShareGrid is based on the OurGrid middleware, that provides a set of mechanisms enabling participating entities to quickly, fairly, and securely share their resources. In this paper we report our experience in designing, deploying, and using ShareGrid, and we describe the applications using it, as well as the lessons we learned, the problems that still remain open, and some possible solutions to them.

@article{ACGBRAG08,
author = {Cosimo Anglano and Massimo Canonico and Marco Guazzone and Marco Botta and Sergio Rabellino and Simone Arena and Guglielmo Girardi},
title = {Peer-to-Peer Desktop Grids in the Real World: The ShareGrid Project},
journal ={Cluster Computing and the Grid, IEEE International Symposium on},
volume = {0},
isbn = {978-0-7695-3156-4},
year = {2008},
pages = {609-614},
doi = {http://doi.ieeecomputersociety.org/10.1109/CCGRID.2008.23},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
}

On the efficacy, efficiency and emergent behavior of task replication in large distributed systems

Large distributed systems challenge traditional schedulers, as it is often hard to determine a priori how long each task will take to complete on each resource, information that is input for such schedulers. Task replication has been applied in a variety of scenarios as a way to circumvent this problem. Task replication consists of dispatching multiple replicas of a task and using the result from the first replica to finish. Replication schedulers (i.e. schedulers that employ task replication) are able to achieve good performance even in the absence of information on tasks and resources. They are also of smaller complexity than traditional schedulers, making them better suitable for large distributed systems. On the other hand, replication schedulers waste cycles with the replicas that are not the first to finish. Moreover, this extra consumption of resources raises severe concerns about the system-wide performance of a distributed system with multiple, competing replication schedulers. This paper presents a comprehensive study of task replication, comparing replication schedulers against traditional information-based schedulers, and establishing their efficacy (the performance delivered to the application), efficiency (the amount of resources wasted), and emergent behavior (the system-wide behavior of a system with multiple replication schedulers). We also introduce a simple access control strategy that can be implemented locally by each resource and greatly improves overall performance of a system on which multiple replication schedulers compete for resources.

@article{CBPGV07,
author = {Walfredo Cirne and Francisco Brasileiro and Daniel Paranhos and Lu\'{\i}s Fabr\'{\i}cio W. G\'{o}es and William Voorsluys},
title = {On the efficacy, efficiency and emergent behavior of task replication in large distributed systems},
journal = {Parallel Computing},
volume = {33},
number = {3},
year = {2007},
issn = {0167-8191},
pages = {213--234},
doi = {http://dx.doi.org/10.1016/j.parco.2007.01.002},
publisher = {Elsevier Science Publishers B. V.},
address = {Amsterdam, The Netherlands, The Netherlands},
}

Automatic grid assembly by promoting collaboration in peer-to-peer grids

Currently, most computational grids (systems allowing transparent sharing of computing resources across organizational boundaries) are assembled using human negotiation. This procedure does not scale well, and is too inflexible to allow for large open grids. Peer-to-peer (P2P) grids present an alternative way to build grids with many sites. However, to actually assemble a large grid, peers must have an incentive to provide resources to the system. In this paper we present an incentive mechanism called the Network of Favors, which makes it in the interest of each participating peer to contribute its spare resources. We show through simulations with up to 10,000 peers and experiments with software implementing the mechanism in a deployed system that the Network of Favors promotes collaboration in a simple, robust and scalable fashion. We also discuss experiences of using OurGrid, a grid based on this mechanism.

@article{ABCM07,
author = {Nazareno Andrade and Francisco Brasileiro and Walfredo Cirne and Miranda Mowbray},
title = {Automatic grid assembly by promoting collaboration in peer-to-peer grids},
journal = {Journal of Parallel and Distributed Computing},
volume = {67},
number = {8},
year = {2007},
issn = {0743-7315},
pages = {957--966},
doi = {http://dx.doi.org/10.1016/j.jpdc.2007.04.011},
publisher = {Academic Press, Inc.},
address = {Orlando, FL, USA},
}

Towards applying content-based image retrieval in the clinical routine

Content-based image retrieval (CBIR) has been one the most vivid research areas in the field of computer vision, and substantial progress has been made over the last years. As such, many have argued for the use of CBIR to support medical imaging diagnosis. However, the sheer volume of data produced in radiology centers has precluded the use of CBIR in the daily routine of hospitals and clinics. This paper aims to change this status quo. We here present a solution that applies Computational Grids to significantly speed up the CBIR procedure, while preserving the security of data in the clinical routine. This solution combines texture attributes and registration algorithms that together are capable of retrieving images with greater-than-90% precision, yet running in a few minutes over the Grid, making it usable in the clinical routine.

@article{OCM07,
author = {Marcelo Costa Oliveira and Walfredo Cirne and Paulo M. de Azevedo Marques},
title = {Towards applying content-based image retrieval in the clinical routine},
journal = {Future Generation Computer Systems},
volume = {23},
number = {3},
year = {2007},
issn = {0167-739X},
pages = {466--474},
doi = {http://dx.doi.org/10.1016/j.future.2006.06.009},
publisher = {Elsevier Science Publishers B. V.},
address = {Amsterdam, The Netherlands, The Netherlands},
}

Allocation strategies for utilization of space-shared resources in Bag of Tasks grids

As the adoption of grid computing in organizations expands, the need for wise utilization of different types of resource also increases. A volatile resource, such as a desktop computer, is a common type of resource found in grids. However, using efficiently other types of resource, such as space-shared resources, represented by parallel supercomputers and clusters of workstations, is extremely important, since they can provide a great amount of computation power. Using space-shared resources in grids is not straightforward since they require jobs a priori to specify some parameters, such as allocation time and amount of processors. Current solutions (e.g. Grid Resource and Allocation Management (GRAM)) are based on the explicit definition of these parameters by the user. On the other hand, good progress has been made in supporting Bag-of-Tasks (BoT) applications on grids. This is a restricted model of parallelism on which tasks do not communicate among themselves, making recovering from failures a simple matter of reexecuting tasks. As such, there is no need to specify a maximum number of resources, or a period of time that resources must be executing the application, such as required by space-shared resources. Besides, this state of affairs makes leverage from space-shared resources hard for BoT applications running on grid. This paper presents an Explicit Allocation Strategy, in which an adaptor automatically fits grid requests to the resource in order to decrease the turn-around time of the application. We compare it with another strategy described in our previous work, called Transparent Allocation Strategy, in which idle nodes of the space-shared resource are donated to the grid. As we shall see, both strategies provide good results. Moreover, they are complementary in the sense that they fulfill different usage roles. The Transparent Allocation Strategy enables a resource owner to raise its utilization by offering cycles that would otherwise go wasted, while protecting the local workload from increased contention. The Explicit Allocation Strategy, conversely, allows a user to benefit from the accesses she has to space-shared resources in the grid, enabling her natively to submit tasks without having to craft (time, processors) requests.

@article{dRTCCCF08,
author = {C\'{e}sar A. F. De Rose and Tiago Ferreto and Rodrigo N. Calheiros and Walfredo Cirne and Lauro B. Costa and Daniel Fireman},
title = {Allocation strategies for utilization of space-shared resources in Bag of Tasks grids},
journal = {Future Generation Computer Systems},
volume = {24},
number = {5},
year = {2008},
issn = {0167-739X},
pages = {331--341},
doi = {http://dx.doi.org/10.1016/j.future.2007.05.005},
publisher = {Elsevier Science Publishers B. V.},
address = {Amsterdam, The Netherlands, The Netherlands},
}

BigBatch: a document processing platform for clusters and grids

BigBatch is an image processing environment designed to process batches of thousands of monochromatic documents. One of the flexibilities and pioneer aspects of BigBatch is offering the possibility of working in distributed environments such as clusters and grids. This paper presents the BigBatch tool and the results of a comparative analysis between cluster and grid configurations. The results obtained show almost no difference in total execution times, indicating that performance is not a primary criterion for choosing between the use of a cluster or a grid. However, there are other, qualitative, aspects that may impact this choice. This paper also considers these aspects and provides a general picture of how to successfully use BigBatch to process document images employing many computers for this task.

@inproceedings{MLFM08,
author = {Giorgia Mattos and Rafael Dueire Linsand Andrei de Ara\'{u}jo Formiga and Fernando M\'{a}rio Junqueira Martins},
title = {BigBatch: a document processing platform for clusters and grids},
booktitle = {SAC '08: Proceedings of the 2008 ACM symposium on Applied computing},
year = {2008},
isbn = {978-1-59593-753-7},
pages = {434--441},
location = {Fortaleza, Ceara, Brazil},
doi = {http://doi.acm.org/10.1145/1363686.1363792},
publisher = {ACM},
address = {New York, NY, USA},
}

Multi-environment software testing on the grid

We propose a solution to improve the confidence on the correctness of applications designed to be executed in heterogeneous environments, like a grid. Our solution is motivated by the observation that the traditional ways to qualify test processes are based on code coverage metrics. We believe that this approach is not adequate when dealing with applications that can (and do) fail when interacting with heterogeneous execution environments. Besides code coverage, tests must also cover possible environments. As a solution we propose the utilization of InGriD to describe and deploy test environments and GridUnit to coordinate and monitor the execution of test sets. By combining these two solutions we provide a cost effective way to introduce environmental coverage to our test suites, which is complementary and orthogonal to traditional code coverage metrics. As a case study, we have shown how our solution could be applied to help testing a grid application called MyPhotoGrid, which uses the grid to parallelize the generation of large photograph albums.

@inproceedings{DMBC06,
author = {Alexandre Duarte and Gustavo WagnerMendes and Francisco Brasileiro and Walfredo Cirne},
title = {Multi-environment software testing on the grid},
booktitle = {PADTAD '06: Proceedings of the 2006 workshop on Parallel and distributed systems: testing and debugging},
year = {2006},
isbn = {1-59593-414-6},
pages = {61--68},
location = {Portland, Maine, USA},
doi = {http://doi.acm.org/10.1145/1147403.1147415},
publisher = {ACM},
address = {New York, NY, USA},
}

Accurate autonomous accounting in peer-to-peer Grids

We here present and evaluate an autonomous accounting scheme that provides accurate results even when the parties (consumer and provider) do not trust each other. Our accounting scheme relies on the observed relative performance among the parties. It is totally autonomous in the sense that it uses only local information, i.e. there is no exchange of information between the parties. This allows for the deployment of the autonomous accounting without requiring any sort of identification infrastructure, such as certificate authorities. The no need of trust or sophisticated infrastructure make our accounting scheme a perfect fit for peer-to-peer grids, which aim to scale much further than traditional grids by allowing free unidentified entry into the grid. Our results show that the proposed scheme performs very close to a perfect accounting scheme whose implementation is infeasible in most systems, including those we target. Although our autonomous accounting scheme was developed to work with OurGrid, it can also be useful for other systems. The basic requirement to use our accounting scheme is that resource consumers must also be resource providers.

@inproceedings{SACBA05,
author = {Robson Santos and Alisson Andrade and Walfredo Cirne and Francisco Brasileiro and Nazareno Andrade},
title = {Accurate autonomous accounting in peer-to-peer Grids},
booktitle = {MGC '05: Proceedings of the 3rd international workshop on Middleware for grid computing},
year = {2005},
isbn = {1-59593-269-0},
pages = {1--6},
location = {Grenoble, France},
doi ={http://doi.acm.org/10.1145/1101499.1101509},
publisher = {ACM},
address = {New York, NY, USA},
}

How web community organisation can help grid computing

OurGrid is a web-based community whose members can use each others' spare computing power. When an OurGrid member is not using his own computer, it can be used by any other member. This paper describes how the community aspects of OurGrid have been crucial for its success. It argues that grid economies, which provide another method for sharing computing power, might also benefit from having a web-based community structure.

@article{Mowbray07,
author = {Miranda Mowbray},
title = {How web community organisation can help grid computing},
journal = {Int. J. Web Based Communities},
volume = {3},
number = {1},
year = {2007},
issn = {1477-8394},
pages = {44--54},
doi ={http://dx.doi.org/10.1504/IJWBC.2007.013773},
publisher = {Inderscience Publishers},
address = {Inderscience Publishers, Geneva,SWITZERLAND},
}

GridUnit: software testing on the grid

Software testing is a fundamental part of system development. As software grows, its test suite becomes larger and its execution time may become a problem to software developers. This is especially the case for agile methodologies, which preach a short develop/test cycle. Moreover, due to the increasing complexity of systems, there is the need to test software in a variety of environments. In this paper, we introduce GridUnit, an extension of the widely adopted JUnit testing framework, able to automatically distribute the execution of software tests on a computational grid with minimum user intervention. Experiments conducted with this solution have showed a speed-up of almost 70x, reducing the duration of the test phase of a synthetic application from 24 hours to less than 30 minutes. The solution does not require any source-code modification, hides the grid complexity from the user and provides a cost-effectiveness improvement to the software testing experience.

@inproceedings{DCBM06,
author = {Alexandre Duarte and Walfredo Cirne and Francisco Brasileiro and Patricia Machado},
title = {GridUnit: software testing on the grid},
booktitle = {ICSE '06: Proceedings of the 28th international conference on Software engineering},
year = {2006},
isbn = {1-59593-375-1},
pages = {779--782},
location = {Shanghai, China},
doi ={http://doi.acm.org/10.1145/1134285.1134410},
publisher = {ACM},
address = {New York, NY, USA},
}

Grid computing to make viable the content based medical image retrieval through the image registration techniques

The content-based image retrieval (CBIR) has great interest of the medical community, because it is capable of retrieval similar images stored in servers that have known pathologies. However, an efficient and reliable CBIR solution has not been achieved yet, due to the complexity of the medical image and the great volume they represent. This work proposes a new methodology based on higher processing provided by Grid Computing technology to achieve the CBIR using the registration algorithms. The registration procedure use two metrics, square difference metric (SDM) and cross correlation (CC). Both metrics showed higher efficiency, SDM obtained precision average of 0.83% (breast image) and 0.94% (head image), the CC showed precision of 0.81% (breast) and 0.52% (head). The higher computational cost related to the registration algorithms was amortized by Grid Computing, that was capable of ensure data secure and represent a low cost solution to small clinics and public hospitals. Grid technologies open new opportunities to investigate the contribution on applying the registration algorithms to CBIR and new advances are expected.

@inproceedings{OCMM07,
author = {Marcelo Costa Oliveira and Walfredo Cirne and Jos\'{e} Fl\'{a}vio Mendes Junior and Paulo Mazzoncini de Azevedo Marques},
title = {Grid computing to make viable the content based medical image retrieval through the image registration techniques},
booktitle = {EATIS '07: Proceedings of the 2007 Euro American conference on Telematics and information systems},
year = {2007},
isbn = {978-1-59593-598-4},
pages = {1--7},
location = {Faro, Portugal},
doi ={http://doi.acm.org/10.1145/1352694.1352711},
publisher = {ACM},
address = {New York, NY, USA},
}

Sandboxing for a free-to-join grid with support for secure site-wide storage area

<><> Grid computing enables different institutions to access each other's resources, and hence requires very strong security guarantees. We here explore how virtualization was used to provide security for OurGrid, an easy-to-use free-to-join grid that supports Bag-of-Tasks applications. OurGrid poses interesting security challenges. It is free-to-join (which means one runs unknown applications) and strives for simplicity (which means that configuration must be trivial). We show how we have dealt with such challenges by using Xen to virtualize a single machine, and VNET, OCFS2 and NFS to virtualize a site-wide shared file system, creating a sandboxing solution called SWAN. We evaluate SWAN's security and performance. Our results indicate that SWAN is efficient in the single machine virtualization, but less so for the shared file system. Yet, a site-wide file system enables grid jobs to reuse files already transferred to other machines in the site, avoiding expensive inter-site file transfer.

<><>@inproceedings{CAGCB06,
author = {Edjozane Cavalcanti and Leonardo Assis and Matheus Gaudencio and Walfredo Cirne and Francisco Brasileiro},
title = {Sandboxing for a free-to-join grid with support for secure site-wide storage area},
booktitle = {VTDC '06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing},
year = {2006},
isbn = {0-7695-2873-1},
pages = {11},
doi = {http://dx.doi.org/10.1109/VTDC.2006.11},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

A large scale fault-tolerant grid information service

Large scale grid systems may provide multitudinous services, from different providers, whose quality of service will vary. Moreover, services appear (and disappear) in the grid with no central coordination. Thus, to find out the most suitable service to fulfill their needs, grid users must resort to Grid Information Services (GISs). These services allow users to submit rich queries that are normally composed of multiple attributes and range operations. The ability to efficiently execute complex searches in a scalable and reliable way is a key challenge for current GISs. Scalability issues are normally dealt with by using peer-to-peer technologies. However, the more reliable peer-to-peer approaches do not cater for rich queries in a natural way. On the other hand, approaches that can easily support these rich queries axe less robust in the presence of faults. In this paper we focus on peer-to-peer GISs that efficiently support rich queries. In particular, we thoroughly analyze the impact of faults in one representant of such GISs, named NodeWiz. We propose extensions that increase NodeWiz's resilience to faults.

@inproceedings{BCACBB06,
author = {Francisco Brasileiro and Lauro Beltrao Costa and Alisson Andrade and Walfredo Cirne and Sujoy Basu and Sujata Banerjee},
title = {A large scale fault-tolerant grid information service},
booktitle = {MCG '06: Proceedings of the 4th international workshop on Middleware for grid computing},
year = {2006},
isbn = {1-59593-581-9},
pages = {14},
location = {Melbourne, Australia},
doi ={http://doi.acm.org/10.1145/1186675.1186690},
publisher = {ACM},
address = {New York, NY, USA},
}

Faults in Grids: Why are they so bad and What can be done about it?

Computational Grids have the potential to become themain execution platform for high performance and distributedapplications. However, such systems are extremelycomplex and prone to failures. In this paper, wepresent a survey with the grid community on which severalpeople shared their actual experience regardingfault treatment. The survey reveals that, nowadays, usershave to be highly involved in diagnosing failures, thatmost failures are due to configuration problems (a hint ofthe area's immaturity), and that solutions for dealingwith failures are mainly application-dependent. Goingfurther, we identify two main reasons for this state of affairs.First, grid components that provide high-level abstractionswhen working, do expose all gory details whenbroken. Since there are no appropriate mechanisms todeal with the complexity exposed (configuration, middleware,hardware and software issues), users need to bedeeply involved in the diagnosis and correction of failures.To address this problem, one needs a way to coordinatedifferent support teams working at the grids differentlevels of abstraction. Second, fault tolerance schemestoday implemented on grids tolerate only crash failures.Since grids are prone to more complex failures, suchthose caused by heisenbugs, one needs to toleratetougher failures. Our hope is that the very heterogeneity,that makes a grid a complex environment, can help in thecreation of diverse software replicas, a strategy that can tolerate more complex failures.

@inproceedings{MCBS03,
author = {Raissa Medeiros and Walfredo Cirne and Francisco Brasileiro and Jacques Sauv\'{e}},
title = {Faults in Grids: Why are they so bad and What can be done about it?},
booktitle = {GRID '03: Proceedings of the 4th International Workshop on Grid Computing},
year = {2003},
isbn = {0-7695-2026-X},
pages = {18},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

Scheduling CPU-Intensive Grid Applications Using Partial Information

<> Scheduling parallel applications on computational grids is a difficult task. In order to map the parallel application's tasks onto resources in a efficient way, grid schedulers apply scheduling heuristics. The existing scheduling heuristics can be broadly classified in two approaches: i) bin-packing schedulers, and ii) replication schedulers. The first approach requires complete and accurate information about the applications and the grid environment. The second approach does not use any information but, instead, applies the principle of task replication to achieve good performance. Each of these approaches have drawbacks; attaining accurate and complete information about resources and applications is not always possible in a grid environment, while the redundancy of replication schedulers yield an extra consumption of resources. In this work, we investigate the trade-off between these two approaches. We propose scheduling heuristics that use any available information to perform efficient scheduling of bag-of-tasks applications, a subclass of parallel applications. Our results show that judicious use of whatever information is available leads to a reduction on resource consumption, without compromising the application's performance.

<>@inproceedings{NAB08,
author = {Nelson N\'{o}brega-J\'{u}nior and Leonardo Assis and Francisco Brasileiro},
title = {Scheduling CPU-Intensive Grid Applications Using Partial Information},
booktitle = {ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing},
year = {2008},
isbn = {978-0-7695-3374-2},
pages = {262--269},
doi = {http://dx.doi.org/10.1109/ICPP.2008.40},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

Scheduling in Bag-of-Task Grids: The PAUÁ Case

In this paper we discuss the difficulties involved in the scheduling of applications on computational grids. We highlight two main sources of difficulties: firstly, the size of the grid rules out the possibility of using a centralized scheduler; secondly, since resources are managed by different parties, the scheduler must consider several different policies. Thus, we argue that scheduling applications on a grid require the orchestration of several schedulers, with possibly conflicting goals. We discuss how we have addressed this issue in the context of PAUÁ, a grid for Bag-of-Tasks applications (i.e. parallel applications whose tasks are independent) that we are currently deploying throughout Brazil.

@inproceedings{CBCPSAdRFMSJ04,
author = {Walfredo Cirne and Francisco Brasileiro and Lauro Costa and Daniel Paranhos and Elizeu Santos-Neto and Nazareno Andrade and Cesar De Rose and Tiago Ferreto and Miranda Mowbray and Roque Scheer and Joao Jornada},
title = {Scheduling in Bag-of-Task Grids: The PAU\'{A} Case},
booktitle = {SBAC-PAD '04: Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing},
year = {2004},
isbn = {0-7695-2240-8},
pages = {124--131},
doi ={http://dx.doi.org/10.1109/SBAC-PAD.2004.37},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

Bridging the High Performance Computing Gap: the OurGrid Experience

High performance computing is currently not affordable for those users that cannot rely on having a highly qualified computing support team. To cater for these users' needs we have proposed, implemented and deployed OurGrid. OurGrid is a peer-to-peer grid middleware that supports the automatic creation of large computational grids for the execution of embarrassingly parallel applications. It has been used to support the OurGrid Community - a public free-to-join grid that is in production since December 2004. In this paper we show how the OurGrid Community has been used to support the execution of a number of applications. Further we discuss the main benefits brought up by the system and the difficulties that have been faced by the system developers and the users and managers of the OurGrid Community.

@inproceedings{BAVOF07,
author = {Francisco Brasileiro and Eliane Araujo and William Voorsluys and Milena Oliveira and Flavio Figueiredo},
title = {Bridging the High Performance Computing Gap: the OurGrid Experience},
booktitle = {CCGRID '07: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid},
year = {2007},
isbn = {0-7695-2833-3},
pages = {817--822},
doi ={http://dx.doi.org/10.1109/CCGRID.2007.28},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

Discouraging Free Riding in a Peer-to-Peer CPU-Sharing Grid

Grid computing has excited many with the promise of access to huge amounts of resources distributed across the globe. However, there are no largely adopted solutions for automatically assembling grids, and this limits the scale of today's grids. Some argue that this is due to the overwhelming complexity of the proposed economy-based solutions. Peer-to-peer grids have emerged as a less complex alternative. We are currently deploying OurGrid, one such peer-to-peer grid. OurGrid is a CPU-sharing grid that targets Bag-of-Tasks applications (i.e. parallel applications whose tasks are independent). In order to ease system deployment, OurGrid is based on a very lightweight autonomous reputation scheme. Free riding is an important issue for any peer-to-peer system. The aim of this paper is to show that OurGrid's reputation system successfully discourages free riding, making it in each peer's own interest to collaborate with the peer-to-peer community. We show this in two steps. First, we analyze the conditions under which a reputation scheme can discourage free riding in a CPU-sharing grid. Second, we show that OurGrid's reputation scheme satisfies these conditions, even in the presence of malicious peers. Unlike other distributed mechanisms for discouraging free riding, OurGrid's reputation scheme achieves this without requiring a shared cryptographic infrastructure or specialized storage.

@inproceedings{ABCM04,
author = {Nazareno Andrade and Francisco Brasileiro and Walfredo Cirne and Miranda Mowbray},
title = {Discouraging Free Riding in a Peer-to-Peer CPU-Sharing Grid},
booktitle = {HPDC '04: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing},
year = {2004},
isbn = {0-7803-2175-4},
pages = {129--137},
doi = {http://dx.doi.org/10.1109/HPDC.2004.9},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

A Reciprocation-Based Economy for Multiple Services in Peer-to-Peer Grids

In this paper we study reciprocation-based mechanisms to encourage donation in peer-to-peer grids in which multiple services, such as processing power and data transfers, are shared explicitly. We have modeled such a system and established how peers should assess whether it is profitable to exchange services with another peer, an issue that is not present in the single service case. Unfortunately, this assessment relies on information provided by untrustworthy peers. As an alternative, we have extended, to the case of multiple services, a reciprocation-based mechanism which uses only reliable information gathered locally. We have assessed this mechanism by simulating scenarios in which services are exchanged that are combinations of two different basic services. In the explored scenarios the mechanism performs very well, and can marginalize free riders even when the cost to peers of donating a service is nearly as large as the utility gained by receiving it.

@inproceedings{MBASW06,
author = {Miranda Mowbray and Francisco Brasileiro and Nazareno Andrade and Jaindson Santana and Walfredo Cirne},
title = {A Reciprocation-Based Economy for Multiple Services in Peer-to-Peer Grids},
booktitle = {P2P '06: Proceedings of the Sixth IEEE International Conference on Peer-to-Peer Computing},
year = {2006},
isbn = {0-7695-2679-9},
pages = {193--202},
doi = {http://dx.doi.org/10.1109/P2P.2006.3},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

Scalable Resource Annotation in Peer-to-Peer Grids

Peer-to-peer grids are large-scale, dynamic environments where autonomous sites share computing resources. Producing and maintaining relevant and up-to-date resource information in such environments is a challenging problem, due to the grid scale, the resource heterogeneity, and the variety of user demand. This work proposes a peer-to-peer annotation approach where users can freely annotate available resources as a solution to this problem. We advocate that the proposed approach (i) is scalable, as the job of updating the resource information is divided among users; (ii) will improve resources' utilization, by reducing the amount of resources which are allocated to users without matching their applications constraints; and (iii) will allow resource allocators to increase users' utility, leveraging access to more detailed preference descriptions. The paper also discusses the challenges in implementing and deploying such approach and present solutions to tackle these challenges.

@inproceedings{ASB08,
author = {Nazareno Andrade and Elizeu Santos-Neto and Francisco Brasileiro},
title = {Scalable Resource Annotation in Peer-to-Peer Grids},
booktitle = {P2P '08: Proceedings of the 2008 Eighth International Conference on Peer-to-Peer Computing},
year = {2008},
isbn = {978-0-7695-3318-6},
pages = {231--234},
doi = {http://dx.doi.org/10.1109/P2P.2008.47},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

Relative autonomous accounting for peer-to-peer Grids

Here we present and evaluate relative accounting, an autonomous accounting scheme that provides accurate results even when the parties (consumer and provider) do not trust each other. Relative accounting relies on the observed relative performance amongst the parties. As such, the basic requirement to use it is that resource consumers must also be resource providers. Relative accounting is totally autonomous in the sense that it uses only local information, i.e. there is no exchange of information between the parties. This allows for the deployment of the autonomous accounting without requiring any sort of identification infrastructure, such as certificate authorities. Not requiring trust or sophisticated infrastructure makes relative accounting a perfect fit for peer-to-peer Grids, which aim to scale much further than traditional Grids by allowing free unidentified entry into the Grid. Our results show that relative accounting performs very close to a perfect accounting, whose implementation is infeasible in most systems, including those we target. Relative accounting was developed to work with OurGrid, a peer-to-peer Grid in production since December 2004, but it can also be used in other peer-to-peer Grids.

@article{SACBA07,
author = {Robson Santos and Alisson Andrade and Walfredo Cirne and Francisco Brasileiro and Nazareno Andrade},
title = {Relative autonomous accounting for peer-to-peer Grids: Research Articles},
journal = {Concurr. Comput. : Pract. Exper.},
volume = {19},
number = {14},
year = {2007},
issn = {1532-0626},
pages = {1937--1954},
doi = {http://dx.doi.org/10.1002/cpe.v19:14},
publisher = {John Wiley and Sons Ltd.},
address = {Chichester, UK, UK},
}

Collaborative Fault Diagnosis in Grids through Automated Tests

Grids have the potential to revolutionize computing by providing ubiquitous, on demand access to computational services and resources. However, grid systems are extremely large, complex and prone to failures. A survey we've conducted reveals that fault diagnosis is still a major problem for grid users. When a failure appears at the user screen, it becomes very difficult for the user to identify whether the problem is in his application, somewhere in the grid middleware, or even lower in the fabric that comprises the grid. To overcome this problem, we argue that current grid platforms must be augmented with a collaborative diagnosis mechanism. We propose for such mechanism to use automated tests to identify the root cause of a failure and propose the appropriate fix. We also present a Java-based implementation of the proposed mechanism, which provides a simple and flexible framework that eases the development and maintenance of the automated tests.

@inproceedings{DBCA06,
author = {Alexandre Duarte and Francisco Brasileiro and Walfredo Cirne and Jose Alencar Filho},
title = {Collaborative Fault Diagnosis in Grids through Automated Tests},
booktitle = {AINA '06: Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 1(AINA'06)},
year = {2006},
isbn = {0-7695-2466-4-01},
pages = {69--74},
doi = {http://dx.doi.org/10.1109/AINA.2006.127},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

When can an autonomous reputation scheme discourage free-riding in a peer-to-peer system?

We investigate the circumstances under which it is possible to discourage free-riding in a peer-to-peer system for resource-sharing by prioritizing resource allocation to peers with higher reputation. We use a model to predict conditions necessary for any reputation scheme to succeed in discouraging free-riding by this method. We show with simulations that for representative cases, a very simple autonomous reputation scheme works nearly as well at discouraging free-riding as an ideal reputation scheme. Finally, we investigate the expected dynamic behavior of the system.

@inproceedings{AMCB04,
author = {N. Andrade and M. Mowbray and W. Cirne and F. Brasileiro},
title = {When can an autonomous reputation scheme discourage free-riding in a peer-to-peer system?},
booktitle = {CCGRID '04: Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid},
year = {2004},
isbn = {0-7803-8430-X},
pages = {440--448},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

A Bag-of-Tasks Approach for State Space Exploration Using Computational Grids

A strategy for exploring distributed state spaces using computational grids that run bag-of-task applications is discussed. The main idea is to use computational grid tools as a layer between the verification tool and the distributed shared resources, aggregating the vast background developed by researchers in grid computing. Hence, the computational grid deals with resource scalability, computational speedup, and reliability in a transparent manner for the verification tool. Experimental results using a state space generation tool for an object-oriented Petri nets and the OurGrid solution show that is possible to achieve speedup applying it to a private network environment. Moreover, when a wide distributed community is considered, the model size can be increased several times.

@inproceedings{RBCFG06,
author = {C\'{a}ssio L. Rodrigues and Paulo E. S. Barbosa and Jairson M. Cabral and Jorge C. A. de Figueiredo and Dalton D. S. Guerrero},
title = {A Bag-of-Tasks Approach for State Space Exploration Using Computational Grids},
booktitle = {SEFM '06: Proceedings of the Fourth IEEE International Conference on Software Engineering and Formal Methods},
year = {2006},
isbn = {0-7695-2678-0},
pages = {226--235},
doi = {http://dx.doi.org/10.1109/SEFM.2006.1},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

Labs of the World, Unite!!!

eScience is rapidly changing the way we do research. As a result, many research labs now need non-trivial computational power. Grid and voluntary computing are well-established solutions for this need. However, not all labs can effectively benefit from these technologies. In particular, small and medium research labs (which are the majority of the labs in the world) have a hard time using these technologies as they demand high visibility projects and/or high-qualified computer personnel. This paper describes OurGrid, a system designed to fill this gap. OurGrid is an open, free-to-join, cooperative Grid in which labs donate their idle computational resources in exchange for accessing other labs’ idle resources when needed. It relies on an incentive mechanism that makes it in the best interest of participants to collaborate with the system, employs a novel application scheduling technique that demands very little information, and uses virtual machines to isolate applications and thus provide security. The vision is that OurGrid enables labs to combine their resources in a massive worldwide computing platform. OurGrid is in production since December 2004. Any lab can join it by downloading its software from https://ourgrid.org.

@article{CBACANM06,
author = {Walfredo Cirne and Francisco Brasileiro and Nazareno Andrade and Lauro Costa and Alisson Andrade and Reynaldo Novaes and Miranda Mowbray},
title = {Labs of the World, Unite!!!},
journal = {Journal of Grid Computing},
volume = {4},
number = {3},
year = {2006},
issn ={1570-7873},
pages = {225--246},
doi ={http://dx.doi.org/10.1007/s10723-006-9040-x},
publisher = {Springer},
address = {New York, USA},
}

Peer-to-peer grid computing with the OurGrid Community

For a number of research and commercial computational problems, it is possible to use as much computing power as available to speed the resolution of the problem through parallel processing. Grid computing has done much in the direction of enabling users to use the computing power of resources across administrative boundaries for solving this kind of problem. However, not much has been done to solve the precedent problem of gaining access to resources spread across several institutions. We have addressed this issue in the OurGrid Toolkit developing the OurGrid Community, a peer-to-peer network for sharing computational power. The goal of this system is to provide easy access to large amounts of computational resources for anyone who needs them. All participants contribute idle resources to form a shared pool from which all can benefit. To motivate the contribution to this pool, the OurGrid Community uses an allocation mechanism that rewards the peers that donate more to the system. This paper describes the OurGrid Community and its first deployment in a grid across Brazil called Pau´a, which is presently being used by several Brazilian research institutes.

MyGrid – A complete solution for running Bag-of-Tasks Applications

MyGrid is a complete grid solution for running Bag-of-Tasks applications (i.e. parallel applications whose tasks are independent) over<> whatever resources are available to the user. MyGrid middleware empowers <><> users to interoperate with heterogeneous computational resources across geographic and administrative boundaries. Due to MyGrid’s flexible architecture, it is easy to add a new kind of machine in the grid through an abstraction called grid machine. The grid machines implementations currently available in MyGrid are user agent, grid script and globus grid machine. This work shows how MyGrid works and its main concepts. We present a parallel application, MyPhotoGrid, in order to demonstrate a practical use of MyGrid.

Building a User-Level Grid for Bag-of-Tasks Applications

We here discuss how to run Bag-of-Tasks applications (those parallel applications whose tasks are independent) on computational grids. Bag-of-Tasks applications are both relevant and amendable for execution on grids. However, few users currently execute their Bag-of-Tasks applications on grids. We investigate the reason for this state of affairs and introduce MyGrid, a system designed to overcome the identified difficulties. MyGrid provides a simple, complete and secure way for a user to run Bag-of-Tasks applications on all resources she has access to. Besides putting together a complete solution useful for real users, MyGrid embeds two important research contributions to grid computing. First, we introduce some simple working environment abstractions that hide the configuration heterogeneity of the machines that compose the grid from the user. Second, we introduce Work Queue with Replication (WQR), a scheduling heuristics that attains good performance without relying on information about the grid or the application, although consuming a few more cycles. Note that not depending on information makes WQR much easier to deploy in practice.

Running Bag-of-Tasks Applications on Computational Grids: The MyGrid Approach

Grid Computing for Bag of Tasks Applications

Bag-of-Tasks applications (those parallel applications whose tasks are independent) are both relevant and amendable for execution on computational
grids. In fact, one can argue that Bag-of-Tasks applications are the applications most suited for grids, where communication can easily become a bottleneck for tightly-coupled parallel applications. In spite of such suitability, few users currently execute their Bag-of-Tasks applications on grids. This state of affairs inspires preoccupation. After all, if it is hard to use the grid even with Bag-of-Tasks applications, grids are not going to be of much use. This article investigates this very issue. We identify three key features (automatic access granting, grid working environment, and application scheduling) needed for the execution of Bag-of-Tasks applications on grids, and describe efforts on how to provide such functionality. Unfortunately, however, some practical hurdles make deploying these features in practice much harder than one might think at first. Therefore, we discuss the four major problems
(lack of end-to-end connectivity, protocol heterogeneity, security issues, and fault diagnosis) one faces when implementing the proposed functionality in practice.

OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing

Available grid technologies like the Globus Toolkit make possible for one to run a parallel application on resources distributed across several administrative domains. Most grid computing users, however, don’t have access to more than a handful of resources onto which they can use this technologies. This happens mainly because gaining access to resources still depends on personal negotiations between the user and each resource owner of resources. To address this problem, we are developing the OurGrid resources sharing system, a peer-to-peer network of sites that share resources equitably in order to form a grid to which they all have access. The resources are shared accordingly to a network of favors model, in which each peer prioritizes those who have credit in their past history of bilateral interactions. The emergent behavior in the system is that peers that contribute more to the community are prioritized when they request resources. We expect, with OurGrid, to solve the access gaining problem for users of bag-of-tasks applications (those parallel applications whose tasks are independent).

@inproceedings{ACBR03,
address = {Seattle, WA, USA},
author = {Andrade, Nazareno and Cirne, Walfredo and Brasileiro, Francisco and Roisenberg, Paulo},
booktitle = {Proceedings of the 9th Workshop on Job Scheduling Strategies for Parallel Processing},
journal = {Proceedings of the 9th Workshop on Job Scheduling Strategies for Parallel Processing},
month = {June},
title = {OurGrid: An approach to easily assemble grids with equitable resource sharing},
year = {2003}
}

Converting Space Shared Resources into Intermittent Resources for use in Bag-of-Tasks Grids

Grid Computing is turning from promise into reality. Naturally, however, this is not happening for all applications at once. Bag-of-Tasks (BoT) applications (those parallel applications whose tasks are independent) are, due to their simplicity, the first class of applications to be massively executed on grids (e.g. SETI@home). BoT applications are especially amendable for grid execution because they can run on intermittent resources (i.e. resources with no guarantees on availability or reliability). In this scenario, good performance and reliable results are provided by eager schedulers, which rely on replication to overcome unfortunate task-to-processor assignments. A little surprisingly, however, eager schedulers are not prepared to use space-shared resources (e.g. parallel supercomputers). This happens because using a space-shared resource involves submitting a detailed request to the resource scheduler, specifying the number of processors needed and the amount of time these processors are to be allocated - an information that eager schedulers are not prepared to provide. This is unfortunate because space-shared computers are the most powerful computing resources available today and thus could greatly improve the execution time of BoT applications. This work proposes an automatic way to craft such requests in order to convert space-shared resources into intermittent ones, therefore rendering them naturally usable by eager schedulers. Such conversion is based on adaptive heuristics and allows eager schedulers use such powerful computing resources without modifications.

@inproceedings{CCF05,
author = {Lauro Beltrao Costa and Walfredo Cirne and Daniel Fireman},
title = {Converting Space Shared Resources into Intermittent Resources for use in Bag-of-Tasks Grids},
booktitle = {SBAC-PAD '05: Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing},
year = {2005},
isbn = {0-7695-2446-X},
pages = {243--250},
doi = {http://dx.doi.org/10.1109/CAHPC.2005.20},
publisher = {IEEE Computer Society},
address = {Washington, DC, USA},
}

A Scheduling Algorithm for Running Bag-of-Tasks Data Mining Applications on the Grid

Data mining applications are composed of computing-intensive processing tasks, which are natural candidates for execution on high performance, high throughput platforms such as PC clusters and computational grids. Besides, some data-mining algorithms can be implemented as Bag-of-Tasks (BoT) applications, which are composed of parallel, independent tasks. Due to its own nature, the adaptation of BoT applications for the grid is straightforward. In this sense, this work proposes a scheduling algorithm for running BoT data mining applications on grid platforms. The proposed algorithm is evaluated by means of several experiments, and the obtained results show that it improves both scalability and performance of such applications.

@inproceedings{SCH04,
author = {Fabr\'{\i}cio Alves Barbosa da Silva and S\'{\i}lvia Carvalho and Eduardo R. Hruschka},
title = {A Scheduling Algorithm for Running Bag-of-Tasks Data Mining Applications on the Grid},
booktitle = {Euro-Par},
year = {2004},
pages = {254-262},
}

Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids

Data-intensive applications executing over a computational grid demand large data transfers. These are costly operations. Therefore, taking them into account is mandatory to achieve efficient scheduling of data-intensive applications on grids. Further, within a heterogeneous and ever changing environment such as a grid, better schedules are typically attained by heuristics that use dynamic information about the grid and the applications. However, this information is often difficult to be accurately obtained. On the other hand, although there are schedulers that attain good performance without requiring dynamic information, they were not designed to take data transfer into account. This paper presents Storage Affinity, a novel scheduling heuristic for bag-of-tasks data-intensive applications running on grid environments. Storage Affinity exploits a data reuse pattern, common on many data-intensive applications, that allows it to take data transfer delays into account and reduce the makespan of the application. Further, it uses a replication strategy that yields efficient schedules without relying upon dynamic information that is difficult to obtain. Our results show that Storage Affinity may attain better performance than the state-of-the-art knowledge-dependent schedulers. This is achieved at the expense of consuming more CPU cycles and network bandwidth.

@inproceedings{SCBL04,
author = {Elizeu Santos-Neto and Walfredo Cirne and Francisco Brasileiro and Aliandro Lima},
title = {Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids},
booktitle = {JSSPP},
year = {2004},
pages = {210-232}
}

Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks

Scheduling independent tasks on heterogeneous environments, like grids, is not trivial. To make a good scheduling plan on this kind of environments, the scheduler usually needs some information such as host speed, host load, and task size. This kind of information is not always available and is often difficult to obtain. In this paper we propose a scheduling approach that does not use any kind of information but still delivers good performance. Our approach uses task replication to cope with the dynamic and heterogeneous nature of grids without depending on any information about machines or tasks. Our results show that task replication can deliver good and stable performance at the expense of additional resource consumption. By limiting replication, however, additional resource consumption can be controlled with little effect on performance.

@inproceedings{SCB03,
author = {Daniel Paranhos da Silva and Walfredo Cirne and Francisco Brasileiro},
title = {Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids},
booktitle = {Euro-Par},
year = {2003},
pages = {169-180},
}

Non-dedicated distributed environment: A solution for safe and continuous exploitation of idle cycles

The Non-Dedicated Distributed Environment (NDDE) aims to muster the idle processing power of interactive computers (workstations or PCs) into a virtual resource for parallel applications and grid computing. NDDE is novel in the sense that it allows for safe and continuous use of idle cycles. Differently from existing solutions, NDDE applications run inside a virtual machine rather than on the user environment. Besides safe and continuous cycle exploitation, this approach enables NDDE applications to run on an operating system other than that used interactively. Our preliminary results suggest that NDDE can in fact harvests most of the idle cycles and has almost no impact on the interactive user.

@inproceedings{NRSNJC03,
author = {Reynaldo C. Novaes and Paulo Roisenberg and Roque Scheer and Caio Northfleet and João H. Jornada and Walfredo Cirne},
title = {Non-dedicated distributed environment: A solution for safe and continuous exploitation of idle cycles},
booktitle = {In Proceedings of the Workshop on Adaptive Grid Middleware},
year = {2003},
pages = {107--115}
}

Independently Auditing Service Level Agreements in the Grid

Service level management of Web and Grid Services is an important issue that has to be solved in order to achieve large-scale deployment of services. The need to negotiate, measure and audit the quality of the services provided (in terms, for instance, of performance, cost or security) will increase directly with the proliferation of services. This paper presents our on-going efforts towards developing an automatic and independent SLA (Service Level Agreement) auditing service to be used with Grid/Web Services. This solution has the goal of auditing services accurately without imposing a heavy performance penalty and solving or easing the trust issues. We present some issues involved in SLA auditing for Web/Grid Services, some possible solutions to these problems, presenting their advantages and drawbacks. Finally, we present a performance analysis of these scenarios.

@inproceedings{BSCC04,
author = {Ana Carolina Benjamim and Jacques Sauvé and Walfredo Cirne and Mirna Carelli},
title = {Independently auditing service level agreements in the grid},
booktitle = {In Proceedings of 11th HP OpenView University Association Workshop (HPOVUA 2004},
year = {2004}
}

Fostering collaboration to better manage water resources: Research Articles

Good water management is literally vital for the arid and semi-arid regions of the planet. Yet good water management requires multidisciplinary expertise, since one must consider climatic, hydrological, economical and social aspects to make balanced decisions on water usage. We here present SegHidro, a Grid portal designed to foster scientific, technical and operational collaboration to improve water resources management. The portal targets researchers and decision makers, enabling them to execute and couple their computational models in a workflow. The portal provides a framework which allows seamless integration of the models, meaning that each phase of the flow may be executed by a different expert and that the resulting data are shared among other portal users. Due to the nature of these applications and the need to execute many prospective scenarios, their execution requires high computing power. However, we go beyond providing high-performance computational Grid capabilities. We also enable people to complement each other's expertise in understanding the trade-offs in the water allocation decisions. The SegHidro portal is about sharing: human expertise, data and computing power.

@article{VACGSC07,
author = {William Voorsluys and Eliane Ara\'{u}jo and Walfredo Cirne and Carlos O. Galv\,
{a}o and Enio P. Souza and Enilson P. Cavalcanti},
title = {Fostering collaboration to better manage water resources: Research Articles},
journal = {Concurrency and Computation: Practice and Experience},
volume = {19},
number = {12},
year = {2007},
issn = {1532-0626},
pages = {1609--1620},
doi = {http://dx.doi.org/10.1002/cpe.v19:12},
publisher = {John Wiley and Sons Ltd.},
address = {Chichester, UK, UK},
}

The SegHidro Experience: Using the Grid to Empower a Hydro-Meteorological Scientific Network

This paper describes our experience with SegHidro, a project that empowers hydro-meteorological researchers by (i) enabling collaborative work via the coupling of computer models, and (ii) providing access to massive grid-based computer resources. SegHidro researchers are geographically distributed and have different and sometimes complementary backgrounds. They have in common their interest over the Brazilian Northeast, a semi-arid region, where irregular rainfall distribution causes many problems to the population. SegHidro was created out of the need to help decision makers to better manage water resources. It relies on grid computing as the "glue" that sticks together the community resources: data, computing power and human expertise. Our experience has shown that simplicity is key for the adoption of a solution and its success. Thus SegHidro is built over a simple infrastructure in both computation and data.

@inproceedings{ACMOSGM05,
author = {Eliane Araújo and Walfredo Cirne and Gustavo Wagner Mendes and Nigini Oliveira and Enio P. Souza and Carlos O. Galvão and Eduardo Sávio Martins},
title = {The SegHidro Experience: Using the Grid to Empower a Hydro-Meteorological Scientific Network},
booktitle ={Proceedings of the First International Conference on e-Science and Grid Computing},
year = {2005},
pages = {64-71},
doi = {http://doi.ieeecomputersociety.org/10.1109/E-SCIENCE.2005.80},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
}

Running Bag-of-Tasks Applications on Computational Grids: The MyGrid Approach

We here discuss how to run Bag-of-Tasks applications on computational grids. Bag-of-Tasks applications (those parallel applications whose tasks are independent) are both relevant and amendable for execution on grids. However, few users currently execute their Bag-of-Tasks applications on grids. We investigate the reason for this state of affairs and introduce MyGrid, a system designed to overcome the identified difficulties. MyGrid provides a simple, complete and secure way for a user to run Bag-of-Tasks applications on all resources she has access to. Besides putting together a complete solution useful for real users, MyGrid embeds two important research contributions to grid computing. First, we introduce some simple working environment abstractions that hide machine configuration heterogeneity from the user. Second, we introduce Work Queue with Replication (WQR), a scheduling heuristics that attains good performance without relying on information about the grid or the application, although consuming a few more cycles. Note that not depending on information makes WQR much easier to deploy in practice.

@inproceedings{CPCSBSSBS03,
author = {Walfredo Cirne and Daniel Paranhos and Lauro Costa and Elizeu Santos-Neto and Francisco Brasileiro and Jacques Sauvé and Fabrício A. B. Silva and Carla O. Barros and Cirano Silveira},
title = {Running Bag-of-Tasks Applications on Computational Grids: The MyGrid Approach},
booktitle ={Proceedings of the 32nd International Conference on Parallel Processing},
year = {2003},
issn = {0190-3918},
pages = {407-418},
doi = {http://doi.ieeecomputersociety.org/10.1109/ICPP.2003.1240605},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
}

A Case for Event-Driven Distributed Objects

Much work has been done in order to make the development of distributed systems as close as sensible to the development of centralized systems. As a result, there are today good distributed object solutions that closely resemble centralized programming. However, this very attempt to mimic centralized programming implies that distributed objects create the illusion that threads traverse the whole distributed application. This brings all the problems related to multi-thread programming, including the need to reason about the thread behavior of the whole application, which gets amplified by the large scale and inherent non-determinism of distributed systems. Moreover, distributed objects present other troubles when the application is not pure client-server, i.e., when the client has other things to do besides waiting for the server. As an alternative, there are a number of message-based non-blocking communication solutions. Unfortunately, these solutions were not designed to directly address the above mentioned issue of multi-threading over the whole distributed application. In addition: (i) these solutions are not as well integrated to the programming language as distributed objects, and (ii) most of them do not provide a well-defined embedded failure detection mechanism, something that is crucial for the development of many distributed systems, and that is well solved by distributed objects (as they couple method invocation and failure detection). We here propose and evaluate an improvement for such a status-quo, named JIC (Java Internet Communication). JIC is an event-driven middleware that relies on a non-blocking communication model, yet providing close semantics to the object-oriented paradigm. JIC is designed to combine the best characteristics of distributed objects and message-based solutions. For instance, JIC defines precise scope for the application's threads, promotes non-blocking communication, provides a failure detection service that is simple to use with precise semantics, and has performance comparable to Java RMI. Furthermore, JIC is designed to be firewall and NAT friendly, greatly helping the deployment of JIC-based applications across multiple administrative domains.

@inproceedings{LCBF06,
author = {Aliandro Lima and Walfredo Cirne and Francisco Brasileiro and Daniel Fireman},
title = {A case for event-driven distributed objects},
booktitle = {Proceedings of 8th International Symposium on Distributed Objects and Applications},
year = {2006},
pages = {1705--1721},
publisher = {Springer-Verlag}
}

Using AOP to Bring a Project Back in Shape: The OurGrid Case

The design and development of distributed software is a complex task. This was no different in OurGrid, a project whose objective was to develop a free-to-join grid. After two years of development, it was necessary to redesign OurGrid in order to cope with the integration problems that emerged. This paper reports our experience in using Aspect-Oriented Programming (AOP) in the process of redesigning the OurGrid middleware. The essential direction of our approach was to get the project (and the software) back in shape. We discuss how the lack of separation of concerns created difficulties in the project design and development and how AOP has been introduced to overcome these problems. In particular, we present the event-based pattern designed to better isolate the middleware concerns and the threads. Besides, we also present the aspects designed for managing the threads and for aiding the testing of multithreaded code. We also highlight the lessons learned in the process of regaining control of the software.

@article{DCS06,
author = {Ayla Dantas and Walfredo Cirne and Katia Saikoski},
title = {Using AOP to Bring a Project Back in Shape: The OurGrid Case},
journal = {Journal of the Brazilian Computer Society},
year = {2006},
volume = {11},
pages = {21--35}
}

Using the Computational Grid to Speed up Software Testing

Software testing practices can improve software reliability. This is a fact that the long history of disasters caused by poorly tested software should not allow us to forget. However, as software grows, its test suite becomes larger and its execution time may become a problem to software developers. This is especially stronger for agile methodologies, which preach a short develop/test cycle. Moreover, due to the increasing complexity of systems, it is not enough to test software only in the development environment. We need to test the software in a variety of environments. In this paper we investigate how the Grid can help with both problems and present a framework to use the intrinsic characteristics of the Grid to speed up the software testing process.

An Approach for the Co-existence of Service and Opportunistic Grids: The EELA-2 Case

Most computational grids currently in production are either service grids or opportunistic grids. While a service grid provides high levels of quality of service, an opportunistic grid provides computing power only on a best-effort basis. Nevertheless, since opportunistic grids do not require resources to be fully dedicated to the grid, they tend to assemble larger numbers of resources. Moreover, these grids cater very well for the execution of the so-called embarrassingly parallel applications, a type of application that is frequently found in practice. In this paper we present an approach for supporting the co-existence of a servicegrid and an opportunistic grid on the same infrastructure. The advantages of this hybrid infrastructure are twofold: firstly, the co-existence allows idle resources belonging to the service grid to be used in an opportunistic way; secondly, the provision of an opportunistic grid allows shared resources to be added to the infrastructure, a feature that turns out to be very important for consortia in which many of the member institutions cannot afford the provision of dedicated resources. The proposed approach is currently being implemented in the context of the EELA-2 project (http://www.eu-eela.eu/), a project co-funded by the European Commission within its Seventh Framework Programme and involving more than 50 institutions both in Europe and Latin America.

Promoting Performance and Separation of Concerns for Data Mining Applications on the Grid

Grid Computing brought the promise of making high-performance computing cheaper and more easily available than traditional supercomputing platforms. Such a promise was very well received by the data mining (DM) community, as DM applications typically process very large datasets and are thus very resource intensive. However, since the Grid is very dynamic and parallel data mining is prone to load unbalancing, obtaining good data mining performance on the Grid is hard. It typically requires the scheduler to understand the inner workings of the application, bringing two related problems. First, good Grid schedulers tend to be very specialized in the application they target. Second, changing the application may require changing the scheduler, which may be especially challenging when there is no clear separation between the application and the scheduler code. We here propose and evaluate a knowledge-based approach that provides abstractions to the DM developer and optimizes at runtime the DM application on the Grid.

GerpavGrid: using the Grid to maintain the city road system

This paper presents and evaluates a governmental application that has been ported to run in a Grid. The Gerpav application is used in the city of Porto Alegre, located in the south of Brazil, to maintain and plan the investments in the city road system, and Grid technology brought substantial performance gains to its distributed version called GerpavGrid. We describe several optimizations strategies used in GerpavGrid to move the bottleneck of the sequential application from database to memory to facilitate distribution of tasks in the Grid. Results of its execution in a real Grid are also presented, and show that a Grid can be also an interesting execution platform for non-classical HPC applications that are database intensive.

@inproceedings{1174090,
 author = {Cesar A.  F. De Rose and Tiago C. Ferreto and Marcelo B. de Farias and Vladimir G. Dias and Walfredo Cirne and Milena P.  M. Oliveira and Katia Saikoski and Maria Luiza Danieleski},
 title = {GerpavGrid: using the Grid to maintain the city road system},
 booktitle = {SBAC-PAD '06: Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing},
 year = {2006},
 isbn = {0-7695-2704-3},
 pages = {73--80},
 doi = {http://dx.doi.org/10.1109/SBAC-PAD.2006.18},
 publisher = {IEEE Computer Society},
 address = {Washington, DC, USA},
 }

Transparent Resource Allocation to Exploit Idle Cluster Nodes in Computational Grids

Clusters of workstations are one of the most suitable resources to assist e-scientists in the execution of large-scale experiments that demand processing power. The utilization rate of these machines is usually far from 100%, and hence this should motivate administrators to share their clusters to Grid communities. However, exploiting these resources in Computational Grids is challenging and brings several problems. This paper presents a transparent resource allocation strategy to harness idle cluster resources aimed at executing grid applications. This novel approach does not make use of a formal allocation request to cluster resource managers. Moreover, it does not interfere with local cluster users, being non-intrusive, and hence motivating cluster administrators to publish their resources to grid communities. We present experimental results regarding the effects of the proposed strategy on the attendance time of both cluster and grid requests and we also analyze its effectiveness in clusters with different utilization rates.

@inproceedings{1107879,
 author = {Marco A.  S. Netto and Rodrigo N. Calheiros and Rafael K.  S. Silva and Cesar A.  F. De Rose and Caio Northfleet and Walfredo Cirne},
 title = {Transparent Resource Allocation to Exploit Idle Cluster Nodes in Computational Grids},
 booktitle = {E-SCIENCE '05: Proceedings of the First International Conference on e-Science and Grid Computing},
 year = {2005},
 isbn = {0-7695-2448-6},
 pages = {238--245},
 doi = {http://dx.doi.org/10.1109/E-SCIENCE.2005.83},
 publisher = {IEEE Computer Society},
 address = {Washington, DC, USA},
 }

A process for separation of crosscutting grid concerns

This paper describes how to explicitly separate crosscutting Grid concerns in a parallel Java application. This process, named GridAspecting, uses a restricted subset of the Java threads model for application decomposition, and aspect-oriented programming for allowing parallel execution of the application's threads as Grid tasks. As a result of the process, all Grid-related code is encapsulated in aspects, thus improving the application's modularity. In addition, by relying on Java's native concurrency abstractions the process simplifies the Grid programming model and makes it possible to test a Grid application even without the Grid.

@inproceedings{1141643,
 author = {Paulo Henrique M. Maia and Nabor C. Mendon\c{c}a and Vasco Furtado and Walfredo Cirne and Katia Saikoski},
 title = {A process for separation of crosscutting grid concerns},
 booktitle = {SAC '06: Proceedings of the 2006 ACM symposium on Applied computing},
 year = {2006},
 isbn = {1-59593-108-2},
 pages = {1569--1574},
 location = {Dijon, France},
 doi = {http://doi.acm.org/10.1145/1141277.1141643},
 publisher = {ACM},
 address = {New York, NY, USA},
 }

Evaluating architectures for independently auditing service level agreements

Web and grid services are quickly maturing as a technology that allows for the integration of applications belonging to different administrative domains, enabling much faster and more efficient business-to-business arrangements. For such an integration to be effective, the provider and the consumer of a service must negotiate a service level agreement (SLA), i.e. a contract that specifies what one party can expect from the other. But, since SLAs are just contracts, auditing is key to assure that they hold. However, auditing can be very challenging when the parties do not blindly trust each other, which is expected to be the common case for large grid deployments. We here evaluate six architectures that perform SLA auditing both quantitatively and qualitatively. The quantitative evaluation focuses on the performance penalty that auditing introduces. The qualitative evaluation compares the architectures based on aspects such as intrusiveness, trust, use of extra requests, possibility of preferential treatment, possibility of auditing consumer load, and possibility of auditing encrypted messages. We conclude that no single architecture seems to be the best solution for all cases and indicate where each one is best suited.

@article{1149420,
 author = {Ana Carolina Barbosa and Jacques Sauv\'{e} and Walfredo Cirne and Mirna Carelli},
 title = {Evaluating architectures for independently auditing service level agreements},
 journal = {Future Gener. Comput. Syst.},
 volume = {22},
 number = {7},
 year = {2006},
 issn = {0167-739X},
 pages = {721--731},
 doi = {http://dx.doi.org/10.1016/j.future.2006.01.001},
 publisher = {Elsevier Science Publishers B. V.},
 address = {Amsterdam, The Netherlands, The Netherlands},
 }

A WSDM-based Architecture for Global Usage Characterization of Grid Computing Infrastructures

Current solutions to characterize grid computing usage are limited in three important aspects. First, they do not provide a global, uniform view of the use of infrastructures comprised of heterogeneous grid middleware. Second, they do not allow the specification of policies to publicize the collected information. Third, they do not generate statistics about the applications that are executed on the grid. To fill this gap, we propose an architecture based on the Web Services Distributed Management standard and on access control policies to characterize global usage of grid computing infrastructures, even when such grids are formed by heterogeneous middleware packages. We introduce this architecture and present preliminary results obtained with a prototype.

@MISC{Ludwig_awsdm-based,
author = {Glauco Antonio Ludwig and Luciano Paschoal Gaspary and Gerson Geraldo and Homrich Cavalheiro and Walfredo Cirne},
title = {A WSDM-based Architecture for Global Usage Characterization of Grid Computing Infrastructures},
year = {2006},
}