OurGrid 4.2.5
We are proud to announce that the OurGrid version 4.2.5 has just been released. This version contains several bug fixes, specially a blocker one regarding file transfer ( http://redmine.lsd.ufcg.edu.br/issues/779), that caused executions to hang. Improvements on logging, and the release of OurGrid Charts were also made during the delelopment of OurGrid 4.2.5. You can see the changelog for this version here.
Important!
Our XMPP server (xmpp.ourgrid.org) is now running ejabberd, due to an openfire issue that caused out-of-order message delivery. Remote sites running openfire will eventually fail and be disconnected from the OurGrid community, so we strongly recommend OurGrid site administrators to move to ejabberd.
Several changes were made in the ejabberd default configuration file in order to have it working with OurGrid. Concerning that, we are providing a template configuration file, so that administrators can replace their default ejabberd configuration for this one and only change the hostname and the IP of the host. An ejabberd installation guide is available here.
Enjoy it!
OurGrid 4.2.4
We are glad to announce the release of version 4.2.4 of the OurGrid middleware. This version of OurGrid has fixed some of the known bugs, including the need for re-ordering XMPP messages due to a known bug on the Openfire Jabber Server, the output of the NoF balances on the peer status command, the refactoring of the Aggregator component, and a new script and image for the installation of vserver-based workers.
The community status can be viewed at http://status.ourgrid.org. Please use our Mailing Lists to provide any suggestion, feedback or question concerning OurGrid Use and Development.
Best regards.
OurGrid 4.2.3
We are glad to announce the release of version 4.2.3 of the OurGrid middleware. This version of OurGrid provides a new version for the Discovery Service, fixing some of its known bugs, and also fixes a known bug related to permissions of transferred files.
The community status can be viewed at http://status.ourgrid.org. Please use our Mailing Lists to provide any suggestion, feedback or question concerning OurGrid Use and Development.
Best regards.
OurGrid 4.2.2
In this version, we are glad to announce that the virtualized Worker is also available for Virtual Box in Windows machines. The Virtual Box image can be obtained in the Download area of the OurGrid site.
Some bugs, that occurred with the Discovery Service communication and when the XMPP server failed, were fixed.
We have also solved a bug that prevented the Worker from detecting a Peer failure. Due to this resolution, some remote interfaces were changed. Therefore, this version is incompatible with previous versions.
OurGrid 4.2.1
For this version, the main focus was to reduce the memory consumed by each OurGrid component. We have used some dynamic analysis tools and found several hot spots in OurGrid and Commune source code. After refactoring it, we reduced the memory usage to less than 10% of the memory consumed in the previous version.
We have also worked to reduce the incidence of file transfer errors in this version.
The code refactoring affected some remote interface. Therefore, this version is incompatible with previous versions.
OurGrid 4.2.0
This version brings many improvements in the OurGrid software, as a consequence of increased maturity in both the development process and the source code. As a result we got a very stable version, which is ready to be used in a higher scale. Several known bugs were resolved, the communication between components is faster and the middleware overhead is much lower. Beginning in this version, every OurGrid component must follow an internal architecture reference, which is well documented (see http://redmine.lsd.ufcg.edu.br/wiki/ourgrid/Development-architecture). We separated the remote communication code from the decision code (logic), therefore the code is more readable and the bugs are easier to solve. The build process was migrated from Ant to Maven and the release process was simplified and formalized. So, we will release new versions more frequently. To this end, the Maven dependence management has proven to be very useful. If you want to contribute to the OurGrid code, you will not need to download all the resources. For example, the virtualized worker images contain hundreds of Mbytes of data that is of no use for the system developer; you will download it only if you instruct Maven to generate a virtualized worker distribution. We grouped the resolved bugs in the following parts:
- File transfer - the Smack version was updated to 3.1.0. It resolved the concurrent file transfers bug that crashed many replicas. We worked in the Smack code to solve the bug with empty files. A bug in the broker that was deleting some downloaded files was also fixed;
- Broker GUI - We fixed a concurrence bug in the Job Tree; this bug did not damage the component behavior, but caused the broker to display some exception traces in the console;
- Cancel Job - we solved two bugs in the Job cancellation operation. The Job’s tree was not showing the right Job state, after cancelling. Some workers were not disposed by the broker and could not be used by other users.
There are also some enhancements in the OurGrid functionality: The cleaning method of virtualized workers could take up to two minutes to complete. So, in the previous version, this functionality delayed the worker allocation for the end user. We optimized this process that is now performed before the worker is allocated. Now, the user can get virtualized workers instantaneously. We added two new states in the OurGrid worker:
- Unavailable: a worker is in this state when it is offline. Now, the semantic of the Contacting state is the following: the worker is up but has not answered to the Peer;
- Error: a worker goes to this state when some error occurred in its initialization or allocation; this is a indication that the site administrator should log in the machine to resolve the problem that is causing the error.
The Workers tab in the Broker GUI in showing more details like, Worker Specification and Worker grouped by jobs.
OurGrid 4.1.5
The OurGrid 4.1.5 release is a stable version. It is incompatible with the previous versions, because many internal interfaces were changed.
The changes since the OurGrid 4.1.4 version are:
- Bug fix: when an allocated Worker changed its state to OWNER, the master Peer did not allocate any new Worker to the Broker. This occurred because the Peer did not mark the Worker as deallocated;
- Bug fix: when a peer was started, after a failure, and its allocated workers remain up, this workers were marked as OWNER in the Peer after the recovery; Now, these workers recognize the peer failure, stop any work and were marked as IDLE;
- A problem in the remote peers reference update was also fixed;
- Other bugs related to wrong error messages were fixed: when the XMPP server property changed, the peer database thrown an exception; and during the execution of the status command of any component, which printed an exception when the Component was not previously started;
- A refactoring was realized in the Broker Component in order to provide a Heuristic Facade which can be exposed as a Web service. The WQR heuristic was chosen to implement the Web Service;
- In this version, the Discovery Service replication feature was implemented.
OurGrid 4.1.4
The OurGrid 4.1.4 code is the fourth 4.1 release candidate. This version is incompatible with the previous versions, because the peer - discovery service interface was changed.
We plan to release a 4.2 version in few days with several changes in the Broker. After that, we will release a stable 4.2 version.
The changes since the OurGrid 4.1.2 version are:
-
The interface peer - discovery service was updated to resolve a bug, which was invalidating the references to remote peers. This interface change forced the 4.1.4 version to be incompatible with the previous versions.
- When the discovery service properties were inconsistent, the peer start command was throwing an unclear exception. This was fixed to show a clear error message.
OurGrid 4.1.2
The OurGrid 4.1.2 code is the third 4.1 release candidate. This version will be tested for a few days and, if works fine, we will release a stable version.
The changes since the OurGrid 4.1.1 version are:
-
A major bug concerning worker allocation was fixed. When the Broker received more workers than its tasks, all of them were allocated and the max number of simultaneous grid processes reached. In this scenario, the scheduler kept trying to allocate a worker to a task even if it was already satisfied, then the Broker crashed.
-
A bug related to a worker failure was fixed. When a DONATED worker failed, it status was not changed to CONTACTING. The Peer did not take any action if the worker was delivered to some Remote Peer, because it expects this Peer perform the operation disposeWorker.
-
A bug concerning PUT command was fixed. The second argument (destination file) of the PUT command was ignored. The worker was ignoring the destination file for the file transfer, so the destination file name was the same as the source file name.
-
In this version, a VMware Server Executor was added to the Worker. So, it can now use VMWare Server (both in Linux and Windows) to run the remote tasks in a virtualized environment.
OurGrid 4.1.1
The OurGrid 4.1.1 code is the second 4.1 release candidate, it is still unstable. We have closed the desired functionalities for OurGrid's 4.1 version and we will release an official and stable 4.1.x version after some beta tests and bug fixes.
The changes since the OurGrid 4.1.0 version are:
-
Two bugs of remote Worker allocation were fixed. In the first one, the Worker did not call the Executor.beginAllcation process for remote workers. The second bug was the name of the remote method RemoteWorkerManagement.workForMygrid, which has been renamed to RemoteWorkerManagement.workForBroker.
-
The worker, peer and broker acceptance tests were updated and are green.
OurGrid 4.1.0
The OurGrid 4.1.0 code is a release candidate. We have closed the desired functionalities for OurGrid's 4.1 version and we will release an official and stable 4.1.x version after some beta tests and bug fixes.
There are a lot of changes in OurGrid design since the 3.3.3 version:
- The main architectural change is the replacement of RMI by XMPP as the communication layer. XMPP (http://xmpp.org/) is an open and extensible protocol. It uses a server relay architecture, just like e-mail does, therefore ports need to be open in the firewall only for the XMPP servers.
- Event driven architecture: the OurGrid components communicate among each other through asynchronous XMPP messages, which are queued in the receptor and are consumed by only one thread. So, the synchronization on OurGrid was simplified, reducing the number of residual bugs and enormously facilitating bug fixing.
- The OurGrid components were renamed. Now, OurGrid has the following components: i) the Broker (old MyGrid), which is the user interface; ii) the Peer, which manages the site; iii) the Worker (old User Agent), which runs remote tasks; iv) the Discovery Service (old Core Peer), which is a rendezvous service used to mount a community, connecting the Peers that belong to it; v) the Aggregator, which collects Peer's statistics and aggregates them; and the WebStatus, which queries the Aggregator data and shows it in a Web 2.0 interface.
- Virtualization for remote task execution - the remote tasks are run on an Executor, a Worker pluggable module. There are many Executor implementations and others may be developed in the future. Currently we have one that runs tasks directly on the Operation System (Windows and Linux) and three others that run tasks on Linux virtual machines with no access to the network. The second approach is recommended because it is more secure for the Worker machine. The current implementations for virtual machines are VServer for Linux and VMWare Server for Windows and Linux.
- The OurGrid functionality was totally reengineered from the previous code, which was refactored from scratch. The unit tests were replaced by acceptance tests that do not break if internal changes occur. There are 530 acceptance test cases assuring the OurGrid functionality (6 for Aggregator, 190 for Broker, 23 for Discovery Service, 204 for Peer and 107 for Worker).
-
Security portfolio - As Ourgrid is a simple grid solution to heterogeneous sites, we can not adopt a restrictive security schema. We provided a security portfolio, from which you can choose the security mechanisms that fits yours requirements. The portfolio includes:
- Message signature with asymmetric key pairs. It is mandatory but will not impact the site administrator because the keys are automatically generated in the component's installation;
- X.509 Certification. The OurGrid Peer can define the Certification Authorities it trusts. So, it will communicate just with the Peers certified by those CAs;
- Use of VOMS - Virtual Organization Membership Service. The OurGrid Peer can query a VOMS to communicate only to the Peers in the same Virtual Organization;
- Sabotage check - after downloading the result, a task can perform a check action to verify watermarks in the result.
- API: OurGrid has a new API. The old sync "UIManager API" has been replaced by two APIs the Sync client (to console applications) and the Assync client (to web and Swing applications).
- Dynamic WorkerSpec. Monitoring tools, like GMond can be attached to the Worker, and some dynamic properties, like CPU utilization and free disk space, can be set in the Worker Specification and can be used in the jobs requirements.
- Extended priority definition - the site administrator can define Peers to which the donation of resources should be prioritized, using the peers' public key. They will be prioritized over the other peers in the resource distribution. A hierarchy of priorities can be defined.
- Flexible statistics aggregation. The Aggregator component was created to store the community statistics. Actually, just the WebStatus uses the Agregator, however we are developing other tools to help the grid management using the statistic data.
- The OurGrid Peer uses a JavaDB database to store its data and statistics.
- The remote request state is not persistent, because it was the reason of several bugs; instead, the Broker refreshes, from time to time, the requests that have not been fully served.
|