From roy@cs.wisc.edu Fri Aug 17 16:31:19 2007 Date: Fri, 17 Aug 2007 15:23:02 -0500 From: Alain Roy To: osg-int@OPENSCIENCEGRID.ORG Subject: Clarification of use of Condor-Devel for site testing Hi, There has been some confusion about the use of the Condor-Devel package for site testing. This is an extra version of Condor (currently 6.9.3) that is installed for the OSG's CE package, and it is used only for running the new site validation probes. We call these probes and their scheduling with Condor the "RSV" work, which stands for "Resource and Service Validation". Let me address a few questions about this work. ===== Why do we use Condor? ===== Basic principle: Our goal is that software (like this site validation) should be able to ensure that it does not interfere with the operation of the system it is testing. We think that it's much harder to do that with cron than with condor cron. In some sense, cron and Condor can be thought of as two examples of batch job systems. All the details are different, but they both have a mechanism to specify and run jobs. Cron does very little management of jobs. Other than controlling when a process is started, all policy decisions are left to the processes themselves. Cron doesn't attempt to do anything to control them. It does not: * Ensure there aren't too many processes running * Ensure that a recurring job doesn't overlap itself * Attempt to control the impact of the processes on the system load All of those are left up to the processes themselves. With Condor's cron implementation we can handle all of those things without touching the tests themselves. Condor can: * Control the number of processes running * Ensure cron jobs don't overlap * Kill processes that take too many system resources (Caveat: for the initial OSG site testing package, we don't actually control the number of process or monitor system resources, but we could.) It may well be that Condor's cron implementation won't do these things perfectly because it's new. But Miron is willing to put effort to improve it if there are problems. Recap: our goal is that software (like this site validation) should be able to ensure that it does not interfere with the operation of the system it is testing. We think that it's much harder to do that with cron than with Condor cron. ===== Why do we install Condor 6.9.x? ===== The Condor cron functionality is new with Condor 6.9, and is not present in older versions. ===== Will we always have Condor-Devel? ===== When the next stable release of Condor is released, it will go into the next stable VDT. Hopefully in the VDT release due at Christmas time, there will be a single Condor installation. ===== Why don't we use the site's existing Condor installation? ===== Theoretically we could, as long as the site was using Condor 6.9.3 or later. However, this would introduce an additional code path (a conditional installation) and it was hard to get everything done as-is without that complexity. This is a good idea for the future! ===== Condor-Devel interferred with my existing Condor! ===== Before VDT 1.8.0d, we had a couple of bugs that caused Condor-Devel to play badly with another Condor already on your site. Our sincere apologies!! We think these were all fixed in VDT 1.8.0d, but if you see any remaining problems do not hesitate to let us know. Our expectation is that normally you should not see the Condor-Devel at all, except as a couple extra processes in your process list. If you want to look at the Condor-Devel's queue, you need to change your environment a bit to do so. (Source $VDT_LOCATION/vdt/etc/condor-devel-env.SHELL) ===== I don't want to use Condor! ===== We don't require anyone to use Condor to run the tests. They can be run by hand or via cron: feel free to do so if you disagree with our use of Condor. At the moment, we don't provide any documentation for setting everything working (running tests, generating web pages, uploading results to Gratia) without Condor because we haven't had time to pursue it. If this is of interest to you, document what you do and we can share it around. The RSV team can also answer your questions (Arvind Gopu, Rob Quick, Scot Kronenfeld, Alain Roy). I hope this helps answer your questions. Let us know if you have any more. -alain