Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[hyades-dev] Notes on the Process Controller Interface document

Reading the Process Controller Interface document, I have some first-pass 
thoughts.

I won't be at the Thursday meeting this week, but Kim Coleman will be. I 
hope she'll guide the group through these points. With most of the 
questions listed below, besides wanting to know the answer, I think the 
answer should appear in the document. That way the NEXT reader doesn't 
have to ask the same questions.

1. The terminology in the command names is a little inconsistent. Why do 
we "add" an exe descriptor, but "create" a descriptor group? I think 
"create" should be the verb for any operation that requires a subsequent 
"delete." Also, I think EXEDESCRIPTOR should split into two words 
everywhere it appears: EXE_DESCRIPTOR. Finally, since "delete" is used for 
both exe descriptors and group descriptors, it should drop EXE from its 
command name.

2. If a command creates something that needs to be deleted, the "create" 
command's description should say so. In this draft, add_exedescriptor 
doesn't mention the need for a subsequent "delete." The description of 
create_descriptor_group does mention it.

3. Looking at CID_STOP_PROCESS, I think there's a lot more to say about 
that command and its potential interactions with other commands. During 
the waiting time, is the agent blocked from receiving and processing other 
commands? Or is the waiting done on another thread, meaning the response 
can come out-of-order with other commands and their responses? If you 
don't want to deal with this, you can take the time element away from this 
command. Define "stop" with two modes, "normal" and "force." Each returns 
success if the signal was successfully delivered to the process, 
regardless of whether it actually died as a result. If a client wants to, 
it can watch for the PROCESS_ENDED event, or use QUERY_PROCESS_NAME to 
poll for the continued existence of the process and decide whether to try 
killing it more forcefully.

4. CID_QUERY_PROCESS_NAME seems awfully limited. What is a process "name" 
anyway? It'll vary from machine to machine. Is it the full path to the 
executable, or the basename, or something else? Does it include the 
command line arguments? I'd certainly like it to: the executable name 
alone is sometimes not helpful. Consider how useless it is to return the 
name "java" without the command line.

5. When using CID_LAUNCH_PROCESS to launch for a process group, the PID 
return value doesn't make sense. You need an array of PIDs. Either that, 
or CID_POSTRUN_EXE and CID_POSTRUN_GROUP should carry PIDs. Or something.

6. What's the relationship between the event CID_POSTRUN_EXE and the 
CID_PROCESS_LAUNCHED response to the CID_LAUNCH_PROCESS command? Maybe the 
event is only used for members of a group, not for an individual launch. 
If so, it should be clarified in the document. Also, this event doesn't 
include the PID of the just-started process, which leaves the client 
unable to (for example) attach to one of the constituent processes in a 
group later. identify or control that process through other means.

7. What's the relationship between the event CID_EXE_LAUNCH_FAILED and the 
response code CID_LAUNCH_FAILED for the command CID_LAUNCH_PROCESS? Again, 
maybe this is only used for the constituents of a group - this should be 
clarified in the document.

8. The "attach" command uses a process ID, so you can attach to processes 
that you didn't launch. But the CID_PROCESS_ENDED event uses a descriptor 
ID - this means it can only report on processes that were launch using 
descriptors. I think the PROCESS_ENDED event should carry a PID.

9. There isn't a command to send an arbitrary (numbered) signal to a 
process. I know a signal isn't a universal concept, but it's common enough 
and useful enough to have it around. For example, many JVMs will drop a 
heap dump in response to a certain "kill" signal, both on Unix and 
Windows. For me, that's reason enough to make it part of the protocol.

10. The document should indicate that the PIDs used in the protocol are 
truly the PIDs as they are known to the underlying system. Otherwise an 
implementation would be free to synthesize its own PIDs, and they wouldn't 
match the ones needed for other commands outside this protocol. (If the 
system doesn't use numeric PIDs that fit in 8 bytes, this won't work.) 

11. This document should discuss the dangers of using PIDs. You can use a 
PID to query for a process name (for example), then send a STOP command to 
kill that process, but in between it's possible that the original process 
died and a new one was started with the same PID. In that case you aren't 
killing the process you think you're killing. There's nothing to do about 
this, but users of this protocol should be reminded of the hazards.

12. I don't think the EnvironmentVars part of the ExeDescriptor is 
described completely enough. This is a big deal. When it's present, is the 
entire environment replaced with the one found here? What is the format of 
the string - how is one environment variable separated from another? I'd 
like the client to be able to query the default environment, the one that 
the agent itself was started with. Then client can use that as a basis for 
building a new environment: keep most variables, prepend/append stuff to 
some, delete others. The ability to query the agent's environment also 
gives the agent a way to communicate arbitrary information to the client: 
put it in an environment variable. For example, today's RAC config file 
makes use of RASERVER_HOME to set things like LIBPATH and CLASSPATH.

-- Allan Pratt, apratt@xxxxxxxxxx
Rational software division of IBM



Back to the top