[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[hyades-dev] Notes on the Process Controller Interface document
|
Reading the Process Controller Interface document, I have some first-pass
thoughts.
I won't be at the Thursday meeting this week, but Kim Coleman will be. I
hope she'll guide the group through these points. With most of the
questions listed below, besides wanting to know the answer, I think the
answer should appear in the document. That way the NEXT reader doesn't
have to ask the same questions.
1. The terminology in the command names is a little inconsistent. Why do
we "add" an exe descriptor, but "create" a descriptor group? I think
"create" should be the verb for any operation that requires a subsequent
"delete." Also, I think EXEDESCRIPTOR should split into two words
everywhere it appears: EXE_DESCRIPTOR. Finally, since "delete" is used for
both exe descriptors and group descriptors, it should drop EXE from its
command name.
2. If a command creates something that needs to be deleted, the "create"
command's description should say so. In this draft, add_exedescriptor
doesn't mention the need for a subsequent "delete." The description of
create_descriptor_group does mention it.
3. Looking at CID_STOP_PROCESS, I think there's a lot more to say about
that command and its potential interactions with other commands. During
the waiting time, is the agent blocked from receiving and processing other
commands? Or is the waiting done on another thread, meaning the response
can come out-of-order with other commands and their responses? If you
don't want to deal with this, you can take the time element away from this
command. Define "stop" with two modes, "normal" and "force." Each returns
success if the signal was successfully delivered to the process,
regardless of whether it actually died as a result. If a client wants to,
it can watch for the PROCESS_ENDED event, or use QUERY_PROCESS_NAME to
poll for the continued existence of the process and decide whether to try
killing it more forcefully.
4. CID_QUERY_PROCESS_NAME seems awfully limited. What is a process "name"
anyway? It'll vary from machine to machine. Is it the full path to the
executable, or the basename, or something else? Does it include the
command line arguments? I'd certainly like it to: the executable name
alone is sometimes not helpful. Consider how useless it is to return the
name "java" without the command line.
5. When using CID_LAUNCH_PROCESS to launch for a process group, the PID
return value doesn't make sense. You need an array of PIDs. Either that,
or CID_POSTRUN_EXE and CID_POSTRUN_GROUP should carry PIDs. Or something.
6. What's the relationship between the event CID_POSTRUN_EXE and the
CID_PROCESS_LAUNCHED response to the CID_LAUNCH_PROCESS command? Maybe the
event is only used for members of a group, not for an individual launch.
If so, it should be clarified in the document. Also, this event doesn't
include the PID of the just-started process, which leaves the client
unable to (for example) attach to one of the constituent processes in a
group later. identify or control that process through other means.
7. What's the relationship between the event CID_EXE_LAUNCH_FAILED and the
response code CID_LAUNCH_FAILED for the command CID_LAUNCH_PROCESS? Again,
maybe this is only used for the constituents of a group - this should be
clarified in the document.
8. The "attach" command uses a process ID, so you can attach to processes
that you didn't launch. But the CID_PROCESS_ENDED event uses a descriptor
ID - this means it can only report on processes that were launch using
descriptors. I think the PROCESS_ENDED event should carry a PID.
9. There isn't a command to send an arbitrary (numbered) signal to a
process. I know a signal isn't a universal concept, but it's common enough
and useful enough to have it around. For example, many JVMs will drop a
heap dump in response to a certain "kill" signal, both on Unix and
Windows. For me, that's reason enough to make it part of the protocol.
10. The document should indicate that the PIDs used in the protocol are
truly the PIDs as they are known to the underlying system. Otherwise an
implementation would be free to synthesize its own PIDs, and they wouldn't
match the ones needed for other commands outside this protocol. (If the
system doesn't use numeric PIDs that fit in 8 bytes, this won't work.)
11. This document should discuss the dangers of using PIDs. You can use a
PID to query for a process name (for example), then send a STOP command to
kill that process, but in between it's possible that the original process
died and a new one was started with the same PID. In that case you aren't
killing the process you think you're killing. There's nothing to do about
this, but users of this protocol should be reminded of the hazards.
12. I don't think the EnvironmentVars part of the ExeDescriptor is
described completely enough. This is a big deal. When it's present, is the
entire environment replaced with the one found here? What is the format of
the string - how is one environment variable separated from another? I'd
like the client to be able to query the default environment, the one that
the agent itself was started with. Then client can use that as a basis for
building a new environment: keep most variables, prepend/append stuff to
some, delete others. The ability to query the agent's environment also
gives the agent a way to communicate arbitrary information to the client:
put it in an environment variable. For example, today's RAC config file
makes use of RASERVER_HOME to set things like LIBPATH and CLASSPATH.
-- Allan Pratt, apratt@xxxxxxxxxx
Rational software division of IBM