I tried updating my proxy to use a lock to protect the call to send
the
event messages and it makes no difference. Looking at the list.c
code in
org.eclipse.ptp.utils, it should not have made a difference since
the list
is already protected with a lock and my function is just a call to
proxy_svr_queue_msg, which is in turn nothing more than a call to
AddToList. So it looks like something else is going wrong.
Dave
Dave Wootton/Poughkeepsie/IBM@IBMUS
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/30/2007 09:25 AM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>,
ptp-dev-bounces@xxxxxxxxxxx
Subject
Re: [ptp-dev] Error on ProxyPacket
Greg
I am getting the failure intermittently. I ran my application a couple
times with my proxy, shut the proxy down, restarted and got the
error. For
some reason, I'm seeing this fairly consistently this morning, so I've
attached 5 logs. log5 is the simplest case where I just started a
remote
proxy ran my program and it failed immediately.
Looking at the logs, it appears something is going wrong just after
the
event containing the list of processes in the job is processed. I
looked
at my code and the sequence at this point where I've fork/exec the
poe app
is that I should be sending a new job event, then an ok event from
my main
thread. In the meantime, a second thread I created is watching for the
attach.cfg file poe creates so I get get the task/node/pid mapping
for my
program. Once I have that, I create and send the event with the task
list.
Then that thread exits, and at that point, the next message I get
should
be a process change event with stdout text. I don't seem to get
that. It's
possible I'm doing something wrong but this code has been running
for a
few months now without problems until recently. The only possible
problem
I have is that I don't properly synchronize between my main thread
which
issues the new job event and the monitor thread which sends the task
list,
so in theory I could send the task list before the new job message. In
reality, I think that's almost impossible since that means that the
poe
process needs to be fork/execed, the application tasks created, and
then
the attach.cfg file created before my main thread issues a new job
event
and an ok event.
What might be happening is that I have a second potential race
condition
between this same monitoring thread and the main thread, where the
main
thread is generating the process change event with stdout text. I
don't
have a lock on the function that equeues the event message, so it's
possible that both threads are trying to create events and one
thread's
message gets trashed. I tried running with my proxy redirecting
stdout to
a file and I didn't see the problem. As soon as I ran with the proxy
generating process change events again, I got the problem back.
I'm not sure what proxy Clement is using. If that proxy also uses
threads,
that might explain what's going on.
Dave
Greg Watson <g.watson@xxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/28/2007 09:14 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Subject
Re: [ptp-dev] Error on ProxyPacket
I just committed a check for NumberFormatException. Can you send the
output?
Greg
On Nov 28, 2007, at 1:35 PM, Dave Wootton wrote:
Is there a switch I can turn on of some sort, such as a compile
flag, that
will print out the actual message data? Otherwise, the closest I
think I
can get is to paste the last event message that was logged in the
console
(before the failing message) and hope that gets close enough to
where the
problem is.
Dave
Greg Watson <g.watson@xxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/27/2007 01:41 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Subject
Re: [ptp-dev] Error on ProxyPacket
Can you get the whole proxy message that caused the error? That way
we'd at least know where it was coming from.
Greg
On Nov 26, 2007, at 3:43 PM, Dave Wootton wrote:
I can't recreate this consistently. Sometimes I run my application
without
problems, but then the next time I get the exception. My traceback
is
slightly different from the stack entry for Integer.parseInt and
whatever
it calls, probably due to a different Java runtime, but identical
before
that. In looking at code, it looks like teh ProxyPacket.read method
is
trying to parse what it thinks is an 8 hex digit integer and failing
when
it sees the ' ' at the start of the string. Since I can't reliably
recreate this, I'm not sure what's happening. A cojuple
possibilities are
that whatever is generating the packet is generating garbage for
length
strings sometimes or the communications sequence is out of sync and
the
read method is reading something which is not really a length
string.
Dave
Clement Kam Man Chu <clement.chu@xxxxxxxxxxxxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/26/2007 03:06 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Subject
Re: [ptp-dev] Error on ProxyPacket
Dave Wootton wrote:
I've also been seeing this intermittently for a while. I updated
from
head
today and just saw this again, using a remote proxy.
Dave
Hi Dave,
Do you know how to reproduce this error? I am not sure because
this
error does not occur frequently. Sometimes occurred after I
launched a
debug job with a large number of processes.
Clement
Clement Kam Man Chu <clement.chu@xxxxxxxxxxxxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/21/2007 10:38 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Subject
[ptp-dev] Error on ProxyPacket
Hi,
I got the following error from the latest version of head.
java.lang.NumberFormatException: For input string: " 00df:00"
at
java
.lang
.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:447)
at
org.eclipse.ptp.proxy.packet.ProxyPacket.read(ProxyPacket.java:157)
at
org
.eclipse
.ptp
.proxy
.client
.AbstractProxyClient.sessionProgress(AbstractProxyClient.java:
354)
at
org.eclipse.ptp.proxy.client.AbstractProxyClient.access
$8(AbstractProxyClient.java:352)
at
org.eclipse.ptp.proxy.client.AbstractProxyClient
$2.run(AbstractProxyClient.java:297)
Clement
--
Clement Kam Man Chu
Research Assistant
Faculty of Information Technology
Monash University, Caulfield Campus
Ph: 61 3 9903 2355
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
<
log
>
<
log5><log2><log3><log4>_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev