Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Error on ProxyPacket

The problem seems to occur only when receiving an attribute change event on a process. The string " 00e9:00" looks to me like the beginning of a CHANGE_PROCESS event that has been truncated to 8 characters. This seems to imply that either (a) processing the previous event consumed 8 characters too many (each packet should begin with an 8 character length field), or (b) the 8 character length field was not prepended to the packet when it was sent. It's hard to see how either of these could occur.

I added some packet debugging. Can you run again and send me the trace?

Greg

On Nov 30, 2007, at 9:51 AM, Dave Wootton wrote:

I tried updating my proxy to use a lock to protect the call to send the event messages and it makes no difference. Looking at the list.c code in org.eclipse.ptp.utils, it should not have made a difference since the list
is already protected with a lock and my function is just a call to
proxy_svr_queue_msg, which is in turn nothing more than a call to
AddToList. So it looks like something else is going wrong.

Dave



Dave Wootton/Poughkeepsie/IBM@IBMUS
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/30/2007 09:25 AM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>,
ptp-dev-bounces@xxxxxxxxxxx
Subject
Re: [ptp-dev] Error on ProxyPacket






Greg
I am getting the failure intermittently. I ran my application a couple
times with my proxy, shut the proxy down, restarted and got the error. For

some reason, I'm seeing this fairly consistently this morning, so I've
attached 5 logs. log5 is the simplest case where I just started a remote
proxy ran my program and it failed immediately.

Looking at the logs, it appears something is going wrong just after the event containing the list of processes in the job is processed. I looked at my code and the sequence at this point where I've fork/exec the poe app

is that I should be sending a new job event, then an ok event from my main

thread. In the meantime, a second thread I created is watching for the
attach.cfg file poe creates so I get get the task/node/pid mapping for my program. Once I have that, I create and send the event with the task list.

Then that thread exits, and at that point, the next message I get should be a process change event with stdout text. I don't seem to get that. It's

possible I'm doing something wrong but this code has been running for a few months now without problems until recently. The only possible problem I have is that I don't properly synchronize between my main thread which issues the new job event and the monitor thread which sends the task list,

so in theory I could send the task list before the new job message. In
reality, I think that's almost impossible since that means that the poe process needs to be fork/execed, the application tasks created, and then the attach.cfg file created before my main thread issues a new job event
and an ok event.

What might be happening is that I have a second potential race condition between this same monitoring thread and the main thread, where the main thread is generating the process change event with stdout text. I don't
have a lock on the function that equeues the event message, so it's
possible that both threads are trying to create events and one thread's message gets trashed. I tried running with my proxy redirecting stdout to
a file and I didn't see the problem. As soon as I ran with the proxy
generating process change events again, I got the problem back.

I'm not sure what proxy Clement is using. If that proxy also uses threads,

that might explain what's going on.

Dave




Greg Watson <g.watson@xxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/28/2007 09:14 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc

Subject
Re: [ptp-dev] Error on ProxyPacket






I just committed a check for NumberFormatException. Can you send the
output?

Greg

On Nov 28, 2007, at 1:35 PM, Dave Wootton wrote:

Is there a switch I can turn on of some sort, such as a compile
flag, that
will print out the actual message data? Otherwise, the closest I
think I
can get is to paste the last event message that was logged in the
console
(before the failing message) and hope that gets close enough to
where the
problem is.
Dave



Greg Watson <g.watson@xxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/27/2007 01:41 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc

Subject
Re: [ptp-dev] Error on ProxyPacket






Can you get the whole proxy message that caused the error? That way
we'd at least know where it was coming from.

Greg

On Nov 26, 2007, at 3:43 PM, Dave Wootton wrote:

I can't recreate this consistently. Sometimes I run my application
without
problems, but then the next time I get the exception. My traceback is
slightly different from the stack entry for Integer.parseInt and
whatever
it calls, probably due to a different Java runtime, but identical
before
that. In looking at code, it looks like teh ProxyPacket.read method
is
trying to parse what it thinks is an 8 hex digit integer and failing
when
it sees the ' ' at the start of the string. Since I can't reliably
recreate this, I'm not sure what's happening. A cojuple
possibilities are
that  whatever is generating the packet is generating garbage for
length
strings sometimes or the communications sequence is out of sync and
the
read method is reading something which is not really a length string.
Dave



Clement Kam Man Chu <clement.chu@xxxxxxxxxxxxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/26/2007 03:06 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc

Subject
Re: [ptp-dev] Error on ProxyPacket






Dave Wootton wrote:
I've also been seeing this intermittently for a while. I updated
from
head
today and just saw this again, using a remote proxy.
Dave



Hi Dave,

 Do you know how to reproduce this error?  I am not sure because
this
error does not occur frequently.  Sometimes occurred after I
launched a
debug job with a large number of processes.

Clement
Clement Kam Man Chu <clement.chu@xxxxxxxxxxxxxxxxxxxxxx>
Sent by: ptp-dev-bounces@xxxxxxxxxxx
11/21/2007 10:38 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc

Subject
[ptp-dev] Error on ProxyPacket






Hi,

I got the following error from the latest version of head.

java.lang.NumberFormatException: For input string: " 00df:00"
 at

java
.lang
.NumberFormatException.forInputString(NumberFormatException.java:48)
 at java.lang.Integer.parseInt(Integer.java:447)
 at
org.eclipse.ptp.proxy.packet.ProxyPacket.read(ProxyPacket.java:157)
 at

org
.eclipse
.ptp
.proxy
.client .AbstractProxyClient.sessionProgress(AbstractProxyClient.java:
354)
 at

org.eclipse.ptp.proxy.client.AbstractProxyClient.access
$8(AbstractProxyClient.java:352)
 at

org.eclipse.ptp.proxy.client.AbstractProxyClient
$2.run(AbstractProxyClient.java:297)

Clement




--
Clement Kam Man Chu
Research Assistant
Faculty of Information Technology
Monash University, Caulfield Campus
Ph: 61 3 9903 2355

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

< log > < log5><log2><log3><log4>_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



Back to the top