Bug 190803 - [dstore] Canceling a long-running dstore job prints "InterruptedException" to stdout and makes all Eclipse freeze ui later on
Summary: [dstore] Canceling a long-running dstore job prints "InterruptedException" to...
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 2.0.1   Edit
Assignee: David McKnight CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords:
Depends on: 188160 190013
Blocks: 190750 176606 191280
  Show dependency tree
 
Reported: 2007-06-04 09:28 EDT by Martin Oberhuber CLA
Modified: 2007-07-10 09:03 EDT (History)
0 users

See Also:
mober.at+eclipse: pmc_approved+
ddykstal.eclipse: review+
mober.at+eclipse: review+
mober.at+eclipse: review+


Attachments
patch to not print stack traces (4.03 KB, patch)
2007-06-07 09:13 EDT, David McKnight CLA
no flags Details | Diff
updated patch according to Martin's comments (10.59 KB, patch)
2007-06-08 11:02 EDT, David McKnight CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2007-06-04 09:28:37 EDT
+++ This bug was initially created as a clone of Bug #190013 +++

When trying to reproduce bug #190013, I did this:
* connected to a Solaris dstore-unix server (szg-anar, Running).
* Selected file "hostname.txt file in the "My Home" folder (/folk/mober).
* Choose Right-click > Move.
  --> Move to: Dialog was shown
* In the Move to: dialog, I went down to the "Root" folder and expanded
  /folk/ which resulted in a very long-running operation (Showing "Pending...")
* While still pending, I tried to also expand "My Home"/aatmp
  which also showed Pending
* After a while, cancel the "Move To" dialog. Go to the Eclpise Progress view.
  Cancel job "RSE Subsystem Operation: Expand /folk"
  --> Get the exception below printed to stdout:
* Select "My Home"/hostname.txt" again, right-click > Move To
  --> Eclipse freezes with sandclock (unusable) for about 4 minutes
  --> After that, "Move To" dialog is shown but "My Home" is not expandable
      (there is no + sign below it)
* Choosing "Refresh" on the "My HOme" item in the move to dialog starts another
  VERY slow job -- perhaps it's trying to refresh too many parents? In which
  case this might be a duplicate of bug #188160.

Here is the InterruptedException printed:


java.lang.InterruptedException
	at org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:343)
	at org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:249)
	at org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:88)
	at org.eclipse.rse.internal.services.dstore.files.DStoreFileService.fetch(DStoreFileService.java:1231)
	at org.eclipse.rse.internal.services.dstore.files.DStoreFileService.getFilesAndFolders(DStoreFileService.java:1177)
	at org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.getFilesAndFolders(FileServiceSubSystem.java:293)
	at org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.listFoldersAndFiles(FileServiceSubSystem.java:331)
	at org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.listFoldersAndFiles(RemoteFileSubSystem.java:948)
	at org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.internalResolveOneFilterString(RemoteFileSubSystem.java:841)
	at org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.internalResolveFilterString(RemoteFileSubSystem.java:782)
	at org.eclipse.rse.core.subsystems.SubSystem.resolveFilterString(SubSystem.java:2153)
	at org.eclipse.rse.internal.files.ui.view.SystemViewRemoteFileAdapter.internalGetChildren(SystemViewRemoteFileAdapter.java:692)
	at org.eclipse.rse.internal.files.ui.view.SystemViewRemoteFileAdapter.getChildren(SystemViewRemoteFileAdapter.java:567)
	at org.eclipse.rse.ui.operations.SystemFetchOperation.execute(SystemFetchOperation.java:236)
	at org.eclipse.rse.ui.operations.SystemFetchOperation.run(SystemFetchOperation.java:109)
	at org.eclipse.rse.ui.view.AbstractSystemViewAdapter.fetchDeferredChildren(AbstractSystemViewAdapter.java:1968)
	at org.eclipse.ui.progress.DeferredTreeContentManager$1.run(DeferredTreeContentManager.java:196)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 1 Martin Oberhuber CLA 2007-06-04 09:31:31 EDT
Eventually, the Refresh job for "My Home" returns but the My Home filter still has no children (no + icon shown). This makes dstore totally unusable for me.
Comment 2 Martin Oberhuber CLA 2007-06-04 09:32:27 EDT
On the Server side, no exceptions are printed, but also in the SystemView, a refresh on "My Home" is now unresponsive.
Comment 3 Martin Oberhuber CLA 2007-06-05 11:38:37 EDT
Discussed in committer meeting 5-jun-2007:

Problem is that the dstore server is still busy running listFiles() query.
It's running in the same thread as all threads of the file system miner.

Making it cancelable is straightforward (we actually do it for other queries): java.io.listFiles() would need to run in a separate Thread and respond to dstore cancellation messages.

Would need to add a new command for the miner, no change to dstore protocol or API. Unsure if doing this would lead to any performance issues with regular queries. 

Assigning RC3 in order to take out the "InterruptedException" printout; investigate but potentially defer to 2.0.1. Potentially reduce severity since it's dstore only.
Comment 4 David McKnight CLA 2007-06-07 09:13:08 EDT
Created attachment 70494 [details]
patch to not print stack traces

first part of this is to not show the stack traces
Comment 5 David McKnight CLA 2007-06-07 16:52:57 EDT
Please review the patch for not printing the exception. 
Comment 6 Martin Oberhuber CLA 2007-06-08 01:27:28 EDT
I'm OK with catching away InterruptedException without logging it. 
But other Exception cought should be logged.
Therefore, at least the first two occurrences in the patch should be changed.

Second, I'm wondering if there is any case where we have a progress monitor (that's probably not directly related to the InterruptedException) that we should set canceled:
   monitor.setCanceled()

Third, there is a problem with not being able to tell InterruptedException from operation cancellation. This will be addressed in the future through bug #190750. Therefore, I recommend the following for now: For the InterruptedException, please remove the commented-out stacktrace print and replace it by a comment e.g.
  //InterruptedException is used to report user cancellation, so no need to log
  //This should be reviewed (use OperationCanceledException) with bug #190750
Comment 7 David McKnight CLA 2007-06-08 11:02:35 EDT
Created attachment 70695 [details]
updated patch according to Martin's comments
Comment 8 David McKnight CLA 2007-06-08 11:06:14 EDT
(In reply to comment #6)
> I'm OK with catching away InterruptedException without logging it. 
> But other Exception cought should be logged.
> Therefore, at least the first two occurrences in the patch should be changed.

It turns out that those first two occurrences can only be InterruptedExceptions as well, so I've changed the catch to reflect that in my patch.

> 
> Second, I'm wondering if there is any case where we have a progress monitor
> (that's probably not directly related to the InterruptedException) that we
> should set canceled:
>    monitor.setCanceled()
> 

My new patch takes care of that if the monitor has not yet been canceled.

> Third, there is a problem with not being able to tell InterruptedException from
> operation cancellation. This will be addressed in the future through bug
> #190750. Therefore, I recommend the following for now: For the
> InterruptedException, please remove the commented-out stacktrace print and
> replace it by a comment e.g.
>   //InterruptedException is used to report user cancellation, so no need to log
>   //This should be reviewed (use OperationCanceledException) with bug #190750
> 

I've updated the catch statements with your suggestion.  Please review this again, thanks.
Comment 9 Martin Oberhuber CLA 2007-06-11 09:43:58 EDT
Hm... patch looks better now, but when I browse from DStoreFileService into DownloadListener.waitForUpdate(0), I see that the code contradicts what you write in the comment:

This method returns when the user canceled, but throws an InterruptedException when the network is down. I'm wondering if we shouldn't treat the "Network Down" situation differently.

I'm basically OK with applying the patch now since it doesn't seem to do harm to the codebase and improve the situation. But PLEASE review the docs to have them consistent with what the code does. Also, the docs "what's being done" should be BEFORE the "thing that's done" and not after it. 

You can make these changes through a separate bug that you file (e.g. "Review Documentation around DownloadStatusListener.waitForUpdate()") or try doing it right away through this bug. 

Since I'm asking for Doc changes only from here on, I'm approving the patch.
Comment 10 David McKnight CLA 2007-06-11 10:13:05 EDT
I've applied the patch for the interrupted exception.  I've lowered the priority of this defect and moved it's milestone to 2.0.1 to deal with the cancellation.  For the canceled vs network down stuff, I think I'll take care of this later in 2.0.1 since network down doesn't appear to be used right now and I may put the code for issuing the dstore cancel command in status monitor class.  The comments in the catch interrupted exception code are there to indicate why we're not logging anything, and are separate from the supplementary monitor.cancellation code.
Comment 11 David McKnight CLA 2007-07-09 14:13:09 EDT
Support for cancellation of query commands are now supported for dstore files.  For other operations, such as delete, rename, etc, we'll need to handle on a case-by-case basis.  For the particular problem reported by Martin, the issue isn't resolved by the cancel, since the more likely problem is that the query was successful but the volume of file info that is transferred back was so large that there appeared to be a hang.  Since the cancellation support is now here, I'll mark this as fixed.
Comment 12 Martin Oberhuber CLA 2007-07-10 07:13:37 EDT
Having cancellation support in RSE 2.0.1, what will happen if a 2.0.1 client operates against a 2.0 server? Or a 2.0 client against a 2.0.1 server? Could there be any issues that need to be looked at?
Comment 13 David McKnight CLA 2007-07-10 09:03:06 EDT
(In reply to comment #12)
> Having cancellation support in RSE 2.0.1, what will happen if a 2.0.1 client
> operates against a 2.0 server? Or a 2.0 client against a 2.0.1 server? Could
> there be any issues that need to be looked at?
> 

If a 2.0.1 client operates against a 2.0 server, when the user clicks cancel, the datastore status monitor looks for the cancel command to send to the host but can't find it, since, on the 2.0 server, the schema for file commands aren't described as allowing cancel.  Since no cancel command is found, no cancel command is issued and the behaviour remains as it did in 2.0.   

If a 2.0 client operations against a 2.0.1 server, when the user clicks cancel, the client does not attempt to send a cancel to the host.  The behaviour remains the same as it did in 2.0.