Bug 231430 - Proxy resolution failures should be cached
Summary: Proxy resolution failures should be cached
Status: VERIFIED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: XML/XMI (show other bugs)
Version: 2.4.0   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Ed Merks CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 278347 (view as bug list)
Depends on:
Blocks: 233165
  Show dependency tree
 
Reported: 2008-05-10 09:42 EDT by Ed Willink CLA
Modified: 2009-06-25 03:00 EDT (History)
4 users (show)

See Also:


Attachments
JUnit test demonstrating problem (2.85 KB, text/plain)
2008-05-15 16:30 EDT, Ed Willink CLA
no flags Details
Handle a failure to create an stream just as if the resource were loaded from an empty stream (2.21 KB, patch)
2008-05-15 18:33 EDT, Ed Merks CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Willink CLA 2008-05-10 09:42:01 EDT
If proxy resolution fails, resolution does not occur, so a further attempt to access the same object or another object with the same URI attempts resolution again and again and again.

In one ridiculous case in which an 'http:...' URI was unregistered, it a Junit test took 200 seconds to process a modest sized resource because of the repeated attempts to resolve the same bad proxy on the web.

If proxy resolution fails, all further resolutions of the same URI could usefully be bypassed, possibly by installing a bad resource place-holder in the URI map.
Comment 1 Ed Merks CLA 2008-05-10 17:11:06 EDT
To resolve a proxy, a resource is created and one attempt is made to load that resource.  No further attempts will be made to load the resource until it's unloaded first.  Of course if you run tests each one creating a new resource set, each one will fail, but I wouldn't entertain the idea of creating a global list of bad URIs never to try again.  Failures can be transient, and other processing can result in the resource being created.  So I don't really see a need for more than what we have, i.e., try once and cache the failed result.
Comment 2 Ed Willink CLA 2008-05-10 17:17:18 EDT
It may be that each individual proxy is blocked, but given a model with 100 proxies to another, an invocation of an algorithm such as resolve all proxies will fail 100 times rather than just once.

I'm certainly not advocating a global list, the URI map would be in the ResourceSet so each new ResourceSet starts afresh. Re-use of an existing ResourceSet re-uses accessed/failed accessed resources.
Comment 3 Ed Merks CLA 2008-05-10 17:28:13 EDT
An attempt to resolve a proxy that references a resource that has failed to load will just look up the fragment of the proxy URI within the resource with that trimmmed proxy URI.  This is a mapped lookup that returns null.  I don't see how checking some other map will improve anything...
Comment 4 Ed Willink CLA 2008-05-12 16:50:05 EDT
The failure is indeed cached by a resource, provided that the Resource.Factory resolution returns non-null.

The actual problem was my derived ResourceSet to support content type analysis. This is now redundant thanks to the EMF content types that seem pretty good.
Comment 5 Ed Willink CLA 2008-05-15 16:30:16 EDT
Created attachment 100544 [details]
JUnit test demonstrating problem

The problem is actaully far worse than I thought.

The attached example creates a model with a single bad proxy, then invokes EcoreUtil.resolveAll.

This results in 5 load exceptions per proxy being handled, rather than just one for all same-resource proxies.

The problem is that a Resource is created, but it is not loaded, so the 

        if (loadOnDemand && !resource.isLoaded())
        {
          demandLoadHelper(resource);
        }
        return resource;

test in ResourceSet.getResource keeps trying.
Comment 6 Ed Merks CLA 2008-05-15 18:33:33 EDT
Created attachment 100569 [details]
Handle a failure to create an stream just as if the resource were loaded from an empty stream
Comment 7 Ed Willink CLA 2008-05-16 02:25:36 EDT
Magic. This reduces one bad test case from 30 to 2 seconds.

I'm not sure why you leave the errors cleared. Surely the exception should be added to the errors so that an unsuccessful load can be externally diagnosed and distinguished from a successful empty load?

[I changed the resolution to INVALID because a bug in my obsolete code prevented the Resource getting added to the ResourceSet. I've reopened now that there is a definite problem in EMF as well.]
Comment 8 Ed Merks CLA 2008-05-16 09:47:01 EDT
The exception will be added as a diagnostic by ResourceSetImpl.handleDemandLoadException.
Comment 9 Dave Steinberg CLA 2008-05-16 18:34:36 EDT
What about if demand load isn't used?  What if you just create() it in the resource set and then load() it?  Won't this look as if it worked perfectly?
Comment 10 Ed Merks CLA 2008-05-16 18:41:54 EDT
If the client calls load directly, then they need to handle the IOException that's thrown by the call.  So they'll know a failure has occurred.  They can record it if they like.  Probably it would have been better for the load call itself to have recorded the exception, but since we can't know we're being called by demand load or directly, we're not in such a good position to change that behavior...

I doubt anyone will notice this change in behavior we're introducing except that broken proxies won't tank performance by repeatedly trying to open an input stream that can't be opened.
Comment 11 Dave Steinberg CLA 2008-05-16 18:47:19 EDT
Oops, I just missed the blatantly obvious: that the exception still gets thrown in the error case (which is obviously what triggers handleDemandLoadException() being called).  I'm good with this change.
Comment 12 Ed Merks CLA 2008-05-16 19:33:58 EDT
The fix is committed to CVS.
Comment 13 Nick Boldt CLA 2008-05-20 13:47:39 EDT
Fix available in HEAD: 2.4.0RC1 (S200805201049).
Comment 14 David Williams CLA 2008-05-22 13:52:17 EDT
I am re-opening this enhancement request since in bug 233165 Ed has agreed to hold off making this fix, until WTP has time/ability to react. (In addition, I think adopters _might_ have made similar assumptions about the meaning of 'isLoaded'. 

Perhaps too, this request could be done in a way that changed no existing behavior, but was more of an addition? After all, in comment #10 Ed says "I doubt anyone will notice this change in behavior" ... so, since someone has noticed this change in behavior, I'd suggest some re-thought and some wider, longer term communication to make sure no one else would "notice" it either. 

Comment 15 Ed Willink CLA 2008-05-22 15:00:46 EDT
I don't think there is a change.

A Resource in a ResourceSet has three states

'Indeterminate': !loaded
'Valid': loaded && errors.size() == 0
'Invalid': loaded && errors.size() != 0

The difference is that a failed proxy resolution left the Resource stuck in 'Indeterminate'. Now it is promoted to 'Invalid' a perfectly reasonable
state for rubbish content.

Any code that assumes loaded means 'Valid' was and still is wrong.
Comment 16 Ed Merks CLA 2008-05-22 15:18:44 EDT
Ed, I agree that this change is a good one, and that we should proceed with it, but given that it's happened late and folks were relying on the old faulty behavior, it seems best to defer this. I've set https://bugs.eclipse.org/bugs/show_bug.cgi?id=233165 to indicate that it's blocked by this. We'll commit this change at the start of the next release cycle.
Comment 17 David Williams CLA 2008-05-22 15:53:36 EDT
Thanks, I appreciate the discussion. 

I certainly don't know enough about EMF to be an expert on what's right and wrong, but from an out-sider (but indirect adopter :) point of view, where, is it documented that 

<quote>
A Resource in a ResourceSet has three states

'Indeterminate': !loaded
'Valid': loaded && errors.size() == 0
'Invalid': loaded && errors.size() != 0
</quote>

Even knowing that, reading through the JavaDoc it's not obvious to me that's what is meant. And ... it is probably there somewhere, I'm just saying I don't easily see it reading the JavaDoc for Resource and 'loaded' and 'errors'. 

Thanks again. 

Comment 18 Ed Merks CLA 2008-05-22 16:14:26 EDT
This is what it says for isLoaded:

  /**
   * ....
   * It will be set to <code>true</code> when the resource is {@link #load(Map) loaded}
   * and when {@link #getContents contents} are first added to a resource that isn't loaded.
   *
   */
  boolean isLoaded();

It makes it pretty clear that calling load should result in isLoaded becoming true.

In any case, I'm convinced that the change is the right one to make and I'm convinced that this is a bad time to do it.  If we do it at the start of the release cycle, WTP will have a year to react, so that seems best for the community as a whole, if not specifically to the EMF community.
Comment 19 David Williams CLA 2008-05-22 16:36:39 EDT
(In reply to comment #18)
> This is what it says for isLoaded:
> 

Yeah I read that, and was trying to say it's not clear to me that it means it should be true after a failed load with errors set.  

But, again, thanks for the education, and thanks for deferring. 

Comment 20 Ed Willink CLA 2008-05-23 01:43:51 EDT
I was trying to point out that hidden bugs are now perhaps being exposed. The kind of problem that is now visible is:

If the target of a proxy is rubbish:
   - bad XML syntax (SaxE)
   - bad URI (PNFE, HNFE, FNFE)
   - bad xmi:type (CCE)
and resolution occurs in casual code that cannot throw an exception, code
fails to observe the failure and so fails to diagnose the failure to the user leading to unhelpful behaviour. Now it can diagnose it.

So, while it may be appropriate to hold back the fix, it would be good for WTP to at least investigate the failures since there are probably good bugs to fix.
Comment 21 Ed Merks CLA 2008-05-23 09:19:27 EDT
Ed,

I totally agree.  It seems to me that the logic being used in WTP does reveal an error.  I.e., it's assuming the inability to create an input stream means the file doesn't exist when in fact it might mean that read permission isn't granted.  When it then reacts by trying to create the file, it might well overwrite an existing file.  So yes, WTP does need to investigate the assumptions in the impacted logic.  It's just a bad time for them to do that now.
Comment 22 Ed Merks CLA 2008-08-02 11:14:58 EDT
The change is committed to CVS for EMF 2.5.

David, note that WTP will need to take action on this now.
Comment 23 Nick Boldt CLA 2008-08-11 13:15:50 EDT
Fix available in HEAD: 2.5.0.I200808060700.
Comment 24 Ed Merks CLA 2009-05-29 06:27:13 EDT
*** Bug 278347 has been marked as a duplicate of this bug. ***
Comment 25 Dave Steinberg CLA 2009-06-25 03:00:25 EDT
Fix available in HEAD: 2.5.0 (R200906151043).