Bug 310345 - [concurrent] Asynchronous Cache Programming Model (ACPM) utilities for DSF
Summary: [concurrent] Asynchronous Cache Programming Model (ACPM) utilities for DSF
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-debug-dsf (show other bugs)
Version: 6.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-23 16:47 EDT by Pawel Piech CLA
Modified: 2020-09-04 15:24 EDT (History)
8 users (show)

See Also:


Attachments
Patch with acpm utitlities prototype. (46.25 KB, patch)
2010-04-23 16:47 EDT, Pawel Piech CLA
john.cortell: iplog-
Details | Diff
Patch with test and ffix for the wait() bug. (20.89 KB, patch)
2010-10-13 18:09 EDT, Pawel Piech CLA
pawel.1.piech: iplog-
Details | Diff
Improvements (21.60 KB, patch)
2010-10-19 11:24 EDT, John Cortell CLA
no flags Details | Diff
Range cache. (51.10 KB, patch)
2010-10-19 16:51 EDT, Pawel Piech CLA
no flags Details | Diff
cache does validate (7.60 KB, patch)
2010-10-27 14:47 EDT, Ken Ryall CLA
ken.ryall: iplog-
Details | Diff
Unit tests for measuring exceptions performance in Transaction (29.15 KB, patch)
2010-11-12 19:16 EST, Pawel Piech CLA
no flags Details | Diff
Full test results. (8.27 KB, text/plain)
2010-11-12 19:18 EST, Pawel Piech CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pawel Piech CLA 2010-04-23 16:47:04 EDT
Created attachment 165970 [details]
Patch with acpm utitlities prototype.

ACPM is a programming model akin to how async. callback methods and single threaded executor is a programming model.  It is being promoted by the TCF project where it is used extensively in the TCF debugger integration with the UI.
The programming model tries to simplify the task of using async. callback based APIs while at the same time guaranteeing that the data retrieved through various async. methods is consistent with each other.

The basic idea behind ACPM is the following:
1) Each piece of data that is to be retrieved from an asyc. API is backed by a cache. 
2) The client of async. data operates only on a set of cache objects.
3) At the beginning of a transaction, client code verifies that all required caches are "valid", meaning that the cache data is up to date.
4) If all caches are valid, the transaction completes its calculation and returns.
4) If any of the cache objects are not valid, the transaction halts and any needed async requests are made to retrieve data from async. data sources.
5) When the async. data requests complete the transaction is re-started from the beginning (going back to step 3).

The main concern with this programming model is that there is a potential for the transaction never completing as a result of the cache objects being invalidated faster than their requests for data are fulfilled.  The current recommended solution for this situation is for the client to cancel such transaction.

---------------------
The above pattern can be implemented in various ways and there are a couple of implementations of it already in the TCF project.  E.g. org.eclipse.tm.tcf.uti.TCFDataCache in the org.eclipse.tm.tcf.core plugin.  A couple of weeks before EclipseCon I prototyped an implementation of a cache and transaction utilities to support ACPM, which would integrate well with existing DSF interfaces.  I haven't had time to work on this prototype since, but I want to post the work here so as not to lose it and to allow others to comment.
Comment 1 John Cortell CLA 2010-04-23 16:52:08 EDT
Glad you posted this. Bringing this up was on my TODO list ever since the TCF BOF at EclipseCON.
Comment 2 John Cortell CLA 2010-10-13 16:42:31 EDT
Pawel, is AbstractCache.wait(RequestMonitor) neglecting to set newWaitingList into fWaitingList?
Comment 3 Pawel Piech CLA 2010-10-13 18:09:22 EDT
Created attachment 180834 [details]
Patch with test and ffix for the wait() bug.

(In reply to comment #2)
> Pawel, is AbstractCache.wait(RequestMonitor) neglecting to set newWaitingList
> into fWaitingList?

You're correct.  This patch adds a couple of simple tests for having multiple clients, which prove the bug, and it adds a fix.

If I remember correctly, I left off at a point where I wrote the logic for handling multiple clients, but I didn't write the tests yet... and I think I got a little carried away with the logic for handling multiple clients. 

I'm also seeing a failure in one of the cancel tests.  Handling client cancellation proved to be a rather complicated problem, because there are policy questions, like: if all clients cancel their requests, should the cache cancel it's request to data provider?  

I'll try to find some time on Friday to revisit the state of this patch and at least provide some more information on what I was intending to do with it next.
Comment 4 John Cortell CLA 2010-10-13 18:27:48 EDT
(In reply to comment #3)
Thanks. BTW, I've been heavily documenting the code as I learn it. This will get messy given we're just dealing with patch files. Any objection to adding this stuff into cvs? If at the end of the day it's not usable, we can always remove it. I just think it'll make things a lot easier as I try to make use of this.
Comment 5 Pawel Piech CLA 2010-10-13 18:29:22 EDT
(In reply to comment #4)
> (In reply to comment #3)
> Thanks. BTW, I've been heavily documenting the code as I learn it. This will
> get messy given we're just dealing with patch files. Any objection to adding
> this stuff into cvs? If at the end of the day it's not usable, we can always
> remove it. I just think it'll make things a lot easier as I try to make use of
> this.

Sure, I have no problem with putting it in CVS.
Comment 6 CDT Genie CLA 2010-10-18 12:23:02 EDT
*** cdt cvs genie on behalf of jcortell ***
Bug 310345: [concurrent] Asynchronous Cache Programming Model (ACPM) utilities for DSF (committing my commented version of Pawel's file. Pawel will be adding the others).

[+] RequestCache.java  http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/RequestCache.java?root=Tools_Project&revision=1.1&view=markup

[+] Transaction.java  http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/Transaction.java?root=Tools_Project&revision=1.1&view=markup
Comment 7 John Cortell CLA 2010-10-18 12:43:29 EDT
I'm having a tough liking the name of the "wait" method in AbstractCache. I think it's misleading, given the expectations set by the widely known Object.wait(). The calling thread doesn't wait at all with AbstractCache.wait(), so I'd like to see a different name  used. However, I'm at the moment unable to come up with a suitable, curt name. Maybe someone more creative than I can think of something.
Comment 8 Pawel Piech CLA 2010-10-18 12:55:25 EDT
(In reply to comment #6)
> *** cdt cvs genie on behalf of jcortell ***
> ...
I committed the rest of the rest of the cache code in progress.  I also added more tests for the cancellation logic and found/fixed a couple of bugs.

(In reply to comment #7)
> I'm having a tough liking the name of the "wait" method in AbstractCache.
> ...
I agree, but I'm afraid I also don't have any better suggestions at the moment.  Unless we were to rename request(DaraRequestMonitor<V> rm) to requestData() and rename wait(RequestMonitor) to request, it would make them more consistent with each other.  We could even overload the two, but that might lead to frustrating bugs.
Comment 9 CDT Genie CLA 2010-10-18 13:23:02 EDT
*** cdt cvs genie on behalf of ppiech ***
Bug 310345 - [concurrent] Asynchronous Cache Programming Model (ACPM) utilities for DSF

[+] CacheTests.java  http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.tests.dsf/src/org/eclipse/cdt/tests/dsf/concurrent/CacheTests.java?root=Tools_Project&revision=1.1&view=markup
[+] TransactionTests.java  http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.tests.dsf/src/org/eclipse/cdt/tests/dsf/concurrent/TransactionTests.java?root=Tools_Project&revision=1.1&view=markup

[+] AbstractCache.java  http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/AbstractCache.java?root=Tools_Project&revision=1.1&view=markup
[+] ICache.java  http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/ICache.java?root=Tools_Project&revision=1.1&view=markup
[+] ImmediateInDsfExecutor.java  http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/ImmediateInDsfExecutor.java?root=Tools_Project&revision=1.1&view=markup
[*] RequestCache.java 1.2 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/RequestCache.java?root=Tools_Project&r1=1.1&r2=1.2
[*] Transaction.java 1.2 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/Transaction.java?root=Tools_Project&r1=1.1&r2=1.2
Comment 10 John Cortell CLA 2010-10-18 14:29:33 EDT
(In reply to comment #8)
> I agree, but I'm afraid I also don't have any better suggestions at the moment.
>  Unless we were to rename request(DaraRequestMonitor<V> rm) to requestData()
> and rename wait(RequestMonitor) to request, it would make them more consistent
> with each other.  

I think that would be a big improvement.

> We could even overload the two, but that might lead to frustrating bugs.

Yeah; I'd definitely avoid that.
Comment 11 John Cortell CLA 2010-10-18 15:00:17 EDT
(In reply to comment #8)
> I agree, but I'm afraid I also don't have any better suggestions at the moment.
>  Unless we were to rename request(DaraRequestMonitor<V> rm) to requestData()
> and rename wait(RequestMonitor) to request, it would make them more consistent
> with each other.  We could even overload the two, but that might lead to
> frustrating bugs.

Maybe lunch inspired me. Wouldn't "update" be a better name here than either "wait" or "request". In the end, that's what the caller is doing. It's asking that the cache update its stale/unset value from the source.
Comment 12 Pawel Piech CLA 2010-10-18 15:12:15 EDT
(In reply to comment #11)
> Maybe lunch inspired me. Wouldn't "update" be a better name here than either
> "wait" or "request". In the end, that's what the caller is doing. It's asking
> that the cache update its stale/unset value from the source.

Update could work, I also like it better than wait or request.  Its downsides are that it's used in a lot of places in flexible hierarchy and it may imply to someone that cache will re-fetch data even if it's not stale.  Also if we use "update" should we just leave "request" as it is?
Comment 13 John Cortell CLA 2010-10-18 15:25:01 EDT
(In reply to comment #12)
> (In reply to comment #11)
> > Maybe lunch inspired me. Wouldn't "update" be a better name here than either
> > "wait" or "request". In the end, that's what the caller is doing. It's asking
> > that the cache update its stale/unset value from the source.
> 
> Update could work, I also like it better than wait or request.  Its downsides
> are that it's used in a lot of places in flexible hierarchy and it may imply to
> someone that cache will re-fetch data even if it's not stale.  

As long as the javadoc makes it clear, I don't think that's much of an issue. An alternative is "updateIfInvalid".

> Also if we use "update" should we just leave "request" as it is?
Hm. The more I think of it, the messier the dual methods are. Can we not simply have a single RequestMonitor monitor method and document that a DataRequestMonitor should be passed if the caller wants to receive the freshly obtained data.
Comment 14 Pawel Piech CLA 2010-10-18 18:22:33 EDT
Oops, I forgot about this thread: 

(In reply to comment #13)
> As long as the javadoc makes it clear, I don't think that's much of an issue.
> An alternative is "updateIfInvalid".
OTOH update should be enough :-)

> > Also if we use "update" should we just leave "request" as it is?
> Hm. The more I think of it, the messier the dual methods are. Can we not simply
> have a single RequestMonitor monitor method and document that a
> DataRequestMonitor should be passed if the caller wants to receive the freshly
> obtained data.

Yes, I don't like this duality either.  Hiding it in the documentation is probably not a good idea though.  I'm fine with just removing it, it's simple enough to code around it if need be.
Comment 15 John Cortell CLA 2010-10-18 18:27:57 EDT
(In reply to comment #14)
> > > Also if we use "update" should we just leave "request" as it is?
> > Hm. The more I think of it, the messier the dual methods are. Can we not simply
> > have a single RequestMonitor monitor method and document that a
> > DataRequestMonitor should be passed if the caller wants to receive the freshly
> > obtained data.
> 
> Yes, I don't like this duality either.  Hiding it in the documentation is
> probably not a good idea though.  I'm fine with just removing it, it's simple
> enough to code around it if need be.

What do you mean by "removing it"? Removing support for the client getting both notified and the updated data? I'm fine with that as I don't see a need for it. It seems sufficient to me just being notified that the cache object has become valid. To do this, we would wantto remove not only that method, but the 'instanceof DataRequestMonitor<?>' checks elsewhere in the implementation. Are we on the same page there?
Comment 16 Pawel Piech CLA 2010-10-18 18:37:46 EDT
(In reply to comment #15)
> Are we on the same page there?
Yes :-). +1 on removing the request() method and the instanceof DataRequestMonitor checks.
Comment 17 John Cortell CLA 2010-10-19 11:24:20 EDT
Created attachment 181192 [details]
Improvements

This patch addresses various things:

1. Removes the dual methods to request the cache object to update. The DataRequestMonitor variant was meant to allow the client to not only be notified when the cache has been updated, but to receive the updated data at the same time. We have no use case for this, it complicates things (see comments above), and the potential use case could be handled differently

2. Change 'wait' method name to 'update' (see comments above)

3. Removes the disable state. Discussing this with Pawel off-line, this seems to be an implementation relic.
Comment 18 Pawel Piech CLA 2010-10-19 11:45:49 EDT
(In reply to comment #17)
> Created an attachment (id=181192) [details]
> Improvements

+1 Please go ahead and commit, and I'll merge into my changes.  I have only one request: rather than deleting the disable tests, they should be replaced with "set" tests.  I.e. calling set() when cache is invalid should be the same as calling disable().
Comment 20 Pawel Piech CLA 2010-10-19 16:51:14 EDT
Created attachment 181227 [details]
Range cache.

This patch adds a range cache.  I don't think we'll get very far with ACPM in DSF if we don't solve the problem of caching ranges of data (subsets of child expressions, ranges of memory, etc.).  In this patch I took a stab at solving this problem generically with a range-cache object.  The range cache has a 
    ICache<List<V>> getRange(final long offset, final int count) 
method for clients, and a 
    void retrieve(long offset, int count, DataRequestMonitor<List<V>> rm)
method for implementors.  Its job is to translate between the two.

The current implementation has some limitations that need to be addressed:
1) It uses longs and ints for data offsets and sizes.  I don't know what's appropriate here though.  BigInteger seems like an overkill.
2) It doesn't implement coalesing.  If two consecutive ranges get requested one after another, the cache should be able to combine them into a single request.

The patch also continues the AbstractCache simplification: it removes the parameters from reset(), so that reset() only changes cache state from valid to invalid.  Also, I added IllegalStateExceptions to getData(), getStatus(), and reset() if called in invalid state.
Comment 22 Pawel Piech CLA 2010-10-20 12:00:04 EDT
Some cleanup: I removed the remaining support for DataRqeustMonitor when calling AbstractCache.update().
Comment 24 CDT Genie CLA 2010-10-20 17:23:01 EDT
*** cdt cvs genie on behalf of jcortell ***
Bug 310345: [concurrent] Asynchronous Cache Programming Model (ACPM) utilities for DSF (reintroduced the &quot;disabled&quot; tests)

[*] CacheTests.java 1.5 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.tests.dsf/src/org/eclipse/cdt/tests/dsf/concurrent/CacheTests.java?root=Tools_Project&r1=1.4&r2=1.5
Comment 25 Pawel Piech CLA 2010-10-20 17:31:52 EDT
(In reply to comment #24)
/dsf/org.eclipse.cdt.tests.dsf/src/org/eclipse/cdt/tests/dsf/concurrent/CacheTests.java?root=Tools_Project&r1=1.4&r2=1.5

Reading your comment in the test:

// ... the 'valid'
// state is not a reflection of the quality of the data, but merely whether
// the cache object's representation of the data is stale or
// not.

makes me think that ICache.isValid() is just a misleading method name.  It implies that the data was retrieved without errors.  Maybe ICache.isCurrent() would be better?
Comment 26 John Cortell CLA 2010-10-20 17:38:38 EDT
(In reply to comment #25)
> (In reply to comment #24)
> /dsf/org.eclipse.cdt.tests.dsf/src/org/eclipse/cdt/tests/dsf/concurrent/CacheTests.java?root=Tools_Project&r1=1.4&r2=1.5
> 
> Reading your comment in the test:
> 
> // ... the 'valid'
> // state is not a reflection of the quality of the data, but merely whether
> // the cache object's representation of the data is stale or
> // not.
> 
> makes me think that ICache.isValid() is just a misleading method name.  It
> implies that the data was retrieved without errors.  Maybe ICache.isCurrent()
> would be better?

+1. How about the Transaction validate methods? Would we adjust those, too? 'checkCurrent'?
Comment 27 John Cortell CLA 2010-10-20 17:41:41 EDT
Also, the more we diverge from the TCF ACMP interfaces, the more confusing things will be for those playing with both frameworks.
Comment 28 Pawel Piech CLA 2010-10-20 17:47:24 EDT
(In reply to comment #27)
> Also, the more we diverge from the TCF ACMP interfaces, the more confusing
> things will be for those playing with both frameworks.

Good point, maybe it's better not to mess with it.
Comment 29 John Cortell CLA 2010-10-22 14:10:06 EDT
I think Transaction.validate() should be public. Currently, the assumption is that all the code that is involved in a transaction will reside in a Transaction method. This is too restrictive. What is I want to write a set of utility methods for use in ACPM transactions. These methods might, e.g., exist as static methods in a class called AcpmRegisterUtils. These methods would take a Transaction object and would need to call its validate method.
Comment 30 Pawel Piech CLA 2010-10-22 16:11:42 EDT
(In reply to comment #29)
> I think Transaction.validate() should be public....
Sure, that makes sense.  I ran into a similar need in RangeCache as well.
Comment 31 Ken Ryall CLA 2010-10-27 14:47:32 EDT
Created attachment 181865 [details]
cache does validate

I did a little work on top of this and have enclosed the resulting patch.

Validate is now handled by the cache object and the results packed up into the exception if there is a problem. I turned the Transaction validate functions into static utilities that can validate a collection of cache objects.

This lets any code that accesses the cache validate and throw without having a reference to the Transaction object since it may have been created many layers away from the use of the cache.

Pawel, if you approve I will clean up the javadoc etc before committing.
Comment 32 John Cortell CLA 2010-10-27 14:52:53 EDT
(In reply to comment #31)
> This lets any code that accesses the cache validate and throw without having a
> reference to the Transaction object since it may have been created many layers
> away from the use of the cache.

Note though that every method in the call stack will need to declare that it can throw Transaction.InvalidCacheException (and CoreException). I bring this up because this patch does not entirely lift ACPM's imposition on code that wants to use it. It just makes it a little less imposing (no need to pass the Transaction object). I think that's a worthwhile improvement.
Comment 33 Pawel Piech CLA 2010-10-28 00:47:53 EDT
At 04:40 PM 10/27/2010, John Cortell wrote:
> Pawel,
>
> The problem I ran into turned out to be the result of RangeCache.getRange() 
> always returning a new cache object. Thus, the subsequent attempt of my 
> transaction will again get an invalid cache object instead of the one that was 
> instantiated on the first go-around and which is now valid.
>
> John

I committed a fix for this problem.  However, you also pointed out a bigger issue: if user calls RangeCache.reset(), then the cache objects returned through RangeCache.getRange() will still be valid.  I think a fix for this will require a fresh implementation of ICache inside of RangeCache.
Comment 34 Pawel Piech CLA 2010-10-28 00:56:05 EDT
(In reply to comment #31)
> Created an attachment (id=181865) [details]
> cache does validate
I like it... I think :-)  I'm a little nervous about getting carried away with the use of checked exceptions.  Eugene's version of ACPM had explicit "if valid <do> else return" kind of logic.  I sprinkled in the exceptions, but creating exceptions is kind of expensive because they create a stack trace.  Since we expect a lot of invalid cache exceptions to be thrown, I added the static exception instance.  So maybe before we commit to this change, we should write a performance test to see what's the penalty of creating the exception every time... and maybe compare it to not using exceptions at all also.
Comment 36 John Cortell CLA 2010-10-28 17:49:40 EDT
I did some performance analysis of DSF ACPM. My tests involves 5000 requests to a dummy, minimal DSF service method (simply sticks a boolean in the given DRM). The test uses 5000 unique IDMContexts to ensure unique cache objects are used for each call. I then make the burst of requests using three methods: ACPM, Sequence and CountingRequestMonitor. Finally, I made sure to set my ACPM cache limit high so that no cleanup is performed. The results are interesting:

ACPM has a pretty high overhead compared to the other two techniques when the transaction makes its first run (all caches are invalid at that point)

   Elapsed time using ACPM transaction = 24.484 seconds
   Elapsed time using CountingRequestMonitor = 1.032 seconds
   Elapsed time using DSF Sequence = 6.187 seconds

A subsequent run, however, shows that ACPM blows away the other two techniques, as expected since it saves the client from having to make any asynchronous calls

   Elapsed time using ACPM transaction = 0.016 seconds
   Elapsed time using CountingRequestMonitor = 0.984 seconds
   Elapsed time using DSF Sequence = 4.672 seconds

Again, keep in mind that these are metrics based on a no-op DSF service. In the real world, the DSF service will represent a big chunk of the overall round-trip, so I think the ACPM overhead (when dealing with invalid caches) is not likely to be a big actor. A 10ms DSF service time will greatly reduce the relative delta between the three times:

   Elapsed time using ACPM transaction = 76.484 seconds
   Elapsed time using CountingRequestMonitor = 51.032 seconds
   Elapsed time using DSF Sequence = 56.187 seconds

Of course, throw in cache lifetime management and the numbers for ACPM will deteriorate.

I'll be contributing the test (action) with the next patch.
Comment 37 CDT Genie CLA 2010-10-28 19:23:04 EDT
*** cdt cvs genie on behalf of jcortell ***
Bug 310345: Made validate methods public

[*] Transaction.java 1.6 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/Transaction.java?root=Tools_Project&r1=1.5&r2=1.6
Comment 38 CDT Genie CLA 2010-11-01 17:23:03 EDT
*** cdt cvs genie on behalf of jcortell ***
Bug 310345 - [concurrent] Asynchronous Cache Programming Model (ACPM) utilities for DSF. The reset() method should null out the data reference and set status to INVALID_STATUS. The former is particularly important since we otherwise hold the memory hostage. Functionally there's no difference since neither the data nor status can be queried when in invalid state.

[*] AbstractCache.java 1.5 http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cdt/dsf/org.eclipse.cdt.dsf/src/org/eclipse/cdt/dsf/concurrent/AbstractCache.java?root=Tools_Project&r1=1.4&r2=1.5
Comment 39 Felix Burton CLA 2010-11-01 18:46:25 EDT
Hi John,

Interesting results.  I am curious about how the ACPM transaction test is implemented.  Does it call validate on each of the 5000 contexts before returning from the test loop or does it only validate the next context when all prior contexts are valid?

Thanks,
Felix
Comment 40 John Cortell CLA 2010-11-02 10:27:51 EDT
(In reply to comment #39)
> Hi John,
> 
> Interesting results.  I am curious about how the ACPM transaction test is
> implemented.  Does it call validate on each of the 5000 contexts before
> returning from the test loop or does it only validate the next context when all
> prior contexts are valid?

The latter. The test is meant to reflect a worst case scenario--a theoretical transaction that makes a ton of unique requests "on the fly" (i.e., the transaction doesn't know up front what data points it will need).
Comment 41 Felix Burton CLA 2010-11-02 12:20:32 EDT
(In reply to comment #40)
> (In reply to comment #39)
> > Hi John,
> > 
> > Interesting results.  I am curious about how the ACPM transaction test is
> > implemented.  Does it call validate on each of the 5000 contexts before
> > returning from the test loop or does it only validate the next context when all
> > prior contexts are valid?
> 
> The latter. The test is meant to reflect a worst case scenario--a theoretical
> transaction that makes a ton of unique requests "on the fly" (i.e., the
> transaction doesn't know up front what data points it will need).

What about  CountingRequestMonitor and DSF Sequence, are they also getting one piece of data at the time?  If not, this does not seem to be an apples to apples comparison, right?
Comment 42 John Cortell CLA 2010-11-02 12:36:16 EDT
(In reply to comment #41)
> (In reply to comment #40)
> > (In reply to comment #39)
> > > Hi John,
> > > 
> > > Interesting results.  I am curious about how the ACPM transaction test is
> > > implemented.  Does it call validate on each of the 5000 contexts before
> > > returning from the test loop or does it only validate the next context when all
> > > prior contexts are valid?
> > 
> > The latter. The test is meant to reflect a worst case scenario--a theoretical
> > transaction that makes a ton of unique requests "on the fly" (i.e., the
> > transaction doesn't know up front what data points it will need).
> 
> What about  CountingRequestMonitor and DSF Sequence, are they also getting one
> piece of data at the time?  If not, this does not seem to be an apples to
> apples comparison, right?

Well, they differ in that respect. CountingRequestMonitor (CRM) makes a subsequent async request without "waiting" for the previous one to complete--i.e., in parallel. The initiating code then has its monitor's handler run when all requests have completed. Sequence waits, thus it works serially. As with CRM, a Sequence's monitor handler runs only after all requests have completed. In all three cases, the overall operation doesn't complete until all the requests have completed, and so in that respect, I think it's fair to compare CRM and Sequence to ACPM. 

There are some simple ACPM transactions that could be coded using CRM and/or Sequence, and some that couldn't be coded with either. My analysis merely meant to reveal worst-case performance characteristics of ACPM relative to other options (feasible or not for any given situation) available in DSF.
Comment 43 Felix Burton CLA 2010-11-02 12:49:00 EDT
(In reply to comment #42)
> (In reply to comment #41)
> > (In reply to comment #40)
> > > (In reply to comment #39)
> > > > Hi John,
> > > > 
> > > > Interesting results.  I am curious about how the ACPM transaction test is
> > > > implemented.  Does it call validate on each of the 5000 contexts before
> > > > returning from the test loop or does it only validate the next context when all
> > > > prior contexts are valid?
> > > 
> > > The latter. The test is meant to reflect a worst case scenario--a theoretical
> > > transaction that makes a ton of unique requests "on the fly" (i.e., the
> > > transaction doesn't know up front what data points it will need).
> > 
> > What about  CountingRequestMonitor and DSF Sequence, are they also getting one
> > piece of data at the time?  If not, this does not seem to be an apples to
> > apples comparison, right?
> 
> Well, they differ in that respect. CountingRequestMonitor (CRM) makes a
> subsequent async request without "waiting" for the previous one to
> complete--i.e., in parallel. The initiating code then has its monitor's handler
> run when all requests have completed. Sequence waits, thus it works serially.
> As with CRM, a Sequence's monitor handler runs only after all requests have
> completed. In all three cases, the overall operation doesn't complete until all
> the requests have completed, and so in that respect, I think it's fair to
> compare CRM and Sequence to ACPM. 

It does not seem like an apples to apples comparison to me.  In Comment 40 you wrote

> The latter. The test is meant to reflect a worst case scenario--a theoretical
> transaction that makes a ton of unique requests "on the fly" (i.e., the
> transaction doesn't know up front what data points it will need).

If the test is meant to test the performance when you don't know upfront what data the next transaction  needs then I think the CRM and Sequence tests need to make the requests sequentially as well, no?
Comment 44 John Cortell CLA 2010-11-02 13:12:05 EDT
(In reply to comment #43)
> > The latter. The test is meant to reflect a worst case scenario--a theoretical
> > transaction that makes a ton of unique requests "on the fly" (i.e., the
> > transaction doesn't know up front what data points it will need).
> 
> If the test is meant to test the performance when you don't know upfront what
> data the next transaction  needs then I think the CRM and Sequence tests need
> to make the requests sequentially as well, no?

It's not meant to be an apples to apples comparison. I'll repeat my intent:

"There are some simple ACPM transactions that could be coded using CRM and/or
Sequence, and some that couldn't be coded with either. My analysis merely meant
to reveal worst-case performance characteristics of ACPM relative to other
options (feasible or not for any given situation) available in DSF."

The comment in #40 simply meant to convey that the test Transaction simulates logic that does not know what to ask for next--again, a worst-case scenario for ACPM (and any other mechanism). The metrics where meant for people who know the limitations of CRM and Sequence. CRM can be used without knowing how *many* requests you need up front. Sequence currently requires you to know, but a variation could be designed that doesn't.
Comment 45 Felix Burton CLA 2010-11-02 13:30:31 EDT
(In reply to comment #44)
> It's not meant to be an apples to apples comparison.

Okay, I still don't get how three measurements that are not "apples to apples" can be put next to each other and compared as done in Comment #36.
Comment 46 John Cortell CLA 2010-11-02 18:02:19 EDT
Pawel, if a transaction request is canceled (by the invoker of the request), the cache object(s) involved in the transaction are flipped into the valid state and are given IStatus.CANCEL. That makes sense. A subsequent use of one of those cache objects, by the same or any other transaction, causes the transaction attempt to fail with a CoreException that has IStatus.CANCEL set in it. That, too, is ok. The problem I'm seeing is that Transaction.execute() reacts to the CoreException by trying to put the exception's status into the Transaction's RM, and that is a no-no, since a RM should be given a cancel status only if the RM was canceled (by the client that made the request), and that hasn't happened. An assertion failure occurs in RequestMonitor.setStatus(IStatus).
Comment 47 John Cortell CLA 2010-11-03 18:30:10 EDT
Thinking this through more, I have a bigger concern. A cache object that is left in the valid but canceled state because of a canceled transaction presents a serious challenge. As I mentioned, any subsequent use of that object is a guaranteed failure until the cache is reset by a source notification. Thus, the cache manager needs to make sure it doesn't dish out that object while it's in that state. The easiest approach is to oust it from its pool of available cache objects. That is going to be messy, though. First of all, when should it do so? Ideally, when the cache object's RM is canceled, but that requires that the cache manager register itself as a cancel listener with the cache object's RM, which is easier said than done. It can instead defer the ousting until the next time it goes to dish out that object--producing instead a new one that replaces the canceled one. But that assumes another transaction will come around that asks for the same information. Then consider the added challenge with RangeCache, since it will have to do its own cache object management (given today's design). I'm hoping an easier solution than any of these options. I'm just not seeing it at this point...
Comment 48 Pawel Piech CLA 2010-11-09 01:33:57 EST
(In reply to comment #46)
> Pawel, if a transaction request is canceled (by the invoker of the request),
> the cache object(s) involved in the transaction are flipped into the valid
> state and are given IStatus.CANCEL. 
Perhaps this is the problem.  The solution would be not to allow a CANCEL state in a cache object.
Comment 49 Pawel Piech CLA 2010-11-09 01:35:27 EST
(In reply to comment #36)
> ...
> I'll be contributing the test (action) with the next patch.

Hi John,
I think it would be most useful to see the tests before taking this topic further.
Comment 50 John Cortell CLA 2010-11-09 09:05:49 EST
(In reply to comment #48)
> (In reply to comment #46)
> > Pawel, if a transaction request is canceled (by the invoker of the request),
> > the cache object(s) involved in the transaction are flipped into the valid
> > state and are given IStatus.CANCEL. 
> Perhaps this is the problem.  The solution would be not to allow a CANCEL state
> in a cache object.

I think it's more than that.  The real problem is that the object is flipped into the valid state. I.e., if we remove the cancel aspect of the current behavior, we end up with a valid cache object, and that's not good either, since it hasn't really been updated. I think the solution would be to leave the cache object as is but naturally its RM would have to move into the cancel state. Maybe that's what you're suggesting.
Comment 51 John Cortell CLA 2010-11-09 09:07:08 EST
(In reply to comment #49)
> (In reply to comment #36)
> > ...
> > I'll be contributing the test (action) with the next patch.
> 
> Hi John,
> I think it would be most useful to see the tests before taking this topic
> further.

The attached patch has the test.
Comment 52 Pawel Piech CLA 2010-11-09 13:30:57 EST
(In reply to comment #51)
> The attached patch has the test.

Could you specify which patch (comment #?).  I don't see any in this bug that have a performance test.
Comment 53 John Cortell CLA 2010-11-09 14:14:10 EST
(In reply to comment #52)
> (In reply to comment #51)
> > The attached patch has the test.
> 
> Could you specify which patch (comment #?).  I don't see any in this bug that
> have a performance test.

Sorry. I meant the " Sample and test ACPM transactions" patch attached to EDC bug 328417.
Comment 54 Ken Ryall CLA 2010-11-12 13:35:05 EST
(In reply to comment #34)
> (In reply to comment #31)
> > Created an attachment (id=181865) [details] [details]
> > cache does validate
> I like it... I think :-)  I'm a little nervous about getting carried away with
> the use of checked exceptions.  Eugene's version of ACPM had explicit "if valid
> <do> else return" kind of logic.  I sprinkled in the exceptions, but creating
> exceptions is kind of expensive because they create a stack trace.  Since we
> expect a lot of invalid cache exceptions to be thrown, I added the static
> exception instance.  So maybe before we commit to this change, we should write
> a performance test to see what's the penalty of creating the exception every
> time... and maybe compare it to not using exceptions at all also.

I created a test that runs the singleTransactionTest 100,000 times. On my system I get:

Using Static Exceptions: 33.800 seconds
Creating the Exception Each Time 34.1 seconds

So not much of a performance hit for the ability to create much more simple and readable code.

I didn't invest the time to write a test that doesn't use exceptions: Exceptions are fast in modern VMs and I think they are a requirement for any sort of complex, deeply nested, transactions.

Please let me know if I can go ahead and commit this change.
Comment 55 Pawel Piech CLA 2010-11-12 19:16:29 EST
Created attachment 183064 [details]
Unit tests for measuring exceptions performance in Transaction

I created a unit tests which hopefully can give us more reproducible performance measurements.

  There are two scenarios tested:
1) A transaction that validates a single cache. 
2) A transaction that vlaidates 100 caches individually (potentially triggering 100 exceptions before completing).  

The first scenario is probably a little more typical, but it obscures the performance difference by all the synchronization logic that is needed just to perform the test.  

The test also measures 3 different designs of the Transaction class:
a) No exceptions are used (validate returns a boolean).
b) A static instance of the exception object is thrown.
c) A new instance of an exception object is created and thrown every time.

The results shouldn't be surprising, but they are significant.  For scenario 1 (single validate per transaction),  a) is twice as fast as b) and b) is 20% faster than c), not a big deal.  However, for scenario 2 (100 validates), a is 10x faster than b) and b) is 10x faster than c).  

(In reply to comment #54)
> Please let me know if I can go ahead and commit this change.

We should probably at least discuss whether to not use exceptions at all.  They are very convenient in the code, but the conventional wisdom is that they should not be used for normal control flow and these tests kind of support that.  Also these results could very well explain the big performance hit in transactions that he saw.
Comment 56 Pawel Piech CLA 2010-11-12 19:18:43 EST
Created attachment 183065 [details]
Full test results.
Comment 57 John Cortell CLA 2010-11-13 00:12:10 EST
(In reply to comment #55)
Interesting. A 100x performance penalty is not something I think we can live with. A 10x penalty is at least debatable given that that the exception approach makes for much simpler transaction logic, especially when the logic goes deeper than one or two stack frames. If transactions tend to be shallow, then I think we should avoid the exceptions altogether. Ken, what's your experience so far with ACPM and EDC? Are you seeing call stacks more than a few frames deep in the transactions you're writing?
Comment 58 Ken Ryall CLA 2010-11-15 07:20:01 EST
(In reply to comment #57)
Pawel, thanks for writing proper performance tests (I wrote my little test while waiting at a Doctor's office). I get the point about using Exceptions for normal control flow. I agree it is not ideal but of course it's an attractive method of bailing out of the transaction.

I'm currently experimenting by rewriting the stack computation code in the EDC Windows debugger to not block the DSF thread while it's waiting for results from the TCF agent. I've done it using non static exceptions, next I plan to try it with no exceptions and then without using ACPM Transcations (moving the work to another thread) so I can compare the three.
Comment 59 Pawel Piech CLA 2010-11-15 11:38:05 EST
(In reply to comment #57)
> (In reply to comment #55)
> Interesting. A 100x performance penalty is not something I think we can live
> with. A 10x penalty is at least debatable given that that the exception
> approach makes for much simpler transaction logic, especially when the logic
> goes deeper than one or two stack frames. If transactions tend to be shallow,
> then I think we should avoid the exceptions altogether. 

I agree, but even 10x slow down is a big penalty when creating a utility to be used throughout the framework.  We could leave it up to the implementers both options:

boolelan validate();
void validateChecked() throws InvalidStackException, CoreException;

But this could just result in a lot of confusion in a set of classes that are already difficult to understand.
Comment 60 John Cortell CLA 2010-11-15 11:55:32 EST
(In reply to comment #59)
> I agree, but even 10x slow down is a big penalty when creating a utility to be
> used throughout the framework.  

Well, the thing is that the 10X is not linear. It's 10X if your transaction uses 100 cache objects and only validates one at a time. Is that the typical transaction? The lower the number of cache objects used in a transaction, the lower the relative overhead of the retry mechanism. Validate multiple objects at a time, and the overhead drops further. If 95% of transactions use 10 cache objects or less and validate more than one at a time, the actual overhead of using exceptions can be said to be much less than 10X. What's why I said it becomes debatable.

> We could leave it up to the implementers both
> options:
> 
> boolelan validate();
> void validateChecked() throws InvalidStackException, CoreException;
> 
> But this could just result in a lot of confusion in a set of classes that are
> already difficult to understand.

I absolutely agree; we don't want to make DSF ACPM more difficult to understand and use than it already is. I think we should pick one approach. I don't like the 10x factor at all, but if we use the return value of methods for the retry mechanism, that can really burden the clients if their transaction logic goes deep. So, I just think we need to think this through pretty carefully.
Comment 61 Pawel Piech CLA 2010-11-15 12:01:57 EST
(In reply to comment #60)
> Well, the thing is that the 10X is not linear. It's 10X if your transaction
> uses 100 cache objects and only validates one at a time. Is that the typical
> transaction? The lower the number of cache objects used in a transaction, the
> lower the relative overhead of the retry mechanism. Validate multiple objects
> at a time, and the overhead drops further. If 95% of transactions use 10 cache
> objects or less and validate more than one at a time, the actual overhead of
> using exceptions can be said to be much less than 10X. What's why I said it
> becomes debatable.

That's true.  However, even with a single transaction test, the penalty is 2x, and this time comparison includes a thread switch to retrieve the data, which in itself is not typical with every transaction.
Comment 62 John Cortell CLA 2010-11-15 12:19:28 EST
(In reply to comment #61)
> That's true.  However, even with a single transaction test, the penalty is 2x,
> and this time comparison includes a thread switch to retrieve the data, which
> in itself is not typical with every transaction.

With a granularity of 8ms, and the 2x being 31ms vs 50ms, I'm not putting a ton of confidence in those numbers. The point about the context switch is definitely something to be taken into account. 

I think it would be difficult to argue too much for the exception approach unless we have proof that writing transaction logic is very painful without it. At the moment, I can just see it possibly being so, but who knows. My vote is to switch DSF ACPM to not use exceptions but not completely close the door on the exception approach until we've put DSF ACPM to some real world use. The TCF-debug guys seem to be living with the boolean/return approach, so there's hope we can, too. :-)
Comment 63 Pawel Piech CLA 2011-11-30 16:45:03 EST
FYI, I an example on using the cache and transaction to the DSF examples.
Comment 64 CDT Genie CLA 2011-11-30 17:23:04 EST
*** cdt git genie on behalf of Pawel Piech ***

    Bug 310345 - [concurrent] Asynchronous Cache Programming Model (ACPM)
    utilities for DSF - Added an ACPM example to DSF examples plugin.

[*] http://git.eclipse.org/c/cdt/org.eclipse.cdt.git/commit/?id=58513ccf2d81c50492e2087a87d31cd15e9e60df
Comment 65 CDT Genie CLA 2011-12-01 11:23:04 EST
*** cdt git genie on behalf of Pawel Piech ***

    Bug 310345 - [concurrent] Asynchronous Cache Programming Model (ACPM) -
    Added missing files from last commit.

[*] http://git.eclipse.org/c/cdt/org.eclipse.cdt.git/commit/?id=02c9e05cd577223b8f7a6320cf600c38b7ae8207
Comment 66 Marc Khouzam CLA 2011-12-07 16:31:25 EST
The recent commits seem to cause the API-tooling to give an error.

There is a missing @since tag, which I will fix, but there is also a non-backwards-compatible change in Transaction.java.

"The type parameter T has been added to existing type parameters for org.eclipse.cdt.dsf.concurrent.Transaction.validate(ICache<?>[])"	


We spoke about doing a major version update for DSF for Bug 341660 but we didn't follow up on it.  It is not decided yet if CDT will be 8.1 or 9.0.  If CDT goes to 9.0 anyway, we could choose to update DSF as well.  Until we know, I'm not sure what to do about this breakage.
Comment 67 Pawel Piech CLA 2011-12-07 18:15:25 EST
(In reply to comment #66)
> The recent commits seem to cause the API-tooling to give an error.
My bad.  I had an API baseline defined, but it was empty for some reason.  So I didn't even get an error that it was missing.

> We spoke about doing a major version update for DSF for Bug 341660 but we
> didn't follow up on it.  It is not decided yet if CDT will be 8.1 or 9.0.  If
> CDT goes to 9.0 anyway, we could choose to update DSF as well.  Until we know,
> I'm not sure what to do about this breakage.

I have no problem moving to 3.0, but this doesn't seem like a good enough reason.  Transaction is not used anywhere except tests AFAIK, so I just added an API filter for the change.  At worse my change will cause a warning to be generated for the client.

Thanks,
Comment 68 CDT Genie CLA 2011-12-07 19:23:03 EST
*** cdt git genie on behalf of Pawel Piech ***

    Bug 310345 - [concurrent] Asynchronous Cache Programming Model (ACPM)
    utilities for DSF (Fixed API tooling errors).

[*] http://git.eclipse.org/c/cdt/org.eclipse.cdt.git/commit/?id=e59cb540dbd31d19cb0d2fdb7c62fc7005d8de4d