Community
Participate
Working Groups
3.0 JFace viewers (namely the TableViewer and CheckBoxTableViewer) should support SWT.VIRTUAL style. When this style is specified, SWT.SetData callbacks are fired to fill the items on demand. If SWT.VIRTUAL is specified on the table passed to its constructor, the viewer should avoid populating the table eagerly, but add an SWT.SetData event handler to do the work as needed. This should work with the existing content provider API, but this requires determining all model elements first, even if the SWT items are not eagerly updated. The existing content provider API can be used to determine the number of model elements, and configure the table appropriately (see Table.setItemCount(int)). Should also extend content providers to allow the number of elements to be requested independently of the model elements themselves.
Candidate uses: - problems view - search results (when in table mode)
Adding Stefan as this will likely affect some problems view work he is looking into.
Created attachment 14195 [details] Patch for work so far Here is a quick patch for what I have done so far. I am going to hold off releasing anything as we are hoping to revisit the Viewer story more completely in 3.1 as there are quite a number of things we wish to support more generically.
*** Bug 57795 has been marked as a duplicate of this bug. ***
Tod: I've been working on a patch that moves the sorting and filtering into a background thread -- it also speeds up insertions drastically. I'm still trying to get removals working fast enough, but sometime next week I'll try to attach a patch here that implements the algorithm to fix this one, bug 62002, bug 21284, and bug 28789...
I've had use cases where I don't really know the number of elements in the table or tree or text file until I go out and read them, which I don't want to do because it could take an arbitrarily long amount of time. I'm hoping this kind of case will be addressed by the new API. And yes, that makes scrolling... interesting. In other programs we've used either a fixed size thumb or a thumb that gets smaller and smaller as you scroll down (to some minimum size). And we left some slack at the end so there was room to click to page down under the thumb.
In Delphi from database tables, when it does not know exact number on records it just shows thumb in two states: top (when we are on first record), center (when we are on some other record). When you do page up, it scrolls on one page up, but stays thumb stays in center (if begin or record set is not finished). Once dataset is fully fetched, it starts to show also bottom state and any intermediate states.
Created attachment 14991 [details] Patch for o.e.ui.ide Adds a concurrent content provider for the ConcreteMarkers used in the Problems view (for testing)
Created attachment 14992 [details] o.e.jface patch Rough patch for JFace -- adds the core classes needed for incremental background sorting and SWT.VIRTUAL support. Does not include any changes to TableViewer, but includes all the algorithms and data structures for the concurrency bits. To use this stuff in TableViewer, do the following: 1. Create a ConcurrentTableManager as a member variable. (It encapsulates all the threading issues and nifty data structures that make this work). 2. Implement IConcurrentTable in TableViewer or an anonymous inner class, and use it to get callbacks from ConcurrentTableManager (all the callbacks will happen in the UI thread). 3. New content providers will need to implement IConcurrentContentProvider. This provides change notifications and uses a "requestUpdate" method which is essentially an asynchronous getElements. Content providers can read their input in a background thread and report whatever is known so far using a sequence of change notifications. 4. A standard IConcurrentContentProvider can be implemented to wrap old-style structured content providers -- it won't provide any change notifications, but TableViewer can delegate its add/remove/update methods to the versions on ConcurrentTableManager, so should be backwards-compatible. 5. TableViewer won't need its map of elements onto TableItems -- the new callbacks will include the TableItem index when updating an item, so no map is needed... but it should keep a map of elements onto the last known result of all the label/colour/font/whateverProviders -- otherwise there will be increased overhead due to label providers (the new algorithm avoids slow insertions/removals by overwriting the visible TableItems rather than inserting or removing items). This is enough to implement and test the changes to TableViewer, but there is still lots to do: - Lots of javadoc issues, unused code to clean up, etc. - No background filtering yet (currently, filtering needs to be implemented in the content provider -- we can probably worry about this in version 2. :-) ). - Scroll position needs some finessing. Currently, it remains on the same line (by index) when items are added/removed, but it should probably follow whatever item is at the top of the set of visible items. BTW, I've done some initial benchmarks and this is FAST. Check out the next attachment.
Created attachment 14993 [details] patch for o.e.ui.tests Adds tests and examples. In particular: - TestConcurrentTable is an example TableViewer-like class that demonstrates how to talk to the concurrency stuff. (It only has one column which displays the objects' toString() values) - ListContentProvider is an example IConcurrentContentProvider that is essentially a list-with-change-notifications - ConcurrentTableTestView is a view containing a concurrent table. It allows large numbers of random elements to be inserted/deleted interactively (to remove the 400-element limit, you need to comment out the line that says "table.setLimit(400)") - TestMarkerView is a view that displays all markers in the workbench using a concurrent table. It is somewhat faster than the problems view and can display new markers as they are discovered by the build rather than waiting.
Tod: I think this includes everything we've been discussing... but updating TableViewer itself is still going to be a scary task. Good luck. :-) Please let me know if I've overlooked anything... if everything's okay, I'll continue testing and cleaning up what I've attached here.
Regarding comment 6 and comment 7: This algorithm should be able to handle large tables much better than the current TableViewer. The content provider can search concurrently, and report records to the TableViewer as it finds them. Insertions will be very fast. Performance-wise, it should work quite well... but the scroll bar won't behave as you describe. As elements are inserted below the currently-visible range, the scrollbar will tend to move up. Note: these algorithms assume that the viewer is doing some sort of sorting. If the viewer was unsorted (ie: if the content provider is determining the order), then a faster algorithm would still be possible.
It looks like you're assuming a model where you start off displaying the top part of a table, and there is a background thread that is streaming data into the table as hard as it can. That's not necessarily going to be the case. I'd rather not read data that is not going to be needed. You can stream a screen full or two in, but don't try to slurp up the whole data set. Also it looks like you're trying to eventually have the whole data set in memory. ConcurrentTableUpdator.setTotalItems() does a "new Object[newTotal]" and a "System.arraycopy" to reallocate the knownObjects array. Two problems with this. First, suppose setTotalItems is called a bunch of times, each time with a total one more than the last time. You've got an O(n^2) algorithm then. Second problem, suppose you have a billion possible items in the table, and you've got a really fast data source, and the background thread is feeding you new rows like crazy. You're going to blow out your client memory limit very quickly. I think of virtual tables as showing you a window on your data, not as a device for lazy loading. There's a subtle difference. Before anybody says, showing a billion rows in a table is a bad user interface, I agree. Here's a use case. Say you have a database application that allows the user to enter a query and see the results. The user, perhaps unintentionally, enters a query that returns a billion rows. You don't want the system to lock up or beat up the database trying to return all those rows. Instead, you want to show the first so many of them and quiesce until he scrolls down to fetch more. By not eating the CPU, Memory, or Network, the system remains responsive in the typical case that the user will want to change the query and resubmit. BTW, Updator should be renamed Updater for consistency with spelling elsewhere.
Ed, thanks for taking the time to examine the code. If you're interested in more details about the algorithm, see bug 62002. The code does not assume that you are looking at the top elements (it will always zero in on whatever range you are currently viewing). However, you are correct in that it assumes you have a data source that is reading in items and dumping them into the viewer as fast as it can. It would be possible to only read a subset of the data from the data provider if the viewer was not sorting the input it got from the content provider. In that case, the Nth item from the content provider is also the Nth item in the viewer, and the viewer can ask the content provider specificially for the items it needs... but that is not the purpose of this algorithm. This algorithm is specifically for the case where the viewer is sorting or filtering the input coming from the content provider. In this case, there is no relationship betwen the order of visible items and the order of items provided by the content provider. For the case where you have an unsorted viewer with no filter, what you really need is a way to replace ConcurrentTableManager itself (this is the object that gets asked for items in a certain range and can choose not to compute items outside that range). It would make sense to write a common base class or interface for ConcurrentTableManager that clients can implement if they want to do this type of thing. However, it would probably make sense to implement this in a completely new TableViewer (since the current TableViewer, in general, uses sorting and filtering and would have a different kind of content provider). Basically, what I'm suggesting is this (excuse the stupid class names): - NewTableViewer takes its input from a SubsetContentProvider. SubsetContentProvider can return items in a particular range, and can be subclassed to handle situations where the content provider doesn't need to compute the entire set of elements in order to know the elements in the given range (as with your database example). - The current ConcurrentTableManager subclasses SubsetContentProvider. It talks with an IConcurrentContentProvider, which returns unordered (and unfiltered) items, sorts them, and filters them. Clients will subclass IConcurrentContentProvider in all the situations where they would currently subclass StructuredContentProvider (but the new version will give them background sorting and filtering for free). - The current TableViewer will use a NewTableviewer and a ConcurrentTableManager internally to do pretty much what it does now (only much faster). I *think* this should handle both your use-cases and mine quite efficiently. SubsetContentProvider (the thing you'd subclass to implement your content provider) would implement the methods setRangeOfInterest(int, int) and refresh(). It would send its output asynchronously to something very much like ConcurrentTableUpdator... However, all this doesn't need to happen at once. We could use the new classes in their current form (with package visibility) to make the current TableViewer much faster... then we could expose new API allowing ConcurrentTableManager to be replaced by a new content provider interface. Does this make everyone happy? :-)
Well I don't want to argue too much since you have code in hand. But I don't think a subsetting data provider is incompatible with sorting and filtering, especially with filtering. Clearly subclasses are the solution as you suggested but I don't want to have a special version of TableViewer, TreeViewer, and TableTreeViewer just for this. The strategy pattern of content providers should be sufficient to handle all these use cases. One content provider eagerly fetches all the data up front. One content provider gives you just enough data to get you started then eagerly fetches the rest in the background. And one content provider gives you just enough data to get you started, maybe a little read-ahead, and quiesces. The view is updated with whatever model data the content provider provides, modulo any filters and sorting that the view applies.
Re: comment 13 You can actually handle the "billion rows" case with the current algorithm using the element limit in ConcurrentTableManager. You set the element limit to... say... 100, and leave your content provider running in the background with low priority querying the database for all billion rows. The viewer will apply its sorting and only display the top 100 rows to the user (allowing everything else to be garbage collected as it arrives). The user can then change and resubmit their query, which will cancel the running query start computing the new elements. The viewer will only hold onto 100 elements for any length of time, eliminating the danger of OOMEs (and making the table easier to navigate).
So I could stream a lot of data from the data source only so it could be discarded? Hmmm... nah. That's kind of like HTTP streaming of media files, except if I'm viewing the start of a movie it's likely I'm going to watch the movie all the way through. But, if I'm viewing part of a huge data set it's UN- likely I'm going to scroll through all those rows. Maybe your ConcurrentContentProvider would be more aptly named StreamingContentProvider.
Re: comment 15 I'm not trying to argue: I just want to be certain we understand your use cases and have a good plan to solve them. If that's all you're looking for, then you can already handle these use cases with the algorithm I've attached here. The one you can't do is the content provider that provides enough data to get started with read-ahead, sleeps, and is then __woken up__ later when the viewer needs more information. The question is: how does the viewer know that it needs more information? As long as it isn't sorting or filtering, it can simply ask the content provider directly for the items at particular row numbers (as suggested in bug 21284)... but if it is permuting (AKA sorting) the items in the viewer then what question can it ask the content provider in order to request more elements, and when should it ask that question? It can't identify elements by their sorted index because the content provider has no knowledge of the sort order, and identifying items by any other means wouldn't allow the viewer to request items currently in the visible window. I believe comment 12 would solve this use-case, but it would require two different content provider interfaces (for ordered and unordered input). You seem to be suggesting that one content provider could do both... if so, what would it look like?
Re: comment 17 If you're sorting the data set, you can't know what the "first" 100 items are unless you've seen the entire data set. The fact that you will discard and garbage collect most of the rows doesn't mean you don't need to see them in the first place. Please clarify: are you saying you know of an algorithm that will allow us to identify the items that will be visible in a particular range after sorting is applied and request them from the content provider before we know the full set of items? (If so, please specify details) Or are you saying that sorting isn't interesting for your viewer and you want to take advantage of this in order to only request items from the content provider as they become visible?
Bug 21284 sounds like what I had in mind, thanks for the pointer. Structured viewers would query the content provider for ranges of elements and children, keyed by the unsorted, unfiltered row/child number. If there was a sort active, the viewer would have to keep querying until it had the whole data set or the action was aborted. If there was not a sort active then it could quit calling the content provider as soon as it had enough data to display (via LabelProvider and setData) all the currently visible elements. If the model changed, the content provider would call the viewer like it does now to make it update items that may or may not be currently known. Maybe that's what you're doing but I'm having a hard time looking at the code and relating it to the existing TableViewer and IStructuredContentProvider. See also org.eclipse.target.internal.ui.DeferredContentProvider. Just thinking "out loud" here but for more efficiency I've often thought that we needed something like an "ordered content provider" that pushes the sorting functionality from the viewer down into the content provider. Same goes for filtering I guess. For example, if the content is coming from a database, it's a simple matter to arrange for the data to come back in a certain order or with certain rows removed. Another example, if you're viewing a list of all compiler warnings from all projects in a workspace and you're filtering on a single project there's no point of even touching those other projects to enumerate the warnings and then throw them away. If you had one of these ordered content providers then you could query it by sorted/filtered row/child numbers which would be easier on the TableViewer.
Note that Stefans work is more related to Bug 60117 although it may also be helpful in the virtual case.
I'm going to leave this in its current state for a bit: I need to catch up on other PRs.
Created attachment 15187 [details] Updated table viewers and test suites Here is a zip file with an updated TableViewer to handle Stefans required changes and an updated version of ConcurrentTableTestView that uses it. The main issue I have with this is the need for listening to the scroll bars - we should decide on what the TableViewer needs to do to support this if it is required - especially in the case where there are going to be updates because of the SWT.VIRTUAL case. I am going to look into the virtual case with this in mind and continue from here.
Stefan (and everyone else) I am going to reapply these patches and move the concurrency discussion to Bug 60177 as these turn out to be fairly seperate problems and I want to keep discussion focussed here on the virtual issues which are somewhat different from Stefans work.
Sorry - Bug 60117
Created attachment 15205 [details] A version of the TableViewer to support virtual I am obsoleting these other patches as they are now in Bug 60117. I am attaching a work in progress version of TableViewer that supports SWT.VIRTUAL but I will not be releasing it until Bug 76391 is dealt with and I have been able to test this suffeciently.
Sounds good, Tod. I should have put these patches in bug 60117 to start with. BTW, why do you need to fix bug 76391 to get SWT.VIRTUAL working? The SetData callback shouldn't be necessary since you can use the current scroll position to determine (much more efficiently) what needs to be populated.
The lookup is O(1) for the remaining entries so whether I do it myself or after the callback there should be no difference in effeciency. The issue with doing it myself is that all of the TableViewer code currently relies on there being a non null data (the Object we are representing) in all of the entries. It is a much lower risk (and more effecient) fix for me if I don't have to add null checking to all of the API that uses the elementMap.
I have this ready now - it turns out I won't need any changes from SWT as it really only affected the new test suites I wrote. In the interest of safety I am not going to release this until tomorrow (October 19) so that it does not affect the integration build.
Please explain: how can you implement SetData in O(1) time when indexof(...) runs in O(n)?
TableViewer.java line 718 >items[i].setImage(new Image[table.getItemCount()]);//Clear all images seems to be a typo. I think >items[i].setImage(new Image[table.getColumnCount()]);//Clear all images makes more sense.
It is a hashtable lookup. There would have to be a lot of collision in the hashcodes for O(n)
I have released the virtual support with some new test suites into HEAD.
Verified in 20041102
Tod - is it possible to patch Eclipse 3.0.1 with the new TableViewer code? Does this feature depends on 3.1 functionality of other classes / plugins?
I would be reluctant to as I would like it to get more use first and have some feedback from SWT (who are doing this now). Usually we only patch a release with important fixes to defects rather than features.