Bug 304130 - Query API only allows expression queries
Summary: Query API only allows expression queries
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: 3.6   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.6 M6   Edit
Assignee: P2 Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: api
Depends on:
Blocks:
 
Reported: 2010-02-26 23:16 EST by John Arthorne CLA
Modified: 2010-03-12 08:57 EST (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Arthorne CLA 2010-02-26 23:16:02 EST
I thought we had agreed awhile ago that expression-based queries would be used as much as possible, but that simple queries without the expression language would still be permitted. With the latest API, the only legal way to write a query is via the expression language. I think this language creates a big barrier to understanding and using the p2 API - to do the simplest task with a p2 repository or profile the client must now understand a completely new domain-specific programming language with a limited range of commands. I.e., simple things are no longer simple, and it takes an expert to craft and debug expression-based queries in order to do anything.

I understand that expression-based queries are important for optimizing querying massive remote repositories (although I'm not aware of any open source repositories actually making use of this). But, not everything in p2 is remote and requires this level of optimization. When querying IProfile, PublisherResult, IProvisioningPlan, and likely several other IQueryables, optimization of remote queries is not needed and simple Java-based queries would be suitable and much easier to write. So, in order to address one particular (so far theoretical) use case we have made all query-related use cases much harder to work with. 

I would like to see us add back MatchQuery in the API so clients can write simple Java-based queries without the complexity of expressions for use cases where expressions are overkill.
Comment 1 Pascal Rapicault CLA 2010-02-26 23:54:18 EST
As expressed on the mailing list earlier today, I share the concern of understandability. 
As for adding back the MatchQuery, I don't think that this is necessary if we have a initial base of simple queries (e.g find me an IU with a given string in the id, find me an iu with a given capability, etc.) and we offer simple ways to compose them (Pipe, Compound).
Comment 2 Thomas Hallgren CLA 2010-02-27 05:09:21 EST
My biggest concern adding MatchQuery back is not that it will break remote queries etc. It's that the only way to evaluate such a query is to scan every candidate sequentially. The performance degradation that occurs when writing:

 IQuery<IInstallableUnit> q = new MatchQuery<IInstallableUnit>() {
   public isMatch(IInstallableUnit iu) {
     return "some.id".equals(iu.getId());
   }
 }

instead of:

 IQuery<IInstallableUnit> q = QueryUtil.createIUQuery("some.id'");

is brutal. A good solution might be to start with a pre-defined query and then apply a MatchIteratorFilter on the result iterator, like so:

 IQueryResult<IInstallableUnit> result = queryable.perform(predefQuery, null);
 Iterator<IInstallableUnit> iter = new MatchIteratorFilter<IInstallableUnit>(result.iterator()) {
   protected boolean isMatch(IInstallableUnit candidate) {
     // Do some additional filtering here
   }
 };

This approach can be used either because you don't want to learn p2QL or because you have filtering requirements that cannot be expressed as such.

Perhaps we should consider making the MatchIteratorFilter API as a partial replacement for MatchQuery?

Even developers with deep knowledge of p2 tend to skip over query optimization concerns (like indexing), a fact that sometimes can lead to poor performance. If I'm not mistaken, one good example of this is bug 300234 which has fixed itself over the last couple of days, just by using expressions. The sluggishness is now gone.

I have to add a disclaimer to that last sentence since although I did browse the code without finding anything, someone might have addressed this in some other way without mentioning it. Susan, did you change that code?
Comment 3 Susan McCourt CLA 2010-02-27 17:08:18 EST
(In reply to comment #2)

> 
> Even developers with deep knowledge of p2 tend to skip over query optimization
> concerns (like indexing), a fact that sometimes can lead to poor performance.
> If I'm not mistaken, one good example of this is bug 300234 which has fixed
> itself over the last couple of days, just by using expressions. The
> sluggishness is now gone.
> 
> I have to add a disclaimer to that last sentence since although I did browse
> the code without finding anything, someone might have addressed this in some
> other way without mentioning it. Susan, did you change that code?

The sluggishness appeared and disappeared without me doing anything specific.  There was so much refactoring going on in queries and elsewhere that I was never able to explain the original slowdown, and have not yet investigated.  The interface to filtered tree has remained pretty stable.  I'd be curious whether it has simply returned to its 3.6M4 performance or if it's really faster.
Comment 4 Thomas Hallgren CLA 2010-02-27 17:32:16 EST
Without putting much effort into analyzing the sluggishness, I'd say it was introduced with the addition of the CategoryQuery and CategorMemberQuery. They didn't exist in 3.6M4. Using them meant that the special use of IUMap as an index for InstallableUnitQuery was put out of commission making the queries sequential scans.
Moving to expressions and use of generic indexing rectified that problem.
Comment 5 Pascal Rapicault CLA 2010-02-28 21:50:54 EST
At this point, given that I can simply iterate over the result, I don't see a need for making MatchIteratorFilter API.
Comment 6 Thomas Hallgren CLA 2010-03-10 04:38:54 EST
John, do you mind if we close this one as a WONTFIX?
Comment 7 John Arthorne CLA 2010-03-10 15:09:47 EST
I'm still not seeing a reason to avoid having MatchQuery back in the API. There were a couple of different clients in the mailing list this week complaining about this and not knowing how to proceed with the expression query. One example was testing an IU id for a certain prefix. The suggested fix was:

QueryUtil.createMatchQuery("id ~= $0", SimplePattern.compile(prefix+'*'));

I don't know how a client would know how to write this, but imagine they did. I wrote a simple test case that ran this query over a very large local profile, and did the same query using a simple old match query. Here is the code:

IQuery<IInstallableUnit> query = new MatchQuery<IInstallableUnit>() {
	public boolean isMatch(IInstallableUnit candidate) {
		return candidate.getId().startsWith("com.ibm");
	}
};
long start = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
	profile.query(query, null);
}
long end = System.currentTimeMillis();
System.out.println("Simple query took: " + (end - start) + "ms");
	IQuery<IInstallableUnit> expressionQuery = QueryUtil.createMatchQuery("id ~= $0", SimplePattern.compile("com.ibm*"));
start = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
	profile.query(expressionQuery, null);
}
end = System.currentTimeMillis();
System.out.println("Expression query took: " + (end - start) + "ms");

And the result was:

Simple query took: 15ms
Expression query took: 32ms

So, we now have a query that is far more complicated to write, and 100% slower to run. Since it's running a profile, issues of remote execution aren't a factor. Why would a client ever want this? Who are we helping by *not* having some simple base class like MatchQuery available in the API?
Comment 8 Ian Bull CLA 2010-03-10 15:35:35 EST
I'm going to agree with John here.  While I understand the importance of RemoteQueries, we are weighing the ease of writing queries vs. the possibility to run your queries remotely. In the last day alone, I have seen three different people struggle with writing Expression Queries.

I think we should move IMatchQuery to an API package with a note that explains the consequences of using the interface.  Remember, some people are writing p2 queries against their own repositories in an entirely closed system and will not have a need for remote queries.

I think we should also strive to ensure that the platform (especially p2) uses expression queries.  This will ensure that repository providers can serve the eclipse community with highly optimized custom solutions.

Another problem, that I just noticed, is that clients will not be able to write their own "context queries".  In addition to moving IMatchQuery to a public package, I wonder if we should also provide an abstract base class Query. Again, this class would have the same note about how it cannot be run remotely.
Comment 9 John Arthorne CLA 2010-03-10 17:47:32 EST
We had a long discussion about this today on IRC and Skype between Ian, me, Susan, and Pascal. Our conclusion was roughly in line with Ian's comment: we want to strongly encourage expression-based queries, and within the platform itself use them exclusively. We want the platform to behave well in the presence of large remote or indexed repositories where the expression queries will make a huge difference. However there are a number of client use cases where the expressions are overkill (and slower), so a simple query base class for these use cases would serve them well. I have added back IMatchQuery and MatchQuery to the API package org.eclipse.equinox.p2.query.
Comment 10 Thomas Hallgren CLA 2010-03-10 19:09:41 EST
I think it is important to divide the need for filtering into two categories.

1. The filtering that can be done using a query evaluator.
2. Other types of filtering, and possibly also actions.

The #1 type filtering is commonly known (outside of p2) as queries. The query evaluator is an opaque entity that performs some magic in order to deliver results in the fastest way possible. This is what has been implemented in p2 now and it opens up for several ways forward. Remote queries is only one of them. Another is to actually use a database for the meta-data.

So far #1 has been so capable that it has been able to completely replace all MatchQuery occurrences except perhaps one which I rewrote as an initial query followed by an iteration.

#2 type filtering, as it stands today (without a MatchQuery), is a matter of using a query result. This type of filtering can be made both as the former ContextQuery (since the iterator is available) and by filtering each row. With generics, the code for such filtering is compact, concise, and easy to read.

And, no need for double iterations!  A "match all elements" query can just pass the queryable iterator on to the query result. No overhead whatsoever.

The only drawback with using #2 type filtering instead of concatenating that into #1 with a MatchQuery implementation, is that you don't get a class that represents the actual query and if the result is huge, you often build up a collector in memory. The solution for that is, as I mentioned in comment 2, to make the MatchIteratorFilter class public. It's a very generic class that filters an iterator using an isMatch() method. Nothing is collected in memory.

So here we have a mechanism to do #2 that a) introduces no overhead, and b) keeps the query evaluator free from concerns about hand coded Java queries.


(In reply to comment #7)
John wrote a query that shows that in some cases #1 imposes an overhead. I would argue that in many (most even) cases the outcome is the opposite and not seldom levels of magnitude worse.  If the query author doesn't want to learn enough to write an expression query, how can we expect that he will be able to tell when a hand coded MatchQuery will be faster?

As for complexity. We compare:

 QueryUtil.createMatchQuery("id ~= /com.ibm*/");

to:

 new MatchQuery<IInstallableUnit>() {
    public boolean isMatch(IInstallableUnit candidate) {
        return candidate.getId().startsWith("com.ibm");
    }
 };

I'm not sure everyone agrees that the former is more complex. It requires some learning for sure.
> Why would a client ever want this? Who are we helping by *not* having
> some simple base class like MatchQuery available in the API?

You are helping everyone that doesn't want to know what a MatchQuery is. Most people could do well without the IQuery as well. I think that ideally, an IQueryable should have a method:

 IQueryResult<IInstallableUnit> query(String expression, Object...args);

and then we could have predefined constants for commonly known expressions.

The IQuery IMO, is like a prepared statement in JDBC. It's mostly confusing to people. Adding an IMatchQuery and MatchQuery to the api doesn't make it simpler. It just exposes unnecessary details about the evaluator that makes it hard to improve things in the future.

(In reply to comment #8)
> ... I have seen three different people struggle with writing Expression Queries.
> 
I think that is an exceptionally good outcome given how much that has changed.

> Another problem, that I just noticed, is that clients will not be able to write
> their own "context queries".  In addition to moving IMatchQuery to a public
> package, I wonder if we should also provide an abstract base class Query.
> Again, this class would have the same note about how it cannot be run remotely.

What do you think about my proposal with #1 and #2 above?

I really want to keep the query api minimal. There are more ways then one to skin the 'keep it simple' cat. I came in late in the game and I obviously don't feel the same love for the MatchQuery. Different views to consider. For future users of p2, I'm not sure the old query concept is the most intuitive to build on.
Comment 11 Thomas Hallgren CLA 2010-03-10 19:10:41 EST
(In reply to comment #9)
> I have added back IMatchQuery and MatchQuery to the API package org.eclipse.equinox.p2.query.

That was quick. Sorry I wasn't in on the discussion...
Comment 12 Thomas Hallgren CLA 2010-03-10 19:47:11 EST
John's performance test made me wonder what had gone wrong. I've never seen 100% decrease in performance so I checked. And sure enough, I had missed one thing in the ExpressionMatchQuery which made things unnecessarily slow. Please try your queries again now. I see these numbers:

ExpressionQuery took: 2000 milliseconds
MatchQuery took: 1767 milliseconds

albeit, I don't compare a SimplePattern.isMatch() with a startsWith(). In all fairness I use a SimplePattern in both cases. The SimplePattern class could be more optimized then what it is today.
Comment 13 Thomas Hallgren CLA 2010-03-10 19:59:40 EST
I added a new test that compares an ExpressionMatchQuery with a MachQuery to the PerformanceTest class in org.eclipse.equinox.p2.tests.ql
Comment 14 Thomas Hallgren CLA 2010-03-11 02:44:15 EST
Pascal asked me to provide an example of my alternative approach. So here it is:

 IQueryResult<IInstallableUnit> everything = repo.query(
   QueryUtil.createIUAnyQuery(), new NullProgressMonitor());

 Iterator<IInstallableUnit> matchIter = new MatchIteratorFilter<IInstallableUnit>(everything.iterator()) {
   @Override
   protected boolean isMatch(IInstallableUnit candidate) {
     return true;
  }
 };
IQueryResult result = new QueryResult<IInstallableUnit>(matchIter);

The isMatch returns true because this example is part of a performance test where this approach is compared to a MatchQuery that also just returns true. A typical result from running this test:

 MatchFilter took: 762 milliseconds
 MatchQuery took: 996 milliseconds

which I contribute the overhead introduced by the MatchQuery itself when collecting the items. An approach that isn't ideal in many cases. The test is checked in. Feel free to play with it or add more tests.
Comment 15 John Arthorne CLA 2010-03-11 12:43:41 EST
The filter iterator looks roughly equivalent, except the client needs to learn a new concept rather than just writing another query. For example with a match query they can write:

queryable.query(someExpressionQuery).query(someMatchQuery) 

Instead of

new QueryResult(new MyMatchIterator(queryable.query(someExpressionQuery).iterator()))

As for performance, both approaches collect a list or set of all the results, although the match iterator only builds this during the first iteration over the results rather than upfront. Which one is faster depends on what you do with the result. If you only iterate over the result once the match iterator is about equal to match query, but doing anything else such as converting to set, unmodifiableSet, or multiple iterations, the match query becomes faster.

So, from a client point of view I don't see any advantage to MatchIterator over MatchQuery. The only advantage seems to preserving the conceptual purity of IQuery as only being expression-based, but I don't see what the big deal is here. I see IQuery being a much richer mechanism if it isn't so narrowly defined.
Comment 16 Thomas Hallgren CLA 2010-03-11 13:13:39 EST
(In reply to comment #15)
> The filter iterator looks roughly equivalent, except the client needs to learn
> a new concept rather than just writing another query. For example with a match
> query they can write:
> 
> queryable.query(someExpressionQuery).query(someMatchQuery) 
> 
> Instead of
> 
> new QueryResult(new
> MyMatchIterator(queryable.query(someExpressionQuery).iterator()))
> 
That's true. The counter argument is that you can also write:

for(iter = queryable.query(someExpressionQuery).iterator(); iter.hasNext();) {
  // do your match here
}

which doesn't require neither MatchQuery nor MatchIteratorFilter. I.e. no new concepts at all. So if we want to limit the number of concepts, why not keep things as they were before and have both internal?

> As for performance, both approaches collect a list or set of all the results,
> although the match iterator only builds this during the first iteration over
> the results rather than upfront.

Not true. The match iterator never builds a list or set. The original query might, but using the AnyQuery as in my example, it will not. It will never build more sets then a MatchQuery and hence, never be less efficient. In some cases though, as proven by the performance test, it will be more efficient.

> So, from a client point of view I don't see any advantage to MatchIterator over
> MatchQuery. The only advantage seems to preserving the conceptual purity of
> IQuery as only being expression-based, but I don't see what the big deal is
> here. I see IQuery being a much richer mechanism if it isn't so narrowly
> defined.

I see several advantages of keeping things pure and narrow:

1. We don't risk a plethora of MatchQuery implementations, some of them undoubtedly very inefficient.
2. The API is kept simple and can be made even simpler in the future.
3. If the user selects everything just to perform a match of his own, then it's very apparent that he does precisely that and that the query evaluator can do nothing at all to optimize. This means that the user will always have an incentive to use a pre-defined query to start with.
4. We don't burden the evaluator with considerations about an alien entity that it needs to consider at all times (keep in mind that queries can be piped and compound. This is highly optimized when using expressions).
5. Some queryables might use another more capable evaluator (such as a database), perhaps even capable of using an index when confronted with "id =~ /com.ibm*/".
Comment 17 John Arthorne CLA 2010-03-11 14:18:43 EST
(In reply to comment #16)
> > As for performance, both approaches collect a list or set of all the results,
> > although the match iterator only builds this during the first iteration over
> > the results rather than upfront.
> 
> Not true. The match iterator never builds a list or set. The original query
> might, but using the AnyQuery as in my example, it will not. It will never
> build more sets then a MatchQuery and hence, never be less efficient. In some
> cases though, as proven by the performance test, it will be more efficient.

Yes it does. After running your PerformanceTest the query result contains an instance of RepeatableIterator that contains an ArrayList with all 3465 values in it. It has built the same large collection of results as MatchQuery.
Comment 18 John Arthorne CLA 2010-03-11 14:26:35 EST
(In reply to comment #16)
> Not true. The match iterator never builds a list or set. The original query
> might, but using the AnyQuery as in my example, it will not. It will never
> build more sets then a MatchQuery and hence, never be less efficient. In some
> cases though, as proven by the performance test, it will be more efficient.

Also as I said, which one is better performance depends on what you do with the result. If I change the performance test line for both MatchQuery and MatchIterator from:

	assertEquals(queryResultSize(result), 3465);

to:

	assertEquals(result.toUnmodifiableSet().size(), 3465);

Then the result of five runs is clearly in favor of MatchQuery. This isn't an imaginary situation since we have quite a few callers of toUnmodifiableSet in our own code.

MatchFilter took: 829 milliseconds
MatchQuery took: 453 milliseconds

MatchFilter took: 578 milliseconds
MatchQuery took: 625 milliseconds

MatchFilter took: 626 milliseconds
MatchQuery took: 374 milliseconds

MatchFilter took: 672 milliseconds
MatchQuery took: 328 milliseconds

MatchFilter took: 562 milliseconds
MatchQuery took: 438 milliseconds
Comment 19 John Arthorne CLA 2010-03-11 14:37:13 EST
(In reply to comment #16)
> 1. We don't risk a plethora of MatchQuery implementations, some of them
> undoubtedly very inefficient.

Whether clients use MatchQuery, MatchIterator, or some custom way of gathering the set of things they need, the performance is about the same in all cases. The only difference is how complicated the client code is.

> 2. The API is kept simple and can be made even simpler in the future.

I don't see one extra class implementing IQuery as much extra complexity on the API.

> 3. If the user selects everything just to perform a match of his own, then it's
> very apparent that he does precisely that and that the query evaluator can do
> nothing at all to optimize. This means that the user will always have an
> incentive to use a pre-defined query to start with.

Currently I don't see any incentive for most clients to switch to expressions - if large, remote, indexed repositories don't come into play in their situation, either because they are not querying a repository or they are in a closed world where the repositories are known to be mirrored locally, etc. We're currently sitting here trying to debug a test failure involving a failing expression query, and we don't have the first clue about why it's failing or how to debug this. I wouldn't inflict this complexity on *anyone* who doesn't have an overwhelming need for it.

> 4. We don't burden the evaluator with considerations about an alien entity that
> it needs to consider at all times (keep in mind that queries can be piped and
> compound. This is highly optimized when using expressions).

There isn't any burden on the evaluator here. I added back MatchQuery without having to change a single evaluator, most of which eventually just call query.perform(someIterator).

> 5. Some queryables might use another more capable evaluator (such as a
> database), perhaps even capable of using an index when confronted with "id =~
> /com.ibm*/".

Yes, for any particular complex query there is a possibility that an IQueryable will exist some day to optimize it. If the client never uses such an IQueryable, it just doesn't matter.
Comment 20 Thomas Hallgren CLA 2010-03-11 19:14:52 EST
I think we talk passed each other. I'll try to clarify what I meant with my bullets.

(In reply to comment #19)
> (In reply to comment #16)
> > 1. We don't risk a plethora of MatchQuery implementations, some of them
> > undoubtedly very inefficient.
> 
> Whether clients use MatchQuery, MatchIterator, or some custom way of gathering
> the set of things they need, the performance is about the same in all cases.
> The only difference is how complicated the client code is.
> 
The "undoubtedly very inefficient" queries that I had in mind fall into two categories:

1. A MatchQuery will often compare things that are indexed but since this is completely hidden to the 
evaluator, no index will be used which means that the performance can be level of magnitues worse then it has to be.

2. I don't expect the quality of newly authored MatchQuerys to be much better then the ones authored in the past. Many of them are now replaced by expression queries that perform a lot better.

I agree that very carefully handcrafted MatchQueries that are used in limited environments will most likely perform very well.

> > 2. The API is kept simple and can be made even simpler in the future.
> 
> I don't see one extra class implementing IQuery as much extra complexity on the
> API.
> 
I agree, the API changes are not major. My objections are targeted against the implications that the extra class brings.

> > 3. If the user selects everything just to perform a match of his own, then it's
> > very apparent that he does precisely that and that the query evaluator can do
> > nothing at all to optimize. This means that the user will always have an
> > incentive to use a pre-defined query to start with.
> 
> Currently I don't see any incentive for most clients to switch to expressions -
> if large, remote, indexed repositories don't come into play in their situation,
> either because they are not querying a repository or they are in a closed world
> where the repositories are known to be mirrored locally, etc.

The fact that you get indexes, managed properties, etc. should be incentive enough I think. But that's not what I meant. I think that if you, in order to write something that does all the filtering by itself, must precede that with a query that explicitly tells you that it will return all rows, then you will intuitively consider alternatives. This is not that apparent with the MatchQuery approach.

You keep bringing up examples where a MatchQuery would cause no harm. I'm looking at the opposite. I want to get rid of a construct that very easily _can_ cause harm.

> We're currently sitting here trying to debug a test failure involving a failing expression
> query, and we don't have the first clue about why it's failing or how to debug
> this. I wouldn't inflict this complexity on *anyone* who doesn't have an
> overwhelming need for it.

I'm all for establishing a constructive dialog on how debugging can be made easier. Someone proposed that we add tracing. I'm not a great fan of that since I fear that it will introduce too much overhead. But perhaps we can do something. Your ideas are very welcome and I think this is regardless of the existence of MatchQuery.

> > 4. We don't burden the evaluator with considerations about an alien entity that
> > it needs to consider at all times (keep in mind that queries can be piped and
> > compound. This is highly optimized when using expressions).
> 
> There isn't any burden on the evaluator here. I added back MatchQuery without
> having to change a single evaluator, most of which eventually just call
> query.perform(someIterator).
> 
I was unclear. When I said burden, I meant current code in the evaluator that must cope with queries that are not expression based. You can look at:

QueryUtil.createCompoundQuery()
QueryUtil.createPipeQuery()
QLFactory.toExpression(IQuery<?> query)
WrappedIQuery

These can be simplified and the WrappedIQuery can be removed completely if everything is expression based.

Another "burden" is that an expression can be expressed as a string and recreated. Unless of course, it contains a wrapped query. All future code that wants to store a query need to cater for this as long as the MatchQuery is present. Simply because we cannot limit in what situations it will be used.

> > 5. Some queryables might use another more capable evaluator (such as a
> > database), perhaps even capable of using an index when confronted with "id =~
> > /com.ibm*/".
> 
> Yes, for any particular complex query there is a possibility that an IQueryable
> will exist some day to optimize it. If the client never uses such an
> IQueryable, it just doesn't matter.

Of course. But there's a rather significant 'if' in that sentence. What if the opposite happens? I'm coming back to that. Instead of trying to find cases where a MatchQuery would work, consider where it will cause trouble. If we can provide a solution that will cover all cases without it, why allow it?

(In reply to comment #17)
> > Not true. The match iterator never builds a list or set. The original query
> > might, but using the AnyQuery as in my example, it will not. It will never
> > build more sets then a MatchQuery and hence, never be less efficient. In some
> > cases though, as proven by the performance test, it will be more efficient.
> 
> Yes it does. After running your PerformanceTest the query result contains an
> instance of RepeatableIterator that contains an ArrayList with all 3465 values
> in it. It has built the same large collection of results as MatchQuery.

You are absolutely right (blush). In order to avoid that, the example should read:

 IQueryResult result = new QueryResult<IInstallableUnit>(matchIter, true);

Regarding the performance measurements using unModifiableSet(). Yes, that exposes a weakness in the QueryResult. Which is great because it's easy to fix. What happens is that the repeatable iterator builds a List when it iterates the first time. This List can then be used as the provider of subsequent iterators. That is of course not optimal since, as you point out, a call to unModifiableSet() is very common. It would be more optimal if a Set was used throughout.

I think the tests have shown three things:

1. The SimplePattern needs some optimizations to deal with xxx* and *xxx type patterns (using startsWith() and endsWith())
2. Iterator passing can be made more efficient.
3. The RepeatableIterator should use a Set and not a List for its backup.

Small changes that will benefit a very large amount of queries.

To everyone involved, please add more tests. Performance or function tests. Things can only get better.
Comment 21 John Arthorne CLA 2010-03-12 08:14:22 EST
(In reply to comment #20)
> I think we talk passed each other.

Yes, I think this is what's happening. Thinking about this more overnight, I think it's because we're coming at it from different perspective. I think you're looking at it mainly from the perspective of a large repository implementer. For such repositories, queries written in Java are brutal because there is no way you can do any processing on the server side, and the best you can do is return all elements to the client and pass them through the Java filter. This is the situation where MatchQuery is potentially dangerous and we want to encourage using expressions. I understand this perspective, which is why I'm not complaining that internally in p2 we are using expression queries exclusively. We want p2 and the Eclipse platform in general to perform well in the presence of such repositories.

I'm trying to look at this from the perspective of a client of the p2 API. Maybe they are writing an RCP app where the set of repositories is fixed and there is just one small repository for doing updates. Maybe they are a client trying to get information from a ProvisioningPlan, or from an IProfile. Hopefully in many cases one of the pre-canned queries from QueryUtil will suit their needs. If not, they need to learn this new language. They'll struggle to write the correct expression for their situation, they'll have trouble debugging evaluation of the expression. They'll undoubtedly curse p2 for imposing such a complex API on them. These are the clients who will benefit from the simplicity of MatchQuery.

In the end I don't think I'm proposing something unreasonable: just leave MatchQuery in the API, and let the *client* decide what is most appropriate for their product use case. If they start out with MatchQuery and then discover they have a need for using large remote repositories, they will quickly see the performance bottleneck and they can make the switch to expressions. The implicit assumption here that clients aren't capable of deciding what is the most appropriate API to use for their situation is a bit insulting to our API clients.
Comment 22 Thomas Hallgren CLA 2010-03-12 08:57:00 EST
(In reply to comment #21)
> In the end I don't think I'm proposing something unreasonable: just leave
> MatchQuery in the API, and let the *client* decide what is most appropriate for
> their product use case.

You cannot assume anything about where MatchQuery implementations will be used. A much safer assumption is that if we have it in the API, it will eventually be used in the wrong places. I really don't want to give the client that choice for no apparent reason.

We have a good solution for anyone that dislikes the expression language. Use pre-canned queries or query for all and iterate. It's plain, efficient, and simple. No magic whatsoever, and it's very apparent what happens. A MatchQuery gives a false impression that some magic is going on that improves on that situation.

> The implicit assumption here that clients aren't capable of deciding what is
> the most appropriate API to use for their situation is a bit insulting to our API
> clients.

I think the assumption that our clients are incapable to comprehend the query language is a bit insulting. We need good documentation, that's for sure.

I would very much appreciate your help describing how to use the expressions in ways that will lower the threshold (bug 305462). Adding a parallel way of doing it to the api is telling our clients - yeah, we have this new query language, but it's so complex that in order to protect you from it, we also provide this other way of doing queries.That is an approach that I'm convinced will do more damage then good to the relationship with our clients.