Re: [p2-dev] query performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [p2-dev] query performance

From: Mengxin Zhu <kane.zhu@xxxxxxxxxxxxx>
Date: Wed, 10 Aug 2011 14:12:22 +0800
Delivered-to: p2-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/p2-dev>
List-help: <mailto:p2-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/p2-dev>, <mailto:p2-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/p2-dev>, <mailto:p2-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11

Hi Thomas,

Thank you for analyzing the bottleneck of my query.

RequiredCapability doesn't support 'namespace' member. So I used belowquery expression for testing,

"$0.traverse(set(), _, { cache, parent |parent.requirements.unique(cache).collect(rc | select(iu | iu ~=rc)).flatten()})"

However the result is amazing. It costs much more time to query thegreat number of IUs. I updated the document and benchmark spreadsheetlisted below. What do you think?


Thanks.

Mengxin Zhu


On 08/09/2011 04:04 PM, Thomas Hallgren wrote:

Hi Mengxin,
I took a look at your query. Here are some comments and hints thatmight help you speed things up:
1. You use the method toSet() on the result instead oftoUnmodifiableSet(). This will yield an extra (and probablyunnecessary copy). Please use toUnmodifiableSet() where possible.
2. You create an array from the incoming collection before you pass itto the query. You can avoid this extra copy by passing the Collectiondirectly.
3. The statement:

"select(iu | $0.exists(iu2 | iu2.requirements.exists(r | iu ~= r )))"
suggests that you want to find all IU's that are required by some IUin the incoming collection. That's a one step traversal. All those newIU's will introduce new requirements and in order to find them all theway the planner does, you must continue evaluating this query until nomore units are found. A better way to resolve this is to use atraverse query:
"$0.traverse(parent | parent.requirements.collect(rc | select(iu | iu~= rc)).flatten())"
If $0 is a large collection then it's likely that an initial 'unique'of all relevant requirements will improve performance significantly:
"$0.traverse(set(), _, { cache, parent |parent.requirements.unique(cache).collect(rc | select(iu | iu ~=rc)).flatten()})"
To really speed things up, you might also want to prune the uniquelist of requirements to only include those that have the desirednamespace:
select(rc | rc.namespace == 'org.eclipse.equinox.p2.iu').
"$0.traverse(set(), _, { cache, parent |parent.requirements.unique(cache).select(rc | rc.namespace =='org.eclipse.equinox.p2.iu').collect(rc | select(iu | iu ~=rc)).flatten()})"
If you try this out, please publish your results.

HTH,

Thomas Hallgren



On 2011-08-09 09:13, Mengxin Zhu wrote:
I find the performance of using query language has great downgrade ifquerying a repository with a great number of IUs. I'm not surewhether it's a common case, at least it does in my case.
I already have a list of non-installed root and group IUs, I want toquery the non-installed IUs from repository that are required bythose root and group IUs.
I compare the different three methods to query different size of IUs.They are using Provisioning planner to resolve and query the requiredIUs, query language and a way to use for loop.
I publish my methods as a document[1], and query benchmark as aspreadsheet[2].
Actually I prefer to use query language, the code looks like muchcleaner. Does anybody know why query language is quite slow to handlewith the great number of IUs, or how to tune my query expression?
[1]https://docs.google.com/document/d/1wfnr2d2TF4vIYDCMmWPuYd0kQA32WiWaXTiaCoJovho/edit[2]https://spreadsheets.google.com/spreadsheet/ccc?key=0AmxBoq-n1R8KdEZ4czdpQk9lMEpvR3pUbzZaZzltTGc
_______________________________________________
p2-dev mailing list
p2-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/p2-dev

Follow-Ups:
- Re: [p2-dev] query performance
  - From: Thomas Hallgren

References:
- [p2-dev] query performance
  - From: Mengxin Zhu
- Re: [p2-dev] query performance
  - From: Thomas Hallgren

Prev by Date: [p2-dev] AUTO: David Klein is out of the office (returning 08/15/2011)
Next by Date: Re: [p2-dev] query performance
Previous by thread: Re: [p2-dev] query performance
Next by thread: Re: [p2-dev] query performance
Index(es):
- Date
- Thread

Breadcrumbs