Summary: | Need advice for finding duplicate Types | ||||||
---|---|---|---|---|---|---|---|
Product: | [Eclipse Project] JDT | Reporter: | Dirk Baeumer <dirk_baeumer> | ||||
Component: | Core | Assignee: | Kent Johnson <kent_johnson> | ||||
Status: | VERIFIED FIXED | QA Contact: | |||||
Severity: | normal | ||||||
Priority: | P3 | CC: | martinae, philippe_mulet | ||||
Version: | 3.1 | Keywords: | performance | ||||
Target Milestone: | 3.1 M7 | ||||||
Hardware: | PC | ||||||
OS: | Windows XP | ||||||
Whiteboard: | |||||||
Attachments: |
|
Description
Dirk Baeumer
2005-02-02 09:39:15 EST
"We search for duplicate types for all types that are used unqualified in the source code to see if we have more than one match." Why? You are not going to remove every single type import, are you? From the bindings you should know which types are resolved to be from which packages & as a result from which existing on demand imports (if any). I can see us adding the multi-declaration query, but I'm wondering how often you need to call it. Also on the performance side, do you create a DOM AST for every file? And change/save every file? If these 2 operations take 2 seconds on average then does it matter if the search takes 200ms vs. 500ms? I updated the import rewriter to be able to switch between all type cache (of all types in the workspace) and use of type search in the project. I observed the same performance numbers as Dirk in comment 0. On a real world example that means: Organize imports on all (89) types of org.eclipse.compare (total 16497 types on class path) with all types cache (empty on startup): 14 s (initial search 4 s, then av. 0.15 s) with project search: 123 s (av. 1.4 s per search in project) The algorithm is: Given a set of type container names that will be star imported { pack1, pack2 } and all the simple type names that are in these containers and are referenced in the source code {A1 (from pack1), A2 (from pack1...), B1, B1 }. Iterate over all types in the project and find if there is another type that has one of the simple type (e.g. A1} name AND a container in our set but which is not the one we reference { e.g. pack2.A1 } I could give you the set of container names and/or the set of type names. But I wonder if that would be much faster than having me looking at all types myself. There is no real work done in the result call back, no element creation. The only solution I see that there is something kind of cache of all types in the project. I tought code assist uses something like that? In my example in comment #2, there is an AST build for each of the files (89 source files) and the file is changed/saved every time. You see that the search can't compare with the time the AST build takes. Created attachment 17632 [details]
the algorithm in code
"Iterate over all types in the project and find if there is another type that has one of the simple type" Why? It seems to me that you only need to search for collisions when you remove a single type import (because you added an on-demand import to cover it). So for each single type import being removed, you must check that it does not exist in any of the other on-demand imported packages. I see no reason to ever ask for every type defined in a package or project. To find if there is another type with the same simple name I currently iterate over all the types I got from the search engine (once), unless I want to do a search for each type. This is needed for all referenced types in the (new) on-demand package. That's mostly more than one. It even has to happen for all type references in all other (unchanged) on-demand packages. I would be very surprised if several searches are faster than one. But I can implement that if you think its worth it. I definitely don't expect it to be faster (now). But if you could get to the point that you're asking for declarations of: - all referenced types in the new on-demand package - all referenced types in the other unchanged on-demand packages AND the scope is limited to these on-demand packages and not the workspace or the project... Then I think we can combine these type declaration queries into a single query that will have decent enough performance. The problem with the scope is that it has to be constructed out of IJavaElements. I don't have the Java elements for the package and the types (in the case of type containers). There's potentially more than one IPackageFragment for the same package (e.g. in the multiple source folder scenario). OK - well then maybe we need to define a multi-searchAllTypeNames that lets you pass in mutiple simple type names AND multiple qualified package names. The scope would be the project. The answer would be every type declaration whose simple type name is in the simple name set & whose package is the qualified set. Something like: searchAllTypeNames( final char[][] packageNames, final char[][] typeNames, final int matchRule, int searchFor, IJavaSearchScope scope, final ITypeNameRequestor nameRequestor, int waitingPolicy, IProgressMonitor progressMonitor) Great, if with 'package name' you mean 'type container name'. I'm wondering, can the search engine really speed up knowing all that? Or would it just iterate over all types as well and do the same checks I just did? The would just save the call backs and they can't be so expensive. It will do it as a single query so each index file is only opened once. So assuming the time is ok for a project search of a single type in a qualified package/container... then the combined time for several types in several containers should be a bit more but not much if its still a single index pass. We would like to limit the number of index files further but we'll look into limiting the scope. At the very least, say there are 3 containers but 15 type names: its likely worth it to see if the containers exist in the index file before checking for each type name. So how is this? For organize imports, we add a call to SearchEngine: void searchAllTypeNames( final char[][] qualifiedNames, final char[][] typeNames, final TypeNameRequestor nameRequestor, int waitingPolicy, IProgressMonitor progressMonitor) It performs a case sensitive exact match to find all types whose name is included in typeNames & whose packageName + enclosingTypeName is included in qualifiedNames. It answers the matching types using: TypeNameRequestor.acceptType(modifiers, packageName, simpleTypeName, enclosingTypeNames, path) For the open type dialog, we add a couple calls to SearchEngine or a new object such as GlobalTypesCache: void searchAllTypeNames( final char[] typeName, int numberOfResultsToValidate, final TypeNameRequestor nameRequestor, int waitingPolicy, IProgressMonitor progressMonitor) It performs a case insensitive pattern match to find all types whose name matches typeName. It will call with first numberOfResultsToValidate matching types (sorted by typeName): TypeNameRequestor.acceptTypeName(modifiers, packageName, simpleTypeName, enclosingTypeNames, resourcePath) with remaining matching types, it will call: TypeNameRequestor.acceptPossibleTypeName(modifiers, packageName, simpleTypeName, enclosingTypeNames, indexPath) To validate that a possible matching type name still exists, you would need to find its resourcePath by calling: String locateType( char[] typeName, char[] packageName, char[][] enclosingTypeNames, String indexPath) The resulting path string would be null if the type can no longer be located. Actually change that, the 2 methods on TypeNameRequestor would be the existing method: TypeNameRequestor.acceptType(modifiers, packageName, simpleTypeName, enclosingTypeNames, resourcePath) & a new method: TypeNameRequestor.acceptPossibleType(modifiers, packageName, simpleTypeName, enclosingTypeNames, indexPath) Kent, up to now I only looked into getting the open type dialog implemented on top of search. However, we expose some of it functionality as public API. Furthermore we know more about the pattern (for example if it is exact or a pattern containing wild cards). So I would like to extend the API as follows: - a flag to control if the match should be case sensitive or case insensitive and if we want to do a pattern match. I know this information from the pattern the user typing in and if it speeds up things I can provide the information. - same for a package name. The user can provide a package name in the dialog which I could pass to the search query if the information can be used to filter things directly in the search engine. - the public API allows contraining the elements in the dialog to a certain IJavaSearchScope. Regarding case sensitive and pattern match I can refilter the results. Same for the package name since I get enough information in the call back. The scope looks a little bit more compilcated. To decide if a type is enclosed in a scope I need its path to call IJavaSearchScope#encloses(String resourcePath). Since I don't have this information for non validated entries I have to postpone the validation until I present them. This might for a scope that aonly matches a handful of types cause performance problems since I have to validate them all as soon as they get visible. Kent, is there a way to honor the scope during the search ? "we expose some of it functionality as public API" what does it look like? An example is JavaUI#createTypeDialog(Shell parent, IRunnableContext context, IJavaSearchScope scope, int style, boolean multipleSelection, String filter). Here a client can pass a search scope and a filter. The filter can contain wildcards and package components (although this isn't speced this way we treat it this way). Martin, I released the call in SearchEngine to handle multiple qualifications & typeNames. SearchEngine.searchAllTypeNames( final char[][] qualifications, final char[][] typeNames, IJavaSearchScope scope, final TypeNameRequestor nameRequestor, int waitingPolicy, IProgressMonitor progressMonitor) Just tried the new API. Good performance. A average search takes around 200ms per file, resulting in a good speed for doing organize imports on the whole project. I would suggest I switch to the new API! Martin, are you using the project scope to limit the number of index files that need to be checked? Yes, a project scope is used. This PR is now done since the AllTypesCache has been removed. We added the new API for organize imports. The Open Type dialog benefits from the caching of the category tables. Verified in I20050510-0010 |