[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [imp-dev] Suggestions for improving the scheduling of analyses
|
Stan Sutton wrote:
Hi All,
I'd like to begin to do something to improve the mechanism by which
analyses are scheduled (and run) in IMP. I'm talking about the
analysis scheduling/invocation mechanism that is currently part of the
UniversalEditor, where "analyzers" ( which might do anything) are
invoked whenever the source text is (re)parsed.
Currently the analyses are invoked in
UniversalEditor.ParserScheduler.notifyAstListeners(..). Each listener
(which invokes an analyzer) has a required analysis level that
indicates the level of analysis on which its associated analysis
depends. Nominally, an analysis is invoked only if the level required
by the analysis is less than the current level.
We recognize that this approach, while a step in the right direction,
is significantly. As noted in comments in IModelListener, which
defines the analysis levels:
// BROKEN!!!
// The following has no notion of the scope of analysis. E.g.,
providing a language
// service may require analysis within a much wider scope than a
compilation unit
// (such as whole program analysis). Also, analyses don't really
form a linear order.
And there are other concerns, for example, that a given analysis may
depend on more than one type of analysis, and that the costs and
benefits of an analysis are not considered.
A few points:
- Most compilers are not structured so that clients can easily select
the analyses
they'd like to have performed. E.g., you can't typically tell a
compiler "I want
to go only as far as type-checking." Moreover, even relatively
well-structured
compilers like Polyglot, which has an explicit goal-directed
scheduler, and
does allow you to specify the compilation goal, won't typically allow
you to
take an AST that's previously been processed as far as one goal (say,
name
binding), and later submit it for additional processing (say, type
checking).
- It may not be easy to figure out how far the compiler got, if it
failed to get
as far as you asked it to go. E.g., type-checking failed, but did
name binding
succeed?
- In many languages and compilers, the analyses are intertwined, either for
efficiency, or frequently, in order to deal with apparent cycles of
reference
and so on. As a result, compilers by necessity interleave name
binding in
one part of the source with the type checking of another part.
Because of the above, it's not easy to use most compilers to actually
schedule these kinds of front-end analyses. For this reason, the current
framework stops roughly at the point of identifying how far the compiler
got.
But even this isn't a particularly good design point, since it's possible
(common, even) for the compiler to succeed in type-checking one part
of a given class, but fail in some small part (say, one expression). Since
we don't want the IDE to become globally crippled by a local failure,
we want the compiler to proceed as far as it can everywhere it can,
and fail in as fine-grained fashion as possible. Thus, it doesn't really
make sense to have a single indication per source file of how far the
compiler got.
At the same time, (for the sake of interoperability with existing compiler
front ends, an important IMP design requirement) it's not feasible to
require that compilers behave in such a fine-grained manner.
As far as the cost of analysis is concerned, many of the most costly
analyses also have notoriously high variance in cost. This makes it
extremely hard to predict the cost of any given analysis instance.
People thus tend to only perform such analyses when needed for
explicitly-requested functionality.
So I'm tempted to say we shouldn't try to extend this part of the
framework.
That said, the new analysis framework we've been working on (in
org.eclipse.imp.pdb, not to be confused with org.eclipse.imp.analysis,
which contains the beginnings of a less general framework that we
may abandon) exposes such inter-analysis dependencies explicitly,
based on the types of their results. The framework requires that
analyses specify the type of data that they produce, which permits
the framework to locate the appropriate producers for a bit of
data when requested by a client. Clients in this case could be
visualizers, or other analyses, and simply ask for a piece of
data with a particular type and a particular scope. If the data
requested doesn't exist yet, the framework schedules the
appropriate analyzer to produce it.
In other words, I think there's significant overlap between your
proposal and what we've done in org.eclipse.imp.pdb. Take a
look at IAnalysisDescriptor, AnalysisManager, and FactBase.
Regarding the related issues that analyses may form a partial order,
we could generalize the current mechanism in the following way:
* Instead of having a single level of analysis that has been
attained, the scheduler can have a collection of analyses that
have been performed
* As each successful analysis is completed, a record of it is
added to the scheduler's set of performed analyses
Regarding the issues that an analysis may depend on multiple analyses
of the same unit or on analyses of multiple units:
* In the most general case, an analysis may depend not on some
analysis level but on an arbitrary predicate. For instance, in
IModelListener, we could replace "AnalysisRequired
getAnalysisRequired()" with "boolean evaluatePreconditions()"
(evaluatePreconditions() might be based on some set of analyses
required or on something else).
* As a compromise (or a step on the road to full generality) we
could change "AnalysisRequired getAnalysisRequired()" to
"Set<AnalysisRequired> getAnalysesRequired()."
Regarding the issues of costs and benefits:
Simply knowing that you can run an analysis doesn't tell you much
about whether it's appropriate to run an analysis. For instance,
suppose you have an AST and that enables you to run both a semantic
analyzer and a spell-checker for comments. Probably you want to run
the semantic analysis first (i.e., it has a higher benefit). But if
the spell-checker can run in 1/100th the time as the semantic
analyzer, then maybe you want to go ahead and run that just to get it
out of the way. (I'm not suggesting what the right approach is,
merely arguing that these considerations are relevant to deciding what
you want to do.)
Further regarding the benefits, I think we can distinguish intrinsic
versus extrinsic benefits. Intrinsic would be some level of benefit
that you can assign to an analysis based solely on the value of its
results, regardless of what other analyses may be available.
Extrinsic benefits are those that depend on considerations beyond the
analysis itself, such as what other analyses it might enable, the
value of those analyses, etc. (which depends on the particular context).
Of course, there are a number of issues regarding the possible use of
costs and benefits in analysis scheduling: they may be hard to
compute precisely, they may be dynamic, their computation represents
some additional cost (a kind of meta-analysis), they might be
monitored (which would impose some additional cost), and the best way
to use cost and benefit values in scheduling isn't clear (and might
itself vary by context or in time).
Still, I think there are some simple steps we could take to begin to
incorporate some of these elements into our analysis scheduling:
1. Forget the dynamic stuff to start with. We don't do anything
dynamic now. Perhaps we could put in some placeholders for
dynamic evaluation of costs and benefits but not implement them
in a complicated way.
2. Adopt some simple scales for costs and benefits (kind of like
bug severity in Bugzilla). Assume that we can derive some
benefit if the person who specifies the cost and benefit levels
gets those basically right most of the time.
3. Allow users to adjust the costs and benefits for particular
analyses via preferences.
4. Devise a scheduling algorithm that is parameterizable in terms
of the weight it puts on various factors, i.e., costs versus
benefits (or costs versus intrinsic benefits versus extrinsic
benefits).
5. Allow users to adjust the weights of the different factors via
preferences.
Cost/benefit-driven scheduling has been tried in many other contexts,
and rarely
succeeds, except in limited cases (e.g. compiler optimizations) with a
limited suite
of analyses with costs that are well-understood a priori. It's too hard
to estimate
costs, and "benefits" are also difficult to quantify. Also, for such
information to
be useful requires a model of the available computing power, relative to the
computational costs, including CPU, memory and disk space. [Otherwise,
why not just schedule every analysis that has non-zero benefit?]
I think the right way to approach at this is not by designing a
framework that
tries to schedule analyses "independent of need", but instead to target a
framework that:
1) performs analyses based on explicitly requested functionality (e.g.
the user
asks for a call graph view, or triggers a certain refactoring).
2) permits ahead-of-time computation of certain generally useful
results, such
as search indices. [Most other analyses will be too costly to
compute before
they're needed.] Let the choice of what to do ahead of time be made by
the IDE/analysis builders, not a general framework.
--
Cheers,
-- Bob
--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center
IMP Team Lead (http://eclipse-imp.sourceforge.net)
X10: Productivity for High-Performance Parallel Programming (http://x10.sf.net)