Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [imp-dev] Suggestions for improving the scheduling of analyses

Stan Sutton wrote:

Hi All,

I'd like to begin to do something to improve the mechanism by which analyses are scheduled (and run) in IMP. I'm talking about the analysis scheduling/invocation mechanism that is currently part of the UniversalEditor, where "analyzers" ( which might do anything) are invoked whenever the source text is (re)parsed.

Currently the analyses are invoked in UniversalEditor.ParserScheduler.notifyAstListeners(..). Each listener (which invokes an analyzer) has a required analysis level that indicates the level of analysis on which its associated analysis depends. Nominally, an analysis is invoked only if the level required by the analysis is less than the current level.

We recognize that this approach, while a step in the right direction, is significantly. As noted in comments in IModelListener, which defines the analysis levels:
   // BROKEN!!!
// The following has no notion of the scope of analysis. E.g., providing a language // service may require analysis within a much wider scope than a compilation unit // (such as whole program analysis). Also, analyses don't really form a linear order. And there are other concerns, for example, that a given analysis may depend on more than one type of analysis, and that the costs and benefits of an analysis are not considered.

A few points:

- Most compilers are not structured so that clients can easily select the analyses they'd like to have performed. E.g., you can't typically tell a compiler "I want to go only as far as type-checking." Moreover, even relatively well-structured compilers like Polyglot, which has an explicit goal-directed scheduler, and does allow you to specify the compilation goal, won't typically allow you to take an AST that's previously been processed as far as one goal (say, name binding), and later submit it for additional processing (say, type checking).

- It may not be easy to figure out how far the compiler got, if it failed to get as far as you asked it to go. E.g., type-checking failed, but did name binding
  succeed?

- In many languages and compilers, the analyses are intertwined, either for
efficiency, or frequently, in order to deal with apparent cycles of reference and so on. As a result, compilers by necessity interleave name binding in
  one part of the source with the type checking of another part.

Because of the above, it's not easy to use most compilers to actually
schedule these kinds of front-end analyses. For this reason, the current
framework stops roughly at the point of identifying how far the compiler
got.

But even this isn't a particularly good design point, since it's possible
(common, even) for the compiler to succeed in type-checking one part
of a given class, but fail in some small part (say, one expression). Since
we don't want the IDE to become globally crippled by a local failure,
we want the compiler to proceed as far as it can everywhere it can,
and fail in as fine-grained fashion as possible. Thus, it doesn't really
make sense to have a single indication per source file of how far the
compiler got.

At the same time, (for the sake of interoperability with existing compiler
front ends, an important IMP design requirement) it's not feasible to
require that compilers behave in such a fine-grained manner.

As far as the cost of analysis is concerned, many of the most costly
analyses also have notoriously high variance in cost. This makes it
extremely hard to predict the cost of any given analysis instance.
People thus tend to only perform such analyses when needed for
explicitly-requested functionality.

So I'm tempted to say we shouldn't try to extend this part of the
framework.

That said, the new analysis framework we've been working on (in
org.eclipse.imp.pdb, not to be confused with org.eclipse.imp.analysis,
which contains the beginnings of a less general framework that we
may abandon) exposes such inter-analysis dependencies explicitly,
based on the types of their results. The framework requires that
analyses specify the type of data that they produce, which permits
the framework to locate the appropriate producers for a bit of
data when requested by a client. Clients in this case could be
visualizers, or other analyses, and simply ask for a piece of
data with a particular type and a particular scope. If the data
requested doesn't exist yet, the framework schedules the
appropriate analyzer to produce it.

In other words, I think there's significant overlap between your
proposal and what we've done in org.eclipse.imp.pdb. Take a
look at IAnalysisDescriptor, AnalysisManager, and FactBase.

Regarding the related issues that analyses may form a partial order, we could generalize the current mechanism in the following way:

    * Instead of having a single level of analysis that has been
      attained, the scheduler can have a collection of analyses that
      have been performed
    * As each successful analysis is completed, a record of it is
      added to the scheduler's set of performed analyses


Regarding the issues that an analysis may depend on multiple analyses of the same unit or on analyses of multiple units:

    * In the most general case, an analysis may depend not on some
      analysis level but on an arbitrary predicate.  For instance, in
      IModelListener, we could replace "AnalysisRequired
      getAnalysisRequired()" with "boolean evaluatePreconditions()"
       (evaluatePreconditions() might be based on some set of analyses
      required or on something else).
    * As a compromise (or a step on the road to full generality) we
      could change "AnalysisRequired getAnalysisRequired()" to
      "Set<AnalysisRequired> getAnalysesRequired()."


Regarding the issues of costs and benefits:

Simply knowing that you can run an analysis doesn't tell you much about whether it's appropriate to run an analysis. For instance, suppose you have an AST and that enables you to run both a semantic analyzer and a spell-checker for comments. Probably you want to run the semantic analysis first (i.e., it has a higher benefit). But if the spell-checker can run in 1/100th the time as the semantic analyzer, then maybe you want to go ahead and run that just to get it out of the way. (I'm not suggesting what the right approach is, merely arguing that these considerations are relevant to deciding what you want to do.)

Further regarding the benefits, I think we can distinguish intrinsic versus extrinsic benefits. Intrinsic would be some level of benefit that you can assign to an analysis based solely on the value of its results, regardless of what other analyses may be available. Extrinsic benefits are those that depend on considerations beyond the analysis itself, such as what other analyses it might enable, the value of those analyses, etc. (which depends on the particular context).

Of course, there are a number of issues regarding the possible use of costs and benefits in analysis scheduling: they may be hard to compute precisely, they may be dynamic, their computation represents some additional cost (a kind of meta-analysis), they might be monitored (which would impose some additional cost), and the best way to use cost and benefit values in scheduling isn't clear (and might itself vary by context or in time).

Still, I think there are some simple steps we could take to begin to incorporate some of these elements into our analysis scheduling:

   1. Forget the dynamic stuff to start with.  We don't do anything
      dynamic now.  Perhaps we could put in some placeholders for
      dynamic evaluation of costs and benefits but not implement them
      in a complicated way.
   2. Adopt some simple scales for costs and benefits (kind of like
      bug severity in Bugzilla).  Assume that we can derive some
      benefit if the person who specifies the cost and benefit levels
      gets those basically right most of the time.
   3. Allow users to adjust the costs and benefits for particular
      analyses via preferences.
   4. Devise a scheduling algorithm that is parameterizable in terms
      of the weight it puts on various factors, i.e., costs versus
      benefits (or costs versus intrinsic benefits versus extrinsic
      benefits).
   5. Allow users to adjust the weights of the different factors via
      preferences.


Cost/benefit-driven scheduling has been tried in many other contexts, and rarely succeeds, except in limited cases (e.g. compiler optimizations) with a limited suite of analyses with costs that are well-understood a priori. It's too hard to estimate costs, and "benefits" are also difficult to quantify. Also, for such information to
be useful requires a model of the available computing power, relative to the
computational costs, including CPU, memory and disk space. [Otherwise,
why not just schedule every analysis that has non-zero benefit?]

I think the right way to approach at this is not by designing a framework that
tries to schedule analyses "independent of need", but instead to target a
framework that:

1) performs analyses based on explicitly requested functionality (e.g. the user
   asks for a call graph view, or triggers a certain refactoring).
2) permits ahead-of-time computation of certain generally useful results, such as search indices. [Most other analyses will be too costly to compute before
   they're needed.] Let the choice of what to do ahead of time be made by
   the IDE/analysis builders, not a general framework.

--
Cheers,
 -- Bob

--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center

IMP Team Lead (http://eclipse-imp.sourceforge.net)
X10: Productivity for High-Performance Parallel Programming (http://x10.sf.net)




Back to the top