Re: [imp-dev] Suggestions for improving the scheduling of analyses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [imp-dev] Suggestions for improving the scheduling of analyses

From: "Robert M. Fuhrer" <rfuhrer@xxxxxxxxxxxxxx>
Date: Thu, 05 Jun 2008 11:37:44 -0400
Delivered-to: imp-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/imp-dev>
List-help: <mailto:imp-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/imp-dev>, <mailto:imp-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/imp-dev>, <mailto:imp-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Thunderbird 2.0.0.14 (Windows/20080421)

Stan Sutton wrote:

Hi All,
I'd like to begin to do something to improve the mechanism by whichanalyses are scheduled (and run) in IMP. I'm talking about theanalysis scheduling/invocation mechanism that is currently part of theUniversalEditor, where "analyzers" ( which might do anything) areinvoked whenever the source text is (re)parsed.
Currently the analyses are invoked inUniversalEditor.ParserScheduler.notifyAstListeners(..). Each listener(which invokes an analyzer) has a required analysis level thatindicates the level of analysis on which its associated analysisdepends. Nominally, an analysis is invoked only if the level requiredby the analysis is less than the current level.
We recognize that this approach, while a step in the right direction,is significantly. As noted in comments in IModelListener, whichdefines the analysis levels:
   // BROKEN!!!
// The following has no notion of the scope of analysis. E.g.,providing a language// service may require analysis within a much wider scope than acompilation unit// (such as whole program analysis). Also, analyses don't reallyform a linear order.And there are other concerns, for example, that a given analysis maydepend on more than one type of analysis, and that the costs andbenefits of an analysis are not considered.


A few points:

- Most compilers are not structured so that clients can easily selectthe analysesthey'd like to have performed. E.g., you can't typically tell acompiler "I wantto go only as far as type-checking." Moreover, even relativelywell-structuredcompilers like Polyglot, which has an explicit goal-directedscheduler, anddoes allow you to specify the compilation goal, won't typically allowyou totake an AST that's previously been processed as far as one goal (say,namebinding), and later submit it for additional processing (say, typechecking).

- It may not be easy to figure out how far the compiler got, if itfailed to getas far as you asked it to go. E.g., type-checking failed, but didname binding

  succeed?

- In many languages and compilers, the analyses are intertwined, either for

efficiency, or frequently, in order to deal with apparent cycles ofreferenceand so on. As a result, compilers by necessity interleave namebinding in

  one part of the source with the type checking of another part.

Because of the above, it's not easy to use most compilers to actually
schedule these kinds of front-end analyses. For this reason, the current
framework stops roughly at the point of identifying how far the compiler
got.

But even this isn't a particularly good design point, since it's possible
(common, even) for the compiler to succeed in type-checking one part
of a given class, but fail in some small part (say, one expression). Since
we don't want the IDE to become globally crippled by a local failure,
we want the compiler to proceed as far as it can everywhere it can,
and fail in as fine-grained fashion as possible. Thus, it doesn't really
make sense to have a single indication per source file of how far the
compiler got.

At the same time, (for the sake of interoperability with existing compiler
front ends, an important IMP design requirement) it's not feasible to
require that compilers behave in such a fine-grained manner.

As far as the cost of analysis is concerned, many of the most costly
analyses also have notoriously high variance in cost. This makes it
extremely hard to predict the cost of any given analysis instance.
People thus tend to only perform such analyses when needed for
explicitly-requested functionality.

So I'm tempted to say we shouldn't try to extend this part of the
framework.

That said, the new analysis framework we've been working on (in
org.eclipse.imp.pdb, not to be confused with org.eclipse.imp.analysis,
which contains the beginnings of a less general framework that we
may abandon) exposes such inter-analysis dependencies explicitly,
based on the types of their results. The framework requires that
analyses specify the type of data that they produce, which permits
the framework to locate the appropriate producers for a bit of
data when requested by a client. Clients in this case could be
visualizers, or other analyses, and simply ask for a piece of
data with a particular type and a particular scope. If the data
requested doesn't exist yet, the framework schedules the
appropriate analyzer to produce it.

In other words, I think there's significant overlap between your
proposal and what we've done in org.eclipse.imp.pdb. Take a
look at IAnalysisDescriptor, AnalysisManager, and FactBase.

Regarding the related issues that analyses may form a partial order,we could generalize the current mechanism in the following way:
    * Instead of having a single level of analysis that has been
      attained, the scheduler can have a collection of analyses that
      have been performed
    * As each successful analysis is completed, a record of it is
      added to the scheduler's set of performed analyses
Regarding the issues that an analysis may depend on multiple analysesof the same unit or on analyses of multiple units:
    * In the most general case, an analysis may depend not on some
      analysis level but on an arbitrary predicate.  For instance, in
      IModelListener, we could replace "AnalysisRequired
      getAnalysisRequired()" with "boolean evaluatePreconditions()"
       (evaluatePreconditions() might be based on some set of analyses
      required or on something else).
    * As a compromise (or a step on the road to full generality) we
      could change "AnalysisRequired getAnalysisRequired()" to
      "Set<AnalysisRequired> getAnalysesRequired()."


Regarding the issues of costs and benefits:
Simply knowing that you can run an analysis doesn't tell you muchabout whether it's appropriate to run an analysis. For instance,suppose you have an AST and that enables you to run both a semanticanalyzer and a spell-checker for comments. Probably you want to runthe semantic analysis first (i.e., it has a higher benefit). But ifthe spell-checker can run in 1/100th the time as the semanticanalyzer, then maybe you want to go ahead and run that just to get itout of the way. (I'm not suggesting what the right approach is,merely arguing that these considerations are relevant to deciding whatyou want to do.)
Further regarding the benefits, I think we can distinguish intrinsicversus extrinsic benefits. Intrinsic would be some level of benefitthat you can assign to an analysis based solely on the value of itsresults, regardless of what other analyses may be available.Extrinsic benefits are those that depend on considerations beyond theanalysis itself, such as what other analyses it might enable, thevalue of those analyses, etc. (which depends on the particular context).
Of course, there are a number of issues regarding the possible use ofcosts and benefits in analysis scheduling: they may be hard tocompute precisely, they may be dynamic, their computation representssome additional cost (a kind of meta-analysis), they might bemonitored (which would impose some additional cost), and the best wayto use cost and benefit values in scheduling isn't clear (and mightitself vary by context or in time).
Still, I think there are some simple steps we could take to begin toincorporate some of these elements into our analysis scheduling:
   1. Forget the dynamic stuff to start with.  We don't do anything
      dynamic now.  Perhaps we could put in some placeholders for
      dynamic evaluation of costs and benefits but not implement them
      in a complicated way.
   2. Adopt some simple scales for costs and benefits (kind of like
      bug severity in Bugzilla).  Assume that we can derive some
      benefit if the person who specifies the cost and benefit levels
      gets those basically right most of the time.
   3. Allow users to adjust the costs and benefits for particular
      analyses via preferences.
   4. Devise a scheduling algorithm that is parameterizable in terms
      of the weight it puts on various factors, i.e., costs versus
      benefits (or costs versus intrinsic benefits versus extrinsic
      benefits).
   5. Allow users to adjust the weights of the different factors via
      preferences.

Cost/benefit-driven scheduling has been tried in many other contexts,and rarelysucceeds, except in limited cases (e.g. compiler optimizations) with alimited suiteof analyses with costs that are well-understood a priori. It's too hardto estimatecosts, and "benefits" are also difficult to quantify. Also, for suchinformation to

be useful requires a model of the available computing power, relative to the
computational costs, including CPU, memory and disk space. [Otherwise,
why not just schedule every analysis that has non-zero benefit?]

I think the right way to approach at this is not by designing aframework that

tries to schedule analyses "independent of need", but instead to target a
framework that:

1) performs analyses based on explicitly requested functionality (e.g.the user

   asks for a call graph view, or triggers a certain refactoring).

2) permits ahead-of-time computation of certain generally usefulresults, suchas search indices. [Most other analyses will be too costly tocompute before

   they're needed.] Let the choice of what to do ahead of time be made by
   the IDE/analysis builders, not a general framework.

--
Cheers,
 -- Bob

--------------------------------
Robert M. Fuhrer
Research Staff Member
Programming Technologies Dept.
IBM T.J. Watson Research Center

IMP Team Lead (http://eclipse-imp.sourceforge.net)
X10: Productivity for High-Performance Parallel Programming (http://x10.sf.net)

Follow-Ups:
- Re: [imp-dev] Suggestions for improving the scheduling of analyses
  - From: Jurgen Vinju

References:
- [imp-dev] Suggestions for improving the scheduling of analyses
  - From: Stan Sutton

Prev by Date: Re: [imp-dev] createPartControl() needs refactoring and IMP services need to be added to ServiceFactory.
Next by Date: Re: [imp-dev] Suggestions for improving the scheduling of analyses
Previous by thread: Re: [imp-dev] Suggestions for improving the scheduling of analyses
Next by thread: Re: [imp-dev] Suggestions for improving the scheduling of analyses
Index(es):
- Date
- Thread

Breadcrumbs