Re: [imp-dev] Suggestions for improving the scheduling of analyses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [imp-dev] Suggestions for improving the scheduling of analyses

From: "Jurgen Vinju" <Jurgen.Vinju@xxxxxx>
Date: Thu, 5 Jun 2008 18:36:55 +0200
Delivered-to: imp-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/imp-dev>
List-help: <mailto:imp-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/imp-dev>, <mailto:imp-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/imp-dev>, <mailto:imp-dev-request@eclipse.org?subject=unsubscribe>

Hi Bob,

Good points. I over-generalized the discussion a bit, including the
parser scheduler and other things (outliner, builder). These are
scheduled by the UniversalEditor in a hard coded way, aren't they?
Then there are the analyses Stan is talking about, which are triggered
using AST listeners. The internal scheduling of UE should still be
separated out, but I think you are already doing that on the account
of the LPEX effort.

I think that the rest should be simple. There are several events
produced by the UniversalEditor: open, save, have (partial) AST, lost
AST, etc. Let's collect them in an interface. The events are just like
button actions, only triggered implicitly. The IDE developer should be
able to hook any action onto these events. Nothing more is needed.

As far as the implementation of these Actions is concerned,  I agree
that the analysis manager is our preferred method. If an analysis
needs information it does not produce itself, it asks the analysis
manager. Another analysis may have provided it before and cached it,
or it needs to be computed on-the-fly.

We still need to implement a feature for limiting the life-time of a
fact in the pdb though. A 'weak reference' kind of solution could
work, such that the JVM removes entries as memory becomes scarce and
garbage collection is needed, but data is left in store while there is
still memory enough...

Cheers,

Jurgen


On 6/5/08, Robert M. Fuhrer <rfuhrer@xxxxxxxxxxxxxx> wrote:
> Stan Sutton wrote:
> >
> > Hi All,
> >
> > I'd like to begin to do something to improve the mechanism by which
> analyses are scheduled (and run) in IMP.  I'm talking about the analysis
> scheduling/invocation mechanism that is currently part of the
> UniversalEditor, where "analyzers" ( which might do anything) are invoked
> whenever the source text is (re)parsed.
> >
> > Currently the analyses are invoked in
> UniversalEditor.ParserScheduler.notifyAstListeners(..).
> Each listener (which invokes an analyzer) has a required analysis level that
> indicates the level of analysis on which its associated analysis depends.
> Nominally, an analysis is invoked only if the level required by the analysis
> is less than the current level.
> >
> > We recognize that this approach, while a step in the right direction, is
> significantly.  As noted in comments in IModelListener, which defines the
> analysis levels:
> >   // BROKEN!!!
> >    // The following has no notion of the scope of analysis. E.g.,
> providing a language
> >    // service may require analysis within a much wider scope than a
> compilation unit
> >    // (such as whole program analysis). Also, analyses don't really form a
> linear order.
> > And there are other concerns, for example, that a given analysis may
> depend on more than one type of analysis, and that the costs and benefits of
> an analysis are not considered.
> >
>
> A few points:
>
> - Most compilers are not structured so that clients can easily select the
> analyses
>  they'd like to have performed. E.g., you can't typically tell a compiler "I
> want
>  to go only as far as type-checking." Moreover, even relatively
> well-structured
>  compilers like Polyglot, which has an explicit goal-directed scheduler, and
>  does allow you to specify the compilation goal, won't typically allow you
> to
>  take an AST that's previously been processed as far as one goal (say, name
>  binding), and later submit it for additional processing (say, type
> checking).
>
> - It may not be easy to figure out how far the compiler got, if it failed to
> get
>  as far as you asked it to go. E.g., type-checking failed, but did name
> binding
>  succeed?
>
> - In many languages and compilers, the analyses are intertwined, either for
>  efficiency, or frequently, in order to deal with apparent cycles of
> reference
>  and so on.  As a result, compilers by necessity interleave name binding in
>  one part of the source with the type checking of another part.
>
> Because of the above, it's not easy to use most compilers to actually
> schedule these kinds of front-end analyses. For this reason, the current
> framework stops roughly at the point of identifying how far the compiler
> got.
>
> But even this isn't a particularly good design point, since it's possible
> (common, even) for the compiler to succeed in type-checking one part
> of a given class, but fail in some small part (say, one expression). Since
> we don't want the IDE to become globally crippled by a local failure,
> we want the compiler to proceed as far as it can everywhere it can,
> and fail in as fine-grained fashion as possible. Thus, it doesn't really
> make sense to have a single indication per source file of how far the
> compiler got.
>
> At the same time, (for the sake of interoperability with existing compiler
> front ends, an important IMP design requirement) it's not feasible to
> require that compilers behave in such a fine-grained manner.
>
> As far as the cost of analysis is concerned, many of the most costly
> analyses also have notoriously high variance in cost. This makes it
> extremely hard to predict the cost of any given analysis instance.
> People thus tend to only perform such analyses when needed for
> explicitly-requested functionality.
>
> So I'm tempted to say we shouldn't try to extend this part of the
> framework.
>
> That said, the new analysis framework we've been working on (in
> org.eclipse.imp.pdb, not to be confused with org.eclipse.imp.analysis,
> which contains the beginnings of a less general framework that we
> may abandon) exposes such inter-analysis dependencies explicitly,
> based on the types of their results. The framework requires that
> analyses specify the type of data that they produce, which permits
> the framework to locate the appropriate producers for a bit of
> data when requested by a client. Clients in this case could be
> visualizers, or other analyses, and simply ask for a piece of
> data with a particular type and a particular scope. If the data
> requested doesn't exist yet, the framework schedules the
> appropriate analyzer to produce it.
>
> In other words, I think there's significant overlap between your
> proposal and what we've done in org.eclipse.imp.pdb. Take a
> look at IAnalysisDescriptor, AnalysisManager, and FactBase.
>
> > Regarding the related issues that analyses may form a partial order, we
> could generalize the current mechanism in the following way:
> >
> >    * Instead of having a single level of analysis that has been
> >      attained, the scheduler can have a collection of analyses that
> >      have been performed
> >    * As each successful analysis is completed, a record of it is
> >      added to the scheduler's set of performed analyses
> >
> >
> > Regarding the issues that an analysis may depend on multiple analyses of
> the same unit or on analyses of multiple units:
> >
> >    * In the most general case, an analysis may depend not on some
> >      analysis level but on an arbitrary predicate.  For instance, in
> >      IModelListener, we could replace "AnalysisRequired
> >      getAnalysisRequired()" with "boolean evaluatePreconditions()"
> >       (evaluatePreconditions() might be based on some set of analyses
> >      required or on something else).
> >    * As a compromise (or a step on the road to full generality) we
> >      could change "AnalysisRequired getAnalysisRequired()" to
> >      "Set<AnalysisRequired> getAnalysesRequired()."
> >
> >
> > Regarding the issues of costs and benefits:
> >
> > Simply knowing that you can run an analysis doesn't tell you much about
> whether it's appropriate to run an analysis.  For instance, suppose you have
> an AST and that enables you to run both a semantic analyzer and a
> spell-checker for comments.  Probably you want to run the semantic analysis
> first (i.e., it has a higher benefit).  But if the spell-checker can run in
> 1/100th the time as the semantic analyzer, then maybe you want to go ahead
> and run that just to get it out of the way.  (I'm not suggesting what the
> right approach is, merely arguing that these considerations are relevant to
> deciding what you want to do.)
> >
> > Further regarding the benefits, I think we can distinguish intrinsic
> versus extrinsic benefits.  Intrinsic would be some level of benefit that
> you can assign to an analysis based solely on the value of its results,
> regardless of what other analyses may be available.  Extrinsic benefits are
> those that depend on considerations beyond the analysis itself, such as what
> other analyses it might enable, the value of those analyses, etc. (which
> depends on the particular context).
> >
> > Of course, there are a number of issues regarding the possible use of
> costs and benefits in analysis scheduling:  they may be hard to compute
> precisely, they may be dynamic, their computation represents some additional
> cost (a kind of meta-analysis), they might be monitored (which would impose
> some additional cost), and the best way to use cost and benefit values in
> scheduling isn't clear (and might itself vary by context or in time).
> >
> > Still, I think there are some simple steps we could take to begin to
> incorporate some of these elements into our analysis scheduling:
> >
> >   1. Forget the dynamic stuff to start with.  We don't do anything
> >      dynamic now.  Perhaps we could put in some placeholders for
> >      dynamic evaluation of costs and benefits but not implement them
> >      in a complicated way.
> >   2. Adopt some simple scales for costs and benefits (kind of like
> >      bug severity in Bugzilla).  Assume that we can derive some
> >      benefit if the person who specifies the cost and benefit levels
> >      gets those basically right most of the time.
> >   3. Allow users to adjust the costs and benefits for particular
> >      analyses via preferences.
> >   4. Devise a scheduling algorithm that is parameterizable in terms
> >      of the weight it puts on various factors, i.e., costs versus
> >      benefits (or costs versus intrinsic benefits versus extrinsic
> >      benefits).
> >   5. Allow users to adjust the weights of the different factors via
> >      preferences.
> >
> >
>
> Cost/benefit-driven scheduling has been tried in many other contexts, and
> rarely
> succeeds, except in limited cases (e.g. compiler optimizations) with a
> limited suite
> of analyses with costs that are well-understood a priori. It's too hard to
> estimate
> costs, and "benefits" are also difficult to quantify. Also, for such
> information to
> be useful requires a model of the available computing power, relative to the
> computational costs, including CPU, memory and disk space. [Otherwise,
> why not just schedule every analysis that has non-zero benefit?]
>
> I think the right way to approach at this is not by designing a framework
> that
> tries to schedule analyses "independent of need", but instead to target a
> framework that:
>
> 1) performs analyses based on explicitly requested functionality (e.g. the
> user
>   asks for a call graph view, or triggers a certain refactoring).
> 2) permits ahead-of-time computation of certain generally useful results,
> such
>   as search indices. [Most other analyses will be too costly to compute
> before
>   they're needed.] Let the choice of what to do ahead of time be made by
>   the IDE/analysis builders, not a general framework.
>
> --
> Cheers,
>  -- Bob
>
> --------------------------------
> Robert M. Fuhrer
> Research Staff Member
> Programming Technologies Dept.
> IBM T.J. Watson Research Center
>
> IMP Team Lead (http://eclipse-imp.sourceforge.net)
> X10: Productivity for High-Performance Parallel Programming
> (http://x10.sf.net)
>
>
>
> _______________________________________________
> imp-dev mailing list
> imp-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/imp-dev
>

References:
- [imp-dev] Suggestions for improving the scheduling of analyses
  - From: Stan Sutton
- Re: [imp-dev] Suggestions for improving the scheduling of analyses
  - From: Robert M. Fuhrer

Prev by Date: Re: [imp-dev] Suggestions for improving the scheduling of analyses
Next by Date: Re: [imp-dev] createPartControl() needs refactoring and IMP services need to be added to ServiceFactory.
Previous by thread: Re: [imp-dev] Suggestions for improving the scheduling of analyses
Next by thread: [imp-dev] createPartControl() needs refactoring and IMP services need to be added to ServiceFactory.
Index(es):
- Date
- Thread

Breadcrumbs