481271 – [tern] The tooling should ignore paths that reference node_modules (or handle them better)

Bug 481271 - [tern] The tooling should ignore paths that reference node_modules (or handle them better)

Summary: [tern] The tooling should ignore paths that reference node_modules (or handle...

Status:	RESOLVED FIXED

Alias:	None

Product:	Orion (Archived)
Classification:	ECD
Component:	JS Tools (show other bugs)
Version:	10.0
Hardware:	PC Mac OS X

Importance:	P2 major (vote)
Target Milestone:	12.0
Assignee:	Olivier Thomann
QA Contact:

URL:
Whiteboard:
Keywords:	performance

Depends on:	459277
Blocks:
	Show dependency tree

Reported:	2015-11-02 13:53 EST by Michael Rennie
Modified:	2016-04-21 15:15 EDT (History)
CC List:	7 users (show)

See Also:	Gerrit Change

Attachments
Local nodejs project containing checked-out modules (24.91 MB, application/x-zip-compressed) 2015-11-12 12:53 EST, Mark Macdonald	no flags	Details
POC for resolving node_modules (10.19 KB, patch) 2016-04-11 22:25 EDT, Michael Rennie	no flags	Details \| Diff
re-created patch (10.00 KB, patch) 2016-04-12 12:05 EDT, Michael Rennie	no flags	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Rennie

2015-11-02 13:53:13 EST

Mark hit a case while using the 'References' support, where he has a project that has a deep hierarchy of module dependencies. He reported that the tooling would continually make file requests and not come back to a usable state.

We should handle this better, by either:

1. limiting the number of requests / debugging if we are sending too many
2. just ignore commonly known paths that are used for config, etc, like 'node_modules'

Comment 1 Steve Northover

2015-11-02 14:37:19 EST

Mike M, I believe you but let's have the repeatable case captured here.

Comment 2 Mark Macdonald

2015-11-12 12:53:34 EST

Created attachment 257906 [details]
Local nodejs project containing checked-out modules

Here are some reproducible steps that should capture the problem

Using Windows 7 + Orion 10 on a local server

1. Create a new projet: File > New > Project > Basic
2. Drag in the zip attached to this bug. Expand it when prompted
3. Open app.js, go to line 50
4. Type "app." then press Ctrl+Space or allow content assist to open
5. At this point the editor quickly becomes unusable:

* I start to see "No response from server. Check your internet connection and try again." errors (presumably because we're too busy with search requests to handle autosaves.)
* java.exe consumes 75% of CPU

It eventually settles down, but the next time content assist opens, the same thing happens again.

Comment 3 Steve Northover

2015-11-12 13:37:50 EST

Is this the 'References' support or is it just general tooling slowness?  Mark's test case uses code assist, not 'References'.  Mark?

Comment 4 Mark Macdonald

2015-11-12 13:52:02 EST

I was using content assist to complete a dot expression, not searching for references.

Comment 5 Steve Northover

2015-11-12 14:48:33 EST

A work around might be to use the .tern-project file to ignore the path.  Have not tried it ...

Comment 6 Steve Northover

2015-11-16 13:29:37 EST

FYI:  The zip will not extract with the built in Windows zip support (need to use another unzipper)

Comment 7 Michael Rennie

2015-11-16 15:06:18 EST

We are asking to find a lib from the workspace (one that is not indexed by default) and the file client search ends up crunching away through the entire huge project.

As far as I know there is no way to tell the file client to not search a particular location, but on the tooling side, I can add an additional check in the nodejs plugin (after we check if a lib is indexed) that will not try to find it in the workspace unless it is a relative path.

For example, say 'tty' is a lib in the node_modules dir, but is not indexed. We will not even try to resolve it, just ignore it (infer it as the 'any' type), whereas something like '../mylib' would be searched for.

var lib = require('tty');

vs.

var lib = require('../mylib');

A couple of problems with this approach:
1. we lose the ability to correctly find the library type infos if they are in the project / workspace
2. this still does not solve the obvious scaling problem of the file client searching for and resolving library names in very large projects

Comment 8 Steve Northover

2015-11-16 15:11:34 EST

SSQ, is this the problem that we talked about on Friday?  (We search the entire file system because we don't "have a main()" -- ie. there is a RequireJS file somewhere that tells use what the base is for a relative path)

Comment 9 Michael Rennie

2015-11-16 16:08:14 EST

(In reply to Steve Northover from comment #8)
> SSQ, is this the problem that we talked about on Friday?  (We search the
> entire file system because we don't "have a main()" -- ie. there is a
> RequireJS file somewhere that tells use what the base is for a relative path)

This problem / project is all about node.js modules.

Comment 10 Steve Northover

2015-11-16 16:20:56 EST

Sorry about that.  Is it the same problem?  (ie. we do a full search because we don't have the base for a relative path)

Comment 11 Michael Rennie

2015-11-16 16:54:39 EST

(In reply to Steve Northover from comment #10)
> Sorry about that.  Is it the same problem?  (ie. we do a full search because
> we don't have the base for a relative path)

No worries, no its not really related to that, although perhaps we could do something smart here and specifically look for a node_modules folder, then look for the lib folder, rather than blindly search for the lib name.

in the meantime, I pushed a partial fix to not search for non-indexed, on-relative libs:

http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=83f2b3b478c23c3fb6340f300cec738acdf7a07d

Using the attached project (imported into my orion.eclipse.org workspace), the time to open content assist + use the tools was reasonable - not lightning fast the first time, but very fast once the graph completes.

For libs that are no longer searched for, I infer them as 'any'.

Comment 12 Steve Northover

2015-11-17 09:30:39 EST

Mark, are you able to try out the new code to see whether it helps?

Comment 13 Steve Northover

2016-03-28 14:55:56 EDT

0) Run the electron app adn build it such taht node_modules exists
1) Make "Undeclared member expression reference" an error
2) Orion Content/org.eclipse.orion.client/modules/orionode/server.js
3) Line 128: show an error for 'on'

if (args.ui) {
	var electron = require('electron');
	var mainWindow = null;
	electron.app.on('ready', function() {
...

Should I not find the electron module that I have loaded in node_modules and know about "on"?

Comment 14 Michael Rennie

2016-03-28 17:08:49 EDT

(In reply to Steve Northover from comment #13)
> 0) Run the electron app adn build it such taht node_modules exists
> 1) Make "Undeclared member expression reference" an error
> 2) Orion Content/org.eclipse.orion.client/modules/orionode/server.js
> 3) Line 128: show an error for 'on'
> 
> if (args.ui) {
> 	var electron = require('electron');
> 	var mainWindow = null;
> 	electron.app.on('ready', function() {
> ...
> 
> Should I not find the electron module that I have loaded in node_modules and
> know about "on"?

No, the commit mentioned in comment 11 turned that off. Crawling the node_modules directory is very very very bad for performance - everything will time out and the type infos will be worthless.

Comment 15 Steve Northover

2016-03-28 17:10:22 EDT

So we used to do this and we died?

Comment 16 Michael Rennie

2016-03-29 09:17:18 EDT

(In reply to Steve Northover from comment #15)
> So we used to do this and we died?

Correct.

Perhaps we could be a lot smarter here when looking for things in node_modules. For example when we encounter something like 'require('electron');', we know its not relative (so don't search in the workspace / project) and instead ask for it by name (or add a prefix to the search, if thats possible).

So, for example, instead of asking for the fileClient to search for 'electron' in the project, we could ask it to find '/node_modules/electron/index.js'. Or perhaps we could get the project from the project service / tern project manager and ask for '<project path>/node_modules/electron/index.js'. If that lookup fails try: '<project path>/node_modules/electron/electron.js'. Failing that we would have to read package.json to find where the modules main is.

Comment 17 Steve Northover

2016-03-29 10:00:04 EDT

If we know we are using Node, where else would something like 'express' be found?  How hard would it be to fix this?  Maybe get the coop to look at this.

Comment 18 Michael Rennie

2016-03-29 10:48:46 EDT

(In reply to Steve Northover from comment #17)
> If we know we are using Node, where else would something like 'express' be
> found?  How hard would it be to fix this?  Maybe get the coop to look at
> this.

It could come from:

1. a sub-project in your project
2. another project in the workspace
3. node_modules in your project
4. a definition file in your workspace / Tern settings
5. a plugin in Tern

I shouldn't be too hard to fix, since we already handle 1, 2, 4 and 5.

Comment 19 Steve Northover

2016-03-29 10:56:22 EDT

Ok, let's see what changing this does.  Does our code assist / cross-file linting get better?  Do we grind to a halt?

The idea of project/sub-project is interesting and requires more discussion elsewhere.  My feeling is that "project is dead" and sniffing is in.  If the directory contains a .git sub-directory, then is it a "project" (etc.).

Comment 20 Troy Tao

2016-03-29 16:08:10 EDT

Do we need to consider the option of actually runing the code? For example, as contentAssistant wants to find out what methods "app" obj has, we can run

eval('require("anyModule")')

and just check the returned obj.

In this way, we take advantage of Nodejs own module system to search through the node_modules.

Comment 21 Steve Northover

2016-03-29 16:12:15 EDT

I think exec'ing the code is usually a failing idea.  First off, we are running on the client not the server.

Comment 22 Troy Tao

2016-04-05 01:06:06 EDT

After reading through relative codes, I think there are several places that require our attention for a fix. 

1. Inside ternAssist.js (line 566-568), the function sortProposals choose to hide proposals of dependencies like "http", "express" and etc. Specifically, it ignores proposals for modules that does not have "/" and not inside envs array. If we comment out line 566-568 and test it with something like:

var http = require("http");
http.

Correct proposals will be displayed. A better way may be adding node dependencies to envs.

2. From my understanding, there are these steps that Tern took to prepare for the "completion" query:
 (1)Tern receives app.js and start the parse process.
 (2)When the parse finished, Tern will go through all the "postParse" passes provided by tern plugins. For the "node.js" plugin, its pass will call resolver.doPostParse (resolver.js line 116) where the resolver goes through ast.dependencies and resolves them into _resolved variable. This process also involves the Server instance in tern.js (for getting file contents).
 (3)After the postParse stage finished, the registered function "nodeRequire" in node.js (line 89) will be called and starts to calculate the returned type. During the process, files inside _resolved will be used and sent to Tern as separate file.

As Mike mentioned, what we need to do here is that instead of greedily loading all files inside "node_modules", we should find module's main js. To implement this, I tried to change the "resolver.doPostParse" to loading each modules' package.json first to fetch the location of main js for these modules and apply a second time "resolveDependencies" after these locations are retrieved. In this way, we can put contents of the main js of each module inside the _resolved variable. For example, if app.js is like this:

var async = require("async");

then "node_module/async/lib/async.js" will be added to _resolved like this:

{"async" : { file:"node_module/async/lib/async.js", logical: "async", contents: String}} (not exactly same as in the code)

In this way, I hijacked the step (2) and changed the _resolved variable for step (3). However, this change did not have expected effects: it did not correctly load "async" to the "scope" of "app.js" in the Tern.fileMap as a dependency. Instead, the "async" object has an empty "types" which should be the main place to retrieve available completions. So I wondered what happened after or before step (3) and tried to find out. But I failed to find where other objs like "http" gets correct "types" and it seems all the "addType" methods inside Infer.js (multiple lines) are not called and the search in "def.js" to see whether it comes from there also failed. To now, the problem still remains. 

3. Another possible way to think about 2. is to think about query stage, I thought the reason that I did not see correct proposals may due to problems from the query side instead of the preparation before sending the "completions" query. So I checked the process when we send the request: 

ContentAssist.js -> TernAssit.js -> TernWorkerCore.js -> Tern.js -> Infer.js

and it seems during the process, we are just retrieving the information without making any changes. To make sure of this, I tried to set up a project where I can see a correct output and this is the place where I found out the whole nodejs module system in Orion seems not working. For example, I first tried to use relative paths as Mike mentioned in comment 11:

var async = require("./node_modules/async/lib/async.js");
async.

the result is "No Proposals Found". So I tried to set up a test.js where:

module.exports = {
	test : function() {
		return "test";
	}
}

and app.js where:

var test = require("./test.js");
test.

the result is still "No Proposals Found". 

So here comes a question, besides all the default modules and envs modules, can we find the completions of any other nodejs modules when they are used as dependencies of app.js?

4. After all the previous failures, I started to carefully go through the TernJs code and docs. Then I found out in its doc, when described the "Node.js resolve plugin", it said that "Node.js resolve plugin" actually depends on the "Modules.js" plugin which I did not found inside Orion. Is there any chance that this is the cause of the problem in 3.? and if not, do we need to consider apply this "Modules.js" plugin?

5. At line 122 of node.js, "_f.content" should be "_f.contents".

Comment 23 Michael Rennie

2016-04-11 22:25:55 EDT

Created attachment 260878 [details]
POC for resolving node_modules

Here is a POC for finding node_module types without searching.

The patch basically works like this:

1. in the node plugin add modules to the graph with the fully qualified name during the initial load / parse pass
2. when asked for later - there is a check to translate the cached path to module name so we can find the type
3. resolver simply passes on the logical name of the module (i.e. 'async') and a flag stating its looking for a node name. In the reading code, check the name is not relative, and explicitly look for <project-path>/node_modules/<logical-name>/package.json
4. if found in (3) parse the contents to find the 'main' and do a fileclient lookup on that. 
5.Failing any of the lookups from (3) or (4), set the 'any' type and move on.

With this patch the test case in the smokeTests project works as expected - content assist is shown for the 'async' node module.

I have not tested this to see how it scales to the example huge project, but since there is no searching it should scale pretty well.

Comment 24 Eclipse Genie

2016-04-12 10:54:29 EDT

New Gerrit change created: https://git.eclipse.org/r/70481

Comment 25 Michael Rennie

2016-04-12 12:05:00 EDT

Created attachment 260898 [details]
re-created patch

Something was wrong with some of the diffs in the other patch

Comment 26 Michael Rennie

2016-04-12 12:45:16 EDT

I Pushed my patch:

http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=7b795299ff6456ce35a5a67e24601ebf3661d835

All tests pass.

There is more polish to do:

1. handle variations in package.json for 'main' names
2. really test the scaling - Olivier and I have both run it against the test project Mark gave and it works very very well.
3. The new library infos will have to be incorporated into bug 485219 to make the linting rule smarter
4. make sure we properly (and safely) are pruning the module dependency graph - we don;t want to load deps from node_modules deps, for example sub-node_modules inside a node_modules dependency. This would take us right back to timeouts / failures.

Comment 27 Michael Rennie

2016-04-12 12:48:09 EDT

Related, bug 491512 - that can be used to better cache the lib infos directly in the AST

Comment 28 Michael Rennie

2016-04-12 13:58:38 EDT

(In reply to Michael Rennie from comment #26)

> There is more polish to do:
> 
> 3. The new library infos will have to be incorporated into bug 485219 to
> make the linting rule smarter

done in: http://git.eclipse.org/c/orion/org.eclipse.orion.client.git/commit/?id=2c6800ad8083b80ca8f39167dba1355c278d2b22

Comment 29 Olivier Thomann

2016-04-13 13:31:25 EDT

Delivered.

Comment 30 Troy Tao

2016-04-21 15:15:10 EDT

When I tested my solution to bug 491512, I noticed that the current loading module solution cannot support a "multiple-level" loading. For example, in a test.js:

/*eslint-env node, async, react */
var asy = require("async");
var react = require("react");
react.
asy.

While Orion can give completions for "asy" correctly, it fails to find completions for "react". The reason seems to be that for async, there are no more modules required loading but for react, there are other modules that need to be loaded to give completions. The second (or third and so on) loading process happened after tern calculate the type info of react which means tern did not have enough information about react during the calculation. Moreover, the later loading failed to update the previous type info result, so when user send a completion request, Orion cannot give correct answers.