Community
Participate
Working Groups
Getting the content of large remote directories (2000+ files/folders) can be extremely slow, especially when the network is slow or distance is huge -- for e.g. some shops have developers in India connecting to servers somewhere in North America. One way to help with this is to get the content in "groups". For example, if a directory contains more than a certain (configurable) number of children, then we will divide the children into different buckets. The number of children per bucket should also be configurable (i.e. 500 chilren per bucket, or 1000 per bucket). Folder1 +-- 1..1000 +-- 1001..2000 +-- 2001+ Exanding a range will get the children within that range (bucket) only. In the initial implementation, no actions will be allowed on the range nodes and the content of the range/bucket does NOT need to be auto-refreshed when content is changed (i.e. files deleted/added/etc). For example, if a file is deleted in the range 1..1000, then that range will have 999 files/folders. If a file is added instead, then range 1..1000 will really have 1001 files. A refresh will have to be issued on the parent directory in order to re-get the buckets/ranges.
I find the "bucket" idea problematic in terms of usability, and likely hard to implement. Before *any* further action on this, we need more exact data about how long it takes, and with what communication protocol. Samuel, I assume you are talking about dstore; can you try the same folders with SSH for comparison? Note that for putting stuff into buckets, you'll need to sort items on the remote side anyways. You can do that with an agent-based solution like dstore only (ssh or ftp won't guarantee the output to be sorted... what bucket should the user look in). So, you'll need to query all names in the remote folder anyways. Assuming 2000 names of 20 characters each, that's 40 KBytes to transfer. On a slow modem (14 kbps), transferring this data takes 30 seconds, which I personally find acceptable if it doesn't happen too often. I see 3 useful possibilities for improvement: 1. "Stream" like transfer of directory contents, such that results 1-n can already be displayed in the viewer while the transfer is still ongoing. The proposed Java7 FileSystem API's go that way for this kind of problem, and I'm in favor of adopting this for EFS as well. 2. On "Refresh", don't throw away the entire contents while refresh is still ongoing, but allow user to continue working with old data. Replace data with new contents as it is received. There is an existing bug for this. 3. On initial transfer, transfer the file names only but not the stat data (modtime, permissions). We *will* need to know isFile vs isFolder though, so I'm not sure how much we gain from this. Thoughts?