Bug 472914 - Consider using Iterable or List instead of arrays in API method return types
Summary: Consider using Iterable or List instead of arrays in API method return types
Status: RESOLVED FIXED
Alias: None
Product: Handly
Classification: Technology
Component: Core (show other bugs)
Version: 0.4   Edit
Hardware: All All
: P3 enhancement
Target Milestone: 0.4   Edit
Assignee: Vladimir Piskarev CLA
QA Contact:
URL:
Whiteboard:
Keywords: api
Depends on: 474001
Blocks:
  Show dependency tree
 
Reported: 2015-07-17 03:50 EDT by Vladimir Piskarev CLA
Modified: 2015-11-03 00:56 EST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir Piskarev CLA 2015-07-17 03:50:01 EDT
Need to consider using Iterable or List instead of arrays in API return types. This is in part inspired by a discussion on handly-dev:

https://dev.eclipse.org/mhonarc/lists/handly-dev/msg00024.html
Comment 1 Vladimir Piskarev CLA 2015-07-31 05:37:47 EDT
Another option (in Java 8) would be Stream.
Comment 2 Vladimir Piskarev CLA 2015-07-31 09:39:48 EDT
Interesting discussion:

http://alexruiz.developerblogs.com/?p=2519
Comment 3 Vladimir Piskarev CLA 2015-07-31 10:30:51 EDT
It looks like all these options are flawed in one way or another. I agree with Jens comment that the real problem is that there is no immutable interface for collections in Java. Arrays are mutable and need to be copied defensively if clients cannot be trusted. Also, they cannot be "virtualized". Iterable is immutable, but anemic. Collection exposes modification methods even when it is unmodifiable.

Stream-oriented processing looks promising wrt scalability, but it does not mix well with checked exceptions (including the omnipresent CoreException) and iterables. Obviously, it requires Java 8. I think we might consider providing stream-oriented views in the future, but not as the sole option.

One solution could be to keep the methods that return arrays and provide the counterpart methods that would return streams (such as getChildren and getChildrenStream). We would have the array-based methods use defensive copying and provide them with default implementations that delegate to corresponding stream-based methods. The array-based methods would be kept for backwards compatibility and convenience (e.g. ease of interoperability with array-based Eclipse Platform APIs, iteration with a for loop, etc.) The stream-based methods would facilitate functional-style programming and scalability.
Comment 4 Vladimir Piskarev CLA 2015-08-04 11:20:14 EDT
A note about covariance. Arrays are covariant, e.g. String[] is a subtype of Object[]. This fact allows clients to override methods that return arrays and restrict the return type to a subtype (i.e. exploit covariant return types). For example:

public interface IJavaElementDelta
   extends IHandleDelta
{
   @Override
   IJavaElementDelta[] getAffectedChildren() throws JavaModelException;
   ...
}

If we treat arrays as immutable, this is never a problem and in fact very useful.

By contrast, generics are invariant, e.g. List<String> is NOT a subtype of List<Object>. This is actually an advantage for *modifiable* collections. But it gets in the way in our case. To allow clients still exploit covariant return types, we would have to use method signatures such as

   List<? extends IHandleDelta> getAffectedChildren()

which is generally considered to be an anti-pattern, since it forces clients to use "beauties" like

   List<? extends IHandleDelta> children = delta.getAffectedChildren();

However, there are relatively recent Java platform APIs that do exactly this for the very same reason: to allow covariance. See, for example, javax.lang.model.util.Elements and javax.lang.model.element.Element.
Comment 5 Vladimir Piskarev CLA 2015-08-05 02:20:40 EDT
Perhaps I need to elaborate on "this is never a problem" above. "This" is array covariance. "Problem" is that if we have, say, an array of type String[], we can assign it to a variable of type Object[] and then do something like this

    String[] ss = ...
    Object[] oo = ss;
    oo[0] = 0; // no compile error, ArrayStoreException at runtime

This is never a problem if arrays are treated as immutable, i.e. clients must not mofify the returned array. This is also never a problem for generic collections, since you cannot assign, say, List<String> to List<Object>.
Comment 6 Vladimir Piskarev CLA 2015-08-05 02:57:46 EDT
A note about reification. Generics are not reified, while arrays are. This can have an impact on efficiency in some cases. Let's again consider the case of method

   @Override
   IJavaElementDelta[] getAffectedChildren()

Its implementation cannot just do a cast

   return (IJavaElementDelta[])super.getAffectedChildren(); // won't compile

It has to copy the original array, even if we are certain that each element of the array is an IJavaElementDelta:

    IHandleDelta[] array = super.getAffectedChildren();
    IJavaElementDelta[] result = new IJavaElementDelta[array.length];
    System.arraycopy(array, 0, result, 0, array.length);
    return result;

By contrast, a generic version of this method can just do an unchecked cast in this case, without copying any data:

    @SuppressWarnings("unchecked")
    List<IJavaElementDelta> result =
        (List<IJavaElementDelta>)super.getAffectedChildren();
    return result;
Comment 7 Vladimir Piskarev CLA 2015-08-05 03:06:20 EDT
Even in cases where some filtering of the result is needed, the generic version can still avoid *some* copying:

    List<IHandleDelta> list = super.getAffectedChildren();
    ArrayList<IJavaElementDelta> result = new ArrayList<>(list.size());
    for (IHandleDelta delta : list)
    {
        if (delta instanceof IJavaElementDelta)
            result.add(delta);
    }
    return result; // can return the result list directly

Should the method return an array, we'd need to copy the result list into a new array:

    return result.toArray(new IJavaElementDelta[result.size()]);

However, if we were to minimize the storage of the returned list, it would require copying its data into a new array internally:

    result.trimToSize(); // calls Arrays.copyOf(elementData, size)
    return result;
Comment 8 Vladimir Piskarev CLA 2015-08-05 03:52:01 EDT
It has been mentioned in comment 4 that generic collections and covariant return types don't mix well.

I think it should not be that much a problem for streams, since they are most often used in chained calls, with type inference. So a method such as

    Stream<? extends IHandleDelta> getAffectedChildren()

would usually be used in something like this:

    List<IHandleDelta> children = delta.getAffectedChildren().filter(
        child -> child.getKind() == ADDED).collect(Collectors.toList());
Comment 9 Vladimir Piskarev CLA 2015-08-05 03:57:41 EDT
It is interesting to note that JDT does not use defensive copying for the returned arrays (nor does it usually specify that clients must not modify the result), while the Workspace (Core Resources) always copies the returned arrays defensively.
Comment 10 Vladimir Piskarev CLA 2015-08-05 04:21:33 EDT
To sum up, my current thinking is that we should keep array-oriented view and consider providing stream-oriented view when we adopt Java 8, as it has been suggested in comment 3. This might be the best of both worlds.

The unfortunate problem with streams, though, is that they don't mix well with checked exceptions. It is especially bad because the checked CoreException is omnipresent in Eclipse Platform APIs and Handly. ThrowingStream [1] might be a solution.

[1] https://github.com/JeffreyFalgout/ThrowingStream
Comment 11 Vladimir Piskarev CLA 2015-08-05 08:44:02 EDT
A note about iteration performance. My experiments consistently show that the standard list view of the array (i.e. Arrays.asList(array)) is about 2x slower to iterate with the enhaced for loop than the array itself -- probably due to comodification checks in the list iterator. An optimized custom immutable list view of the array (that need not check for comodification) is still about 25-30% slower to iterate with the for-each construct than the naked array.

Iteration over the elements of the retun value is arguably the most common use of the kind of methods we're discussing. Probably another reason to keep arrays.

See also a note of Joshua Bloch on why they decided to return an array instead of list in enum values() method, and still consider it the right decision:

http://www.oracle.com/technetwork/articles/java/bloch-effective-08-qa-140880.html
Comment 12 Vladimir Piskarev CLA 2015-08-05 10:13:03 EDT
However, if we account for the need of defensive copying of the returned array, it is not as clear cut. My measurements indicate that iteration over the standard list view of the array with the for-each loop takes about the same time as both cloning the array and iterating over the clone. (A custom unmodifiable list view could be considerably faster.) Arrays can still win even when defensive copying is involved if one needs to iterate over the result more than once.

I think, though, that in cases where scalability is the most important factor, streams would be hard to beat as they can provide both lazyness and parallelization. So the write-up in comment 10 is still relevant.
Comment 13 Vlad Dumitrescu CLA 2015-08-05 14:13:48 EDT
Great analysis of all pros and cons. 

As the original poster of this suggestion, my use case is for usage of the model that falls outside Handlys area: manipulating the model in memory (for example refactorings). For this case, the speed advantage of arrays is less important, as there are many other things happening.

However, the adapter solution that you introduce in https://pisv.wordpress.com/2015/07/09/handly-support-for-legacy-models/ would probably work fine in this case too and then I believe we can have the best of both worlds.

[As a side note, I am using Xtend a lot in my code, where it is easy to follow a mostly functional paradigm and where lists and arrays are transparently converted into each other when needed. This is both nice and a bit dangerous, as conversions can happen too many times and that slows things down considerably.]
Comment 14 Vladimir Piskarev CLA 2015-08-06 03:06:53 EDT
(In reply to Vlad Dumitrescu from comment #13)

Thanks Vlad for your comment and the original suggestion that catalyzed this discussion. Unfortunately, I had not been able to properly address your request within the 0.3 time frame (sorry!), but the 0.4 focus on API looks like a nice fit for it.

I must say that now it is not as clear cut to me as it seemed initially. What we have here is basically a set of tradeoffs, and it appears to be pretty hard to choose one over another. But taking into account that array-based APIs are already in use, it is probably better to keep them, if only not to disrupt existing clients.

You are absolutely right -- and I should have mentioned it in my consideration, thanks for pointing it out! -- that the model adaptation facility introduced in 0.4 will probably make this decision a bit less important, since adopters can now choose to treat the Handly-based model as implementation detail that will not leak into client API (with some performance penalty paid for the adapter model). Another benefit of the adapter model (at least until Handly core API is truly finalized) is that it will minimize possible disruptions caused by Handly evolution.

Also, I concur with you that Xtend can make it all more palatable (e.g. the above mentioned "beauty" in Java <code>List<? extends IHandleDelta> children = delta.getAffectedChildren()</code> translates into <code>val children = delta.affectedChildren</code> in Xtend), but my design target is (understandably) first of all Java.

Regarding implicit conversions of arrays to lists in Xtend: I suspect they use Arrays#asList internally, which is slower to iterate and is not unmodifiable. Probably you would be better off with a custom immutable list view on the returned array for a small pay in terms of increased verbosity. I have implemented a prototype that uses lists instead of arrays [1]. You can take a look at the Arrays2 class [2] for an example of such a view.

[1] https://github.com/pisv/handly-playground/tree/lists
[2] https://github.com/pisv/handly-playground/blob/lists/org.eclipse.handly/src/org/eclipse/handly/util/Arrays2.java
Comment 15 Vladimir Piskarev CLA 2015-08-06 03:27:32 EDT
(In reply to Vladimir Piskarev from comment #14)
> (In reply to Vlad Dumitrescu from comment #13)
> 
> the model adaptation facility introduced in 0.4

Just to clarify: the 0.4 version is not required to implement the adapter model (it's just a design pattern), but will bring some infrastructure support for it such as a content adapter (bug 474840). For example, you will be able to use a content provider based on elements from the adapter model in Handly-based outline view.
Comment 16 Vladimir Piskarev CLA 2015-08-06 03:28:50 EDT
I meant bug 472840, sorry.
Comment 18 Vladimir Piskarev CLA 2015-08-07 05:31:26 EDT
Two more notes:

* Array-oriented view should specify that clients must not modify the returned array, whatever the actual implementation is (defensive copying or not). Otherwise, array covariance can lead to bugs. For example:

    IJavaElementDelta delta = ...
    IHandleDelta[] children = delta.getAffectedChildren();
    children[0] = new HandleDelta(element); // ArrayStoreException

* List-oriented view that allows covariance restricts mutability in a rather weird way. For example:

    List<? extends IHandleDelta> children = delta.getAffectedChildren();
    // you can remove
    children.remove(0);
    children.remove(delta);
    children.clear();
    // but not add or set (except for null)
    children.add(delta); // won't compile
    children.add(null); // you can only add null
    children.set(0, delta); // won't compile
    children.set(0, null); // you can only set null

There is absolutely no sense to return a modifiable list from such a view. 
It (more or less) fits only unmodifiable lists. Unfortunately, Java doesn't have a standard interface for unmodifiable lists.
Comment 19 Vladimir Piskarev CLA 2015-08-07 06:34:58 EDT
I don't plan to take any further action in 0.4 regarding it, but leave this enhacement request open to allow for broader community feedback. If you feel strongly about it, please leave a comment. Meanwhile, based on my current observations, I think that the best policy here would be "when in doubt, excercise restraint" (or "if it ain't broke, don't fix it"). I must admit I'm a bit exhausted.
Comment 20 Vladimir Piskarev CLA 2015-11-03 00:56:03 EST
Resolving as FIXED, although no code changes were made. Please reopen if you feel very strongly about it.