Bug 474042 - [null] Syntactic analysis of fields should generate 100% null-safe reads
Summary: [null] Syntactic analysis of fields should generate 100% null-safe reads
Status: NEW
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.6   Edit
Hardware: PC All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: JDT-Core-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-31 09:59 EDT by Timo Kinnunen CLA
Modified: 2015-09-02 02:09 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Timo Kinnunen CLA 2015-07-31 09:59:22 EDT
The compiler setting Use syntactic null analysis for fields in Errors/Warnings could be improved. Looking at the bytecode generated with and without that setting turned on shows no difference. This means that the code could trigger an NPE with the feature enabled and without a warning being displayed. I propose that the compiler should use this setting as a permission to take advantage of the freedoms granted by the Java memory model to generate fully null-safe code when possible, automatically.

Using the code from bug 473970 as example, the compiler could transform from this:

@NonNullByDefault public class NullTest1 {
	@Nullable public String[] strings;

	public void print() {
		if(strings != null) {
			for(String string : strings) {
				System.out.println(string);
			}
		}
	}
}

into this version:

@NonNullByDefault public class NullTest2 {
	@Nullable public String[] strings;

	public void print() {{
		String string; int __i; int __n; String[] __strings2; if((__strings2 = strings) != null) {
			for(__n = __strings2.length, __i = 0; __i < __n; __i++) { string = __strings2[__i];
				System.out.println(string);
			}
		}
	}}
}
(unnamed local variables are prefixed with underscores)

The generated bytecode differs only at the start of the method:
     0  aload_0 [this]           -
     1  getfield [strings]       2  aload_0 [this]
     4  ifnull 41                3  getfield [strings]
     7  aload_0 [this]           6  dup
     8  getfield [strings]       7  astore 4
    11  dup                      9  ifnull 41
    12  astore 4                12  aload 4
    14  arraylength             14  arraylength
    15  istore_3                15  istore_3
    16  iconst_0                16  iconst_0
    17  istore_2                17  istore_2
    18  goto 36                 18  goto 36
    21  aload 4                 21  aload 4
    23  iload_2                 23  iload_2
    24  aaload                  24  aaload
    25  astore_1 [string]       25  astore_1 [string]
    26  getstatic [System.out]  26  getstatic [System.out]
    29  aload_1 [string]        29  aload_1 [string]
    30  invokevirtual [println] 30  invokevirtual [println]
    33  iinc 2 1                33  iinc 2 1
    36  iload_2                 36  iload_2
    37  iload_3                 37  iload_3
    38  if_icmplt 21            38  if_icmplt 21
    41  return                  41  return

Note that the field is read only once in the new method. This means that there is only one data race for that field which now can be covered by the single null-check. The generated bytecode is also 2 bytes shorter.

According to my reading of the memory model in §17.4 and in particular the example §17.4-C from http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4-C
this transformation is allowed as long as the field is not volatile or subject to synchronization and the reading thread doesn't write to the field between the two reads. 

Without an enforced ordering between a potential write to this shared field and the reads from this method, both reads can appear to have happened at the exact same time. Then they both will have seen the same value and so the single check for null is sufficient to cover both of them.
Comment 1 Stephan Herrmann CLA 2015-07-31 17:50:01 EDT
Interesting proposal.

In which way would this help people struggling with syntactic analysis for fields? At face value you only avoid the remaining danger of data races, right? People wanting any analysis for fields tend to not care about this residual risk. But: Are you suggesting that with this transformation behind the scenes, the information resulting from the null-check should remain valid longer? Currently, we cancel that information after the next instruction.

Can you clarify which exact goal you pursue with this proposal?


Inspired by this bug I scribbled down a new candidate for our planned null utilities, see bug 442103 comment 6.

To generalize over both proposals, we could say, we'd like to emulate a missing language feature, which - in analogy to t-w-r - could look like this:

   withNonNull (String[] nnStrings = strings) {
      for (String string : strings)
         System.out.println(string);
   }

I.e., Java lacks a block statement that combines capturing a value in a local variable and performing a check on that local. If syntax were more elegant than an explicit introduction of new local variable, things would look a lot nicer without much jumping through hoops. The functional variant in bug 442103 comment 6 goes a long way, I believe, except for the obvious restrictions regarding access to other local variables (no writes, must be effectively final). WDYT?
Comment 2 Stephan Herrmann CLA 2015-07-31 19:48:58 EDT
(In reply to Stephan Herrmann from comment #1)
> Can you clarify which exact goal you pursue with this proposal?

This bug per se is very clear, actually. I was just puzzled about your announcement in bug 473970 comment 3:

> ... I think some improvements could be made here, especially since this 
> use-case comes up so frequently.

I don't recall users asking for more safety, only for fewer warnings/errors ;P
Comment 3 Timo Kinnunen CLA 2015-08-02 18:01:35 EDT
(In reply to comment #2)
> (In reply to Stephan Herrmann from comment #1)
> > Can you clarify which exact goal you pursue with this proposal?
> 
> This bug per se is very clear, actually. I was just puzzled about your
> announcement in bug 473970 comment 3:
> 
> > ... I think some improvements could be made here, especially since this
> > use-case comes up so frequently.
> 
> I don't recall users asking for more safety, only for fewer warnings/errors ;P

Well at least I'm asking for it :) 

Anyway, my thinking here was that if this change turned out to be safe (still to be determined?), not to cause performance regressions (it shouldn't, it's often used as a performance optimization), etc., then in the future this transformation and Synthetic analysis itself could be folded in with all the other transformations, making Synthetic analysis of fields the default and so resulting in fewer errors/warning being emitted after all.
Comment 4 Timo Kinnunen CLA 2015-09-02 02:09:22 EDT
(In reply to comment #3)
> if this change turned out to be safe (still to
> be determined?)

This question turned out quite surprising. The good news is that volatile reads won't need any special handling. The bad news is that the Java verifier and binary compatibility allow a read from a field to be volatile or non-volatile depending on other classfiles present at link time, so every read can potentially execute as volatile. This isn't as bad as it sounds and there's actually a simple solution that applies for all fields. As an example, here's this code where any field could be volatile: 

	public boolean hasValidAddressByRead_orig() {
		return address != null && address.validFrom != null && address.validFrom.isBefore(Instant.now());
	}

Writing out all of the reads, there's 5 reads in total and 3 locations where a NPE could occur as only two of the reads are covered by a null check:

	public boolean hasValidAddressByRead_0() {
		@Nullable Address r1; @Nullable Address r2; @Nullable Instant r3; @Nullable Address r4; @Nullable Instant r5;
		r1 = address;
		if(r1 != null) {
			r2 = address; r3 = r2.validFrom;
			if(r3 != null) {
				r4 = address; r5 = r4.validFrom;
				return r5.isBefore(Instant.now());
			}
		}
		return false;
	}

The problem is that later reads can contradict earlier reads. As any or all of the reads could be volatile the reads can't be reordered and combined freely. What we would like is a memory snapshot where r1==r2==r4 and r3==r5 and the reads happened in the order r1,r2,r3,r4,r5. 

Luckily the Java Memory Model doesn't prohibit a compiler from creating lots of unnecessary reads if such reads don't violate the memory model or have observable effects on program execution. We can use this to brute-force us our memory snapshot by repeating the five reads over and over until they are consistent. An unoptimized computer-generated version could look like this:

	public boolean hasValidAddressByRead_1() {
		boolean value;
		out: while(true) {
			point1: {
				point2: {
					Instant temp;
					retrying: while(true) {
						@Nullable Address r1 = address; if(r1 == null) break point1;
						@Nullable Address r2 = address; if(r2 != r1) continue retrying;
						@Nullable Instant r3 = r1.validFrom; if(r3 == null) break point2;
						@Nullable Address r4 = address; if(r4 != r1) continue retrying;
						@Nullable Instant r5 = r1.validFrom; if(r5 != r3) continue retrying;
						temp = r3;
						break retrying;
					}
					value = temp.isBefore(Instant.now()); break out;
				}
				value = false; break out;
			}
			value = false; break out;
		}
		return value;
	}

If we can make the inner loop execute so amazingly fast that changes to physical memory can't keep up, we will have 5 consistent reads in no time at all. 

After the inner loop has finished by assigning temp it will be the case that r1==r2, r1==r4 and r3==r5, those reads occurred in the correct order, and so we don't really need to perform the r2, r4 and r5 reads. With those reads replaced by values from r1 and r3 the inner loop now only runs once, and after some further cleanup this is the optimized version we have generated:

	public boolean hasValidAddressByRead_2() {
		@Nullable Address r1 = address;
		if(r1 != null) {
			@Nullable Instant r3 = r1.validFrom;
			if(r3 != null) {
				return r3.isBefore(Instant.now());
			}
			return false;
		}
		return false;
	}

I originally thought that I would have to examine every control flow path for each recognized code pattern to prove a transform is legal, but this retry approach seems more general and a lot simpler.