Bug 320170 - [compiler] [null] Whitebox issues in null analysis
Summary: [compiler] [null] Whitebox issues in null analysis
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.7   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 3.7 M2   Edit
Assignee: Ayushman Jain CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 292478
  Show dependency tree
 
Reported: 2010-07-17 11:20 EDT by Stephan Herrmann CLA
Modified: 2010-09-15 04:13 EDT (History)
2 users (show)

See Also:


Attachments
proposed patch (3.24 KB, patch)
2010-07-17 11:28 EDT, Stephan Herrmann CLA
no flags Details | Diff
updated patch (5.01 KB, patch)
2010-07-20 12:42 EDT, Stephan Herrmann CLA
no flags Details | Diff
updated patch with one more test (8.84 KB, patch)
2010-07-29 10:13 EDT, Stephan Herrmann CLA
amj87.iitr: iplog+
amj87.iitr: review+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stephan Herrmann CLA 2010-07-17 11:20:45 EDT
While working on the code for null analysis (implementing bug 292478)
I found two issues in the code:

* NullInfoRegistry#mitigateNullInfoOf(..) has a potential typo:
  local variables s1 ... are used like this:
  (s1 = source.nullBit1) & (s3 = source.nullBit3) & (s4 = source.nullBit4)
  BUT s2 is assigned from this.nullBit2 (should be source.nullBit2 ??).
  I marked it as a TODO in the patch. Although I tried hard to find a test
  where it makes a difference, I could not find any, because the change would
  only affect bits 3 and 4, but clients of mitigateNullInfoOf hardly seem to
  be interested in these bits.

  How was the computation of these bits developed? If the boolean formulae
  are an optimized version of some truth tables, would the original form
  still be available/documented somewhere?

* The same method as above cannot handle null bits 0111. This pattern is
  undocumented in UnconditionalFlowInfo, but it actually occurs, saying:
  "potentially null potentially non-null and potentially unknown".
  I created test0536_try_finally() to witness this issue. Without the patch
  null-analysis detects nothing, but with the patch it reports a potential
  null pointer access.
  The point is that mitigateNullInfoOf should collect all uncertainties from
  any statement in the try block, but currently "x=bar()" overrides previous
  uncertainties with status "definitely unknown" thus losing valuable 
  information.
Comment 1 Stephan Herrmann CLA 2010-07-17 11:28:08 EDT
Created attachment 174558 [details]
proposed patch

One new test and its fix, plus a TODO.
Comment 2 Ayushman Jain CLA 2010-07-19 03:11:53 EDT
(In reply to comment #0)

> * NullInfoRegistry#mitigateNullInfoOf(..) has a potential typo:
>   local variables s1 ... are used like this:
>   (s1 = source.nullBit1) & (s3 = source.nullBit3) & (s4 = source.nullBit4)
>   BUT s2 is assigned from this.nullBit2 (should be source.nullBit2 ??).

It seems to me that the bits a2...a4 are used to record this.nullBits while bits s1...s4 are for source. So yes s2 should be assigned source.nullBit2. I'm not sure if its a typo here or if the variable that needs to used there should be this.nullBit2 indeed.

>   How was the computation of these bits developed? If the boolean formulae
>   are an optimized version of some truth tables, would the original form
>   still be available/documented somewhere?

You can find the state table in org.eclipse.jdt.core.tests.compiler.regression.NullReferenceImplTests.java:71 onwards. And since only the 4 higher bits are really relevant, we encode this state table in these 4 bits as given in UnconditionalFlowInfo.java:57. This is all we have regarding these bits i guess. Can you check if changing s2 to be assigned source.nullBit makes any tests fail?
 
> * The same method as above cannot handle null bits 0111. This pattern is
>   undocumented in UnconditionalFlowInfo, but it actually occurs, saying:
>   "potentially null potentially non-null and potentially unknown".
>   I created test0536_try_finally() to witness this issue. Without the patch
>   null-analysis detects nothing, but with the patch it reports a potential
>   null pointer access.
>   The point is that mitigateNullInfoOf should collect all uncertainties from
>   any statement in the try block, but currently "x=bar()" overrides previous
>   uncertainties with status "definitely unknown" thus losing valuable 
>   information.

This sounds good. But can you guide me through the logic that you followed in your patch (and it'll be good if you can also document it through comments at the place where you made the change so that somebody looking at it in the future would understand easily) ? 
Also add the new state that your'e capturing at both the above mentioned files to document it. 
A small comment about the patch - we usually leave a space after a line comment
//<space>Comment goes here
So please incorporate this change as well in the next patch. Thanks! :)
Comment 3 Stephan Herrmann CLA 2010-07-20 12:42:14 EDT
Created attachment 174767 [details]
updated patch

(In reply to comment #2)
> It seems to me that the bits a2...a4 are used to record this.nullBits while
> bits s1...s4 are for source. So yes s2 should be assigned source.nullBit2. I'm
> not sure if its a typo here or if the variable that needs to used there should
> be this.nullBit2 indeed.

<shrug>
I'm leaving the TODO in the patch, unable to decide.
 
> >   How was the computation of these bits developed? If the boolean formulae
> >   are an optimized version of some truth tables, would the original form
> >   still be available/documented somewhere?
> 
> You can find the state table in
> org.eclipse.jdt.core.tests.compiler.regression.NullReferenceImplTests.java:71
> onwards. And since only the 4 higher bits are really relevant, we encode this
> state table in these 4 bits as given in UnconditionalFlowInfo.java:57.

Thanks for the pointer, I hadn't seen NullReferenceImplTests before, but this
table indeed doesn't say more than the one in UnconditionalFlowInfo.java:57.

I briefly looked at the tests in NullReferenceImplTests, but sadly the
tables in NullReferenceImplTransformations didn't help much to enlighten me.
I tried to interpret the hex-tables as a translation of the comments but with
not much success.

>  This is all we have regarding these bits i guess.

That's a pity, because I can't see how anybody can manually make sense of 
computations like this:
	m3 = s1	& (s2 & (ns3 = ~s3) & (ns4 = ~s4) & (a3 | a4)
				| (ns2 = ~s2) & s3 & ns4 & (a2 | a4)
				| ns2 & ns3 & s4 & (a2 | a3));
I don't see a connection between this and anything in NullReferenceImplTests.

> Can you check if changing s2 to be assigned source.nullBit makes any tests 
> fail?

The tests in NullReferenceTests are unaffected by this change.
As I mentioned, I tried hard to find *anything* that would be affected by
the changed, but failed.

> > * The same method as above cannot handle null bits 0111. This pattern is
> >   undocumented in UnconditionalFlowInfo, but it actually occurs, saying:
> >   "potentially null potentially non-null and potentially unknown".
> >   I created test0536_try_finally() to witness this issue. Without the patch
> >   null-analysis detects nothing, but with the patch it reports a potential
> >   null pointer access.
> >   The point is that mitigateNullInfoOf should collect all uncertainties from
> >   any statement in the try block, but currently "x=bar()" overrides previous
> >   uncertainties with status "definitely unknown" thus losing valuable 
> >   information.
> 
> This sounds good.

Yea, I'm much more positive about this issue ;)

> But can you guide me through the logic that you followed in
> your patch (and it'll be good if you can also document it through comments at
> the place where you made the change so that somebody looking at it in the
> future would understand easily) ? 

I added some comments. Please let me know if more is needed.

> Also add the new state that your'e capturing at both the above mentioned files
> to document it. 

done.

> A small comment about the patch - we usually leave a space after a line comment
> //<space>Comment goes here
> So please incorporate this change as well in the next patch. Thanks! :)

I usually do, too.
I must have copied that style from other sinners within the test class :)
Comment 4 Ayushman Jain CLA 2010-07-21 07:27:16 EDT
(In reply to comment #3)
> Created an attachment (id=174767) [details]
> updated patch

Thanks for updating the patch. 

It'll be good if you can help me understand the logic behind this fix, which I still dont really get (since I'm not familiar with NullInfoRegistry at all). :)

It seems to me that you handle the 0111 case in a special way, but I dont see such special handling for any other pattern. Also can you tell me why you did what you did, and how your fix results in the given warning to be ellicited? Thanks!
Comment 5 Stephan Herrmann CLA 2010-07-21 18:02:54 EDT
(In reply to comment #4)

I fully understand your hesitation, I had hard times figuring out this issue.
BTW: Is it true that no original developer of this code is active in JDT/Core
any more? Quite a pity.

Yes, I'm handling the 0111 case specially, because that case is not correctly
covered by the preceding formulae. Apparently the formulae were developed
under the assumption that 0111 never occurs, which my test proves wrong.
If I would fully understand how the formulae for computing m1, m2 ... were
constructed I would have woven my change into those, but frankly, I didn't
dare to touch them. However, I think it is safe to add this as a rule that
simply overrides the previously computed value, because
 - we know that the formulae simply miss this case
 - we know the expected result (see below)
The code looks like an optimized version of a complex if-then-else structure.
If optimization is relevant, we could actually swap order to:
	if ((m = (m1 | m2 | m3)) != 0) {
		long x = ~this.nullBit1 & a2 & a3 & a4; // x is set for all variable ids that have state 0111 (pot n, pot nn, pot un)
		if (x != 0) {
			// restore state 0111 for all variable ids in x:
			source.nullBit1 &= ~x;
			source.nullBit2 |= x;
			source.nullBit3 |= x;
			source.nullBit4 |= x;
		} else {
			newCopy = true;
			source = source.unconditionalCopy();
			...
		}
	}
because we only need to evaluate either then or else branch.

As to the logic itself: If you look at test0536_try_finally() the
try block creates three kinds of uncertainty regarding "x":
- x can be null due to "x = null"
- x can be non-null due to "x = new X()"
- x can be unknown due to "x = bar()"
During TryStatement.analyseCode(..) all three pieces of information are
collected in handlingContext.initsOnFinally, which records the possible
states when entering the finally block - assuming that any statement in
the try block could raise an exception thus jumping to the finally block.
In this situation NullInfoRegistry.mitigateNullInfoOf(flowInfo) is called with:
 - this = handlingContext.initsOnFinally
 - flowInfo = merged info from more potential jumps to finally (like from a
   return within the try block).
I understand the method as merging all uncertainties that have been analysed
for the various potential paths into finally, so that
finallyContext.complainOnDeferredChecks(result) will actually complain about
all possible issues of the finally block.
The test case breaks the current implementation, because mitigateNullInfoOf
is called on a flowInfo with bits 0111 for which it was not designed.
Without the patch the sum of all uncertainties is computed as 0000 which
means: nothing to complain about, which is wrong. mitigateNullInfoOf must
not drop any information about potential problems downstream.

When passing the special pattern 0111 into complainOnDeferredChecks(..)
we correctly detect potential null-dereference within the finally block.
Question: Can pot. non-null and pot. unknown also trigger warnings in this
combination? If so, maybe we should add more tests but I couldn't find
any statement that would trigger this (other statements in finally that
should cause a warning).

It seems were are talking about the following difference: 
 - 0000 encodes we know nothing about x, so we better keep quiet.
 - 0111 encodes that we positively know that x could be in any possible state
   so we better raise a warning ("we know that we know nothing" :) )

OK that's a lot of words. Some of this you may already know, but its 
certainly good to double check. I hope it's a bit clearer now?
Comment 6 Stephan Herrmann CLA 2010-07-21 19:38:22 EDT
(In reply to comment #5)

one correction:

> If optimization is relevant, we could actually swap order to:

that part was wrong because we need to merge the information
if one variable has pattern 0111 and other variables have other patterns

So the patch from comment 3 is still the best I can offer.
Comment 7 Ayushman Jain CLA 2010-07-22 09:58:25 EDT
(In reply to comment #5)
> (In reply to comment #4)

Thanks Stephen. I now have more context about this. Btw, i tried using a variable x2 along with x, with all the operations that are done on x also done on x2. The NullInfoRegistry then read 0333. And when I used a third variable x3 with just one action done on it in try block ie. x3 = new X();, I obtained a registry 7043. So now I'm confused again. :)
 
> I fully understand your hesitation, I had hard times figuring out this issue.
> BTW: Is it true that no original developer of this code is active in JDT/Core
> any more? Quite a pity.

Maxime Daniel used to work on it. But not any longer. I wish this area were a bit more documented. :)

> When passing the special pattern 0111 into complainOnDeferredChecks(..)
> we correctly detect potential null-dereference within the finally block.
> Question: Can pot. non-null and pot. unknown also trigger warnings in this
> combination? 

As far as I know, we dont warn for pot non null, and pot unkown.
Comment 8 Stephan Herrmann CLA 2010-07-22 12:35:50 EDT
(In reply to comment #7)
> (In reply to comment #5)
> > (In reply to comment #4)
> 
> [...] Btw, i tried using a
> variable x2 along with x, with all the operations that are done on x also done
> on x2. The NullInfoRegistry then read 0333. And when I used a third variable x3
> with just one action done on it in try block ie. x3 = new X();, I obtained a
> registry 7043. So now I'm confused again. :)

Hm, with this class:
public class X {
 X bar () { return null; }
 void foo() {
   X x = new X();
   X x2 = new X();
   X x3 = new X();
   try {
     x = null;
     x2 = null;
     x = new X();
     x2 = new X();
     x3 = new X();
     x = bar();
     x2 = bar();
   } finally {
     x.toString();
     x2.toString();
     x3.toString();
   }
 }
}
I get a combined info of 4373, ie., x and x2 have 0111 and x3 has 1010.
Looks correct to me!

Do you still see the wrong result? What test exactly?

> Maxime Daniel used to work on it. But not any longer. I wish this area were a
> bit more documented. :)

Maybe he still has some bits on his computer that could help explain the
background?
Comment 9 Ayushman Jain CLA 2010-07-26 03:31:15 EDT
(In reply to comment #8)

> I get a combined info of 4373, ie., x and x2 have 0111 and x3 has 1010.
> Looks correct to me!
> 
> Do you still see the wrong result? What test exactly?

Ok so 7043 is the null bits on the flowInfo and not the registry. Sorry quoted the wrong no, here. I'm getting 0373 with the same testcase in the above comment. Even if x and x2 have 0111, and x3 has 1010, how does it combine into 0373?
Comment 10 Stephan Herrmann CLA 2010-07-29 10:13:05 EDT
Created attachment 175498 [details]
updated patch with one more test

(In reply to comment #9)
> (In reply to comment #8)
> 
> > I get a combined info of 4373, ie., x and x2 have 0111 and x3 has 1010.
> > Looks correct to me!
> > 
> > Do you still see the wrong result? What test exactly?
> 
> Ok so 7043 is the null bits on the flowInfo and not the registry. Sorry quoted
> the wrong no, here. I'm getting 0373 with the same testcase in the above
> comment. Even if x and x2 have 0111, and x3 has 1010, how does it combine into
> 0373?

Mh, I don't exactly know what you are seeing, but here's an updated patch
including the above test. When stepping through test0537_try_finally I see:

- this (NullInforRegistry) has 0373:
  x1 & x2 have 0111: 
    the try block potentially sets these to null, to non-null or to unknown
  x3 has 0010: 
    the try block potentially sets x3 to non-null
- the "flowInfo" argument has 7043:
  this combines the states before the try, at the end of the try and at returns
  it says: x1 & x2 are def unknown, x3 is def non-null
- these are combined into the variable "source" (funny name, but this is
  the result of the function):
  Here I indeed see 4373:
  x1 & x2 are 0111: potentially anything
  x3 is 1010: def non-null (unchanged from "flowInfo").

I don't see a problem (with my patch).

BTW: If also updated the copyright headers.
Comment 11 Ayushman Jain CLA 2010-07-29 10:27:32 EDT
(In reply to comment #10)
> Created an attachment (id=175498) [details]
> updated patch with one more test

Stephen, all this is fine. All I wanted to understand is
>Here I indeed see 4373:
>  x1 & x2 are 0111: potentially anything
>  x3 is 1010: def non-null (unchanged from "flowInfo").
How is the no. "4373" calculated from the infos of x1, x2 and x3? I believe this was calculated using those complicated formulae that both of us didn't understand right? If yes, then how did you derive the infos of x1, x2 as 0111, and of x3 as 1010, using just the no. "4373"?

I'm sorry for being so inquisitive but since you have a better understanding of this code, I want to take this opportunity to increase my understanding as well :)
Comment 12 Stephan Herrmann CLA 2010-07-29 12:14:52 EDT
I'm answering your comment out of order, to explain the general scheme first:

(In reply to comment #11)
> If yes, then how did you derive the infos of x1, x2 as 0111,
> and of x3 as 1010, using just the no. "4373"?

The connection is by the id of the local variables:
  x1 has id==0, x2 has id==1, x3 has id==2
This means the state for x1 is found by taking the 1-bits of all four fields
nullBit1, nullBit2 etc. Think of 4373 as a matrix:

           x3
           |
           |x2
           ||
           ||x1
           |||
nullBit1: 0100 = 4
nullBit2: 0011 = 3
nullBit3: 0111 = 7
nullBit4: 0011 = 3

Just rotate the matrix 90 degree counter-clockwise (turning columns into rows) 
and you get the states for the local variables:

0111 - state for variable with id==0 (x1)
0111 - state for variable with id==1 (x2)
1010 - state for variable with id==2 (x3)
0000 - irrelevant, because no variable with id==3

> How is the no. "4373" calculated from the infos of x1, x2 and x3? I believe
> this was calculated using those complicated formulae that both of us didn't
> understand right?

The original formulae compute 4040 (don't ask me how :). Then I just replace
the slices for variables x1 and x2. 
Local variable "x" (of mitigateNullInfoOf(), not the example program, oops) 
is the mask with one bit set for each affected variable 
(the set of locals x1 and x2 is encoded by the mask 0x3).
Now this line
  source.nullBit1 &= ~x
blanks those bits from nullBit1, and
  source.nullBit2 |= x 
and following *set* the bit in nullBit2 etc.
Using the bitmask 0x3 I only change the two rightmost columns in the first
matrix above, which are the first two rows in the second matrix.

In the example: before my adjustment the 4040 looks like this:
nullBit1: 0100 = 4
nullBit2: 0000 = 0
nullBit3: 0100 = 4
nullBit4: 0000 = 0

Blanking bits 0x3 in nullBit1 has no effect here, remains at 4, setting bits
0x3 changes nullBit2, nullBit4 0->3 and nullBit3 4->7. Yields our 4373, voila!

Is this clearer now?
Comment 13 Ayushman Jain CLA 2010-07-29 13:26:01 EDT
(In reply to comment #12)
> Is this clearer now?
Perfectly clear. :)
Thanks also for updating copyrights and test. I'll release your fix soon.
Comment 14 Ayushman Jain CLA 2010-07-30 02:02:21 EDT
Targetting 3.7M2
Comment 15 Ayushman Jain CLA 2010-08-19 06:48:53 EDT
Released in HEAD for 3.7M2.
Added regression tests NullReferenceTest#test0536_try_finally() and NullReferenceTest#test0537_try_finally().
Comment 16 Satyam Kandula CLA 2010-09-15 04:13:56 EDT
Verified for 3.7M2 using build I20100909-1700