That loadAllDetached method is a bit complicated if I look in the woven output you included. Pinpointing where the stack height is inconsistent would be time consuming. The amount of cflow related weaving also makes me feel that maybe we could improve the pointcut to weave less (but I'm not totally clear on your use case so maybe not):
cflow(within(*..*EnhancerByMockito*))
will weave the counter code into every single join point in that Enhancer class. Every get/set/handler/etc - if you could get away with it you'd be better off with this:
cflow(within(*..*EnhancerByMockito*) && execution(* *(..)))
or maybe "cflow(execution(* *..*EnhancerByMockito*.*(..)))"
Which only weaves the method execution join points.
AspectJ weaving is built on patterns, the patterns typically produced by compilers. So if it needs to weave a particular instruction it recognizes it and the surrounding instructions and knows what to do. If 'something else' is generating the code, the pattern may be odd and cause problems in the weaving process. e.g. cglib here generating byte code - if cglib isn't quite producing what a compiler would produce to achieve the same thing, it can cause problems like this.
Andy