Community
Participate
Working Groups
The vararg function objc_msgSend() is called incorrectly from os.c in macOS SWT. In os.c, it is invoked by casting it to a specific function signature before calling it, for example: rc = (jintLong)((jintLong (*)(jintLong, jintLong, jintLong, jintLong, jintLong))objc_msgSend)(arg0, arg1, arg2, arg3, arg4); However, objc_msgSend is a vararg function and the calling convention between a vararg function and a non-vararg function is not guaranteed to be the same. In particular, on x86_64 it is different. From the ABI spec (https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf): "For calls that may call functions that use varargs or stdargs (prototype-less calls or calls to functions containing ellipsis (. . . ) in the declaration) %al16 is used as hidden argument to specify the number of vector registers used. The contents of %al do not need to match exactly the number of registers, but must be an upper bound on the number of vector registers used and is in the range 0–8 inclusive." So, the above specifies that for vararg function calls, al (which is part of eax and part of thus eax and thus rax) have a meaning. This same requirement does not exist for non-vararg calls. This can be seen by compiling these two functions with GCC: jintLong func1(void) { jintLong a = 1, b = 2, c = 3, d = 4, e = 5; return (jintLong)objc_msgSend(a, b, c, d, e); } jintLong func2(void) { jintLong a = 1, b = 2, c = 3, d = 4, e = 5; return (jintLong)((jintLong (*)(jintLong, jintLong, jintLong, jintLong, jintLong))objc_msgSend)(a, b, c, d, e); } If you look at their disassembly, you can see that it is different: Dump of assembler code for function func1: 0x00000001006f72f0 <func1+0>: push %rbp 0x00000001006f72f1 <func1+1>: mov %rsp,%rbp 0x00000001006f72f4 <func1+4>: mov $0x5,%r8d 0x00000001006f72fa <func1+10>: mov $0x4,%ecx 0x00000001006f72ff <func1+15>: mov $0x3,%edx 0x00000001006f7304 <func1+20>: mov $0x2,%esi 0x00000001006f7309 <func1+25>: mov $0x1,%edi 0x00000001006f730e <func1+30>: xor %eax,%eax 0x00000001006f7310 <func1+32>: leaveq 0x00000001006f7311 <func1+33>: jmpq 0x100791dc8 <dyld_stub_objc_msgSend> End of assembler dump. Dump of assembler code for function func2: 0x00000001006f72cc <func2+0>: push %rbp 0x00000001006f72cd <func2+1>: mov %rsp,%rbp 0x00000001006f72d0 <func2+4>: mov $0x5,%r8d 0x00000001006f72d6 <func2+10>: mov $0x4,%ecx 0x00000001006f72db <func2+15>: mov $0x3,%edx 0x00000001006f72e0 <func2+20>: mov $0x2,%esi 0x00000001006f72e5 <func2+25>: mov $0x1,%edi 0x00000001006f72ea <func2+30>: leaveq 0x00000001006f72eb <func2+31>: jmpq 0x100791dc8 <dyld_stub_objc_msgSend> End of assembler dump. The difference is the extra "xor %eax,%eax" call in func1. This is to clear the "al" register to specify that vector registers aren't being used. Without it, whatever value is left over in that register would be in effect, which can cause a crash. The fix is to change os.c to not cast the objc_msgSend function and just call it normally, as is done in func1.
Is anybody looking at this? Calling conventions for C vararg functions in general may be different from non-vararg functions. That has "always" been that way; IIRC at least since ANSI C 89. Calling a vararg function through a non-vararg prototype has always been wrong. It may or may not work; the standard calls this "undefined behavior". See [1], p. 82: "If a function that accepts a variable number of arguments is defined without a parameter type list that ends with the ellipsis notation, the behavior is undefined." I wonder if this might be the cause of various strange bugs that have been reported over the years with objc_msgSend and objc_msgSendSuper calls (typically causing Eclipse to hang), such as bug 485337, bug 476140, bug 483379, bug 511901, bug 418021, bug 486875, bug 443949, and possibly others. [1] https://www.pdf-archive.com/2014/10/02/ansi-iso-9899-1990-1/ansi-iso-9899-1990-1.pdf
Thomas, if you have a Mac, feel free to contribute a patch. There aren't so many SWT developers with Macs.
(In reply to Andrey Loskutov from comment #2) > Thomas, if you have a Mac, feel free to contribute a patch. There aren't so > many SWT developers with Macs. Hmph. OK, I have a working patch to MacGenerator and NativesGenerator that generate these files (the Cocoa OS.java and os.c). The correct solution is to leave the casts in because objc_msgSend_stret and objc_msgSendSuper_stret must be casted in any case. However, I changed the cast to a varargs prototype as in ((jintLong (*)(jintLong, jintLong, ...)) objc_msgSend)(a, b, c, d, e) That also produces the correct code including clearing %al, and does so also for the *_stret calls. Now, can someone tell me how the SWT Jenkins builds work? What do I have to push to Gerrit to have this work? Do I have to push the changed generator and the new generated OS.java and os.c? Or only the generator? And do I have to also push the new generated binary libraries? And what needs to pushed together or what comes after what?
Also, job eclipse.platform.swt-Gerrit seems to run on a CentOS node. How and where would these OS X-specific changes be built and be automatically tested at all? Which job builds the OS X jnilibs?
Hi Thomas, Thanks for working on this! (In reply to Thomas Wolf from comment #3) > > Now, can someone tell me how the SWT Jenkins builds work? What do I have to > push to Gerrit to have this work? Do I have to push the changed generator > and the new generated OS.java and os.c? Or only the generator? And do I have > to also push the new generated binary libraries? And what needs to pushed > together or what comes after what? Please push the changes to the Generator and the generated OS.java and os.c files to gerrit in the same gerrit change. You *don't* have to push the newly built binaries. They'll be built by the SWT build jobs (https://hudson.eclipse.org/releng/view/SWT%20Natives/) as part of the I-build. Specifically, https://hudson.eclipse.org/releng/view/SWT%20Natives/job/cocoa_x86_64/ builds the Cocoa native libraries. (In reply to Thomas Wolf from comment #4) > Also, job eclipse.platform.swt-Gerrit seems to run on a CentOS node. How and > where would these OS X-specific changes be built and be automatically tested > at all? Which job builds the OS X jnilibs? Unfortunately, the Mac (and Windows) changes are not built or automatically tested as part of the Gerrit jobs. They'll be tested as part of the I-build after the changes are merged. Please run the tests locally to make sure they pass.
New Gerrit change created: https://git.eclipse.org/r/136214
BTW it's not just about clearing %al, but also about setting %al correctly when values _are_ passed in vector registers. (All objc_msgSend* calls that have NSPoint or NSRect arguments; in the C code the *_JJD* variants.) For those %al is not set unless a varargs prototype is used. With a varags prototype, %al is set correctly now.
Gerrit change https://git.eclipse.org/r/136214 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=94bbab9593afe86192ce9ff1c7d23f9003c3ef5e
Thanks for the patch, Thomas! Marking as fixed.
Thanks Alexei for reporting this!
Verified in I20190218-1800.
I'm sorry, but this change has to be reverted. It's based on an incorrect premise that functions of objc_msgSend family are variadic. They are NOT. objc_msgSend perfectly forwards function arguments for arbitrary signatures. Its formal C declaration is not meant to be used directly. Instead, it should always be cast to whatever the target method signature is, exactly like the old code did before this change. See the official documentation, section "Cast Objective-C Messages in Proper Form": https://developer.apple.com/documentation/uikit/core_app/updating_your_app_from_32-bit_to_64-bit_architecture/managing_functions_and_function_pointers?language=objc Also, if you look at compiler generated msgSend calls, you'll see that they are not variadic (unless the target method itself is variadic).
(In reply to Nikita Nemkin from comment #12) You need to file a new bug as 4.11 is done and we don't want to change history.
Re: comment 12 I have observed that if objc_msgSend is called with a garbage value in eax (i.e. left over from some other code), it can indeed crash on x86_64, due to presumably using that value for its vararg meaning. The vararg calling convention does clear eax which solves the problem (whereas the casted version doesn't - see examples in the original issue). So not sure about the linked docs - what I see in practice is this does matter. Although this may not happen with OpenJDK and those specific JNI calls because it seems OpenJDK might actually clear eax before invoking JNI and the os.c JNI functions don't currently clobber eax. But eax being cleared when invoking JNI is not guaranteed to be the case - for example Avian jvm does not clear it.
(In reply to Alexei Svitkine from comment #14) > Re: comment 12 Can you demonstrate these crashes you're seing? What you're describing is a total breach of ABI rules and is very hard to believe. SWT worked just fine before this change. ObjC compiler doesn't initialize rax when calling non-variadic methods: https://godbolt.org/z/PRx1M5, I see no reason why SWT should do so.
"Can you demonstrate these crashes you're seing?" I can try to get a repro. It's in the context of a bigger project, so will try to isolate. "What you're describing is a total breach of ABI rules and is very hard to believe." Actually the ABI states that variadic functions should be setting eax according to the number of vector registers being passed. The discussion is around whether objc_msgSend() actually follows the ABI per its C signature. "SWT worked just fine before this change." I mentioned I was seeing the issues with Avian VM, not with other VMs (presumably because their JNI calling impls might already have been clearing eax?). I assume you've not actually tested SWT with Avian on x86_64, right? So this statement is not useful. "ObjC compiler doesn't initialize rax when calling non-variadic methods: https://godbolt.org/z/PRx1M5, I see no reason why SWT should do so." Thanks for the example, it does seem there's some inconsistency with the results I was seeing here. Let me see if I can get a small repro for what I was seeing. Perhaps its dependent on the signature of what's being called.
In the context of the project where I originally encountered the issue, I can longer reproduce the crashes I was seeing that were fixed when I modified my local copy of SWT's os.c to use variadic calling conventions. However, the current build is using a newer version of Avian VM than what was being used when I was seeing those issues. So maybe something was changed in that VM that is responsible for the discrepancies. So at this point, given that I can't repro the problems anymore and the evidence that Nikita provided that casting to a non-variadic signature is actually correct, it does seem like this should be reverted back. Sorry for the trouble. :(
(In reply to Alexei Svitkine from comment #16) Wasn't Avian abandoned a long time ago? To be honest, I didn't expect Eclipse to work on Avian at all.
(In reply to Nikita Nemkin from comment #18) > (In reply to Alexei Svitkine from comment #16) > > Wasn't Avian abandoned a long time ago? To be honest, I didn't expect > Eclipse to work on Avian at all. To clarify, this was for SWT being used as a library in a different project, not all of Eclipse. It works just fine. :) It's nice to ship a Java-based app with 2MB of runtime instead of 200MB. :)
Actually, looks like I spoke too soon. Had some user error in my testing and am actually seeing the crashes from before after reverting my manual os.c changes. Let's see if I could get a repro.
Okay, so looks like the offending call was +[NSString stringWithFormat:], which was being called via JNI objc_1msgSend__JJJJ on x86_64. So the issue is more subtle than the original report. The original code was correct for most calls - but variadic calls themselves (like stringWithFormat) were being handled incorrectly. So the right fix would be to revert the general case changes back to non-variadic calls, but introduce variadic calls for the cases that need them like stringWithFormat:.
@Nikita, https://git.eclipse.org/r/#/c/140108/ - only reverts the changes in this bug. But, according to comment#21, another fix is required for completion. Do you agree? Do you plan to provide the fix?
(In reply to Lakshmi Shanmugam from comment #22) > @Nikita, > https://git.eclipse.org/r/#/c/140108/ - only reverts the changes in this > bug. > But, according to comment#21, another fix is required for completion. Do you > agree? Do you plan to provide the fix? I plan to remove all NSString.stringWithFormat calls from SWT. There are only a few (in Display code). This should fix Alexei's crashes on Avian. I don't plan to add variadic support to JNIGen.
(In reply to Nikita Nemkin from comment #23) > (In reply to Lakshmi Shanmugam from comment #22) > > @Nikita, > > https://git.eclipse.org/r/#/c/140108/ - only reverts the changes in this > > bug. > > But, according to comment#21, another fix is required for completion. Do you > > agree? Do you plan to provide the fix? > > I plan to remove all NSString.stringWithFormat calls from SWT. There are > only a few (in Display code). This should fix Alexei's crashes on Avian. > > I don't plan to add variadic support to JNIGen. Can you please provide a patch for removing NSString.stringWithFormat? I think we should first fix this before reverting the patch. This would only fix the crash caused by NSString.stringWithFormat. I've not checked if there are other similar functions in use in SWT, but if there are any, then they would crash too after reverting the patch, right?
(In reply to Alexei Svitkine from comment #21) Alexei, I figured out why you only get crashes on Avian. The code using stringWithFormat (Display.createMainMenu) never runs on Oracle Java and derivatives. It only runs when both the JVM and the app bundle lack DefaultApp.nib file, which SWT expects to provide the default menu. (In reply to Lakshmi Shanmugam from comment #24) SWT declares a few variadic ObjC functions, but never actually uses them. I don't expect any new crashes to surface. After all, Cocoa port did fine for 12 years without the "fix" that's being reverted.
New Gerrit change created: https://git.eclipse.org/r/140652
(In reply to Nikita Nemkin from comment #25) > (In reply to Lakshmi Shanmugam from comment #24) > > SWT declares a few variadic ObjC functions, but never actually uses them. I > don't expect any new crashes to surface. After all, Cocoa port did fine for > 12 years without the "fix" that's being reverted. Thanks for the patch. I didn't say or imply that new crashes would surface on reverting the patch. The patch was created in the first place as a possible fix for the crashes/hang issues seen in Cocoa port in the past (see comment#1 for details). So, my question was to clarify whether calls to other variadic functions would crash after the patch was reverted. Since the patch removes the variadic functions in use, I think there is no problem (atleast for now).
(In reply to Nikita Nemkin from comment #23) > (In reply to Lakshmi Shanmugam from comment #22) > > @Nikita, > > https://git.eclipse.org/r/#/c/140108/ - only reverts the changes in this > > bug. > > But, according to comment#21, another fix is required for completion. Do you > > agree? Do you plan to provide the fix? > > I plan to remove all NSString.stringWithFormat calls from SWT. There are > only a few (in Display code). This should fix Alexei's crashes on Avian. > Opened Bug 546585 to track this. (In reply to Eclipse Genie from comment #26) > New Gerrit change created: https://git.eclipse.org/r/140652 Moved the above gerrit to Bug 546585.
(In reply to Alexei Svitkine from comment #21) > Okay, so looks like the offending call was +[NSString stringWithFormat:], > which was being called via JNI objc_1msgSend__JJJJ on x86_64. > > So the issue is more subtle than the original report. The original code was > correct for most calls - but variadic calls themselves (like > stringWithFormat) were being handled incorrectly. > > So the right fix would be to revert the general case changes back to > non-variadic calls, but introduce variadic calls for the cases that need > them like stringWithFormat:. Opened Bug 546586 to fix the JNI Generator.
Nikita's patch [1] will revert the initial changes for this bug. https://git.eclipse.org/r/140652 already removed the variadic function calls in SWT code, so reverting the initial changes should not cause any problems. I plan to push this ([1]) today as there were no other concerns/objections. [1] - https://git.eclipse.org/r/#/c/140108/
(In reply to Lakshmi Shanmugam from comment #30) > Nikita's patch [1] will revert the initial changes for this bug. > https://git.eclipse.org/r/140652 already removed the variadic function calls > in SWT code, so reverting the initial changes should not cause any problems. > > I plan to push this ([1]) today as there were no other concerns/objections. > > [1] - https://git.eclipse.org/r/#/c/140108/ Merged the revert patch - https://git.eclipse.org/r/#/c/140108/ to master. Revert is tracked by Bug 546144.