Community
Participate
Working Groups
This defect comes from our Hebrew tester. We embed the JDT java editor into another editor, but this defect appears to happen in the plain Java editor in 3.0.x. If it is fixed in 3.1 please let us know. Here is the text of his report: --- Enter following definition. Capital letters stand for BiDi chars java.lang.String INPUT java.lang.String OUTPUT INPUT = "HELLO"; OUTPUT = "WORLD"; if INPUT != OUTPUT then INPUT = "HELLO world"; Result: The actual display will be as follows: java.lang.String INPUT java.lang.String OUTPUT HELLO" = INPUT"; WORLD" = OUTPUT"; if OUTPUT =! INPUT then HELLO" = INPUT world"; If you type the same text in Eclipse Java Code Editor you get correct display. --- We fixed one of the bugs above and got this second report: --- I still see if OUTPUT =! INPUT instead of if INPUT != OUTPUT . In addition please implement some basic function accepting one String and one integer argument abc (String, Int), then call it using Latin and BiDi variable: abc (latin,5); abc (BIDI,4); --- We determined that this is the behaviour of the JDT editor in 3.0.2. If you need more information please let me know and we'll try to get it for you.
Please provide more info. The comment says: " If you type the same text in Eclipse Java Code Editor you get correct display." What should be fixed then?
See the second section, where we fixed the bug in our embedded editor. He then describes a remaining problem, which appears in both our embedded editor and in the JDT editor.
We currently only support BIDI for strings. The Java editor itself is LTR for all languages.
This is the "Hebrew tester" speaking who originaly opened this defect against WebSphere Integration Developer 6.0.0.0 in tordadt CMVC (46189,46231). My name is Tomer Mahlin. I am leading a development group working on BiDi enablement in WebSphere Adpaters, WebSphere Process Server and WebSphere Integration Developer (WID). As part of this project we are testing WID with BiDi data. You can contact me via email tomerm@il.ibm.com if any additional clarifications are required. To clarify on the 2 issues described above. 1. In the Eclipse editor and in WID following text TUPNI != TUPTUO appear in the wrong order: TUPTUO =! TUPNI 2. Create a sample function accepting 2 arguments. First one is of type String, second one is of type integer. Put into the code a call to this function using one time variable with Latin name and another time with BiDI name e.g. sample_function(latin_var,3); sample_function(BIDI_VAR,4); // capital letters stands for BiDi chars Result: The call using BiDi argument is incorrectly displayed: sample_function(4,BIDI_OBJ); In other words the order of arguments supplied to the function is switched.
This is the normal behavior of each text widget: Steps to reproduce: 0. select Hebrew as input language and use your real keyboard layout 1. open empty file in text editor 2. type "if (" 3. switch to Hebrew with Hebrew keyboard layout 4. type ללל 5. switch to Hebrew with your real keyboard layout 6. type " != " 7. type עעע ==> it gets inversed if (עעע != ללל) Why do you think this should be different in the Java editor?
This is not what happens in WID 6.0 based on Eclipse 3.0 Capital chars stand for BiDI chars: Type: if (INPUT != OUTPUT) You expect to see: if (TUPNI != TUPTUO) But what you see is: if (TUPTUO =! TUPNI) The problem is not with BiDI text. BiDi text is inversed. The problem is with relative order of logical expression. The expected order is the same as in Latin case. The order I get is different.
Should the logical expressions only be treated like this in the Java editor or is the Latin case order expected in all text widget in Hebrew?
Daniel, You ask a complex question. First of all let me mention that no matter what each single BiDi word should be correctly transformed before it is displayed. So for example let us look at the word HELLO (capital chars stand for biDi chars) . If you look at the Java String buffer holding the word you will see that the order in which it appears in the buffer is the same as the order in which it was typed. In other words, first cell of the buffer will hold letter H. However when you display the word the order is "reversed", so we see OLLEH. When you take as an example a sentence having both BiDi and Latin words, the order is not trivial. So, even if each BiDi word is "reversed" the relative order of Latin and BiDi words depends on the type of BiDi transformation applied on the buffer. In Hebrew sentence not only the order of letters in each word is Right to left but the order of words is right to left as well. However if you take a look at the English sentence with embedded BiDi words the direction might change depending on the context inside the sentence. Now, back to logical expressions. Logical expressions and mathematical formulas have general Left to Riht orientation exactly like in English. This is applicable for logical expressions and function calls used in programming language like Java. However again the order of letters inside each BiDi word is "reversed". In addition if we use constants with mixed text the rules for usual sentence are applicable inside the constant, however for general expression in which the constant is used the direction is still Left To Right just like in English case. We can continue this discussion via regular mail notes. Tomer.
I think you did not answer my question from comment 7 ;-) Let's rephrase it: you would expect the same behavior if typing the stuff into the normal text editor, is that correct? >We can continue this discussion via regular mail notes. We should not discuss bug reports in a private room unless of course you want to make some comment that can't be shared here.
Daniel, Yes, I do expect to see the same in the text editor. However keep in mind that as of now different text editors exhibit different behavior. For example, Notepad does not keep the correct order while WordPad does. Since as you know in the software world the final decision is not always based on the technical considerations, it is not always certain what should be expected. Bottom line, from the technical and customer perspective the answer is yes. I wanted to provide you more information on how similar cases are handled or suggested to be handled in WID. However this information is IBM confidential and thus I suggested to use regular mail. Please feel free to contact me at tomerm@il.ibm.com Tomer.
Moving to Platform SWT since text editors use the StyledText widget which should support this.
I believe StyledText has all the support the application needs to implement/fix this request. Two options: 1. The application should add bidi control characters to control the reordering of text. 2. Use the bidi segments listener of StyledText, see the example in BidiSegmentEvent. /** * This event is sent to BidiSegmentListeners when a line is to * be measured or rendered in a bidi locale. The segments field is * used to specify text ranges in the line that should be treated as * separate segments for bidi reordering. Each segment will be reordered * and rendered separately. * <p> * The elements in the segments field specify the start offset of * a segment relative to the start of the line. They must follow * the following rules: * <ul> * <li>first element must be 0 * <li>elements must be in ascending order and must not have duplicates * <li>elements must not exceed the line length * </ul> * In addition, the last element may be set to the end of the line * but this is not required. * * The segments field may be left null if the entire line should * be reordered as is. * </p> * A BidiSegmentListener may be used when adjacent segments of * right-to-left text should not be reordered relative to each other. * For example, within a Java editor, you may wish multiple * right-to-left string literals to be reordered differently than the * bidi algorithm specifies. * * Example: * <pre> * stored line = "R1R2R3" + "R4R5R6" * R1 to R6 are right-to-left characters. The quotation marks * are part of the line text. The line is 13 characters long. * * segments = null: * entire line will be reordered and thus the two R2L segments * swapped (as per the bidi algorithm). * visual line (rendered on screen) = "R6R5R4" + "R3R2R1" * * segments = [0, 5, 8] * "R1R2R3" will be reordered, followed by [blank]+[blank] and * "R4R5R6". * visual line = "R3R2R1" + "R6R5R4" * </pre> */ Tomer, can you use the above to fix the problem you are having ?
Felipe, I can not do that since WID (WebSphere Integration Developer) code is not available to me. My team is denied the access and so we can only test. I think this is suggestion James should try out. Tomer.
Tomer, We don't own the java editor code. If i understand this thread correctly, there is an observation that the JDT editor behaves differently than a regular eclipse text editor. Felipe proposes that existing APIs can be called to change this behaviour, so they should be called by the JDT editor, not by us. We bundle the JDT editor as is, so if it does not behave correctly in a BiDi environment, then this defect should be transferred back to them and they should address it.
>If i understand this thread correctly, there >is an observation that the JDT editor behaves differently than a regular eclipse >text editor. I interpreted Tomer's comment 8 differently, namely - he expects this to work in all editors / text widgets - currently it doesn't work in both the Java and the text editor. Tomer?
Created attachment 26892 [details] BiDI text sample in the Eclipse Java editor BiDi text is incorrectly reordered.
Created attachment 26893 [details] BiDI text sample in the WID text editor BiDi text is incorrectly ordered.
Can you please also try the standard/default text editor?
or is this the WID editor?
Daniel is correct. In the context of 2 samples (if condition and function call) both editors (JDT used in WID and Eclipse Java editor) behave the same way . The display is incorrect as shown on the screen captures I attached to this defect. The examples discussed in this defect are only samples. There are might be many more. I assume that indeed in other cases the behavior of JDT and Eclipse might be different. However in this specific case the behavior is the same.
>both editors (JDT used in WID and Eclipse Java editor) behave the same >way That's as expected since they embedded the Java editor. My question is: how does the normal/default text editor bahave. Can you please test that for us.
This was partially fixed in bug 92105. As I understand it, the fix there was to make string literals separate Bidi segments in the Java editor, and Tomer is requesting that identifiers should also be separate Bidi segments. In the Java editor in 3.1, source line OUTPUT = "SHALOM"; is displayed as expected: TUPTUO = "MOLAHS"; but source line OUTPUT = INPUT; is not segmented, and is displayed: TUPNI = TUPTUO;
Created attachment 26897 [details] BiDI text sample in the Text Editor
Normal/default text editor bahaves exatly the same way. Please see the new attachment.
According to comment 22 this problem belongs to Text. StyledText offers the API to do what is being request.
Personally I think having BIDI identifiers in java code is not correct.
>Personally I think having BIDI identifiers in java code is not correct. I would also think so, but I'm not an expert here. The swapping of method arguments looks wrong though. There was lots of discussions in this bug, so let me try to summarize: 1. main problem is still the swapping of arguments when calling a method 2. Text editor and Java editor currently both incorrectly swap the arguments Correct? Main question is now: do you want only the Java editor to get fixed or also the Text editor? We might be able to do something for the Java editor but won't touch the Text editor who should reflect what StyledText gives us out of the box.
This is in answer to comment #26 where Felipe Heidrich states: "Personally I think having BIDI identifiers in java code is not correct." First of all, the Java language allows identifiers to be written using most Unicode characters, including Arabic or Hebrew letters, so it is inelegant to restrict users from using valid syntactical features just because our editors are not up to the task. Besides, many modern tools ("visual" designers of various kinds) generate Java code based on business data entered by users. Such data will often include entity names in the user's native language, which may be Arabic or Hebrew. These entity names often appear in the generated code as variable or method names (or part of those), and we cannot deprecate such use without harming considerably the user-friendliness of the tools.
This is in answer to comment #27 and the question asked by Daniel Megert. My opinion is that any editor which is not designed to handle specifically Java code may use the general Bidi algorithm. However, editors specializing in Java code should display that code in a meaningful way even if the code contains Bidi letters in any syntactical construct where Java syntax allows it. As Tomer Mahlin indicated in comment #8, this means that the Bidi algorithm should be applied to each token separately, but tokens must be laid out on the line from left to right. What I call "token" means any of keyword, identifier, operator, punctuation, literal, comment (and whatever else I may have forgotten, but I hope that the intention is clear).
Can someone attach a test file here? That would be easier than the screen shots. Let's see whether we can make this right for 3.4.
Created attachment 73557 [details] Sample including 2 BiDi classes Hello Daniel, I am attaching a zip with exported sample Java project. It includes 2 BiDi classes with BiDi names in IBM.COM package (capital characters stand for BiDi characters of course). Java code including problematic samples for cases mentioned in this defects and some additional cases should include following comment at the top: /* Display was verified on July 11, 2007 by Tomer Mahlin (tomerm@il.ibm.com) * using Eclipse 3.3.0 Build id: I20070601-1539 * */ The samples provided in this file are by all means should not be considered as a comprehensive coverage of possible problematic cases. However, I think they are representative enough. All samples are UTF-8 encoded. Please let me know if you would like me to provide a snapshots of expected (as opposed to current incorrect) display. Tomer.
> Please let me know if you would like me to provide a snapshots of expected >(as opposed to current incorrect) display. That would be perfect.
Created attachment 73630 [details] This is how the BiDi code looks like in Eclipse 3.3 now I am actually attaching 3 images. 1. BiDi code display in Eclipse 3.3 (current situation) 2. The same BiDi code display in Notepad (the worst case) 3. How BiDi code should look like (the best case)
Created attachment 73631 [details] This is how the same BiDi code looks like in Notepad Please notice that I attach this image for illustrative purposes only. There is no expectation of Notepad or any ->general<- Eclipse text editor to display Java code with BiDi characters in correct layout. Correct display of Java code with BiDi characters is expected only from Java editor.
Created attachment 73632 [details] Finally, this is how BiDI code should look like This is the expected display. It was forcefully created using introduction of LRM invisible Unicode characters into the code text itself (this results in compilation errors since LRM should be introduced not into the text buffer but for display purposes only).
Another example of inadequate presentation of Hebrew characters inside Java code - declaration of Array-type variables. type - CLASSTYPE variable name - VARNAME expected result - private CLASSTYPE[] VARNAME; current result - private VARNAME []CLASSTYPE;
Added a fix to HEAD. Available in builds > N20080417-2000. Tomer, I would appreciate if you could download the next N- or I-build (http://download.eclipse.org/eclipse/downloads/) and try it out. Thanks! If the fix is good we also need to apply it to the JDISourceViewer.
Created attachment 97751 [details] THe same code displayed using I20080422-0800 build This is an important break through in handling Bidi text in Java code editor !!! Thank you so much for this patch !!! I still need to work on the comprehensive list of test cases (which I will publish here). At this time all patterns mentioned in this defect above are correctly displayed. The only problematic thing is display of constants and comments. Both are considered free text and thus no special order of tokens inside comments or constants should be enforced. It looks like the blank space between words in comments and constants is considered as separator thus the order of words is not conformant to UBA. For example: "HELLO WORLD" is expected to be displayed as "DLROW OLLEH" while what I see is "OLLEH DLROW". Similarly, /* HELLO WORLD */ is expected to be displayed as /* DLROW OLLEH */ while what I see is /* OLLEH DLROW */.
Thanks for the feedback Tomer. I suggest we open separate bugs with examples for additional issues not covered here.
Verified in I20080427-2000.
I am OK with opening new defects. However, please notice that before you introduced your fix the display in comments and constants was OK. Conseqeuntly I conclude that this regression (incorrect display of Bidi text in comments and constants) is a result of latest fix you provided. This is the reason why I mentioned the issue here. I think I will start with creating the comprehensive list of test cases before I open any additional defects. I simply want to methodically verify correct display of Bidi text inside Java code editor. Thank you VERY MUCH again for the provided fix !!!
Ah I see. So basically, I should not touch any comment (single line, Javadoc, block) and make string constant rendering like it used to be?
Yes. Exactly.
I filed bug 229226 to track this.
Hello James, As far as I am concerned this defect can be closed. If additional issues (i.e. the one reported in bug 229226) come up I will open separate defects. Thanks !
Sounds good, I don't even remember what this is about anymore. :-)