106638 – [misc] Support BiDi chs in logical expr in java editor

Bug 106638 - [misc] Support BiDi chs in logical expr in java editor

Summary: [misc] Support BiDi chs in logical expr in java editor

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Text (show other bugs)
Version:	3.0.2
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	3.4 M7
Assignee:	JDT-Text-Inbox
QA Contact:

URL:
Whiteboard:
Keywords:	nl

Depends on:
Blocks:

Reported:	2005-08-10 11:39 EDT by James Moody
Modified:	2008-05-05 08:52 EDT (History)
CC List:	7 users (show)

See Also:

Attachments
BiDI text sample in the Eclipse Java editor (150.65 KB, image/jpeg) 2005-09-07 07:30 EDT, Tomer Mahlin	no flags	Details
BiDI text sample in the WID text editor (126.31 KB, image/jpeg) 2005-09-07 07:31 EDT, Tomer Mahlin	no flags	Details
BiDI text sample in the Text Editor (119.68 KB, image/jpeg) 2005-09-07 08:37 EDT, Tomer Mahlin	no flags	Details
Sample including 2 BiDi classes (3.94 KB, application/octet-stream) 2007-07-11 11:48 EDT, Tomer Mahlin	no flags	Details
This is how the BiDi code looks like in Eclipse 3.3 now (207.26 KB, image/jpeg) 2007-07-12 03:47 EDT, Tomer Mahlin	no flags	Details
This is how the same BiDi code looks like in Notepad (90.80 KB, image/jpeg) 2007-07-12 03:49 EDT, Tomer Mahlin	no flags	Details
Finally, this is how BiDI code should look like (187.73 KB, image/jpeg) 2007-07-12 03:54 EDT, Tomer Mahlin	no flags	Details
THe same code displayed using I20080422-0800 build (96.18 KB, image/jpeg) 2008-04-28 03:06 EDT, Tomer Mahlin	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description James Moody

2005-08-10 11:39:47 EDT

This defect comes from our Hebrew tester. We embed the JDT java editor into
another editor, but this defect appears to happen in the plain Java editor in
3.0.x. If it is fixed in 3.1 please let us know.

Here is the text of his report:

---
Enter following definition. Capital letters stand for BiDi chars

	java.lang.String INPUT
	java.lang.String OUTPUT
	
	INPUT = "HELLO";
	OUTPUT = "WORLD";

	if INPUT != OUTPUT then
		INPUT = "HELLO world";

Result: The actual display will be as follows:

	java.lang.String INPUT
	java.lang.String OUTPUT
	
	HELLO" = INPUT";
	WORLD" = OUTPUT";

	if OUTPUT =! INPUT then
		HELLO" = INPUT world";

 If you type the same text in Eclipse Java Code Editor you 
get correct display.
---

We fixed one of the bugs above and got this second report:

---
I still see 	if OUTPUT =! INPUT instead of if INPUT != OUTPUT .
In addition please implement some basic function accepting one String and one
integer argument
abc (String, Int), then call it using Latin and BiDi variable:
abc (latin,5);
abc (BIDI,4);
---

We determined that this is the behaviour of the JDT editor in 3.0.2.

If you need more information please let me know and we'll try to get it for you.

Comment 1 Dani Megert

2005-08-10 13:55:47 EDT

Please provide more info. The comment says:

" If you type the same text in Eclipse Java Code Editor you 
get correct display."

What should be fixed then?

Comment 2 James Moody

2005-08-10 14:00:28 EDT

See the second section, where we fixed the bug in our embedded editor. He then
describes a remaining problem, which appears in both our embedded editor and in
the JDT editor.

Comment 3 Dani Megert

2005-08-11 05:56:11 EDT

We currently only support BIDI for strings. The Java editor itself is LTR for
all languages.

Comment 4 Tomer Mahlin

2005-08-25 08:40:52 EDT

 This is the "Hebrew tester" speaking who originaly opened this defect against 
WebSphere Integration Developer 6.0.0.0 in tordadt CMVC (46189,46231).
 My name is Tomer Mahlin. I am leading a development group working on BiDi 
enablement in WebSphere Adpaters, WebSphere Process Server and WebSphere 
Integration Developer (WID).
   As part of this project we are testing WID with BiDi data. You can contact 
me via email tomerm@il.ibm.com if any additional clarifications are required.

  To clarify on the 2 issues described above. 
1. In the Eclipse editor and in WID following text TUPNI != TUPTUO appear in 
the wrong order: TUPTUO =! TUPNI

2. Create a sample function accepting 2 arguments. First one is of type String, 
second one is of type integer. Put into the code a call to this function using 
one time variable with Latin name and another time with BiDI name
   e.g. sample_function(latin_var,3);
        sample_function(BIDI_VAR,4); // capital letters stands for BiDi chars

Result: The call using BiDi argument is incorrectly displayed:
        sample_function(4,BIDI_OBJ); 
 In other words the order of arguments supplied to the function is switched.

Comment 5 Dani Megert

2005-08-25 09:13:56 EDT

This is the normal behavior of each text widget:

Steps to reproduce:
0. select Hebrew as input language and use your real keyboard layout
1. open empty file in text editor
2. type "if ("
3. switch to Hebrew with Hebrew keyboard layout
4. type &#1500;&#1500;&#1500;
5. switch to Hebrew with your real keyboard layout
6. type " != "
7. type &#1506;&#1506;&#1506;
==> it gets inversed
if (&#1506;&#1506;&#1506; != &#1500;&#1500;&#1500;)

Why do you think this should be different in the Java editor?

Comment 6 Tomer Mahlin

2005-08-25 09:42:37 EDT

This is not what happens in WID 6.0 based on Eclipse 3.0
Capital chars stand for BiDI chars:
  Type: if (INPUT != OUTPUT)
  You expect to see: if (TUPNI != TUPTUO)
  But what you see is: if (TUPTUO =! TUPNI)
The problem is not with BiDI text. BiDi text is inversed. The problem is 
with relative order of logical expression. The expected order is the same as 
in Latin case. The order I get is different.

Comment 7 Dani Megert

2005-08-25 09:53:16 EDT

Should the logical expressions only be treated like this in the Java editor or
is the Latin case order expected in all text widget in Hebrew?

Comment 8 Tomer Mahlin

2005-08-25 11:35:15 EDT

Daniel,

   You ask a complex question. First of all let me mention that no matter what 
each single BiDi word should be correctly transformed before it is displayed. 
So for example let us look at the word HELLO (capital chars stand for biDi 
chars) . If you look at the Java String buffer holding the word you will see 
that the order in which it appears in the buffer is the same as the order in 
which it was typed. In other words, first cell of the buffer will hold letter 
H. However when you display the word the order is "reversed", so we see OLLEH. 
When you take as an example a sentence having both BiDi and Latin words, the 
order is not trivial. So, even if each BiDi word is "reversed" the relative 
order of Latin and BiDi words depends on the type of BiDi transformation 
applied on the buffer. In Hebrew sentence not only the order of letters in each 
word is Right to left but the order of words is right to left as well. However 
if you take a look at the English sentence with embedded BiDi words the 
direction might change depending on the context inside the sentence.
  
  Now, back to logical expressions. Logical expressions and mathematical 
formulas have general Left to Riht orientation exactly like in English. This is 
applicable for logical expressions and function calls used in programming 
language like Java. However again the order of letters inside each BiDi word 
is "reversed". In addition if we use constants with mixed text the rules for 
usual sentence are applicable inside the constant, however for general 
expression in which the constant is used the direction is still Left To Right 
just like in English case.

   We can continue this discussion via regular mail notes.

Tomer.

Comment 9 Dani Megert

2005-08-25 11:54:24 EDT

I think you did not answer my question from comment 7 ;-) Let's rephrase it: you
would expect the same behavior if typing the stuff into the normal text editor,
is that correct?

>We can continue this discussion via regular mail notes.
We should not discuss bug reports in a private room unless of course you want to
make some comment that can't be shared here.

Comment 10 Tomer Mahlin

2005-08-28 04:05:54 EDT

Daniel,

  Yes, I do expect to see the same in the text editor. However keep in mind 
that as of now different text editors exhibit different behavior. For example, 
Notepad does not keep the correct order while WordPad does. Since as you know 
in the software world the final decision is not always based on the technical 
considerations, it is not always certain what should be expected.
  Bottom line, from the technical and customer perspective the answer is yes.
  I wanted to provide you more information on how similar cases are handled or 
suggested to be handled in WID. However this information is IBM confidential 
and thus I suggested to use regular mail.

  Please feel free to contact me at tomerm@il.ibm.com

Tomer.

Comment 11 Dani Megert

2005-08-29 04:28:37 EDT

Moving to Platform SWT since text editors use the StyledText widget which should
support this.

Comment 12 Felipe Heidrich

2005-09-06 11:21:16 EDT

I believe StyledText has all the support the application needs to 
implement/fix this request.
Two options:
1. The application should add bidi control characters to control the 
reordering of text.
2. Use the bidi segments listener of StyledText, see the example in 
BidiSegmentEvent.

/**
 * This event is sent to BidiSegmentListeners when a line is to
 * be measured or rendered in a bidi locale.  The segments field is 
 * used to specify text ranges in the line that should be treated as 
 * separate segments for bidi reordering.  Each segment will be reordered 
 * and rendered separately.
 * <p>
 * The elements in the segments field specify the start offset of 
 * a segment relative to the start of the line. They must follow
 * the following rules:
 * <ul>
 * <li>first element must be 0
 * <li>elements must be in ascending order and must not have duplicates
 * <li>elements must not exceed the line length
 * </ul>
 * In addition, the last element may be set to the end of the line 
 * but this is not required.
 *
 * The segments field may be left null if the entire line should 
 * be reordered as is.
 * </p>
 * A BidiSegmentListener may be used when adjacent segments of 
 * right-to-left text should not be reordered relative to each other. 
 * For example, within a Java editor, you may wish multiple 
 * right-to-left string literals to be reordered differently than the
 * bidi algorithm specifies.  
 *
 * Example:
 * <pre>
 * 	stored line = "R1R2R3" + "R4R5R6"
 *		R1 to R6 are right-to-left characters. The quotation marks
 * 		are part of the line text. The line is 13 characters long.
 * 
 * 	segments = null: 
 * 		entire line will be reordered and thus the two R2L segments 
 * 		swapped (as per the bidi algorithm). 
 *		visual line (rendered on screen) = "R6R5R4" + "R3R2R1"
 * 
 * 	segments = [0, 5, 8]	
 * 		"R1R2R3" will be reordered, followed by [blank]+[blank] and 
 * 		"R4R5R6". 
 *		visual line = "R3R2R1" + "R6R5R4"
 * </pre>
 */


Tomer, can you use the above to fix the problem you are having ?

Comment 13 Tomer Mahlin

2005-09-06 11:48:13 EDT

Felipe,

 I can not do that since WID (WebSphere Integration Developer) code is not 
available to me. My team is denied the access and so we can only test. I think 
this is suggestion James should try out.

Tomer.

Comment 14 James Moody

2005-09-06 11:51:57 EDT

Tomer,

We don't own the java editor code. If i understand this thread correctly, there
is an observation that the JDT editor behaves differently than a regular eclipse
text editor. Felipe proposes that existing APIs can be called to change this
behaviour, so they should be called by the JDT editor, not by us. We bundle the
JDT editor as is, so if it does not behave correctly in a BiDi environment, then
this defect should be transferred back to them and they should address it.

Comment 15 Dani Megert

2005-09-06 12:03:21 EDT

>If i understand this thread correctly, there
>is an observation that the JDT editor behaves differently than a regular eclipse
>text editor.

I interpreted Tomer's comment 8 differently, namely
- he expects this to work in all editors / text widgets
- currently it doesn't work in both the Java and the text editor.

Tomer?

Comment 16 Tomer Mahlin

2005-09-07 07:30:46 EDT

Created attachment 26892 [details]
BiDI text sample in the Eclipse Java editor

BiDi text is incorrectly reordered.

Comment 17 Tomer Mahlin

2005-09-07 07:31:44 EDT

Created attachment 26893 [details]
BiDI text sample in the WID text editor

BiDi text is incorrectly ordered.

Comment 18 Dani Megert

2005-09-07 07:35:12 EDT

Can you please also try the standard/default text editor?

Comment 19 Dani Megert

2005-09-07 07:35:32 EDT

or is this the WID editor?

Comment 20 Tomer Mahlin

2005-09-07 07:43:22 EDT

 Daniel is correct. In the context of 2 samples (if condition and function 
call) both editors (JDT used in WID and Eclipse Java editor) behave the same 
way . The display is incorrect as shown on the screen captures I attached to 
this defect. 
 The examples discussed in this defect are only samples. There are might be 
many more. I assume that indeed in other cases the behavior of JDT and Eclipse 
might be different. However in this specific case the behavior is the same.

Comment 21 Dani Megert

2005-09-07 07:48:09 EDT

>both editors (JDT used in WID and Eclipse Java editor) behave the same 
>way 
That's as expected since they embedded the Java editor. My question is: how does
the normal/default text editor bahave. Can you please test that for us.

Comment 22 Simon Montagu

2005-09-07 08:03:20 EDT

This was partially fixed in bug 92105. As I understand it, the fix there was to
make string literals separate Bidi segments in the Java editor, and Tomer is
requesting that identifiers should also be separate Bidi segments.

In the Java editor in 3.1, source line
  OUTPUT = "SHALOM";
is displayed as expected:
  TUPTUO = "MOLAHS";

but source line
  OUTPUT = INPUT;
is not segmented, and is displayed:
  TUPNI = TUPTUO;

Comment 23 Tomer Mahlin

2005-09-07 08:37:14 EDT

Created attachment 26897 [details]
BiDI text sample in the Text Editor

Comment 24 Tomer Mahlin

2005-09-07 08:38:38 EDT

Normal/default text editor bahaves exatly the same way. Please see the new 
attachment.

Comment 25 Felipe Heidrich

2007-07-09 14:51:42 EDT

According to comment 22 this problem belongs to Text.
StyledText offers the API to do what is being request.

Comment 26 Felipe Heidrich

2007-07-09 15:23:50 EDT

Personally I think having BIDI identifiers in java code is not correct.

Comment 27 Dani Megert

2007-07-10 02:43:09 EDT

>Personally I think having BIDI identifiers in java code is not correct.
I would also think so, but I'm not an expert here. The swapping of method arguments looks wrong though.

There was lots of discussions in this bug, so let me try to summarize:
1. main problem is still the swapping of arguments when calling a method
2. Text editor and Java editor currently both incorrectly swap the arguments

Correct? Main question is now: do you want only the Java editor to get fixed or also the Text editor? We might be able to do something for the Java editor but won't touch the Text editor who should reflect what StyledText gives us out of the box.

Comment 28 Matitiahu Allouche

2007-07-10 03:40:11 EDT

This is in answer to comment #26 where Felipe Heidrich states: "Personally I think having BIDI identifiers in java code is not correct."

First of all, the Java language allows identifiers to be written using most Unicode characters, including Arabic or Hebrew letters, so it is inelegant to restrict users from using valid syntactical features just because our editors are not up to the task.
Besides, many modern tools ("visual" designers of various kinds) generate Java code based on business data entered by users.  Such data will often include entity names in the user's native language, which may be Arabic or Hebrew.  These entity names often appear in the generated code as variable or method names (or part of those), and we cannot deprecate such use without harming considerably the user-friendliness of the tools.

Comment 29 Matitiahu Allouche

2007-07-10 03:56:26 EDT

This is in answer to comment #27 and the question asked by Daniel Megert.

My opinion is that any editor which is not designed to handle specifically Java code may use the general Bidi algorithm.
However, editors specializing in Java code should display that code in a meaningful way even if the code contains Bidi letters in any syntactical construct where Java syntax allows it. As Tomer Mahlin indicated in comment #8, this means that the Bidi algorithm should be applied to each token separately, but tokens must be laid out on the line from left to right. What I call "token" means any of keyword, identifier, operator, punctuation, literal, comment (and whatever else I may have forgotten, but I hope that the intention is clear).

Comment 30 Dani Megert

2007-07-11 10:39:13 EDT

Can someone attach a test file here? That would be easier than the screen shots.

Let's see whether we can make this right for 3.4.

Comment 31 Tomer Mahlin

2007-07-11 11:48:34 EDT

Created attachment 73557 [details]
Sample including 2 BiDi classes

Hello Daniel,

  I am attaching a zip with exported sample Java project. It includes 2 BiDi classes with BiDi names in IBM.COM package (capital characters stand for BiDi characters of course). Java code including problematic samples for cases mentioned in this defects and some additional cases should include following comment at the top:
/*  Display was verified on July 11, 2007 by Tomer Mahlin (tomerm@il.ibm.com)
 * using Eclipse 3.3.0 Build id: I20070601-1539
 * */

  The samples provided in this file are by all means should not be considered as a comprehensive coverage of possible problematic cases. However, I think they are representative enough.

  All samples are UTF-8 encoded.

  Please let me know if you would like me to provide a snapshots of expected (as opposed to current incorrect) display.

Tomer.

Comment 32 Dani Megert

2007-07-11 11:50:08 EDT

> Please let me know if you would like me to provide a snapshots of expected
>(as opposed to current incorrect) display.
That would be perfect.

Comment 33 Tomer Mahlin

2007-07-12 03:47:25 EDT

Created attachment 73630 [details]
This is how the BiDi code looks like in Eclipse 3.3 now

I am actually attaching 3 images.
1. BiDi code display in Eclipse 3.3 (current situation)
2. The same BiDi code display in Notepad (the worst case)
3. How BiDi code should look like (the best case)

Comment 34 Tomer Mahlin

2007-07-12 03:49:29 EDT

Created attachment 73631 [details]
This is how the same BiDi code looks like in Notepad

 Please notice that I attach this image for illustrative purposes only. There is no expectation of Notepad or any ->general<- Eclipse text editor to display Java code with BiDi characters in correct layout.
  Correct display of Java code with BiDi characters is expected only from Java editor.

Comment 35 Tomer Mahlin

2007-07-12 03:54:05 EDT

Created attachment 73632 [details]
Finally, this is how BiDI code should look like

 This is the expected display. It was forcefully created using introduction of LRM invisible Unicode characters into the code text itself (this results in compilation errors since LRM should be introduced not into the text buffer but for display purposes only).

Comment 36 Ira Fishbein

2007-07-18 08:29:25 EDT

Another example of inadequate presentation of Hebrew characters inside Java code - declaration of Array-type variables.
type - CLASSTYPE
variable name - VARNAME

expected result - private CLASSTYPE[] VARNAME;
current result - private VARNAME []CLASSTYPE;

Comment 37 Dani Megert

2008-04-18 06:37:14 EDT

Added a fix to HEAD.
Available in builds > N20080417-2000.

Tomer,
I would appreciate if you could download the next N- or I-build (http://download.eclipse.org/eclipse/downloads/) and try it out. Thanks!

If the fix is good we also need to apply it to the JDISourceViewer.

Comment 38 Tomer Mahlin

2008-04-28 03:06:24 EDT

Created attachment 97751 [details]
THe same code displayed using I20080422-0800 build

 This is an important break through in handling Bidi text in Java code editor !!! Thank you so much for this patch !!!
 I still need to work on the comprehensive list of test cases (which I will publish here). At this time all patterns mentioned in this defect above are correctly displayed.
 The only problematic thing is display of constants and comments. Both are considered free text and thus no special order of tokens inside comments or constants should be enforced. It looks like the blank space between words in comments and constants is considered as separator thus the order of words is not conformant to UBA. For example:
  "HELLO WORLD" is expected to be displayed as "DLROW OLLEH" while what I see is "OLLEH DLROW".
  Similarly, /* HELLO WORLD */ is expected to be displayed as /* DLROW OLLEH */ while what I see is /* OLLEH DLROW */.

Comment 39 Dani Megert

2008-04-28 03:17:28 EDT

Thanks for the feedback Tomer. I suggest we open separate bugs with examples for additional issues not covered here.

Comment 40 Dani Megert

2008-04-28 05:01:20 EDT

Verified in I20080427-2000.

Comment 41 Tomer Mahlin

2008-04-28 06:38:00 EDT

 I am OK with opening new defects. However, please notice that before you introduced your fix the display in comments and constants was OK. Conseqeuntly I conclude that this regression (incorrect display of Bidi text in comments and constants) is a result of latest fix you provided. This is the reason why I mentioned the issue here. 
 I think I will start with creating the comprehensive list of test cases before I open any additional defects. I simply want to methodically verify correct display of Bidi text inside Java code editor.

  Thank you VERY MUCH again for the provided fix !!!

Comment 42 Dani Megert

2008-04-28 06:46:45 EDT

Ah I see. So basically, I should not touch any comment (single line, Javadoc, block) and make string constant rendering like it used to be?

Comment 43 Tomer Mahlin

2008-04-28 07:33:09 EDT

Yes. Exactly.

Comment 44 Dani Megert

2008-04-29 02:24:56 EDT

I filed bug 229226 to track this.

Comment 45 Tomer Mahlin

2008-05-05 02:11:58 EDT

 Hello James,

  As far as I am concerned this defect can be closed. If additional issues (i.e. the one reported in bug 229226) come up I will open separate defects.

  Thanks !

Comment 46 James Moody

2008-05-05 08:52:43 EDT

Sounds good, I don't even remember what this is about anymore. :-)