Bug 306751 - [BiDi] NLS class could add BiDi marks when loading the properties file
Summary: [BiDi] NLS class could add BiDi marks when loading the properties file
Status: NEW
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Compendium (show other bugs)
Version: 3.3   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: equinox.compendium-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 307307
Blocks: 307161 310834
  Show dependency tree
 
Reported: 2010-03-22 14:24 EDT by Ira Fishbein CLA
Modified: 2019-09-06 15:37 EDT (History)
11 users (show)

See Also:


Attachments
Search view (17.13 KB, image/jpeg)
2010-03-22 14:32 EDT, Ira Fishbein CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ira Fishbein CLA 2010-03-22 14:24:02 EDT
Build Identifier: 3.6.0 - I20100313-1044

BIDI3.6_BDL: ComplexExpression-Bidi data displayed unreadable in Search view
1. Open Eclipse, create Java project and java class inside the project
2. Create Hebrew-named method which recieves two parameters of Hebrew-named type
3. Perform Search and make sure the method declaration line will be found
4. Examine the results
Result : Hebrew text became mixed with English (see screenshot)

Reproducible: Always
Comment 1 Ira Fishbein CLA 2010-03-22 14:32:11 EDT
Created attachment 162706 [details]
Search view
Comment 2 Dani Megert CLA 2010-03-23 12:38:45 EDT
I think if we test this with "-nl he" then we have to do this with the corresponding language pack, otherwise the outcome is really random as can be seen in the screenshot and I don't think we can do anything on our side here when those things are mixed in the label: the string which shows the matches cannot be hard-coded to LTR because it will fail in when the string is indeed NLSed in the language pack.

Tomer, do you agree?
Comment 3 Tomer Mahlin CLA 2010-03-24 04:27:50 EDT
First of all to make sure we understand the problem:

The English example would look like:
   myfunc (firstarg_type, secondarg_type) (2 matches)
The Hebrew example should look like:
   CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES) (2 matches)

At this stage let us analyze the problem. As far as I understand the string is constructed from 2 pieces:
piece #1 = MYFUNC (FIRSTARG_TYPE, SECARG_TYPE)
piece #2 = ({0} matches)
   {0} signifies a placeholder which at run time is replaced by number of matches (in the example above it is 2).

Piece #1 is end user originated text and can include arbitrary characters from any language. What is important is that it's syntax is well know - it is Java language syntax.

Piece #2 includes text which is translated based on the current locale of Eclipse. In the example discussed in this bug it is English (but it can be Arabic or Hebrew as well). 

Direction of complex expression (relative order or piece #1 and piece #2) should follow GUI direction. In other words:
* for not mirrored GUI we should see:
   CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES) (2 matches)
* for mirrored GUI we should see:
   (2 matches) CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES)
   Since when GUI is mirrored it should also be translated to Arabic / Hebrew we will actually expect to see: 
   (SEHCTAM 2) CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES)
notice that the word 'matches' is translated in this case.

 We have 2 problems:
a. resolve problem of complex expression in piece #1. Meaning enforce Java code syntax in this portion of text
b. Make sure the relative order of piece #1 and piece #2 follows GUI direction.

 For resolution of problem (a) we can use TextProcessor with following list of separators: "(),". However, I believe it would be best to use next generation of TextProcessor solution: see bug 183164. I am copying Mati (author of this solution) who can help us with some code snippet.
 For resolution of problem (b) we need to figure out the GUI direction. I believe this is easy since it can be explicitly specified via -dir command line parameter and thus I presume it is stored in some generally available variable (probably getDisplay().getShell().getOrientation()). Based on GUI direction we should act as follows:
  GUI direction = LTR =>  result = <piece #1> + LRM + <piece #2>
  GUI direction = RTL =>  result = <piece #1> + RLM + <piece #2>

where 
   LRM = \u200E
   RLM = \u200F
Comment 4 Dani Megert CLA 2010-03-24 04:38:57 EDT
The string which says that the match count is put into parenthesis comes from the NL pack (i.e. the parenthesis are in there):
    {0} ({1} matches)
and hence I think we must not assume anything about that string in the code except putting LTR marks for the Java element label. The Hebrew language pack should make sure that the string is correctly rendered by either not using "(...)" or by adding BiDi marks at the appropriate places.
Comment 5 Tomer Mahlin CLA 2010-03-24 04:47:23 EDT
1. For problem (a) we need to resolve the problem of complex expressions for {0} from "{0} ({1} matches)" as was suggested earlier.

2. For problem (b) we can act as follows:
 For Arabic  / Hebrew language pack we should alter the message template to be:
    {0} <RLM> ({1} matches)
 For all LTR language (i.e. English, French, Russian ...) we should alter the message template to:
    {0} <LRM> ({1} matches)

Alternatively, if we don't wish to touch the message templates for all languages,  the <LRM> / <RLM> can be added programatically at the end of {0} as part of handling problem (a).
Comment 6 Dani Megert CLA 2010-03-24 04:52:25 EDT
> For all LTR language (i.e. English, French, Russian ...) we should alter the
>message template to:
Doing this for every properties file string which has special delimiters like '(' in it or at all code places doesn't make sense. If we really want to support that mixed mode then this would have to go to a central place in the platform e.g. when loading the strings from a properties file.
Comment 7 Tomer Mahlin CLA 2010-03-24 04:54:37 EDT
Important point to pay your attention to. Mixed Bidi text can appear in the
discussed context in following cases: 
1. Usage of English message template and Arabic / Hebrew end user originated
text (as part of {0})
2. Usage of Arabic / Hebrew message template and English end user originated
text (as part of {0})
3. Combination of cases 1 / 2

Thus both characters of the message template and characters used as part of
text replacing a placeholder in the message template may affect the display of
entire message. This means that in general case if we tend to resolve the
problem (b) by manipulating message template, we will have to alter all
versions of this message template in all language packs (not just in Hebrew /
Arabic ones). 
 Alternatively, if we want to resolve problem (b) in the code, we can do so by
appending LRM/RLM to the text replacing placeholder {0}.
Comment 8 Tomer Mahlin CLA 2010-03-24 05:11:17 EDT
It is interesting idea to address enforcing direction of a message template in one central place. This means addressing problem (b) in one central place for all messages. Please notice that addressing complex expressions for place holders (in this case {0}) should still be done on the case by case basis (since not all place holders are replaced with structured text and when they are, type of complex expression can be different for different cases).

 To provide a general solution we need 2 things:
1. Based on the language (which is associated with loaded NL pack) define natural base text direction. This will allow us to decide if we need to insert LRM or RLM into message template.
2. We need to understand if enforcement of direction in message template makes sense for all templates. It does make sense for the case discussed in this bug. But I am not sure there are no exceptions. 

  The general approach would be as follows:
  * parse out the message template and identify all place holders
  * replace each place holder as follows:
      if natural base text direction for message template language is LTR:
         {x} -> {x}<LRM>
      if natural base text direction for message template language is RTL:
         {x} -> {x}<RLM>
   For the message discussed in this defect we will have:
      LTR => {0}<LRM> ({1}<LRM> matches)
      RTL => {0}<RLM> ({1}<RLM> matches)

How can we figure out the natural base text direction for given language ? Here is the sample code using ICU4J which can help: 

import com.ibm.icu.text.UnicodeSetIterator;
import com.ibm.icu.text.UnicodeSet;
import com.ibm.icu.lang.UScript;
import com.ibm.icu.util.ULocale;
import com.ibm.icu.lang.UCharacter;
import java.lang.Character;

public class ICUTest {

    /**
     * @param args
     */
    public static void main(String[] args) {

        final ULocale en_locale = new ULocale("en");
        final ULocale[] locales = {
                /* test locale */
                en_locale,
                new ULocale("sr"),
                new ULocale("fr"),
                new ULocale("ta"),
                new ULocale("ru"),
                new ULocale("ja"),
                new ULocale("zu"),
                new ULocale("hi"),
                new ULocale("he"),
                new ULocale("iw"),
                new ULocale("ar"),
                new ULocale("fa")};

        for (int locIdx = 0; locIdx < locales.length; locIdx++) {
            String locName = locales[locIdx].getDisplayName(en_locale);
            System.out.println("locale " + locName + " starting...");
            String result = null;
            int script[] = UScript.getCode(locales[locIdx]);
            for (int scrIdx = 0; scrIdx < script.length; scrIdx++) {
                String scrName = UScript.getShortName(script[scrIdx]);
                if (result == null) {
                    System.out.println("   script " + scrName + ": checking...");
                    String pattern ="[[:" + scrName + ":]]";
                    UnicodeSet ucs = new UnicodeSet(pattern);
                    for (UnicodeSetIterator it = new UnicodeSetIterator(ucs); it.next(); ) {
                        if (it.codepoint != UnicodeSetIterator.IS_STRING) {
                            byte cdir = UCharacter.getDirectionality(it.codepoint);
                            if (cdir == Character.DIRECTIONALITY_LEFT_TO_RIGHT ) {
                                result = "LTR";
                                break;
                            }
                            if (cdir == Character.DIRECTIONALITY_RIGHT_TO_LEFT ||
                                cdir == Character.DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC) {
                                result = "RTL";
                                break;
                            }
                        }
                    }
                } else {
                    System.out.println("   script " + scrName + ": skipped");
                }
            }
            System.out.println("   locale " + locName + "  -  " + result);
        }
    }
}
Comment 9 Dani Megert CLA 2010-03-26 04:36:48 EDT
Given where the discussion is going I'm using this bug for the general solution and filed bug 307161 for the Java search result rendering.

Most likely this would happen when loading the properties file. In Eclipse this is normally done via org.eclipse.osgi.util.NLS.initializeMessages(String, Class).
Comment 10 Markus Keller CLA 2010-03-26 06:42:31 EDT
This additional processing should only be enabled at the user's request in the LTR case. E.g. an application started in an "en" locale or in a "de" locale with "-nl en" should not get the additional directional marks out of the box.

Maybe attach it to the -dir flag (i.e. only insert the marks if -dir is explicitly used)?
Maybe also enable it if the explicit or the system default locale is an RTL one?
Comment 11 Tomer Mahlin CLA 2010-03-28 10:35:24 EDT
I don't think that locale is the right criteria to condition the discussed
support on. I shared my view on the subject via bug 307307.
Comment 12 Ross Wagner CLA 2010-04-05 11:28:15 EDT
Could you please change [BiDi] in the title of this bug to [BiDi3.6], to keep with standard format that my query tool relies on?

Thanks
Comment 13 Oleg Besedin CLA 2010-04-05 17:07:41 EDT
(In reply to comment #12)
> Could you please change [BiDi] in the title of this bug to [BiDi3.6], to keep
> with standard format that my query tool relies on?

What does the "[BiDi3.6]" mean? From what I've seen we use the "[BiDi]" tag.
Comment 14 Dani Megert CLA 2010-04-06 04:35:23 EDT
>Could you please change [BiDi] in the title of this bug to [BiDi3.6], to keep
>with standard format that my query tool relies on?
Our (Eclipse) standard is to use [BiDi]. If your query tool needs specific marks then I suggest you either add it at the end of the summary or into the Whiteboard field.
Comment 15 Eclipse Webmaster CLA 2019-09-06 15:37:30 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.