306751 – [BiDi] NLS class could add BiDi marks when loading the properties file

Bug 306751 - [BiDi] NLS class could add BiDi marks when loading the properties file

Summary: [BiDi] NLS class could add BiDi marks when loading the properties file

Status:	NEW

Alias:	None

Product:	Equinox
Classification:	Eclipse Project
Component:	Compendium (show other bugs)
Version:	3.3
Hardware:	All All

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	equinox.compendium-inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:	307307
Blocks:	307161 310834
	Show dependency tree

Reported:	2010-03-22 14:24 EDT by Ira Fishbein
Modified:	2019-09-06 15:37 EDT (History)
CC List:	11 users (show)

See Also:

Attachments
Search view (17.13 KB, image/jpeg) 2010-03-22 14:32 EDT, Ira Fishbein	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ira Fishbein

2010-03-22 14:24:02 EDT

Build Identifier: 3.6.0 - I20100313-1044

BIDI3.6_BDL: ComplexExpression-Bidi data displayed unreadable in Search view
1. Open Eclipse, create Java project and java class inside the project
2. Create Hebrew-named method which recieves two parameters of Hebrew-named type
3. Perform Search and make sure the method declaration line will be found
4. Examine the results
Result : Hebrew text became mixed with English (see screenshot)

Reproducible: Always

Comment 1 Ira Fishbein

2010-03-22 14:32:11 EDT

Created attachment 162706 [details]
Search view

Comment 2 Dani Megert

2010-03-23 12:38:45 EDT

I think if we test this with "-nl he" then we have to do this with the corresponding language pack, otherwise the outcome is really random as can be seen in the screenshot and I don't think we can do anything on our side here when those things are mixed in the label: the string which shows the matches cannot be hard-coded to LTR because it will fail in when the string is indeed NLSed in the language pack.

Tomer, do you agree?

Comment 3 Tomer Mahlin

2010-03-24 04:27:50 EDT

First of all to make sure we understand the problem:

The English example would look like:
   myfunc (firstarg_type, secondarg_type) (2 matches)
The Hebrew example should look like:
   CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES) (2 matches)

At this stage let us analyze the problem. As far as I understand the string is constructed from 2 pieces:
piece #1 = MYFUNC (FIRSTARG_TYPE, SECARG_TYPE)
piece #2 = ({0} matches)
   {0} signifies a placeholder which at run time is replaced by number of matches (in the example above it is 2).

Piece #1 is end user originated text and can include arbitrary characters from any language. What is important is that it's syntax is well know - it is Java language syntax.

Piece #2 includes text which is translated based on the current locale of Eclipse. In the example discussed in this bug it is English (but it can be Arabic or Hebrew as well). 

Direction of complex expression (relative order or piece #1 and piece #2) should follow GUI direction. In other words:
* for not mirrored GUI we should see:
   CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES) (2 matches)
* for mirrored GUI we should see:
   (2 matches) CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES)
   Since when GUI is mirrored it should also be translated to Arabic / Hebrew we will actually expect to see: 
   (SEHCTAM 2) CNUFYM (EPYT_GRATSRIF, EPYT_GRADNOCES)
notice that the word 'matches' is translated in this case.

 We have 2 problems:
a. resolve problem of complex expression in piece #1. Meaning enforce Java code syntax in this portion of text
b. Make sure the relative order of piece #1 and piece #2 follows GUI direction.

 For resolution of problem (a) we can use TextProcessor with following list of separators: "(),". However, I believe it would be best to use next generation of TextProcessor solution: see bug 183164. I am copying Mati (author of this solution) who can help us with some code snippet.
 For resolution of problem (b) we need to figure out the GUI direction. I believe this is easy since it can be explicitly specified via -dir command line parameter and thus I presume it is stored in some generally available variable (probably getDisplay().getShell().getOrientation()). Based on GUI direction we should act as follows:
  GUI direction = LTR =>  result = <piece #1> + LRM + <piece #2>
  GUI direction = RTL =>  result = <piece #1> + RLM + <piece #2>

where 
   LRM = \u200E
   RLM = \u200F

Comment 4 Dani Megert

2010-03-24 04:38:57 EDT

The string which says that the match count is put into parenthesis comes from the NL pack (i.e. the parenthesis are in there):
    {0} ({1} matches)
and hence I think we must not assume anything about that string in the code except putting LTR marks for the Java element label. The Hebrew language pack should make sure that the string is correctly rendered by either not using "(...)" or by adding BiDi marks at the appropriate places.

Comment 5 Tomer Mahlin

2010-03-24 04:47:23 EDT

1. For problem (a) we need to resolve the problem of complex expressions for {0} from "{0} ({1} matches)" as was suggested earlier.

2. For problem (b) we can act as follows:
 For Arabic  / Hebrew language pack we should alter the message template to be:
    {0} <RLM> ({1} matches)
 For all LTR language (i.e. English, French, Russian ...) we should alter the message template to:
    {0} <LRM> ({1} matches)

Alternatively, if we don't wish to touch the message templates for all languages,  the <LRM> / <RLM> can be added programatically at the end of {0} as part of handling problem (a).

Comment 6 Dani Megert

2010-03-24 04:52:25 EDT

> For all LTR language (i.e. English, French, Russian ...) we should alter the
>message template to:
Doing this for every properties file string which has special delimiters like '(' in it or at all code places doesn't make sense. If we really want to support that mixed mode then this would have to go to a central place in the platform e.g. when loading the strings from a properties file.

Comment 7 Tomer Mahlin

2010-03-24 04:54:37 EDT

Important point to pay your attention to. Mixed Bidi text can appear in the
discussed context in following cases: 
1. Usage of English message template and Arabic / Hebrew end user originated
text (as part of {0})
2. Usage of Arabic / Hebrew message template and English end user originated
text (as part of {0})
3. Combination of cases 1 / 2

Thus both characters of the message template and characters used as part of
text replacing a placeholder in the message template may affect the display of
entire message. This means that in general case if we tend to resolve the
problem (b) by manipulating message template, we will have to alter all
versions of this message template in all language packs (not just in Hebrew /
Arabic ones). 
 Alternatively, if we want to resolve problem (b) in the code, we can do so by
appending LRM/RLM to the text replacing placeholder {0}.

Comment 8 Tomer Mahlin

2010-03-24 05:11:17 EDT

It is interesting idea to address enforcing direction of a message template in one central place. This means addressing problem (b) in one central place for all messages. Please notice that addressing complex expressions for place holders (in this case {0}) should still be done on the case by case basis (since not all place holders are replaced with structured text and when they are, type of complex expression can be different for different cases).

 To provide a general solution we need 2 things:
1. Based on the language (which is associated with loaded NL pack) define natural base text direction. This will allow us to decide if we need to insert LRM or RLM into message template.
2. We need to understand if enforcement of direction in message template makes sense for all templates. It does make sense for the case discussed in this bug. But I am not sure there are no exceptions. 

  The general approach would be as follows:
  * parse out the message template and identify all place holders
  * replace each place holder as follows:
      if natural base text direction for message template language is LTR:
         {x} -> {x}<LRM>
      if natural base text direction for message template language is RTL:
         {x} -> {x}<RLM>
   For the message discussed in this defect we will have:
      LTR => {0}<LRM> ({1}<LRM> matches)
      RTL => {0}<RLM> ({1}<RLM> matches)

How can we figure out the natural base text direction for given language ? Here is the sample code using ICU4J which can help: 

import com.ibm.icu.text.UnicodeSetIterator;
import com.ibm.icu.text.UnicodeSet;
import com.ibm.icu.lang.UScript;
import com.ibm.icu.util.ULocale;
import com.ibm.icu.lang.UCharacter;
import java.lang.Character;

public class ICUTest {

    /**
     * @param args
     */
    public static void main(String[] args) {

        final ULocale en_locale = new ULocale("en");
        final ULocale[] locales = {
                /* test locale */
                en_locale,
                new ULocale("sr"),
                new ULocale("fr"),
                new ULocale("ta"),
                new ULocale("ru"),
                new ULocale("ja"),
                new ULocale("zu"),
                new ULocale("hi"),
                new ULocale("he"),
                new ULocale("iw"),
                new ULocale("ar"),
                new ULocale("fa")};

        for (int locIdx = 0; locIdx < locales.length; locIdx++) {
            String locName = locales[locIdx].getDisplayName(en_locale);
            System.out.println("locale " + locName + " starting...");
            String result = null;
            int script[] = UScript.getCode(locales[locIdx]);
            for (int scrIdx = 0; scrIdx < script.length; scrIdx++) {
                String scrName = UScript.getShortName(script[scrIdx]);
                if (result == null) {
                    System.out.println("   script " + scrName + ": checking...");
                    String pattern ="[[:" + scrName + ":]]";
                    UnicodeSet ucs = new UnicodeSet(pattern);
                    for (UnicodeSetIterator it = new UnicodeSetIterator(ucs); it.next(); ) {
                        if (it.codepoint != UnicodeSetIterator.IS_STRING) {
                            byte cdir = UCharacter.getDirectionality(it.codepoint);
                            if (cdir == Character.DIRECTIONALITY_LEFT_TO_RIGHT ) {
                                result = "LTR";
                                break;
                            }
                            if (cdir == Character.DIRECTIONALITY_RIGHT_TO_LEFT ||
                                cdir == Character.DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC) {
                                result = "RTL";
                                break;
                            }
                        }
                    }
                } else {
                    System.out.println("   script " + scrName + ": skipped");
                }
            }
            System.out.println("   locale " + locName + "  -  " + result);
        }
    }
}

Comment 9 Dani Megert

2010-03-26 04:36:48 EDT

Given where the discussion is going I'm using this bug for the general solution and filed bug 307161 for the Java search result rendering.

Most likely this would happen when loading the properties file. In Eclipse this is normally done via org.eclipse.osgi.util.NLS.initializeMessages(String, Class).

Comment 10 Markus Keller

2010-03-26 06:42:31 EDT

This additional processing should only be enabled at the user's request in the LTR case. E.g. an application started in an "en" locale or in a "de" locale with "-nl en" should not get the additional directional marks out of the box.

Maybe attach it to the -dir flag (i.e. only insert the marks if -dir is explicitly used)?
Maybe also enable it if the explicit or the system default locale is an RTL one?

Comment 11 Tomer Mahlin

2010-03-28 10:35:24 EDT

I don't think that locale is the right criteria to condition the discussed
support on. I shared my view on the subject via bug 307307.

Comment 12 Ross Wagner

2010-04-05 11:28:15 EDT

Could you please change [BiDi] in the title of this bug to [BiDi3.6], to keep with standard format that my query tool relies on?

Thanks

Comment 13 Oleg Besedin

2010-04-05 17:07:41 EDT

(In reply to comment #12)
> Could you please change [BiDi] in the title of this bug to [BiDi3.6], to keep
> with standard format that my query tool relies on?

What does the "[BiDi3.6]" mean? From what I've seen we use the "[BiDi]" tag.

Comment 14 Dani Megert

2010-04-06 04:35:23 EDT

>Could you please change [BiDi] in the title of this bug to [BiDi3.6], to keep
>with standard format that my query tool relies on?
Our (Eclipse) standard is to use [BiDi]. If your query tool needs specific marks then I suggest you either add it at the end of the summary or into the Whiteboard field.

Comment 15 Eclipse Webmaster

2019-09-06 15:37:30 EDT

This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.