Bug 563123 - double-click strategy behaves differently since removal of icu (bug 562047)
Summary: double-click strategy behaves differently since removal of icu (bug 562047)
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 4.16   Edit
Hardware: PC Windows 10
: P3 normal (vote)
Target Milestone: 4.16 RC1   Edit
Assignee: Alexander Kurtakov CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 562047
Blocks:
  Show dependency tree
 
Reported: 2020-05-13 07:27 EDT by Sebastian Ratz CLA
Modified: 2021-01-14 04:32 EST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Ratz CLA 2020-05-13 07:27:29 EDT
Double-click strategy was affected by the removal of ICU in bug 562047:

Now that
com.ibm.icu.text.BreakIterator
is was replaced with
java.text.BreakIterator

The double click strategy behaves differently:

Consider for example:
  public class A {
    int foo;
    int foo_1;
  }

Open this class with the Generic Text Editor.

Double click on foo_1:

Before: foo_1 was selected.
After:  foo is selected.

This does *not* affect the JDT editors, only the generic editors.

But since the generic editors is used in different languages this affects especially languages where underscores are more common.
Comment 1 Thomas Wolf CLA 2020-05-14 02:39:25 EDT
Evidently the same effect as reported in bug 563121.
Comment 2 Eclipse Genie CLA 2020-05-21 10:00:24 EDT
New Gerrit change created: https://git.eclipse.org/r/163360
Comment 3 Sebastian Ratz CLA 2020-05-21 10:02:02 EDT
I Played around quite a bit with this in org.eclipse.jface.text.DefaultTextDoubleClickStrategy.

All my attempts to post-process what the BreakIterator yields have failed for corner cases.

However, I have come up with a neat solution by simply never giving the BreakIterator any '_' which it can interpret wrongly.

What do you think?
Comment 4 Thomas Wolf CLA 2020-05-22 03:01:19 EDT
java.text.BreakIterator appears to behave quite differently than the ICU one anyway.

Paste this into a text editor: 我喜欢吃苹果。

(I can't speak, read or write Chinese. This is from [1] and means according to Google translate "I like to eat apples".)

According to [1], this should be split into 我 喜欢 吃 苹果 .

The ICU BreakIterator does so, even with a Western locale. (As shown by double clicking.)

The JDK BreakIterator doesn't: a double click selects the whole sequence.

Inserting a '_': 我喜_欢吃苹果。yields

ICU: 我 喜 _ 欢 吃 苹果
JDK: 我喜 _ 欢吃苹果

In the Java editor, add in some class

  public static final String 我喜欢吃苹果 = "我喜欢吃苹果"; //$NON-NLS-1$

With 2020-06: double clicking anywhere inside the variable name or inside the string selects the whole sequence.

With 2020-03: double clicking in the variable name selects the whole sequence. Double clicking inside the string selects "words": 我 喜欢 吃 苹果 .

With ICU, double-clicking on 吃 never selects that but always either the two before or the two after.

(All tests on OS X, English locale.)

[1] https://stackoverflow.com/questions/42219292/how-does-breakiterator-work-in-android/42219474#42219474
Comment 5 Julian Honnen CLA 2020-05-22 03:20:17 EDT
'#' is similarly affected (also in JDT editors).

Example: A#123
Double clicking the digits used to select 123, now it also selects the hash tag.

Other tools (e.g. word, notepad++) also treat '#' as a separator, i.e. eclipse now behaves differently.
Comment 6 Alexander Kurtakov CLA 2020-05-22 03:37:50 EDT
I'll revert to icu BreakIterator for 4.16 and investigation for removal can be continued for next release by first having testcase for the strategy so we know if smth breaks.
Comment 7 Eclipse Genie CLA 2020-05-22 03:42:27 EDT
New Gerrit change created: https://git.eclipse.org/r/163410
Comment 8 Alexander Kurtakov CLA 2020-05-22 04:12:59 EDT
I've created https://bugs.eclipse.org/bugs/show_bug.cgi?id=563465 and its deps for the further work on the topic.