Bug 183164 - [Implementation for] Display of Complex Expressions Containing Bidirectional Text
Summary: [Implementation for] Display of Complex Expressions Containing Bidirectional ...
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Components (show other bugs)
Version: 3.5   Edit
Hardware: PC All
: P3 major (vote)
Target Milestone: Juno M6   Edit
Assignee: equinox.components-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 179191 179321 307307
Blocks: 229010 303177 288554 359627
  Show dependency tree
 
Reported: 2007-04-19 07:37 EDT by Tomer Mahlin CLA
Modified: 2012-02-15 11:23 EST (History)
28 users (show)

See Also:


Attachments
General implementation of support for complex expressions (166.07 KB, application/octet-stream)
2009-10-07 03:39 EDT, Tomer Mahlin CLA
no flags Details
Code and javadoc for a new proposed plug-in to handle Bidi Complex Expressions (178.38 KB, application/octet-stream)
2010-02-01 14:20 EST, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details
Adding extensibility (185.93 KB, patch)
2010-02-03 15:05 EST, Oleg Besedin CLA
no flags Details | Diff
Refactoring out the environment (62.90 KB, patch)
2010-02-08 16:06 EST, Oleg Besedin CLA
no flags Details | Diff
Minor additions/changes to the API and corresponding code changes (6.44 KB, patch)
2010-02-23 19:07 EST, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
JUnits for testing the bidi.complexp plugin (69.85 KB, patch)
2010-02-23 19:11 EST, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Adjusting JUnits 1 (9.15 KB, patch)
2010-02-24 10:53 EST, Oleg Besedin CLA
no flags Details | Diff
Fix compiler warnings and some line formatting (118.08 KB, patch)
2010-03-02 17:01 EST, Matitiahu Allouche CLA
no flags Details | Diff
ComplExpUtilTest#testComplExpUtil() failure stack (3.43 KB, text/plain)
2010-03-03 16:37 EST, Oleg Besedin CLA
no flags Details
Minor bug fixes for ComplExpBasic and ComplExpUtil (119.68 KB, patch)
2010-03-16 19:22 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Adjust JUnits according to guidelines in comment #48 (93.99 KB, patch)
2010-03-16 19:28 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
changes to ComplExpBasic (5.76 KB, text/plain)
2010-04-10 17:39 EDT, Matitiahu Allouche CLA
no flags Details
Patch generating new version of complex expression package (238.99 KB, patch)
2011-01-04 13:42 EST, Matitiahu Allouche CLA
no flags Details | Diff
Updated JUnits corresponding to patch_bidi_20110104 (98.72 KB, patch)
2011-01-04 16:47 EST, Matitiahu Allouche CLA
no flags Details | Diff
New version of Complex Expression package + StringRecord class (260.35 KB, patch)
2011-01-11 08:45 EST, Matitiahu Allouche CLA
no flags Details | Diff
minor changes to JUnits needed for demo-ing new class StringRecord (98.73 KB, patch)
2011-01-11 08:53 EST, Matitiahu Allouche CLA
no flags Details | Diff
Complete replacement for bidi complex expression support (184.60 KB, application/octet-stream)
2011-02-06 12:50 EST, Matitiahu Allouche CLA
no flags Details
new version for bidi complex expression support (263.50 KB, patch)
2011-02-06 12:53 EST, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
new version for bidi complex expression tests (160.77 KB, text/plain)
2011-02-06 12:56 EST, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details
Slightly updated "new version for bidi complex expression" patches (484.58 KB, patch)
2011-02-09 16:28 EST, Oleg Besedin CLA
no flags Details | Diff
new version for bidi complex expression support (212.38 KB, patch)
2011-04-26 12:47 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
new version for bidi complex expression tests (84.02 KB, patch)
2011-04-26 12:51 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
New version of Complex Expression package (58.36 KB, patch)
2011-05-09 11:05 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
minor changes to test programs following implementation of comment #108 (6.73 KB, patch)
2011-05-09 11:10 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
new version of package for structured text support (374.85 KB, patch)
2011-05-25 12:41 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
tests for new version of package for structured text support (138.44 KB, patch)
2011-05-25 12:45 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
new version of package for structured text support (156.55 KB, patch)
2011-07-04 15:46 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
tests for new version of package for structured text support (55.47 KB, patch)
2011-07-04 15:56 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Fix for problem in STextStringRecord (17.17 KB, patch)
2011-07-05 08:50 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Patch - minor API changes and initial Javadoc cleanup (134.85 KB, patch)
2011-07-18 14:23 EDT, Oleg Besedin CLA
no flags Details | Diff
Minor comment fix in a test program (526 bytes, patch)
2011-07-20 10:41 EDT, Matitiahu Allouche CLA
no flags Details | Diff
A possible change for direction cache (50.82 KB, patch)
2011-07-26 14:37 EDT, Oleg Besedin CLA
no flags Details | Diff
Clean up after adding STextDirections (35.08 KB, patch)
2011-07-31 08:36 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Clean up test programs after adding STextCharTypes (6.05 KB, patch)
2011-07-31 08:38 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Add getDirection variant without STextCharTypes parameter for a few processor types (1.83 KB, patch)
2011-07-31 09:01 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Change STextEngine to STextHandler which can be instantiated (137.39 KB, patch)
2011-08-02 11:55 EDT, Matitiahu Allouche CLA
no flags Details | Diff
Update test programs after changes in attachment 200742 (28.35 KB, patch)
2011-08-02 11:57 EDT, Matitiahu Allouche CLA
no flags Details | Diff
Remove 2 fields in STextProcessorData, fix STextCharTypes (23.48 KB, patch)
2011-08-03 08:11 EDT, Matitiahu Allouche CLA
no flags Details | Diff
Update test programs after fixes to STextCharTypes (2.22 KB, patch)
2011-08-03 08:13 EDT, Matitiahu Allouche CLA
no flags Details | Diff
Add STextOffsets, some code fixes, rename dirProp to charType (49.74 KB, patch)
2011-08-04 07:51 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
update test programs after adding STextOffsets (3.34 KB, patch)
2011-08-04 07:53 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Cleanup, Rename and Merge Impl (187.29 KB, patch)
2011-08-13 20:08 EDT, Matitiahu Allouche CLA
no flags Details | Diff
Tests after renaming and merging (187.29 KB, patch)
2011-08-13 20:10 EDT, Matitiahu Allouche CLA
no flags Details | Diff
Cleanup, Rename, Merge Impl (128.51 KB, patch)
2011-08-13 23:58 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Tests after Cleanup, Rename, Merge Impl (58.85 KB, text/plain)
2011-08-14 00:11 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details
CHange state to Object (19.17 KB, patch)
2011-08-14 00:35 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Tests after changing state to Object (2.58 KB, patch)
2011-08-14 00:36 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Pass ISTextExpert instance as argument to STextTypeHandler methods, and other various changes (45.08 KB, patch)
2011-08-17 17:30 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Tests after passing ISTextExpert instance as argument to STextTypeHandler methods, and other various changes (9.63 KB, patch)
2011-08-17 17:33 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Move insertMarks() from STextProcessor to STextImpl, and other various changes (22.04 KB, patch)
2011-08-19 08:19 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Tests after moving insertMarks to STextImpl (34.33 KB, patch)
2011-08-19 08:21 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
Mainly javadoc update (97.45 KB, patch)
2011-08-25 15:39 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
tests after changing mainly javadoc (4.26 KB, patch)
2011-08-25 15:41 EDT, Matitiahu Allouche CLA
ob1.eclipse: iplog+
Details | Diff
change bidi support in bug 299031 to use STextProcessor instead of (old) TextProcessor (2.58 KB, patch)
2011-10-23 08:44 EDT, Matitiahu Allouche CLA
no flags Details | Diff
fix bug 234451 (2.32 KB, patch)
2011-10-23 08:50 EDT, Matitiahu Allouche CLA
no flags Details | Diff
Javadoc updates, no changes in executing code (37.95 KB, patch)
2011-11-08 08:08 EST, Matitiahu Allouche CLA
no flags Details | Diff
article describing what is structured text and how it is handled in Eclipse (47.77 KB, text/html)
2012-02-07 10:48 EST, Matitiahu Allouche CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tomer Mahlin CLA 2007-04-19 07:37:49 EDT
This defect is a placeholder for general discussion on implementation of design for Display of Complex Expressions Containing Bidirectional Text provided in bug 179191. It will also include links to separate defects opened on implementation of specific types of complex expressions.

 As of April 19th 2007, the general approach for implementation of design for Display of Complex Expressions Containing Bidirectional Text as articulated by Eclipse team members is as follows:

1. Static vs. Dynamic cases
------------------------------
 According to bug 179515 dynamic cases of complex expressions (see description of static / dynamic cases in design in bug 179191) are not planned to be addressed due to:
  * involved performance issues.
  * Eclipse should not battle the platform (so to speak) in those cases. In 
    case platform provides a better support so will do Eclipse.

2. Should directional characters be copied to clipboard ?
----------------------------------------------------------
  There is no clearly articulated position on this issue. The issue is tracked via bug 179526. According to design (bug 179191) the characters should not be copied to clipboard (in case the complex expression text can be copied). This is for consistency sake with respect to the target to which the text is going to be pasted. 

3. Conditions under which support for complex expressions is invoked
-----------------------------------------------------------------------
  Currently invocation of support for complex expression is planned only for environment in which default locale is set to BiDi locale (e.g. Arabic/Hebrew). This can occur for example if:
   * Eclipse is ran without any command line arguments and default locale on OS 
     level is set to BiDI locale. In this case Eclipse will be in non-mirrored 
     mode (aka LTR mode)
   * Eclipse is ran with -nl iw (or -nl he, or -nl ar etc.) option. In this 
     case Eclipse will be in mirrored mode (aka RTL) mode.
 Notice that running Eclipse with -dir rtl option does not automatically invoke support since it does not affect locale definition. This is also documented on this site: http://todcreaseyeclipse.blogspot.com/2007/03/bidirectional-support-faq.html. Using -nl parameter on the other hand affects both locale definition and mirroring mode in case parameter refers to BiDI - iw, ar, he etc. If this is the case, then Eclipse is ran in mirrored mode (aka RTL mode). 
 Notice (though it is not very relevant in this context) that since -dir parameter has higher priority over -nl parameter it also possible to invoke support for complex expressions when Eclipse is in non-mirrored mode (aka LTR mode) and regardless of OS level locale definition by using both -dir ltr and -nl iw (or he, ar etc.) parameters. 
 
Disclaimer: Notice that currently (as of April 19th, 2007) the above policy for invocation of support for complex expressions is applicable for File Path type of complex expressions (the only type of complex expressions addressed in current implementation). Having said that it seems that this is going to be a general policy for all types of complex expressions to be supported.
 In this context please notice that partial support for complex expressions in Java code editor currently does not depend on locale. In this sense the invocation mechanism is inconsistent across different types of complex expressions.

  Some additional relevant information:

  * The locale dependency issue is tracked via bug 179724. 
  * Adding he locale to the set of supported locales is tracked via bug 179321


4. Basis for adequate solution is correct parsing: Simple vs. Complex languages
--------------------------------------------------------------------------------
  Based on the design it is clear that the approach for addressing complex expressions issue is based on the complex expression type language and its syntax. To correctly address the problem the pattern should be correctly tokenized based on the appropriate language syntax.
 In that respect it worth mentioning that in most general terms all languages can be broadly divided in two groups:

   * Group 1: include languages which can be adequately parsed into tokens using oversimplified technique of parsing - tokenization based on list of separators. This technique provides adequate results for such simple languages as FilePath 

   * Group 2: includes more complex languages for which above technique does not provide adequate results. To this group belong such languages as Java code, XPath etc. The simplest example which can be used to prove that tokenization based on list of separators does not produce adequate results for complex languages is an example of constants. Constants in XPath might include separators which are part of the language syntax as well. However, we would like to see the constants as not well structured text but rather as simple unstructured text. Thus constants should be considered as context in which tokenization based on separators must not happen while in the rest of expression it must occur. See a good example in bug 140390, comment 5.

  Consequently the approach for implementing support for complex expressions might not be centralized as was pointed out in bug 140390, comment 9. Indeed 
for simple case (in which parsing based on list of separators based tokenization provides an adequate solution) the implementation of support might be isolated to some class providing parsing services. However, for more complex languages (e.g. Java code, XPath etc.) there are already tools supporting them with built-in tokenizers (which for example are used to support a colorization). Thus the implementation of support for complex expressions in those cases should be part of those tools. 

5. Layer(s)/components in which support for complex expressions should be implemented
---------------------------------------------------------------------------
  For group 1 languages the general strategy would be to make the support as general as possible without asking each component owner to patch the component In this respect following 2 items worth attention:
   * Martin Aeschlimann in bug 146220 suggests that platform adds label 
     provider for complex expressions at least for FilePath / URI type)
   * It is preferable to implement the support on JFace level rather than on 
     SWT according to bug 179191, comment 7
Comment 1 Tomer Mahlin CLA 2007-04-30 11:59:46 EDT
 Implementation of solution for File Path type of complex expression is discussed in bug 184722
Comment 2 Tomer Mahlin CLA 2007-05-02 11:25:30 EDT
 Another suggestion for general implementation layer/component was articulated in bug 178819, comment 37. In this comment it is suggested to combine the approach of invisible characters injection into string buffer with the label decorators that can be plugged in via extension points.
Comment 3 Ed Merks CLA 2007-05-02 11:49:23 EDT
Only the application producing the labels knows their internal structure and hence where control characters would be needed.  Anything that's just receiving a String will not have the necessary knowledge.
Comment 4 Tomer Mahlin CLA 2007-09-09 11:47:47 EDT
Implementation of solution for XPath type of complex expression is
discussed in bug 202752 
Comment 5 Tomer Mahlin CLA 2008-05-05 05:05:15 EDT
Verification of solution for Java code type of complex expression is
discussed in bug 230162 
Comment 6 Tomer Mahlin CLA 2009-10-07 03:39:28 EDT
Created attachment 148967 [details]
General implementation of support for complex expressions

Proposed solution
--------------------------

  Several attempts were made to enhance TextProcessor. However, it seems that
it would be much easier to provide a new package including a comprehensive and
easily extendable solution for addressing the problem of display for text with
internal complex structure including Bidi text. This package is not meant to
replace TextProcessor (since it is widely used in many products). The goal is
to provide long term extendable solution which comprehensively addresses the
problem.
   Bidi architect (Mati  Allouche) recently came up with such a solution. It is
actually an implementation of design (authored also by Mati) which was
published long time ago via bug 179191.  Similarly to TextProcessor the
provided solution is a stand alone and does not depend on anything except for
core Java. However, as opposed to TextProcessor the proposed solution: a.
easily extendable, b. addresses great variety of types of complex expressions,
c. takes into account the direction of complex expression syntax.

We would be very much interested in your opinion on possibility to integrate
the proposed solution in Eclipse 3.5 (the code will be of course contributed via
public bugzilla). 
  The code is going to be supported and tested: a. by the author b. in the
framework of annual Eclipse Bidi testing effort.


ComplexExpressions.zip provided by Mati (Bidi architect) includes following items:
   implementation (directory \src\com\ibm\bidiTools\complexExpressions)
   test programs (directory \src\Test*)
   documentation (directory \doc).

The TestAll.java program in \src executes the various specific test programs.
Comment 7 Tomer Mahlin CLA 2009-10-07 04:54:09 EDT
Just a clarification on the following sentence: "...This package is not meant to
replace TextProcessor (since it is widely used in many products)...". The long term goal is to deprecate TextProcessor. However, in the beginning both solution will coexist. The preference is of course to use a new more flexible and configurable solution.
Comment 8 Ed Merks CLA 2009-10-20 09:28:38 EDT
I guess this is really asking about integration with Eclipse 3.6 given 3.5 is out the door already...

I wonder if it makes sense that everyone will be writing what amounts to a parser/lexer for the concrete syntax of their language.  After all, Java already has such a thing and it's not trivial to write one that performs well.  It's not clear that it makes sense to apply this type of solution to the Java editor itself, and that's really the most important place to apply it.  I can imagine that it's impossible to apply this solution recursively to the contents of quoted Java literal strings or Java comments given that they are free form, so it's seems still very easy for things to get messed up at that level.  

Better line separator support will be needed given that Eclipse editors support all the various line feed conventions in an editor/file specific way.

        if (lineSep == null) {
            lineSep = System.getProperty("line.separator", "\n");
        }

I find it hard to imagine more than a handful of Eclipse projects with the resource to spare on the uphill battle of turning mixed LTR and RTL text into meaningful content. In fact, I wonder if IBM itself has such resource to spare given the sparse Eclipse investments these days.  E.g., it's hard to find an active Modeling committer.
Comment 9 Mike Wilson CLA 2009-10-20 10:11:23 EDT
I'm confused, why does EMF's inability to get enough investment have anything to do with the fact that Eclipse doesn't have the base support it needs in order to be a first-class bidi platform. We can argue about how long it will take for this support to be adopted till we're both blue in the face, but one thing I *do* know is that if we don't provide the base support, it won't get adopted.
Comment 10 Tomer Mahlin CLA 2009-10-20 10:52:56 EDT
In reply to comment #8.
 The proposed solution is not meant to be used in the context of various multiline editors specializing in editing specific language (i.e. Java code, XML content etc.). Indeed those editors have already build in parsers which should be leveraged. The solution for such editors (which in overwhelming majority of cases are based on StyledText widget) is to use BidiSegmentListener (integrated into StyledText widget). This approach was already implemented in Eclipse 3.4 for Java code editor via bug 106638 and bug 229226
Comment 11 Ed Merks CLA 2009-10-20 10:56:35 EDT
I said modeling, not EMF, and that's a whole top level project.

Also, as I've said many times, the distinction between being able to develop excellent BIDI applications, i.e., being a first class bidi platform, is very different from the tools themselves supporting mixed RTL and LTR scripts in their development languages. After all, you can develop a world class BIDI application without a single BIDI Java class name.  So I've argued many times that we're simply focused on the wrong things. Folks like Steve Francisco went and talked to real customers and asked if they would create Hebrew classes or file names and their reaction was, no, that would create a complete mess. After all, try to explore the file system with the windows file explorer and you indeed end up with a disaster. Use any other editor to view anything.  It's just not pleasant and I expect we'll all be blue in the face long before it ever is pleasant.  

Of course we're all free to spend our precious resource on the highest priority items as we see fit to prioritize them and we all like to think we're choosing wisely...
Comment 12 Adir Pekarevich CLA 2009-10-20 14:25:10 EDT
(In reply to comment #11)
> I said modeling, not EMF, and that's a whole top level project.
> Also, as I've said many times, the distinction between being able to develop
> excellent BIDI applications, i.e., being a first class bidi platform, is very
> different from the tools themselves supporting mixed RTL and LTR scripts in
> their development languages. After all, you can develop a world class BIDI
> application without a single BIDI Java class name.  So I've argued many times
> that we're simply focused on the wrong things. Folks like Steve Francisco went
> and talked to real customers and asked if they would create Hebrew classes or
> file names and their reaction was, no, that would create a complete mess. After
> all, try to explore the file system with the windows file explorer and you
> indeed end up with a disaster. Use any other editor to view anything.  It's
> just not pleasant and I expect we'll all be blue in the face long before it
> ever is pleasant.  
> Of course we're all free to spend our precious resource on the highest priority
> items as we see fit to prioritize them and we all like to think we're choosing
> wisely...

We also are working with customers, and I would confirm your glue on BIDI Java class name, if not modeling tools (not meaning EMF in this case). There are products like iLOG BRMS that allow end users (busenss managers) to define business rules in National language. iLOG than builds JAVA code based on that
(of course with BIDI JAVA class names). I thought this can be direction for many products , that more and more code will be generated based on end user input, written probably in a Hebrew or Arabic language. That for JAVA. About EMF, I am also little concerned. Is not is possible that EMF based or EMF like modeling tools also will be used by end users, and generate XML /JAVA code based on their input ? Or maybe iLog case is exception that does not requite our attention ?
Comment 13 Ed Merks CLA 2009-10-21 03:55:12 EDT
Yes, it's certainly possible that people can use EMF exactly how you've described: i.e., generate any number of artifacts from the names in their Ecore models.  The generated Java used to be a total disaster; perhaps it's quite a bit better now, but I imagine that even if the JDT made this less of a disaster, one would then generate Javadoc from it and the HMTL that's produced would have the same original disaster reflected in its rendering.  Will we get the browsers of the world to do complex expression analysis or get the Javadoc generator to insert the right control characters?  

Also, if you used any tool other than Eclipse to work with any of the artifacts, e.g., editors for serialized instance files or even just the OS tools for viewing the file system, it would also be nothing short of a complete mess.  The names would also be used in OCL expressions, which of course is yet another example of the complex expression issues.  The list of issues just grows.  Meanwhile, IBM doesn't even have committers remaining on the many of these affected technologies...

It all makes me wonder if venturing down this path is actually a good strategy given the dark forest into which it leads. Of course I understand fully the desire to bring light to that dark forest---it's commendable in fact---but to deliver on that desire requires time, effort, expertise, money, and other resources from a vast group of people.  Unfortunately I personally don't have customers who will pay for improved BIDI support.  In fact, while I was at IBM, I was painfully aware that any resource invested in EMF was effectively considered the same as money flushed down a drain and while EMF is definitely used in hundreds of IBM products, apparently not a single customer dollar was ever generated to help fund the EMF work.

Of course this bugzilla is about solving technical problems, not solving funding problems.  I just feel compelled to point out that without solving the funding problems, there will be no resource available to solve such technical problems.  Take note for example of the almost exclusively @ibm.com CC list and the complete lack of technical feedback even from IBM itself.
Comment 14 Lina Kemmel CLA 2009-10-21 05:13:53 EDT
(In reply to comment #13)
> ... one would then generate Javadoc from it and the HMTL that's produced
> would have the same original disaster reflected in its rendering.  Will we get
> the browsers of the world to do complex expression analysis or get the Javadoc
> generator to insert the right control characters?

Ed,
Suppose we offer rich text format in our product. The fact that it woun't be honored by plain text editors doesn't stop us from continuing supporting it.

As for the particular case with the browsers, yes, we use to attach DOM event listeners and do this analysis...
Comment 15 Ed Merks CLA 2009-10-21 06:17:25 EDT
My understanding is that a Javadoc generator produces HTML and once you have that HTML, understanding (reverse engineering) how to format the "arbitrary" text it contains according to its original structural meaning becomes a daunting and I would argue unsolvable task.  The right approach would be to change the Javadoc generator itself to produce text that requires no subsequent processing to be rendered in a desirable way.  It's only at this generation point that we know the actual meaning of what's being generated. This aspect of fixing the logic that composes the text rather than trying to post-process "arbitrary" text to recover the structural knowledge is something I've tried to point out repeatedly, but it seems to get lost again each time.  Consider for example how a "," would need to be processed differently depending on whether it's used in the middle of a Hebrew textual description verses when it's used as a parameter separator as part of Java's syntax.  Within an arbitrary Java comment, Java String literal, or HMTL content, the right choice for how to ensure that this portion of text is meaningfully rendered will never been clear.

I'm certainly not suggesting you stop trying to support whatever you think is important for the customers of your products.  If some capability helps generate more revenue from more customers, by all means give them exactly what they want.  Just keep in mind that product revenue doesn't always work its way back to fund the basic frameworks that underpin those products.
Comment 16 Felipe Heidrich CLA 2009-10-21 09:29:06 EDT
Ed, the way I see it, the bidi improvement work for 3.6 (which includes this and several other bugs) is a contribution from IBM teams in Israel and Egypt to Eclipse.
*They* are doing (nearly) all the work, which is important to them and their customers.

IMO, this is good for the entire Eclipse community, who will get a even better bidi support in the next release.

Shall we go back using bugzilla for technical discussions now ?
Comment 17 Oleg Besedin CLA 2009-12-01 09:53:42 EST
As far as packaging considerations go, I'd suggest creating a new equinox bundle "org.eclipse.equinox.text" and its companion "org.eclipse.equinox.text.tests".

Such bundle could be included in the Equinox builds / downloads as it becomes stable and later, when Eclipse SDK starts consuming it, included in the SDK.

===
(Why?
(a) it could be useful for runtime components and console-only apps => hence, it has to be positioned at Equinox level or below;
(b) the code most likely will need to be extendable to add new templates => it will need to use extension registry;
(c) the code shapes up to be a sizable contribution (the current attachment is 160K zipped archive) => it should be a separate bundle (plus: if it turns out to be active, it is a bit easier to transform it into a subproject).

From those considerations the best fit would be a new Equinox bundle that would depend, possibly, on OSGi, equinox.common, extension registry, and nothing else. 
===
Comment 18 John Arthorne CLA 2009-12-01 11:21:12 EST
(In reply to comment #17)
> As far as packaging considerations go, I'd suggest creating a new equinox
> bundle "org.eclipse.equinox.text" and its companion
> "org.eclipse.equinox.text.tests".

We discussed this in the Equinox call today and agreed a separate bundle makes sense given the size of the implementation. Just a minor detail, but we weren't sold on the bundle name, since as far as I understand this is strictly about bidi support and not general text infrastructure (like org.eclipse.text). Perhaps something like org.eclipse.equinox.bidi would be more appropriate, unless we're intending to widen the scope of what it provides.
Comment 19 Tomer Mahlin CLA 2009-12-03 03:29:24 EST
In order to address cases described in bug 229010 - Messages with placeholder - I believe we need to add a function to com.ibm.bidiTools.complexExpressions.ComplExpUtil class. 

function String handleMsgWithPlaceholders (
      String msg, 
         int msg_dir, 
    String[] placeholders,
       int[] placeholders_dir,
       int[] placeholders_typeOfComplExp)

This function will act as follows:
  1. For each placeholder having meaningful type of complex expression the syntax will be enforced (using info from placeholders_typeOfComplExp).
  2. For each placeholder not having meaningful type of complex expression the base text direction will be enforced (using info from placeholders_dir)
  3. Using standard Java tools the placeholders in message template will be replaced with appropriate placeholders' values
  3. For the entire text (referred by msg input parameter), base text direction (referred by msg_dir input parameter) will be enforced. 
  4. Return processed text.

Assumptions: 
   msg - the template of message. 
     For example:  ''{U0}'' - {U1} matches found inside  ''{U2}'' ({U3})
   msg_dir - base text direction of message. LTR for Latin languages, RTL for 
     Arabic, Hebrew.
   placeholders - array of strings which should replace the placeholders (i.e. 
     U0, U2, U3) in the message template.
   placeholders_dir - array of values which can be either 
     ComplExpProcessor.DIRECTION_LTR or
     ComplExpProcessor.DIRECTION_RTL 
   placeholders_typeOfCompExp - array of values which can be:
     ComplExpUtil.PATH
     ComplExpUtil.EMAIL 
     ComplExpUtil.REGEXP 
     ComplExpUtil.URL 
     etc.
Comment 20 John Arthorne CLA 2010-01-07 15:26:16 EST
Moving to Equinox as this is where the functionality will likely end up.
Comment 21 Thomas Watson CLA 2010-01-08 13:57:39 EST
Originally we planned on this for 3.6, marking the milestone so we don't loose track of it.  What are the next steps to make this happen for 3.6?
Comment 22 Tomer Mahlin CLA 2010-01-10 04:15:51 EST
If I am not mistaken, Mati (Bidi architect and the developer of suggested implementation) is following Oleg's and Paul's instructions and working on converting the Java project (current format of implementation) into Eclipse plugin format. The goal is to address following comment from Paul: 
" ... the implementation as an eclipse plugin.  Right now it is a java project, but to be consumed by eclipse it would need to be a plugin with minimal dependencies, with a well defined split between API and internals, and with an extension point to allow other bundles to add new processors..."
Comment 23 Matitiahu Allouche CLA 2010-02-01 14:20:56 EST
Created attachment 157821 [details]
Code and javadoc for a new proposed plug-in to handle Bidi Complex Expressions

-> This code was developed by myself, Matitiahu Allouche.

-> I work for IBM and my employer agrees to contribute this code to Eclipse.

-> The code is contributed under the Eclipse Public License.
Comment 24 Matitiahu Allouche CLA 2010-02-01 14:30:09 EST
Comment on attachment 157821 [details]
Code and javadoc for a new proposed plug-in to handle Bidi Complex Expressions

-> This code was developed by myself.

-> I work for IBM and my employer agrees to contribute this code to Eclipse.

-> The code is contributed under the Eclipse Public License.
Comment 25 Matitiahu Allouche CLA 2010-02-01 14:38:33 EST
Sorry for the double posting (#23 and #24).

The plug-in that I submitted is named "org.eclipse.equinox.bidi".
I have used this name because it seems to have been agreed upon in comment #18. However, I think that "org.eclipse.equinox.bidi.complexp" would be more appropriate, since it can be expected that additional Bidi-related contributions will be submitted, so that the corresponding names should be "org.eclipse.equinox.bidi.xxxx", and the present one should become "org.eclipse.equinox.bidi.complexp" as stated above.
Comment 26 Thomas Watson CLA 2010-02-01 15:00:53 EST
(In reply to comment #25)
> "org.eclipse.equinox.bidi.xxxx", and the present one should become
> "org.eclipse.equinox.bidi.complexp" as stated above.

I wonder if we would want separate bundles for each though or would just have a single bidi bundle.  I agree that the package names where the API lives should keep this in mind and not lump all the API into a single org.eclipse.equinox.bidi package if we think other API packages could come in the future.  This way if we really did want separate bundles in the future it will make it much easier to split the API apart.
Comment 27 Oleg Besedin CLA 2010-02-02 17:05:23 EST
Thanks Matitiahu!

I've released code into the Equinox CVS repository under:

org.eclipse.equinox/components/bundles/org.eclipse.equinox.bidi

I've made some simple changes:

- changed the package name to org.eclipse.equinox.bidi.complexp, as suggested by Tom
- added Equinox code formatting and re-formatted the code
- fixed manifest file and build.properties, added about.html, plugin.properties
- I did not put the generated documentation in the repository as it will be generated as a part of Eclipse build, once the bundle is included in the builds

ToDo:
- implementations of IComplExpProcessor should be internal and contributed via the extenion point?
- ComplExpUtil is a mix of APIs and internal parts. It should be changed to be two classes.
- IComplExpProcessor has 26 methods that implementors need to provide. Yet the provided implementations override only two methods only over the "basic" implementation: #indexOfSpecial() and #processSpecial(). May be, make the "basic" implementation a default and only ask implementors to provide those two methods?

For now I marked the package as "internal" and set version to 0.9 to make clear that APIs are going to change.
Comment 28 Oleg Besedin CLA 2010-02-03 15:05:48 EST
Created attachment 158092 [details]
Adding extensibility

I changed code so that implementations of IComplExpProcessor are internal and supplied via extension point. Added schema, changed IDs to be strings, created interface describing supplied expression types, added error logging and processing of the registry events.

ToDo:

- IComplExpProcessor has 26 methods that implementers need to provide. 
Yet the provided implementations override only two methods only over the "basic" implementation: #indexOfSpecial() and #processSpecial(). Should we change the interface to only have those two methods?

- API definitions: It feels like there should be two outside facades: one for consumers, one for people providing text processor implementations.

- Thread safety and multiplicity of text processors.
It does not seem that instances of the processors can be safely shared. This is a major issue as it would force us to create a new instance of a text processor on every call.

Patch applied to CVS Head.
Comment 29 Oleg Besedin CLA 2010-02-03 15:11:10 EST
I also created a bundle to contain JUnits tests, the bundle name is 

  "org.eclipse.equinox.bidi.tests"; 

it is co-located with the "org.eclipse.equinox.bidi".
Comment 30 Matitiahu Allouche CLA 2010-02-04 11:44:31 EST
(In reply to comment #28)
> - Thread safety and multiplicity of text processors.
> It does not seem that instances of the processors can be safely shared. This > is a major issue as it would force us to create a new instance of a text 
> processor on every call.

I agree that a single instance of a text processor cannot be used simultaneously by multiple threads.  However, I do not see why that means a new instance for every single call.
Let us take for instance a text processor for file paths. Typically, the only processing needed for processing a path occurrence (after creating the instance) is a call to leanToFullText(source_string). The same instance can be reused for the next call from the same location in the consumer, or from another location in the same thread.
Only an application needing to do more sophisticated processing will have to call several different methods for the same string and will need a separate instance for this processing. Even then, the same instance can be reused for successive strings.
Comment 31 Oleg Besedin CLA 2010-02-04 13:54:09 EST
(In reply to comment #30)
> (In reply to comment #28)
> > - Thread safety and multiplicity of text processors.
> I agree that a single instance of a text processor cannot be used
> simultaneously by multiple threads.  However, I do not see why that means a new
> instance for every single call.
> Let us take for instance a text processor for file paths. Typically, the only
> processing needed for processing a path occurrence (after creating the
> instance) is a call to leanToFullText(source_string). The same instance can be
> reused for the next call from the same location in the consumer, or from
> another location in the same thread.
> Only an application needing to do more sophisticated processing will have to
> call several different methods for the same string and will need a separate
> instance for this processing. Even then, the same instance can be reused for
> successive strings.

Let's say a plugin A has a view that shows a Java file; it works on the main thread. At the same time we have plugin B that does some lengthy processing on a background thread, say, indexing Java files. It uses "Java" processor to store the result. Sometime, while plugin B does indexing, a user opens a view from the plugin A which too needs a "Java" processor. 

There is no synchronization, so plugin A will happily re-enter the same #leanToFullText() method currently being used by the plugin B. Thenit will pick up class variables set by the half-processed string the plugin B is currently working on, or set the same variables to the initial state, breaking processing done for the plugin B.

The ways to avoid this:
(a) instead of using class fields to store the state, pass the state information in the arguments of method calls;
(b) create a separate instance of the class every time as there is no way to say if the client will be doing its processing on the same thread as all other clients;
(c) add Java synchronization to prevent another thread entering processing until the current thread is finished with the class.

The (b) obviously carries a memory and CPU overhead; the (c) might be an option if nothing else is possible, but half the time we go that route we get deadlocks; (a) would be my choice unless it turns out to be impossible.

===

(The funny thing is that because of how Java shared memory model works, the multi-threading access to the processors might not be a problem on *many* systems *most* of the time. In effect, each thread would create a local copy of modified variables and refer to it the processing. That would work well for us as we actually don't want the variables to be shared, rather for every thread to have its own copy.

The "gotcha" is that such modified variables will be merged back into the "main" memory at some unspecified time, which, in the absence of synchronization, is essentially left up to the JVM implementation.

However, that is a very different discussion and not something that we can (or should) rely on in a simple case like this.)
Comment 32 Matitiahu Allouche CLA 2010-02-05 05:24:00 EST
(In reply to comment #31)
> (In reply to comment #30)
> > (In reply to comment #28)
> > > - Thread safety and multiplicity of text processors.
> Let's say a plugin A has a view that shows a Java file; it works on the main
> thread. At the same time we have plugin B that does some lengthy processing on
> a background thread, say, indexing Java files. It uses "Java" processor to
> store the result. Sometime, while plugin B does indexing, a user opens a view
> from the plugin A which too needs a "Java" processor. 
> There is no synchronization, so plugin A will happily re-enter the same
> #leanToFullText() method currently being used by the plugin B. Thenit will pick
> up class variables set by the half-processed string the plugin B is currently
> working on, or set the same variables to the initial state, breaking processing
> done for the plugin B.

Sorry but I don't follow the reasoning above. If each thread uses a different instance of the "Java" processor, it gets its own set of class variables (except for static variables which are effectively used only for constants). The only problem I can see is if a thread creates an instance of the processor, then does half the processing for one string, then starts the processing for another string using the same instance, and then goes back to processing the first string. There is something fundamental that I must be missing.
Comment 33 Oleg Besedin CLA 2010-02-08 10:05:00 EST
(In reply to comment #32)
> Sorry but I don't follow the reasoning above. If each thread uses a different
> instance of the "Java" processor, it gets its own set of class variables...

Creating a new processor for each string will create a measurable overhead. It will be at least one call per every text entry on a screen = at least one processor created and disposed of for every text entry displayed on the screen.

On another hand, if we don't create a new processor for every call, then the processor has to be able to support more than one thread calling it at the same time.
Comment 34 Oleg Besedin CLA 2010-02-08 16:06:18 EST
Created attachment 158531 [details]
Refactoring out the environment

(In reply to comment #28)
> - API definitions: It feels like there should be two outside facades: one for
> consumers, one for people providing text processor implementations.

Upon spending a bit more time on this, there appears to be three types of actors:
- consumers who want to process/de-process BiDi text of a certain types
- providers of text processors for certain types of text
- environment that specifies GUI directionality and need for BiDi processing

As a first step in the attached patch I extract environment factors into the new classes and interfaces in the package "org.eclipse.equinox.bidi.internal.environment". The code is arranged so that it will be easy to switch to multiple environments should the actually need it. For now, the environment-related code is marked as internal.

I also removed a number of overrides for setting directionality, keeping two methods: the simplest one (#setDirection()) and the most specific one (#setDirection(arabic, arabicGUIMirrored, hebrew, hebrewGUIMirrored)).
Comment 35 Matitiahu Allouche CLA 2010-02-19 09:40:47 EST
Why are the type constants isolated in a separate interface (IBiDiProcessor)?
Could not they be in IComplExpProcessor?
I expect that any program using these constants will do that to instantiate a processor, thus will refer IComplExpProcessor anyway, thus the 2 interfaces could be unified.
Comment 36 Matitiahu Allouche CLA 2010-02-19 09:53:23 EST
In ExtensibilityTest.java, in method isTypePresent, constant "regex" should be parameter "type".
Comment 37 Oleg Besedin CLA 2010-02-19 14:47:28 EST
(In reply to comment #35)
> Why are the type constants isolated in a separate interface (IBiDiProcessor)?
> Could not they be in IComplExpProcessor?
> I expect that any program using these constants will do that to instantiate a
> processor, thus will refer IComplExpProcessor anyway, thus the 2 interfaces
> could be unified.

Sure, if it helps, we can do this. I thought that having those constants in a separate place will make it easier to extract APIs from IComplExpProcessor, but, if it does not help, feel free to merge it back into IComplExpProcessor.

(In reply to comment #36)
> In ExtensibilityTest.java, in method isTypePresent, constant "regex" should be
> parameter "type".

Yes, good catch, thank you. I fixed that in CVS Head.
Comment 38 Matitiahu Allouche CLA 2010-02-21 11:17:51 EST
(In reply to comment #34)
> Created an attachment (id=158531) [details]
> Refactoring out the environment
> As a first step in the attached patch I extract environment factors into the
> new classes and interfaces in the package
> "org.eclipse.equinox.bidi.internal.environment". The code is arranged so that
> it will be easy to switch to multiple environments should the actually need it.
> For now, the environment-related code is marked as internal.
> I also removed a number of overrides for setting directionality, keeping two
> methods: the simplest one (#setDirection()) and the most specific one
> (#setDirection(arabic, arabicGUIMirrored, hebrew, hebrewGUIMirrored)).

This patch is quite pervasive. I am in the process of converting my test programs to JUnits, and after applying the patch I see multiple failures in my programs.
I would prefer to ignore this patch for now and get to the point where all my tests run successfully. This would give us a base for reference. After that, we could go back to the patch and verify that after the appropriate source adaptations, the tests still run successfully.
Warning: to make the tests run successfully, I may have to make some changes (probably not a lot) in files within packages org.eclipse.equinox.bidi.complexp, bidi.internal.complexp and bidi.internal.complexp.consumable. This may entail some needed adaptations to the patch if and when we get to it.

What do you think?
Comment 39 Matitiahu Allouche CLA 2010-02-21 11:32:04 EST
(In reply to comment #27)
> - IComplExpProcessor has 26 methods that implementors need to provide. Yet the
> provided implementations override only two methods only over the "basic"
> implementation: #indexOfSpecial() and #processSpecial(). May be, make the
> "basic" implementation a default and only ask implementors to provide those two
> methods?

All the 26 public methods are meant for end users of the plug-ins. It is expected that all new processors will be extensions of ComplExpBasic and will *not* have to implement the 26 methods. The existing processors do not override any of the 26 methods and do what they need to do by only implementing #indexOfSpecial() and #processSpecial().

I would not mind making mandatory that all processors be extensions of ComplExpBasic, but I don't know how to do that except by giving up the IComplExpProcessor interface and making the factory return ComplExpBasic objects.

What do you think?
Comment 40 Matitiahu Allouche CLA 2010-02-21 11:56:03 EST
(In reply to comment #31)
> > > - Thread safety and multiplicity of text processors.

> The ways to avoid this (interaction of multiple threads on the same class variables):
> (a) instead of using class fields to store the state, pass the state
> information in the arguments of method calls;
> (b) create a separate instance of the class every time as there is no way to
> say if the client will be doing its processing on the same thread as all other
> clients;
> (c) add Java synchronization to prevent another thread entering processing
> until the current thread is finished with the class.
> The (b) obviously carries a memory and CPU overhead; the (c) might be an option
> if nothing else is possible, but half the time we go that route we get
> deadlocks; (a) would be my choice unless it turns out to be impossible.

I think that I understand the implications of each option. I too prefer (a), but I would not want the user to have to manage the "state information" (== what is currently stored in class variables).
I suggest the following:
1) There will be a new internal class named, for instance, ComplExpData, which will have the same fields as currently stored as class variables.
2) There will be a new class oriented to the users needs, with the same APIs as in IComplExpProcessor. Let us name it ComplExpProcessor.
The members of this class will be an instance of ComplExpData and a reference to a processor.
3) The implementing classes (what is currently internal) will implement APIs  similar to ComplExpProcessor with the addition for all methods of a reference to a ComplExpData instance.
4) The constructor for ComplExpProcessor will receive as argument a processor type. It will then initialize a ComplExpData instance and call StringProcessor.getProcessor() to retrieve the appropriate processor.
5) The user will only know of ComplExpProcessor. He will create as many ComplExpProcessor instances as needed. These instances will reuse the same processor as long as they need the same type of processor. Only ComplExpData instances will be created each time.

If this proposal is accepted, I want to postpone its implementation until I get all my tests running successfully with the current version.
Comment 41 Oleg Besedin CLA 2010-02-22 09:25:37 EST
(In reply to comment #38)
> This patch is quite pervasive. I am in the process of converting my test
> programs to JUnits, and after applying the patch I see multiple failures in my
> programs.

That seems to be the best next step. 

The tests will also show how the contents is expected to be consumed, in turn, helping to shape up APIs.
Comment 42 Oleg Besedin CLA 2010-02-22 09:39:20 EST
(In reply to comment #39)
> All the 26 public methods are meant for end users of the plug-ins. It is
> expected that all new processors will be extensions of ComplExpBasic and will
> *not* have to implement the 26 methods. The existing processors do not override
> any of the 26 methods and do what they need to do by only implementing
> #indexOfSpecial() and #processSpecial().
> 
> I would not mind making mandatory that all processors be extensions of
> ComplExpBasic, but I don't know how to do that except by giving up the
> IComplExpProcessor interface and making the factory return ComplExpBasic
> objects.

The possible approach would be to change from inheritance to aggregation: instead of having processor providers inherit from ComplExpBasic with 26 methods, make them implement the two "special" methods and make the internally created and managed instances of ComplExpBasic consult those pieces.

The difference if that ComplExpBasic exposes all the processing done to transform BiDi strings. If it is presented as API, it will be impossible to modify the BiDi processing in future, or even "automatically" switch processor providers to a new version. 

With such approach, if there is a need in future to add another method, we can do it by either creating IMyInterface2, or, if it is represented as an abstract class by simply adding a no-op implementation in the base class.
Comment 43 Thomas Watson CLA 2010-02-22 09:44:44 EST
I noticed that the Bundle-RequiredExecutionEnvironment header listed J2SE-1.4 and Foundation 1.1.  I updated to project org.eclipse.equinox.bidi to only list OSGi/Minimum-1.2.  OSGi/Minimum-1.2 is identical to Foundation 1.1 except OSGi/Minimum-1.2 does not include the javax.microedition packages.
Comment 44 John Arthorne CLA 2010-02-22 16:59:10 EST
Regarding the discussion between an abstract base class vs. interface: One approach is to actually have both. The interface would be marked @noimplement so people providing implementations would have to extend the base class. That also allows exposing details to the subclasses via the base class, while keeping the interface much simpler as the client-facing API. Keeping the client and provider APIs separate like this is quite powerful, as it keeps the client API clean and simple but exposes more complexity only to provider classes. You would then be able to add new methods to the provider API in the future (with default implementations in the base class) without breaking any API.
Comment 45 Matitiahu Allouche CLA 2010-02-23 19:07:19 EST
Created attachment 160015 [details]
Minor additions/changes to the API and corresponding code changes

This patch does the following:
1) in IComplExpProcessor:
   + add get/setOperators methods
   + change name of parameter in assumeOrientation() from "desiredOrientation" to "componentOrientation", because this method describes a reality rather than specifies a requirement.

2) In ComplExpBasic:
   + change the type of the "type" variable from int to String, to reflect a previous change.
   + remove method setType which is not used currently.
   + implement the changes from IComplExpProcessor.
   + fixe a bug in assumeOrientation().

3) In ComplExpDoNothing: implement changes from IComplExpProcessor.

I hope that this innocuous patch may soon be integrated in the CVS Head.
Comment 46 Matitiahu Allouche CLA 2010-02-23 19:11:43 EST
Created attachment 160017 [details]
JUnits for testing the bidi.complexp plugin

This bunch of JUnits provides an almost complete coverage of the org.eclipse.equinox.bidi.complexp plugin.
Comment 47 Oleg Besedin CLA 2010-02-24 10:04:29 EST
Thanks Matitiahu! I've applied patches "Minor additions/changes to the API and corresponding code changes" and "JUnits for testing the bidi.complexp plugin". 

Two minor things - not sure if you've seen those:
- The ComplExpUtilTest#testComplExpUtil() fails for me
- ComplExpBasic#indexOfSpecial(int whichSpecial, String leanText, int fromIndex)
	there is a compiler warning that "leanText" is hiding a field
Comment 48 Oleg Besedin CLA 2010-02-24 10:53:41 EST
Created attachment 160081 [details]
Adjusting JUnits 1

I've changed ComplExpMathTest to look a bit more like a typical JUnit. What I did:

 - removed all System.out.println() statements
	There are literally 10's of thousands test run during an Eclipse build. Usually tests ask not to print to the console as it becomes an impossible mess. In case tests pass that is the only information needed; if test fails, we see an exception stack that is enough to track the failure point. So, usually there is no need to print extra information in the console. (If it is desirable for the debug purposes, there is a tracing mechanism. See, for instance, the DataArea class in the org.eclipse.equinox.common bundle, search for "getDebugOptions()")

 - changed checks to simply use assertEquals() and removed special reporting of results
	Usually tests made to fail fast - if one test fails, rest of the tests in the class are skipped. I guess the logic is that tests go from simple to more complicated and if a simple test fails, there is no point to spend time running more complicated tests. Or, from another point, there is no difference to a product if one test fails or ten.
 
 - removed "static" from method signatures; renamed methods so that methods to be invoked by JUnit framework start with "test*", but utility methods do not. 

 - created base class ComplExpTestBase into which moved methods #toPseudo() and #toUT16() from the Tools class.

  - used #setUp() override to create processor to be tested
	Not really important, just makes code a bit shorter.

This patch "Adjusting JUnits 1" is applied to CVS Head.

=====

Mati, could you adjust the rest of the tests?

ComplExpUtilTest
ExtensionsTest
FullToLeanTest
MethodsTest

The Tools class likely won't be needed once all tests are adjusted.
Comment 49 Matitiahu Allouche CLA 2010-02-24 12:25:40 EST
(In reply to comment #48)
> Usually tests ask not to print to the console as it becomes an impossible mess.

Is it acceptable to print a few lines just in the failing cases?
This is because the exception stack is enough to track the failure point, but not to identify the specific error. This reduced printing, which normally will result in 0 lines written, is likely to save some debugging time.
Comment 50 Oleg Besedin CLA 2010-02-24 13:10:51 EST
(In reply to comment #49)
> (In reply to comment #48)
> > Usually tests ask not to print to the console as it becomes an impossible mess.
> 
> Is it acceptable to print a few lines just in the failing cases?
> This is because the exception stack is enough to track the failure point, but
> not to identify the specific error. This reduced printing, which normally will
> result in 0 lines written, is likely to save some debugging time.

The assert..() methods have a form that accept messages, like:

  assertEquals(String message, String expected, String actual)

would that work?
Comment 51 Matitiahu Allouche CLA 2010-03-02 17:01:40 EST
Created attachment 160698 [details]
Fix compiler warnings and some line formatting

Here is a patch which should not be controversial.  It fixes all compiler warnings (mostly because of not externalized literals).
Comment 52 Oleg Besedin CLA 2010-03-03 10:05:39 EST
(In reply to comment #51)
> Created an attachment (id=160698) [details]
> Fix compiler warnings and some line formatting
> 
> Here is a patch which should not be controversial. 

Hmm... Did you intend to replace tabs with spaces in the indentations for the following files:

ComplExpSql
ComplExpRegex
ComplExpJava
ComplExpSingle
ComplExpDelimsEsc
ComplExpDelims
ComplExpBasic

?

http://www.eclipse.org/equinox/documents/coding.php

Normally JDT settings on the project take care of fixing that, but they can be confused. If you open a file and press Shift+Ctrl+F it will reformat the code according to the project settings.
Comment 53 Matitiahu Allouche CLA 2010-03-03 11:31:18 EST
(In reply to comment #52)
> Hmm... Did you intend to replace tabs with spaces in the indentations for the
> following files:
> ComplExpSql
> ComplExpRegex
> ComplExpJava
> ComplExpSingle
> ComplExpDelimsEsc
> ComplExpDelims
> ComplExpBasic
> ?
> Normally JDT settings on the project take care of fixing that, but they can be
> confused. If you open a file and press Shift+Ctrl+F it will reformat the code
> according to the project settings.

I have done nothing special to replace tabs with spaces, and I have not changed the project JDT settings. However, I understand that the formatting is done by default when saving a file in the Java editor. Some of the files I have edited with an external editor, and done a Refresh in the Package Explorer view. I thought it would cause a reformatting, but maybe it does not.
Comment 54 Matitiahu Allouche CLA 2010-03-03 11:42:14 EST
(In reply to comment #47)
> - The ComplExpUtilTest#testComplExpUtil() fails for me

This test, or any other, does not fail for me. Can you please check if it still fail for you and send me the corresponding info?
BTW, this is a case where it would be good to have some console lines to describe the failing result.
Comment 55 Oleg Besedin CLA 2010-03-03 16:37:05 EST
Created attachment 160858 [details]
 ComplExpUtilTest#testComplExpUtil() failure stack

(In reply to comment #54)
> (In reply to comment #47)
> > - The ComplExpUtilTest#testComplExpUtil() fails for me
> 
> This test, or any other, does not fail for me. Can you please check if it still
> fail for you and send me the corresponding info?

Sure, attached.
Comment 56 Lina Kemmel CLA 2010-03-03 17:13:04 EST
I think that Mati is running on a Bidi platform, while Oleg is not. Complex expressions processor is sensitive to the locale, which causes these discrepancies. Probably the expected test results should accommodate the locale characteristics?
Comment 57 Matitiahu Allouche CLA 2010-03-16 19:15:53 EDT
(In reply to comment #48)
> Created an attachment (id=160081) [details]
> Adjusting JUnits 1
> I've changed ComplExpMathTest to look a bit more like a typical JUnit. What I
> did:
>  - removed all System.out.println() statements
>  - changed checks to simply use assertEquals()
>  - removed "static" from method signatures; renamed methods so that methods to
> be invoked by JUnit framework start with "test*", but utility methods do not. 
>  - created base class ComplExpTestBase
>   - used #setUp() override to create processor to be tested
> Mati, could you adjust the rest of the tests?
> The Tools class likely won't be needed once all tests are adjusted.

I have adjusted all the tests. I will attach 2 patches:
- one for the tests
- one for the main source: while working on the tests, I discovered I needed changes in ComplExpUtil.java to allow changing the locale more than once, and in ComplExpBasic.java to handle the case of empty source strings.
Comment 58 Matitiahu Allouche CLA 2010-03-16 19:22:44 EDT
Created attachment 162228 [details]
Minor bug fixes for ComplExpBasic and ComplExpUtil

1) Add a few lines to ComplExpBasic.java to handle more properly cases when empty strings are passed as arguments to leanToFullText() and fullToLeanText() methods.

2) Change to ComplExpUtil to allow setting the locale more than once during a run.
Comment 59 Matitiahu Allouche CLA 2010-03-16 19:28:22 EDT
Created attachment 162229 [details]
Adjust JUnits according to guidelines in comment #48

All JUnits have been changed according to Oleg's guidelines in comment #48.
Parts of Tools.java have been moved into CompExpTestBase.java, the rest is not used any more, so this file has been removed.
Comment 60 Oleg Besedin CLA 2010-03-17 14:12:15 EDT
Thanks Matitiahu! Both patches applied. The only thing I did was to format files (Shift+Ctrl+F) to use tabs instead of spaces.

Now all JUnits pass on my machine. All tests are green, just in time for St. Patrick's Day.

What's the next thing? Is it time to look into APIs?
 - ComplExpUtil
 - IComplExpProcessor
 - and others

I'd suggest to consider who are the external entities that need to interact with this code. It seems that we have 3 sides:
-> the system - determines desired language, need to go RTL/LTR, which scripts are supported, which scripts are mirrored
	we can have one default instance of the "system" and add an optional ability to pass in custom-made "system".
-> consumers - people how just want their text processed. They should have simple #process(text[, type]) / #deprocess(text) methods  
-> people who write specific processors. At present it seems that the intention is that those people will extend ComplExpBasic. However, exposing that as API will definitely show too much and will make future changes next to impossible. I'd suggest using aggregation rather then inheritance.
Comment 61 Matitiahu Allouche CLA 2010-03-24 20:23:48 EDT
Warning:  long and arduous comment!

Here is a proposal for evolving the API of the complex expressions plug-in. This proposal got much of its inspiration from advice given in this bug.
Comments are more than welcome, and all the names suggested for classes and interfaces are candidates for improvement.

Consumers will use instances of a class named ComplExpProcessor. The main constructor for this class (I am not sure if there will be need for other constructors) will be:
   public ComplExpProcessor(String type);
where "type" is one of the strings currently listed in IBiDiProcessor.

A ComplExpProcessor instance will contain the following elements:
- a reference to a BaseProcessor (or extension thereof ) instance (see below).  This reference will be initialized using StringProcessor.getProcessor() in the constructor.
- a reference to an Environment instance which is created in the ComplExpProcessor constructor.  It will be used to store information about mirroring and component direction.
- a reference to a Features instance which is created in the ComplExpProcessor constructor.  It will be used to store information specific to the type of processor, such as operators or delimiters, supported scripts, expression direction.  It will be initialized in the BaseProcessor init() method (see below).
- a reference to a WorkData instance which is created in theconstructor. It will be used to store all the variables which are currently non-final members of ComplExpBasic, except those accounted for in Environment and Features. This is meant to take care of thread safety.

The API for ComplExpProcessor will include only a consumer-oriented subset of the current IComplExpProcessor API, namely:
   + leanToFullText()
   + fullToLeanText()
   + getFinalState()
   + leanBidiCharOffsets()
   + fullBidiCharOffsets()
   + leanToFullPos()
   + fullToLeanPos()
It will also include getters for the Environment, Features and WorkData instances. I am not sure if we will need also setters.

The API for Environment and Features will be essentially getters and setters.

The API for BaseProcessor will be similar to the consumer-oriented API of ComplExpProcessor, but with an added parameter for each method, a reference to the current ComplExpProcessor instance. This will allow the BaseProcessor implementation to access the data in the Environemt, Features and WorkData instances.
It will also include an init() method which will be called immediately after StringProcessor.getProcessor().
All the different type of processors will be extensions of BaseProcessor.

I am still hesitating about how BaseProcessor should access fields in WorkData. Using getters and setters would be very unwieldly. Direct access would need to make all members of WorkData public.
I would like to locate WorkData in an internal package. Will it then be allowed to have a WordData instance within a ComplExpProcessor object, while ComplExpProcessor is not internal?

I would appreciate if somebody can point me to info about what are the restrictions on classes within internal packages and how the compiler is made aware of such restrictions (for my own edification).
Comment 62 Oleg Besedin CLA 2010-03-29 16:23:26 EDT
(In reply to comment #61)
> I would appreciate if somebody can point me to info about what are the
> restrictions on classes within internal packages and how the compiler is made
> aware of such restrictions (for my own edification).

Are you asking how we differentiate between APIs and non-APIs?

This has some links:

http://wiki.eclipse.org/API_Central

In short, APIs are packages that are exported in the manifest.mf file ("Export-Package" directive) with no restrictions, while non-APIs have "x-internal" or "x-friends" qualifiers on them:

Export-Package: org.eclipse.equinox.bidi.complexp,
 org.eclipse.equinox.bidi.internal.complexp;x-internal:=true,
 org.eclipse.equinox.bidi.internal.complexp.consumable;x-internal:=true

The non-API packages usually have ".internal." in the package name, but at this time this is a secondary factor; the OSGi manifest is used to determine what's an API.

(Also some non-Java elements, such as extension points, schemas, and package re-exports are considered APIs.)

The restrictions are enforced by the PDE component of Eclipse SDK at development time and are not really present at the compiler level. This is more like a verbal contract: we support APIs, but if somebody used non-API classes, there are absolutely no guarantees that those internal classes are going to be there next release. The PDE will flag improper uses as errors, but it is possible to set them to be treated as warnings or ignored altogether. (This is often done for JUnit test bundles as they often have to reach into internals.)

Did it answer your question? This is really an large and surprisingly interesting topic, there are plenty of reading accessible from the API central link above.
Comment 63 Oleg Besedin CLA 2010-03-29 16:44:35 EDT
(In reply to comment #61)
> Here is a proposal for evolving the API of the complex expressions plug-in.

To me, this looks like a move in the right direction. The addition of "Features" looks very interesting as we can use it to consolidate the typical BiDi processors.

That said, it seems that we should be able to simplify things a bit more for the most common use cases. It feels like there are still too many special BiDi things are exposed (like "leanBidiCharOffsets()", "leanToFullPos()").

Let's consider them:

A) Consumers

A1) Most common use case

Say, a developer wants to process a string. At this time he uses:

	String full = TextProcessor.process(String str, String delimiter);

To me, this seems like the most common case. If that's the case, I'd like to preserve this by having an API like:
	String full = ComplExpProcessor.process(String lean, String type);
	String lean = ComplExpProcessor.deprocess(String full);

(note that this version uses "type" of the complex expression rather then the list of delimiters).

For performance reasons, as this path will be called a lot, it would be nice not to have to create new objects for each string we process / deprocess. Hence, using the static methods.

A2) More complicated cases would be:
 - when text is processed in multiple calls;
 - when a call is made with a different "environment" setting

Those probably could be served by creating a separate class instance that is not expected to be shared with other callers. The consumer's code then would be:

	ComplExpProcessor processor = new ComplExpProcessor(String type 
		[, IBiDiEnvironment environment]);
	String full1 = processor.process(String lean1);
	String full2 = processor.process(String lean2);
	processor.reset();
	String lean = processor.deprocess(String full);

B) Processor providers

B1) Simple case (coma, e-mail, file, math, property, system, underscore, URL, SPath)

If I understand correctly, you propose that typical processors are created by describing their features? Then a Feature structure would be something like:
	Processor name: XPath
	Operators: " /[]<>=!:@.|()+-*"
	[Delimiters: "''\"\""]
	[IsSingle: false]
	[DefaultDirection: LTR]

That seems like an excellent idea. Most of the pre-made and, I am guessing, most of the custom-made processors then could be described using such Features structures with no coding involved from the processor providers. 

B2) More complicated case (Java, RegEx, SQL)

It seems that processors that do not follow the pattern above only override #indexOfSpecial() and #processSpecial(). (Plus overrides seem to follow the same pattern, at least in the  #indexOfSpecial().)

What I was hoping to achieve for this use case is to have processor providers only contribute classes that implement a simple interface, something like
	interface IBiDiProcessor {
		equivalent_of_indexOfSpecial();
		equivalent_of_processSpecial();
	}
without having to dive into the BaseProcessor implementation.

Does this make sense? The driving factors here are:
- the simple case should be trivial to use, without the need to learn much about BiDi
- added processing for simple cases should have little performance impact. Ideally, as close to 0 as possible for non-BiDi locales
- exposing as little as possible as APIs. We can always add new APIs; we almost never can remove an API.

As long as we are improving on those factors, we are on the right track.
Comment 64 Matitiahu Allouche CLA 2010-04-10 17:39:48 EDT
Created attachment 164476 [details]
changes to ComplExpBasic

This is a modest change to ComplExpBasic.java which reduces the number of class variables, in preparation for the big changes to come.
Comment 65 Oleg Besedin CLA 2010-04-12 11:13:03 EDT
Thanks Matitiahu, the patch "changes to ComplExpBasic" applied to CVS Head.
Comment 66 Thomas Watson CLA 2010-05-24 14:57:31 EDT
Work to continue in the next release.
Comment 67 Matitiahu Allouche CLA 2011-01-04 13:42:38 EST
Created attachment 186038 [details]
Patch generating new version of complex expression package

After a long period of silence, here is a new version of this Complex Expression package. This patch constitutes a real makeover of the previous version.
Its design follows the spirit of comment #61 and comment #63.
The API for processors is now reduced to 4 methods, 3 of which don't need overriding for the simple cases, and the 4th can be reduced to a 1 line return statement.
Thread safety should be satisfied since processors now have no instance variables.
I have tried to make clear the distinction between classes used by "consumers" (to follow Oleg's terminology) and those used by creators of new processors.
Improved Javadoc should make things easier for reviewers.
Comment 68 Matitiahu Allouche CLA 2011-01-04 16:47:32 EST
Created attachment 186054 [details]
Updated JUnits corresponding to patch_bidi_20110104

This patch brings the JUnits in sync with the new version of the Complex Expressions package updated by patch #186038.
Code coverage is high (better than 95%).
Comment 69 Matitiahu Allouche CLA 2011-01-06 04:10:54 EST
Something I forgot to mention in my comment #67 : in file CxpEnv.java, there are several "TBD" reminders (originally issued by Oleg) advising to replace some JDK method calls by OSGi service or bundle properties.

1) For my edification, can somebody explain why those latter are better?

2) Can somebody point me to appropriate OSGi and bundle API? This will save me time.
Comment 70 Matitiahu Allouche CLA 2011-01-11 08:45:28 EST
Created attachment 186488 [details]
New version of Complex Expression package + StringRecord class

This patch supersedes patch 186038 (see comment #67). It adds the StringRecord class which is an implementation for bug #300890, please see there.
Comment 71 Matitiahu Allouche CLA 2011-01-11 08:53:22 EST
Created attachment 186490 [details]
minor changes to JUnits needed for demo-ing new class StringRecord

This patch supersedes patch #186054 (see comment #68), brings minor changes needed for the StringRecord demo.
Comment 72 Oleg Besedin CLA 2011-01-18 14:58:02 EST
(In reply to comment #69)
> Something I forgot to mention in my comment #67 : in file CxpEnv.java, there
> are several "TBD" reminders (originally issued by Oleg) advising to replace
> some JDK method calls by OSGi service or bundle properties.
> 
> 1) For my edification, can somebody explain why those latter are better?
> 
> 2) Can somebody point me to appropriate OSGi and bundle API? This will save me
> time.

For "// TBD use bundle properties": instead of 
   System.getProperty("os.name")
use 
   BundleContext.getProperty("abc")
(In this code BundleContext can be obtained from BidiActivator#bundleContext.)
This allows values overriden by the OSGi framework to be picked up by your code.

For "// TBD use OSGi service": there is an OSGi service LocaleProvider that returns current locale for the session. Again, the possibility is that it can be overridden by some part of the application. To see the code that uses it, check, for instance, the RegistryStrategyOSGI class from the bundle org.eclipse.equinox.registry.
Comment 73 Oleg Besedin CLA 2011-01-18 15:02:27 EST
> For "// TBD use bundle properties": instead of 
>    System.getProperty("os.name")
> use 
>    BundleContext.getProperty("abc")

Should be:
System.getProperty(...) -> BundleContext.getProperty(...)

Off-hand I don't know what's the difference between "osgi.os" and "os.name", Equinox code seems to use "osgi.os" and "osgi.nl" keys.
Comment 74 Oleg Besedin CLA 2011-01-18 16:15:16 EST
(In reply to comment #70)
> Created attachment 186488 [details]
> New version of Complex Expression package + StringRecord class

That's a lot of changes. Seems like every single class has been renamed and reformatted?

Actually, the following did not change:
  IComplExpProcessor
  ComplExpBasic
  ComplExpDoNothing
and the ComplExpBasic produces a compile error.

Are those 3 classes supposed to be removed?

=> Formatting, tabs-vs-spaces etc.

I probably was not clear on this in the comment 52: there are specific format settings used by different projects, including Equinox. They are there so that code within a project has the same style. Unless there is a very good reason, those settings should be used. See

http://www.eclipse.org/equinox/documents/coding.php

and the links to "Equinox JDT UI preferences" and "Equinox JDT Core preferences" on that page.

=> Naming

I usually try to stay away from "debates" on how things should be named, but the following names take away readability:

classes CxpEnv, CxpDoer
methods init2(), deprocess()
variables ceh, whichSpecial, B, L, R, AL, AN, EN, proc, cef, 
etc.

Please do a pass through the code to make sure you use descriptive names.

+ To me, prefix "Cxp" (Complex eXPression?) does not convey any meaning. I'd rather have "Bidi" in whichever capitalization form. (Java seems to use "Bidi" for java.text.Bidi so probably that's the form to use.)

+ Packages: let's have a package "org.eclipse.equinox.bidi" (whithout ".complexp") for generally useful APIs and "org.eclipse.equinox.bidi.custom" for APIs used to enhance default functionality. 

+ IProcessorTypes should be IExpressionTypes - those names are for people using this code.
Comment 75 Matitiahu Allouche CLA 2011-01-19 05:06:48 EST
(In reply to comment #74)
> (In reply to comment #70)
> > Created attachment 186488 [details] [details]
> > New version of Complex Expression package + StringRecord class
> 
> That's a lot of changes. Seems like every single class has been renamed and
> reformatted?
> 
> Actually, the following did not change:
>   IComplExpProcessor
>   ComplExpBasic
>   ComplExpDoNothing
> and the ComplExpBasic produces a compile error.
> 
> Are those 3 classes supposed to be removed?

IcomplExpProcessor should be removed (renamed ICxpProcessor).
ComplExpBasic should be removed (replaced by CxpHelper for the API and CxpDoer for the implementation).
ComplExpDoNothing is removed (replaced by the argument-less constructor of CxpHelper).

> 
> => Formatting, tabs-vs-spaces etc.
> 
> I probably was not clear on this in the comment 52: there are specific format
> settings used by different projects, including Equinox. They are there so that
> code within a project has the same style. Unless there is a very good reason,
> those settings should be used. See
> 
> http://www.eclipse.org/equinox/documents/coding.php
> 
> and the links to "Equinox JDT UI preferences" and "Equinox JDT Core
> preferences" on that page.

You were perfectly clear in comment 52, and I have the proper preference files in my .settings directory. But I am sometimes using an external editor which leaves spaces instead of tabs and forget to re-format afterwards. Mea culpa!

> 
> => Naming
> 
> I usually try to stay away from "debates" on how things should be named, but
> the following names take away readability:

I have already stated that all the names I assign are candidates for improvement. I understand the need for meaningful names, but there is a balance to find between meaningfulness and long-windedness. I will try below to justify some of my choices, but will comply with any directive.

> 
> classes CxpEnv, CxpDoer

CxpEnv: "env" is a quite common abbreviation for "environment" in programming. C has getenv and putenv functions. Unix has an env shell command.

CxpDoer: the idea is that this class is actually implementing the API. I have thought about Factor, Agent, but Doer seems the simplest expression of the idea. Any suggestions?

> methods init2(), deprocess()

init2: this method is meant to re-initialize things when the environment changes. What about "reInit"? "reInitOnEnvChange"?

deprocess: the names "process" and "deprocess" must be kept for compatibiliy with names in the current TextProcessor.

> variables ceh, whichSpecial, B, L, R, AL, AN, EN, proc, cef,  

It is a common usage in Java to name instances of classes with long names with the lower case letters corresponding to the upper case letters of the class names. For instance, "isr" for an instance of InputStreamReader.
So ceh stands for an instance of Complex Expression Helper and cef for an instance of Complex Expression Features.
But I have no problem changing them to "helper" and "features" respectively.
"proc" will be changed to "processor".

For "whichSpecial", is "caseNumber" better? Any suggestion?

For B, L, R, AL, AN, EN, these are acronyms defined in the Unicode Bidirectional Algorithm ( http://www.unicode.org/reports/tr9/ ), which should be known to people reviewing this part of the code.
Changing "B" to "UnicodeBidiClassBlock" is way too verbose IMHO. Any suggestion? Maybe "UBA_B"?
 
> etc.
> 
> Please do a pass through the code to make sure you use descriptive names.
> 
> + To me, prefix "Cxp" (Complex eXPression?) does not convey any meaning. I'd
> rather have "Bidi" in whichever capitalization form. (Java seems to use "Bidi"
> for java.text.Bidi so probably that's the form to use.)

I changed from prefix "ComplExp" to "Cxp" after getting tired from writing umpteen times the lengthy "ComplExp". I can revert to "ComplExp" if you insist.
"Bidi" would be conveniently terse, but it is too generic. Eclipse may have to accomodate bidi support for more than complex expressions, so the prefix should be specific to complex expressions. Suggestions?

> 
> + Packages: let's have a package "org.eclipse.equinox.bidi" (whithout
> ".complexp") for generally useful APIs and "org.eclipse.equinox.bidi.custom"
> for APIs used to enhance default functionality. 

I am not sure how the current classes would map to those packages.

> 
> + IProcessorTypes should be IExpressionTypes - those names are for people using
> this code.

I will change "IProcessorTypes" to "IExpressionTypes".
Comment 76 Oleg Besedin CLA 2011-01-19 12:28:23 EST
(In reply to comment #75)
> > Are those 3 classes supposed to be removed?
> IcomplExpProcessor should be removed (renamed ICxpProcessor).
> ComplExpBasic should be removed (replaced by CxpHelper for the API and CxpDoer
> for the implementation).
> ComplExpDoNothing is removed (replaced by the argument-less constructor of
> CxpHelper).

Could you make sure that those changes appear in the updated patch?

> ... I have the proper preference files
> in my .settings directory. But I am sometimes using an external editor which
> leaves spaces instead of tabs and forget to re-format afterwards. Mea culpa!

The patch attached in the comment 70 changes formatter profile to a custom one:

formatter_profile=_Matitiahu

(file org.eclipse.equinox.bidi\.settings\org.eclipse.jdt.ui.prefs) with one of the changes being spaces used instead of tabs.

> CxpEnv: "env" is a quite common abbreviation for "environment" in programming.

From my view point: I can only guess what "CxpEnv" means because I know the history of this enhancement request. I am afraid that for a new developer that name won't mean much.

> C has getenv and putenv functions. Unix has an env shell command.

Yes, and then there is this "vi" editor that proves that menus are unnecessary luxury :-).

> 
> CxpDoer: the idea is that this class is actually implementing the API. I have
> thought about Factor, Agent, but Doer seems the simplest expression of the
> idea. Any suggestions?

The usual no-thought-required approach is that if the API is named "abc" then the internal implementation is named "abcImpl". That is not a requirement, as long as there is a name that actually describes the purpose, it is fine. ("Doer" does not describe the purpose.)

> 
> > methods init2(), deprocess()
> 
> init2: this method is meant to re-initialize things when the environment
> changes. What about "reInit"? "reInitOnEnvChange"?

How about "setEnvironment(BiDiEnvironment)"?

> 
> deprocess: the names "process" and "deprocess" must be kept for compatibiliy
> with names in the current TextProcessor.

Good point, I agree.

> 
> > variables ceh, whichSpecial, B, L, R, AL, AN, EN, proc, cef,  
> 
> It is a common usage in Java to name instances of classes with long names with
> the lower case letters corresponding to the upper case letters of the class
> names. For instance, "isr" for an instance of InputStreamReader.
> So ceh stands for an instance of Complex Expression Helper and cef for an
> instance of Complex Expression Features.
> But I have no problem changing them to "helper" and "features" respectively.
> "proc" will be changed to "processor".

Yes, please do. That would help make lines like this easier to understand:

        cef = proc.init(ceh, this.env);

> For B, L, R, AL, AN, EN, these are acronyms defined in the Unicode
> Bidirectional Algorithm ( http://www.unicode.org/reports/tr9/ ), which should
> be known to people reviewing this part of the code.
> Changing "B" to "UnicodeBidiClassBlock" is way too verbose IMHO. Any
> suggestion? Maybe "UBA_B"?

I'd call "B" a "DIR_SEPARATOR", but that's up to you. Trading "B" for "UBA_B" somehow doesn't seem to help.

> > Please do a pass through the code to make sure you use descriptive names.

> I changed from prefix "ComplExp" to "Cxp" after getting tired from writing
> umpteen times the lengthy "ComplExp". I can revert to "ComplExp" if you insist.
> "Bidi" would be conveniently terse, but it is too generic. Eclipse may have to
> accomodate bidi support for more than complex expressions, so the prefix should
> be specific to complex expressions. Suggestions?

"Complex expression" has no meaning to general public. Is surface integral a complex expression? How about tax deductible depreciation calculations? 

Suggestion: "Bidi". 

> > + Packages: let's have a package "org.eclipse.equinox.bidi" (whithout
> > ".complexp") for generally useful APIs and "org.eclipse.equinox.bidi.custom"
> > for APIs used to enhance default functionality. 
> 
> I am not sure how the current classes would map to those packages.

The APIs serving as a replacement for the TextProcessor and APIs necessary to work with predefined expression types would go into "org.eclipse.equinox.bidi"; the APIs for framework / advanced customization / additional expression types would go into "org.eclipse.equinox.bidi.custom".

This way you can avoid dumping all "advanced" APIs on developers who won't need them.
Comment 77 Matitiahu Allouche CLA 2011-01-20 05:24:38 EST
(In reply to comment #76)
> (In reply to comment #75)
> > > Are those 3 classes supposed to be removed?
> > IcomplExpProcessor should be removed (renamed ICxpProcessor).
> > ComplExpBasic should be removed (replaced by CxpHelper for the API and CxpDoer
> > for the implementation).
> > ComplExpDoNothing is removed (replaced by the argument-less constructor of
> > CxpHelper).
> 
> Could you make sure that those changes appear in the updated patch?

I create a patch for the whole project. I don't know what special step I should take to signal that those classes have been removed (they are removed from my directories).

> 
> > ... I have the proper preference files
> > in my .settings directory. But I am sometimes using an external editor which
> > leaves spaces instead of tabs and forget to re-format afterwards. Mea culpa!
> 
> The patch attached in the comment 70 changes formatter profile to a custom one:
> 
> formatter_profile=_Matitiahu
> 
> (file org.eclipse.equinox.bidi\.settings\org.eclipse.jdt.ui.prefs) with one of
> the changes being spaces used instead of tabs.
> 

Oooouch! I don't remember what preference I meant to change, and don't understand how the space/tabs confusion got there. Anyway, I have restored the prefs files from the equinox site, and I will re-save all the java files, so that the next patch should be better.

> > CxpEnv: "env" is a quite common abbreviation for "environment" in programming.
> 
> From my view point: I can only guess what "CxpEnv" means because I know the
> history of this enhancement request. I am afraid that for a new developer that
> name won't mean much.

OK, "env" will be changed to "environment".

> > 
> > CxpDoer: the idea is that this class is actually implementing the API. I have
> > thought about Factor, Agent, but Doer seems the simplest expression of the
> > idea. Any suggestions?
> 
> The usual no-thought-required approach is that if the API is named "abc" then
> the internal implementation is named "abcImpl". That is not a requirement, as
> long as there is a name that actually describes the purpose, it is fine.
> ("Doer" does not describe the purpose.)
> 

OK, "CxpDoer" will be changed to "XyzImpl" where "Xyz" is whatever prefix is 
chosen instead of "Cxp". 

> > 
> > > methods init2(), deprocess()
> > 
> > init2: this method is meant to re-initialize things when the environment
> > changes. What about "reInit"? "reInitOnEnvChange"?
> 
> How about "setEnvironment(BiDiEnvironment)"?
> 

OK for setEnvironment, although it somewhat clashes with the directive in http://www.eclipse.org/equinox/documents/coding.php which says: "get/set prefixes should be reserved for real accessors. If the method does real work, it's not an accessor."

> 
> > For B, L, R, AL, AN, EN, these are acronyms defined in the Unicode
> > Bidirectional Algorithm ( http://www.unicode.org/reports/tr9/ ), which should
> > be known to people reviewing this part of the code.
> > Changing "B" to "UnicodeBidiClassBlock" is way too verbose IMHO. Any
> > suggestion? Maybe "UBA_B"?
> 
> I'd call "B" a "DIR_SEPARATOR", but that's up to you. Trading "B" for "UBA_B"
> somehow doesn't seem to help.

Here is what I propose to do. I will let these as they are, but add comments to explain what they are adjacent to where they are used.

> 
> > > Please do a pass through the code to make sure you use descriptive names.
> 

I will.

> > I changed from prefix "ComplExp" to "Cxp" after getting tired from writing
> > umpteen times the lengthy "ComplExp". I can revert to "ComplExp" if you insist.
> > "Bidi" would be conveniently terse, but it is too generic. Eclipse may have to
> > accomodate bidi support for more than complex expressions, so the prefix should
> > be specific to complex expressions. Suggestions?
> 
> "Complex expression" has no meaning to general public. Is surface integral a
> complex expression? How about tax deductible depreciation calculations? 
> 
> Suggestion: "Bidi". 
> 

Sorry to disagree, but I think that "Bidi" is misleading. Both in the JDK's java.text.Bidi and in ICU4J, "Bidi" is the class which transforms logically ordered text into visually ordered text. This is quite different from what we want to accomplish here.
On the other hand, I would agree to "BidiXyz" where "Xyz" is something not too long which expresses what kind of bidi processing we want to do here. The problem is that I don't find an Xyz conveniently short that would be understandable to an innocent passer-by. Help!
Comment 78 Lina Kemmel CLA 2011-01-20 08:21:09 EST
(In reply to comment #77)
> Both in the JDK's java.text.Bidi and in ICU4J, "Bidi" is the class which 
> transforms logically ordered text into visually ordered text. This is quite 
> different from what we want to accomplish here.
> On the other hand, I would agree to "BidiXyz" where "Xyz" is something not too
> long which expresses what kind of bidi processing we want to do here. The
> problem is that I don't find an Xyz conveniently short that would be
> understandable to an innocent passer-by. Help!

Maybe "BidiComplex"? :-)

The same name is used in Dojo for the class handles complex expressions:
http://dojotoolkit.org/api/1.5/dojox.string.BidiComplex
Comment 79 Matitiahu Allouche CLA 2011-02-06 12:50:03 EST
Created attachment 188406 [details]
Complete replacement for bidi complex expression support

This zip file contains complete project directories for the bidi complex expression packages and the corresponding tests. 
I have tried to address all comments to the previous version. 
Package and class names have changed, thus it may be more convenient to use this complete replacement than to deal with patch files, However, I will submit patch files in separate attachments.
Comment 80 Matitiahu Allouche CLA 2011-02-06 12:53:46 EST
Created attachment 188407 [details]
new version for bidi complex expression support

This patch corresponds to the eclipse.org.bidi part of attachment 188406 [details].
Comment 81 Matitiahu Allouche CLA 2011-02-06 12:56:55 EST
Created attachment 188408 [details]
new version for bidi complex expression tests

This patch corresponds to the org.eclipse.equinox.bidi.tests part of attachment 188406 [details].
Comment 82 Matitiahu Allouche CLA 2011-02-06 12:59:08 EST
(In reply to comment #80)

The text of comment 80 should have been:

This patch corresponds to the org.eclipse.equinox.bidi part of attachment 188406 [details].
Comment 83 Oleg Besedin CLA 2011-02-09 16:28:21 EST
Created attachment 188627 [details]
Slightly updated "new version for bidi complex expression" patches
Comment 84 Oleg Besedin CLA 2011-02-09 16:31:19 EST
(In reply to comment #80)
> Created attachment 188407 [details]
> new version for bidi complex expression support

Nicely done, I've applied patches to CVS Head with some minor adjustments:

- removed ComplExpBasic, ComplExpDoNothing. (See comment 77)
- removed overridden formatter settings in .settings\org.eclipse.jdt.core.prefs, .settings\org.eclipse.jdt.ui.prefs
- removed hardcoded reference to swt.jar from the tests's classpath
- removed IComplExpProcessor and updated reference in the extension schema to point to IBidiComplexProcessor
- updated copyright dates
- removed package.html files from internal packages. They are only necessary on API packages, and those two files did not have much contents anyway.


What's next? I did not look much into implementations far, just glossed over the APIs. Two things stand out:

1. The BidiComplexHelper being non-static. 

I guess it was made this way to work around thread-safety issue pointed in the comment 28. This is not very efficient because a new object would have to be created and then garbage collected for every Bidi call. I was hoping that the class BidiComplexImpl can be reworked instead to remove the need for class variable, rather pass them as method arguments.

2. The amount of APIs. There is a very good progress comparing to the last year version, but still we have 9 API classes with total of 44 public methods, 11 fields, and 23 constants. This seems way too much for the task. By the way, we generally try not to expose fields as APIs, rather provide getter and setter methods. (Otherwise it becomes impossible to change them later.) Also, some classes like BidiComplexEnvironment, BidiComplexFeatures would fit better ito the "custom" package, rather then the "org.eclipse.equinox.bidi" package.
Comment 85 Matitiahu Allouche CLA 2011-03-02 10:36:55 EST
(In reply to comment #84)
> By the way, we
> generally try not to expose fields as APIs, rather provide getter and setter
> methods. (Otherwise it becomes impossible to change them later.)

This comment must refer to the fields in classes BidiComplexFeatures and BidiComplexEnvironment.
I would rather not complicate the code by adding getter and setter calls.
In fact, we don't need setters since all the fields of these 2 classes are final. They are set in the class constructor and cannot be modified, so the likeliness that we need to modify their definition in the future is very low.
Do you insist on introducing the getter methods?
Comment 86 Matitiahu Allouche CLA 2011-03-02 10:50:53 EST
(In reply to comment #84)
> Also, some
> classes like BidiComplexEnvironment, BidiComplexFeatures would fit better into
> the "custom" package, rather than the "org.eclipse.equinox.bidi" package.

I agree that BidiComplexFeatures should go into the "custom" package, since it is essentially a complement to IBidiComplexProcessor allowing having some static fields associated with a processor instance. This class is only needed by developers of processors or users needing to alterate the behavior of a processor.

The situation of BidiComplexEnvironment is different: it is meant for use by end users of the complex expressions support, to define the environment in which they run, if different from default. I think that this class should stay in the "bidi" package.
Do you agree?
Comment 87 Matitiahu Allouche CLA 2011-03-03 10:20:44 EST
(In reply to comment #84)
> 1. The BidiComplexHelper being non-static. 
> 
> I guess it was made this way to work around thread-safety issue pointed in the
> comment 28. This is not very efficient because a new object would have to be
> created and then garbage collected for every Bidi call. I was hoping that the
> class BidiComplexImpl can be reworked instead to remove the need for class
> variable, rather pass them as method arguments.

If we want to avoid any class variables, this means that each unit of work must be done in one API call, and this API call must have all the necessary parameters, such as Features and Environment and return all the necessary results.

I have identified 7 current methods which provide a final result to the user, rather than setting parameters in preparation for later asking for a final result.
These methods are (with the expected augmented argument list):
1) int[] fullBidiCharOffsets(String text, IBidiComplexProcessor processor,
                             int initState, int[] finalState,
                             BidiComplexFeatures features, 
                             BidiComplexEnvironment environment)
2) int fullToLeanPos(String text, int position, IBidiComplexProcessor processor,
                     int initState, int[] finalState,
                     BidiComplexFeatures features, 
                     BidiComplexEnvironment environment)
To this one we should add fullToLeanMap to avoid repeating the same processing if the user needs more than one position:
   int fullToLeanMap(String text, IBidiComplexProcessor processor,
                     int initState, int[] finalState,
                     BidiComplexFeatures features, 
                     BidiComplexEnvironment environment)
3) String fullToLeanText(String text, IBidiComplexProcessor processor,
                         int initState, int[] finalState,
                         BidiComplexFeatures features, 
                         BidiComplexEnvironment environment)
4) int getCurDirection(String text, IBidiComplexProcessor processor,
                       BidiComplexFeatures features, 
                       BidiComplexEnvironment environment)
5) int[] leanBidiCharOffsets(String text, IBidiComplexProcessor processor,
                             int initState, int[] finalState,
                             BidiComplexFeatures features, 
                             BidiComplexEnvironment environment)
6) int leanToFullPos(String text, int position, IBidiComplexProcessor processor,
                     int initState, int[] finalState,
                     BidiComplexFeatures features, 
                     BidiComplexEnvironment environment)
To this one we should add leanToFullMap to avoid repeating the same processing if the user needs more than one position:
   int leanToFullMap(String text, IBidiComplexProcessor processor,
                     int initState, int[] finalState,
                     BidiComplexFeatures features, 
                     BidiComplexEnvironment environment)
7) String leanToFullText(String text, IBidiComplexProcessor processor,
                         int initState, int[] finalState,
                         BidiComplexFeatures features, 
                         BidiComplexEnvironment environment)

Should I go ahead with this change?

If yes, then BidiComplexHelper will only have static methods, and its name should be changed to BidiComplexEngine or something similar.
  -
Comment 88 Oleg Besedin CLA 2011-03-03 10:34:42 EST
(In reply to comment #85)
> (In reply to comment #84)
> > By the way, we
> > generally try not to expose fields as APIs, rather provide getter and setter
> > methods. (Otherwise it becomes impossible to change them later.)
> 
> This comment must refer to the fields in classes BidiComplexFeatures and
> BidiComplexEnvironment.
> I would rather not complicate the code by adding getter and setter calls.
> In fact, we don't need setters ...

Good, so no need to add setters. 

> Do you insist on introducing the getter methods?

I think it will be better to have getter methods rather then exposing class fields as APIs. 

Consider, for instance, if it turns out that a field can be lazily cached foe performance reasons. If the field is exposed as API, such future optimization can not be done.
Comment 89 Oleg Besedin CLA 2011-03-03 10:38:17 EST
(In reply to comment #86)
> (In reply to comment #84)
> The situation of BidiComplexEnvironment is different: it is meant for use by
> end users of the complex expressions support, to define the environment in
> which they run, if different from default. I think that this class should stay
> in the "bidi" package.
> Do you agree?

Yes, that seems like a good idea.
Comment 90 Oleg Besedin CLA 2011-03-03 10:47:04 EST
(In reply to comment #87)
> (In reply to comment #84)
> If we want to avoid any class variables, this means that each unit of work must
> be done in one API call, and this API call must have all the necessary
> parameters, such as Features and Environment and return all the necessary
> results.

I'd hope to have two versions:

- a call that uses "default" features and environment settings
(The features and environment would have a "default" instance to be used by those calls.)
- a call that specifies its features and environment settings

This way most people won't have to worry about obtaining or constructing features and environment.

> 
> I have identified 7 current methods which provide a final result to the user,
> rather than setting parameters in preparation for later asking for a final
> result.

Could you elaborate on how a typical end user (let's call him a "plugin developer") going to use those?

I was hoping that we can limit "plugin developer" APIs to two transformations: "lean text" to "full text" and the reverse.

> 1) int[] fullBidiCharOffsets(...)
> 2) int fullToLeanPos(...)
>    int fullToLeanMap(...)
> 3) String fullToLeanText(...)
> 4) int getCurDirection(...)
> 5) int[] leanBidiCharOffsets(...)
> 6) int leanToFullPos(...)
>    int leanToFullMap(...)
> 7) String leanToFullText(...)

As a "plugin developer" all I need is: here is the "lean" text of type ABC, please make it a "full" text I can pass to a display. 

Sometimes, I might need a reverse method: here is the "full" text with Bidi control chars, please make it into a "lean" text.
Comment 91 Matitiahu Allouche CLA 2011-03-03 11:29:05 EST
(In reply to comment #84)
> 2. The amount of APIs. 
> There is a very good progress comparing to the last year
> version, but still we have 9 API classes with total of 44 public methods, 11
> fields, and 23 constants. This seems way too much for the task.

If we implement the changes in comment #87, then we probably can remove the
get/set APIs for BidiComplexFeatures and BidiComplexEnvironment, which is a
gain of 4 APIs.
Maybe we could also remove the public attribute of all fields in
BidiComplexFeatures and BidiComplexEnvironment, which would gain another 9
APIs. Note that I personally don't like such a removal, and I am not sure that
it won't come to a dead end while implementing.

Also the following APIs can be banished to somewhere in package "bidi.custom":
  - getDirProp
  - setDirProp
  - insertMark
  - processOperator
  - setFinalState

This would leave us in the "bidi" package with:
- 9 methods to do real work
- 12 expression types
- 1 constructor for BidiComplexEnvironment
- 1 constructor and 5 methods for BidiComplexStringRecord
- 2 methods to access a processor instance
- 6 methods in BidiComplexUtil

This still seems a lot, but I don't see how to reduce this further.
Comment 92 Matitiahu Allouche CLA 2011-03-03 11:36:17 EST
(In reply to comment #90)
> (In reply to comment #87)
> > (In reply to comment #84)
> > If we want to avoid any class variables, this means that each unit of work must
> > be done in one API call, and this API call must have all the necessary
> > parameters, such as Features and Environment and return all the necessary
> > results.
> 
> I'd hope to have two versions:
> 
> - a call that uses "default" features and environment settings
> (The features and environment would have a "default" instance to be used by
> those calls.)
> - a call that specifies its features and environment settings
> 
> This way most people won't have to worry about obtaining or constructing
> features and environment.

What I have in mind is that the following parameters can be specified as null, which will fall back to default settings:
  - finalState
  - features
  - environment
Comment 93 Matitiahu Allouche CLA 2011-03-03 11:45:18 EST
(In reply to comment #90)
> > I have identified 7 current methods which provide a final result to the user,
> > rather than setting parameters in preparation for later asking for a final
> > result.
> 
> Could you elaborate on how a typical end user (let's call him a "plugin
> developer") going to use those?
> 
> I was hoping that we can limit "plugin developer" APIs to two transformations:
> "lean text" to "full text" and the reverse.
> 
> . . .
> 
> As a "plugin developer" all I need is: here is the "lean" text of type ABC,
> please make it a "full" text I can pass to a display. 
> 
> Sometimes, I might need a reverse method: here is the "full" text with Bidi
> control chars, please make it into a "lean" text.

I know of existing use in real-life products which utilize the offsets rather than the result strings. If I am not mistaken, this is for applications which manage the user interface at the keyboard level and want to make the existence of the added controls transparent to the end user. It is not enough to have the final string, the application also needs to know where exactly are the added controls. To this end, the offset APIs methods and the pos/map methods are useful.
getCurDirection is also needed for cases when the direction depends on the actual content of the source string. This direction may affect details of presentation like alignment.
Comment 94 Oleg Besedin CLA 2011-03-16 10:11:47 EDT
To me, what would help here is to revisit the intention of this enhancement.

As I understand it, the primary goal is to have a TextProcessor replacement aware of string types that require special Bidi processing, such as file names and math expressions. 

That should dictate the "simple" set of APIs. 

After that is achieved, if you'd like to make this into a a more general Bidi engine (comment 93), that would be great - as long as it does not detract from the primary purpose. The ease-of-use is most definitely a consideration, and, therefore I suggest separating additional functionality into different API packages.
Comment 95 Matitiahu Allouche CLA 2011-04-26 12:47:45 EDT
Created attachment 194078 [details]
new version for bidi complex expression support

This is a new version of the complex expression support. New in this version:
- significant decrease of the number of public APIs
- methods for processor developers are exiled into the "bidi.custom" package
- all public methods for processor users are static
- no creation of objects under the covers when calling the API to process strings
- processing one string needs one call to one method
Comment 96 Matitiahu Allouche CLA 2011-04-26 12:51:57 EDT
Created attachment 194079 [details]
new version for bidi complex expression tests

This is an update for the tests synchronized with the new version of the support updated by patch 194078.
It also adds a test program for BidiComplexStringRecord.
Comment 97 Oleg Besedin CLA 2011-04-27 10:54:04 EDT
"New version" patches applied to CVS Head, thanks Mati!
Comment 98 Thomas Watson CLA 2011-04-27 14:33:15 EDT
I assume this should not be marked as 3.7?
Comment 99 Oleg Besedin CLA 2011-04-28 16:49:41 EDT
On the latest code:

- There are both IBidiComplexProcessor and BidiComplexProcessor. The interface is not needed; the BidiComplexProcessor can be an abstract class with getFeatures() being an abstract method.

- BidiComplexFeatures - it seems that they are tied with processors. Processors need to have features and there does not seem to be much that can be done with features outside of processors. Can we move the data from features into BidiComplexProcessor and eliminate features as a separate entity?

- Arabic and Hebrew - there is code to provide behavior specific to Arabic and Hebrew scripts. The only place in the current code with the difference is in the e-mail processing: in mirrored environment dirArabic = RTL and dirHebrew = LTR. For Eclipse text processing, are there important distinctions between the two scripts? How about other LTR scripts? 

- BidiComplexFeatures#ignoreArabic; #ignoreHebrew - are those really needed? They don't seem to be used at the moment except for tests. An example that illustrates how they can be used would help here.

- BidiComplexUtil#process(): The "new BidiComplexProcessor()" is inconsistent as this class really is abstract - getFeatures() needs to be implemented. Should it use BidiComplexDelims or its derivative?

- The methods on BidiComplexEngine that take "Object processor" as an argument. That is confusing; I am not sure which will be best, for now I'd suggest all "simple" API methods to take String; if String = null then equivalent of TextProcessor is implied. (Same "Object" issue applies as it propagates into implementation in BidiComplexImpl.)

- BidiComplexImpl#leanToFullCommon() returns either int[] or String. This seems like two methods with String one using the int[].

- BidiComplexImpl#leanToFullCommon()
It seems that call to 
   int orient = getCurOrient(environment, text, dirProps);
uses dirProps which is not initialized at that point.

- Terminology: BidiComplexFeatures#operators => those would be easier to understand if they were called "separators" or "delimiters".
Comment 100 Matitiahu Allouche CLA 2011-05-01 12:05:21 EDT
(In reply to comment #99)
> On the latest code:
> 
> - There are both IBidiComplexProcessor and BidiComplexProcessor. The interface
> is not needed; the BidiComplexProcessor can be an abstract class with
> getFeatures() being an abstract method.
> 
Yes, this is possible, and it was my initial design, when the corresponding component was named ComplExpBasic, and all processors were extensions thereof. I even suggested something similar in comment #39.
However, from comments #28, #42 and #44 I got the impression that it is better to isolate in an interface those APIs that processors are assumed to implement and to leave in a separate class all the methods used by processors which should normally not be overridden.

Personally, I prefer the way of things as they are now, i.e. having a minimal interface and a support class, but changing it as suggested in comment #99 is no problem.
Please reconsider the whole picture and give advice.
Comment 101 Matitiahu Allouche CLA 2011-05-01 12:49:55 EDT
(In reply to comment #99)
> On the latest code:
> 
> - BidiComplexFeatures - it seems that they are tied with processors. Processors
> need to have features and there does not seem to be much that can be done with
> features outside of processors. Can we move the data from features into
> BidiComplexProcessor and eliminate features as a separate entity?
> 
We can do that. If we do, users needing to use a processor with a modified value for some of its features will have to subclass this processor rather than to use it with a private instance of BidiComplexFeatures. (There is a use case of modifying the delimiters in BidiComplexUtil.process).
I see one advantage and some inconvenients in this proposal.
Pro: in the user APIs, the BidiComplexFeatures argument becomes unneeded.
Cons:
  - The user needs to deal with the whole processor instead of only the features.
  - It will take more lines to subclass the processor than to create a BidiComplexFeatures instance.
  - All the getters of BidiComplexFeatures will be transferred to BidiComplexProcessor. Generally, it is better to divide and conquer.
  - When subclassing with all fields being static, all decisions have to be made at compile time. With a separate BidiComplexFeatures instance, it is easier to defer some choices until run time, for instance choices based on the environment (.e.g. locale).

Please let me know if you maintain your advice.
Comment 102 Matitiahu Allouche CLA 2011-05-01 16:52:52 EDT
(In reply to comment #99)
> On the latest code:
> 
> - Arabic and Hebrew - there is code to provide behavior specific to Arabic and
> Hebrew scripts. The only place in the current code with the difference is in
> the e-mail processing: in mirrored environment dirArabic = RTL and dirHebrew =
> LTR. For Eclipse text processing, are there important distinctions between the
> two scripts? How about other LTR scripts? 
> 
The way of presenting complex expressions should be standardized by the national bureau of each country using RTL scripts.
For Israel, a draft standard exists which is in the process of getting approval before publishing to the public at large.
For the Arabic countries, there is no standard that I am aware of. We know that behavior different from the Hebrew standard is expected by Arabic users for display of email addresses and probably also URLs (or rather IRIs), possibly regular expressions and XPath.
The API must be flexible enough to accomodate different preferences according to the country, or at least the language.
Comment 103 Matitiahu Allouche CLA 2011-05-01 17:01:48 EDT
(In reply to comment #99)
> On the latest code:
> 
> - BidiComplexFeatures#ignoreArabic; #ignoreHebrew - are those really needed?
> They don't seem to be used at the moment except for tests. An example that
> illustrates how they can be used would help here.
> 
As I wrote in comment #102, the way of presenting complex expressions for Arabic users is not standardized. In feedback that I received from some Arabic experts, they expressed the opinion that some types of complex expression (e.g. IRIs) must be displayed by the pure Unicode Bidi Algorithm, without any added directional control characters. This is the reason for the ignoreArabic flag.
I know of no current use case for the ignoreHebrew flag, but the inverse situation could also happen, thus the ignoreHebrew flag. It is also there for symmetry.
Comment 104 Matitiahu Allouche CLA 2011-05-01 17:11:47 EDT
(In reply to comment #99)
> On the latest code:
> 
> - BidiComplexUtil#process(): The "new BidiComplexProcessor()" is inconsistent
> as this class really is abstract - getFeatures() needs to be implemented.
> Should it use BidiComplexDelims or its derivative?
> 
No, this class is not abstract: it implements a processor which splits the source text according to delimiters. Admittedly, the user must supply his own BidiComplexFeatures argument, since BidiComplexProcessor.getFeatures() throws an exception.
In BidiComplexUtil.process(), the caller wants to specify his own delimiters and does supply a BidiComplexFeatures argument.
Comment 105 Matitiahu Allouche CLA 2011-05-01 17:31:28 EDT
(In reply to comment #99)
> On the latest code:
> 
> - The methods on BidiComplexEngine that take "Object processor" as an argument.
> That is confusing; I am not sure which will be best, for now I'd suggest all
> "simple" API methods to take String; if String = null then equivalent of
> TextProcessor is implied. (Same "Object" issue applies as it propagates into
> implementation in BidiComplexImpl.)
> 
The current ability to supply a processor reference as first argument to BidiComplexEngine methods has 2 purposes:
1. Performance: if an application needs to use repeatedly the same processor, it can get a reference to the processor by giving its type string and from now on use the processor reference when invoking the user API. This avoids having to look up the processor in the registry each time a method is invoked,
2. It allows using processors which have not been registered, thus providing maximal flexibility.

A way to avoid having an Object processor argument could be to add to BidiComplexEngine a method getProcessor() taking a String argument and returning a (I)BidiComplexProcessor reference. Then the other methods in BidiComplexEngine will have as their first argument a (I)BidiComplexProcessor.
This would add a getProcessor() call for users which always use the String processor type and can currrently use this string directly as first argument.
Comment 106 Matitiahu Allouche CLA 2011-05-01 17:55:20 EDT
(In reply to comment #99)
> On the latest code:
> 
> - BidiComplexImpl#leanToFullCommon() returns either int[] or String. This seems
> like two methods with String one using the int[].
> 
The same leanToFull logic may return its result either as an array of offsets or as a ready-for-presentation string. In both cases, the logic is the same and most of the code is common. I did not want to duplicate the common code, thus the way this method returns its results.

I can replace leanToFullCommon() by 3 methods corresponding to the 3 current values of the "option" argument, and these 3 methods will each invoke a method which does most of the common processing. There will be some code duplication but not too much.
Comment 107 Matitiahu Allouche CLA 2011-05-01 18:06:51 EDT
(In reply to comment #99)
> On the latest code:
> 
> - BidiComplexImpl#leanToFullCommon()
> It seems that call to 
>    int orient = getCurOrient(environment, text, dirProps);
> uses dirProps which is not initialized at that point.
> 
As far as I know, creating a new byte array for dirProps (in the line preceding the call to getCurOrient() mentioned above) initializes all array elements to 0.
getCurOrient() checks for entries equal to 0 and may update some of them.
Comment 108 Matitiahu Allouche CLA 2011-05-01 18:11:41 EDT
(In reply to comment #99)
> On the latest code:
> 
> - Terminology: BidiComplexFeatures#operators => those would be easier to
> understand if they were called "separators" or "delimiters".

No problem. "Delimiters" is used with another meaning in BidiComplexDelims, so I will probably replace "operators" by "separators".
Comment 109 Matitiahu Allouche CLA 2011-05-09 11:05:39 EDT
Created attachment 195082 [details]
New version of Complex Expression package

This version implements comment #106 and comment #108 .
Comment 110 Matitiahu Allouche CLA 2011-05-09 11:10:58 EDT
Created attachment 195084 [details]
minor changes to test programs following implementation of comment #108

This patch implements comment #108 which changes terminology from "operators" to "separators".
Comment 111 Matitiahu Allouche CLA 2011-05-09 11:13:42 EDT
Any reaction to comments 101 to 105?
Comment 112 Matitiahu Allouche CLA 2011-05-09 11:16:23 EDT
Correction: any reaction to comments 100 to 105?
Comment 113 Oleg Besedin CLA 2011-05-17 15:57:13 EDT
Patches "New version of Complex Expression package" and "minor changes to test programs following implementation of comment #108" applied to CVS Head. Thanks Mati!
Comment 114 John Arthorne CLA 2011-05-17 16:28:39 EDT
Oleg, can you put iplog+ on any patches from Mati that you applied to head?
Comment 115 Oleg Besedin CLA 2011-05-17 16:42:48 EDT
(In reply to comment #114)
> Oleg, can you put iplog+ on any patches from Mati that you applied to head?

Done.
Comment 116 Matitiahu Allouche CLA 2011-05-25 12:41:11 EDT
Created attachment 196575 [details]
new version of package for structured text support

This patch implements change of terminology from "complex expression" to "structured text".
All class names have their prefix changed from "BidiComplex" to "SText".
There is no other change (e.g. change in program logic) in this patch.
Comment 117 Matitiahu Allouche CLA 2011-05-25 12:45:25 EDT
Created attachment 196576 [details]
tests for new version of package for structured text support

This attachment synchronizes the tests with the version of the structured text support package updated by attachment #196575 [details].
Comment 118 Oleg Besedin CLA 2011-05-27 10:08:36 EDT
Patched from 05-25 applied to CVS Head. Thanks Mati!
Comment 119 Oleg Besedin CLA 2011-06-10 09:48:38 EDT
(In reply to comment #100)
> (In reply to comment #99)
> > On the latest code:
> > 
> > - There are both IBidiComplexProcessor and BidiComplexProcessor. The interface
> > is not needed; the BidiComplexProcessor can be an abstract class with
> > getFeatures() being an abstract method.
> > 
> Yes, this is possible, and it was my initial design, when the corresponding
> component was named ComplExpBasic, and all processors were extensions thereof.
> I even suggested something similar in comment #39.
> However, from comments #28, #42 and #44 I got the impression that it is better
> to isolate in an interface those APIs that processors are assumed to implement
> and to leave in a separate class all the methods used by processors which
> should normally not be overridden.
> 
> Personally, I prefer the way of things as they are now, i.e. having a minimal
> interface and a support class, but changing it as suggested in comment #99 is
> no problem.
> Please reconsider the whole picture and give advice.

The ISTextProcessor (formerly IBidiComplexProcessor) and STextProcessor (formerly BidiComplexProcessor) describe the same API. So, you have two descriptions of the same methods. The are obvious drawbacks to this and no advantages.

A rule of thumb is that objects that can be extended by consumers are exposed as classes and objects that can't are exposed as interfaces. (The underlying reason is the ease of making changes while maintinng API compatibility.)

As in your case the object (text processor implementation) is expected to be subclassed by the consumer, we should leave STextProcessor and remove ISTextProcessor.
Comment 120 Oleg Besedin CLA 2011-06-10 09:52:57 EDT
(In reply to comment #101)
> (In reply to comment #99)
> > On the latest code:
> > 
> > - BidiComplexFeatures - it seems that they are tied with processors. Processors
> > need to have features and there does not seem to be much that can be done with
> > features outside of processors. Can we move the data from features into
> > BidiComplexProcessor and eliminate features as a separate entity?
> > 
> We can do that. If we do, users needing to use a processor with a modified
> value for some of its features will have to subclass this processor rather than
> to use it with a private instance of BidiComplexFeatures. (There is a use case
> of modifying the delimiters in BidiComplexUtil.process).
> I see one advantage and some inconvenients in this proposal.
> Pro: in the user APIs, the BidiComplexFeatures argument becomes unneeded.
> Cons:
>   - The user needs to deal with the whole processor instead of only the
> features.

The typical consumer won't be creating their own processors, but rather use predefined processors included with Eclipse.

>   - It will take more lines to subclass the processor than to create a
> BidiComplexFeatures instance.

For consumers who will create their own processors relevant data currently maintained by the BidiComplexFeatures can be passed in via class constructors.

>   - All the getters of BidiComplexFeatures will be transferred to
> BidiComplexProcessor. Generally, it is better to divide and conquer.
>   - When subclassing with all fields being static, all decisions have to be
> made at compile time. 

If fields are truly static, shouldn't they go in the "environment" portion?
Comment 121 Oleg Besedin CLA 2011-06-10 09:57:10 EDT
(In reply to comment #102)
> (In reply to comment #99)
> > On the latest code:
> > 
> > - Arabic and Hebrew - there is code to provide behavior specific to Arabic and
> > Hebrew scripts. The only place in the current code with the difference is in
> > the e-mail processing: in mirrored environment dirArabic = RTL and dirHebrew =
> > LTR. For Eclipse text processing, are there important distinctions between the
> > two scripts? How about other LTR scripts? 
> > 
> The way of presenting complex expressions should be standardized by the
> national bureau of each country using RTL scripts.
> For Israel, a draft standard exists which is in the process of getting approval
> before publishing to the public at large.
> For the Arabic countries, there is no standard that I am aware of. We know that
> behavior different from the Hebrew standard is expected by Arabic users for
> display of email addresses and probably also URLs (or rather IRIs), possibly
> regular expressions and XPath.
> The API must be flexible enough to accomodate different preferences according
> to the country, or at least the language.

I see, thank you for the explanation.
Comment 122 Oleg Besedin CLA 2011-06-10 10:00:11 EDT
(In reply to comment #104)
> (In reply to comment #99)
> > On the latest code:
> > 
> > - BidiComplexUtil#process(): The "new BidiComplexProcessor()" is inconsistent
> > as this class really is abstract - getFeatures() needs to be implemented.
> > Should it use BidiComplexDelims or its derivative?
> > 
> No, this class is not abstract: it implements a processor which splits the
> source text according to delimiters. Admittedly, the user must supply his own
> BidiComplexFeatures argument, since BidiComplexProcessor.getFeatures() throws
> an exception.

Well, that makes it an abstract class :-).
Comment 123 Oleg Besedin CLA 2011-06-10 10:03:43 EDT
(In reply to comment #105)
> (In reply to comment #99)
> > On the latest code:
> > 
> > - The methods on BidiComplexEngine that take "Object processor" as an argument.
> > That is confusing; I am not sure which will be best, for now I'd suggest all
> > "simple" API methods to take String; if String = null then equivalent of
> > TextProcessor is implied. (Same "Object" issue applies as it propagates into
> > implementation in BidiComplexImpl.)
> > 
> The current ability to supply a processor reference as first argument to
> BidiComplexEngine methods has 2 purposes:
> 1. Performance: if an application needs to use repeatedly the same processor,
> it can get a reference to the processor by giving its type string and from now
> on use the processor reference when invoking the user API. This avoids having
> to look up the processor in the registry each time a method is invoked,
> 2. It allows using processors which have not been registered, thus providing
> maximal flexibility.
> 
> A way to avoid having an Object processor argument could be to add to
> BidiComplexEngine a method getProcessor() taking a String argument and
> returning a (I)BidiComplexProcessor reference. Then the other methods in
> BidiComplexEngine will have as their first argument a (I)BidiComplexProcessor.
> This would add a getProcessor() call for users which always use the String
> processor type and can currrently use this string directly as first argument.

We can also create processor instances for the know types, such as those described in ISTextTypes and make them available as "final public static" somewhere. 

At any rate, it should not be an "Object". That is both confusing and error-prone.
Comment 124 Matitiahu Allouche CLA 2011-07-04 15:46:52 EDT
Created attachment 199068 [details]
new version of package for structured text support

This new version implements the following comments:
  - comment #119: removal of ISTextProcessor
  - comment #120: removal of STextFeatures and transfer of features into STextProcessor
  - comment #121: language-specific handling is internal to processor code
  - comment #122: STextUtil now implements getSeparators
  - comment #123: no more Object arguments to STextEngine methods; only processor instances are used.

In addition, the API of STextStringRecord has been modified as follows:
  - removal of triplets
  - hopefully more intuitive API
  - re-use of STextStringRecord instances to minimize creation and destruction of objects
Comment 125 Matitiahu Allouche CLA 2011-07-04 15:52:42 EDT
(In reply to comment #124)
Another change implemented by attachment 199068 [details]: the list of String literals for processor types in ISTextTypes has been removed.
Corresponding pre-defined processor instances for the built-in types of processors are now present in STextEngine.
Comment 126 Matitiahu Allouche CLA 2011-07-04 15:56:52 EDT
Created attachment 199070 [details]
tests for new version of package for structured text support

This is an update for the tests synchronized with the new version of the
support updated by patch 199068.
Comment 127 Matitiahu Allouche CLA 2011-07-05 08:50:38 EDT
Created attachment 199113 [details]
Fix for problem in STextStringRecord

This patch complements patch 199068 by fixing some problem in STextStringRecord.
Comment 128 Oleg Besedin CLA 2011-07-13 10:52:15 EDT
Thank you! Patches from comments 124 - 127 applied and pushed into Git "master" branch.

Mati, note that Equinox switched to use Git as a source control system; the "live" repository is located at git://git.eclipse.org/gitroot/equinox/rt.equinox.bundles.git .
Comment 129 Oleg Besedin CLA 2011-07-18 14:23:33 EDT
Created attachment 199855 [details]
Patch - minor API changes and initial Javadoc cleanup

Changes:

- cleaned up Javadoc to remove some references to internal implementaion and make it less verbose. The current Javadoc is by no means final and will need more attention
- changed implementation and tests to use environment, rather then get/set Locale directly - this exposes some little issues with dynamic environment
- combined several methods in STextEnvironment into one method #isProcessingNeeded(), removed static methods from this class
- added a constructors to the base processor that takes separator as an argument and removed STextUtil#MyProcessor
- renamed STextUtil#processTyped() into process()

This looks like a large code change, but in reality it is several minor tweaks with initial Javadoc cleanup.

The change is pushed into the Git master branch.
Comment 130 Oleg Besedin CLA 2011-07-18 15:05:25 EDT
Combined all functionality related to obtaining processors into STextProcessorFactory (from STextStringProcessor, STextEngine).

Commit ID 2883815e408713e8fb8200d84294a2b8789943e7.
Comment 131 Matitiahu Allouche CLA 2011-07-20 10:41:05 EDT
Created attachment 200008 [details]
Minor comment fix in a test program

This patch is a test to see if I can produce a valid patch using EGit.
It is *not* in git patch format. I hope that this is the proper choice.

Oleg, please verify if this patch is valid.
Comment 132 Oleg Besedin CLA 2011-07-20 10:48:45 EDT
(In reply to comment #131)
> Oleg, please verify if this patch is valid.

Yes, that works.
Comment 133 Matitiahu Allouche CLA 2011-07-21 05:06:29 EDT
(In reply to comment #129)

> - cleaned up Javadoc to remove some references to internal implementaion and
> make it less verbose. The current Javadoc is by no means final and will need
> more attention

1. Verbosity is a matter of taste and our appreciation may differ. 

2. However, you have removed user-oriented information about what APIs do, such as which bidi formatting characters are added when using which option. Without having this info available in the javadoc, the user has no recourse but looking into the source code.

3. In particular, method isProcessingNeeded in class STextEnvironment will not be used unless the javadoc explains how the result is determined.

4. In many places, references of the form {@link STextClass#someMethod someMethod} have been replaced by {@link STextClass#someMethod}. This causes the method to be listed with its full argument list, which IMHO blurs the message.
With the first form, the reader may follow the link if he wants to see the details of this API.
I don't understand the rationale for these changes.
Comment 134 Matitiahu Allouche CLA 2011-07-21 10:21:19 EDT
(In reply to comment #129)

> - changed implementation and tests to use environment, rather then get/set
> Locale directly - this exposes some little issues with dynamic environment

The use of get/setLocale in the tests was not arbitrary. Users may use the default locale for other reasons than handling structured text. The implementation of SText should be able to accomodate this situation, and the tests were written to check that it does.

Besides, I see a problem with the new code in STextEnvironment.
STextEnvironment.DEFAULT is a static instance. Its language member will be set when the class is loaded, according to the default locale at this moment.
Different threads using different locales may refer to the DEFAULT environment either explicitly or by calling STextEngine methods with a null environment argument. They are entitled to assume that the DEFAULT environment behaves according to their own default locale (which can be different from the default locale at load time), and this will not happen.
A similar problem will surface if the threads query the isProcessingNeeded method in STextEnvironment.DEFAULT, since the processingNeeded variable is a cached value and is computed once and for all.

The previous implementation did not have this problem, since it accessed the default locale each time an API was called which relied on that locale.

This is a real problem, since it is to be expected, and even recommended, that users use the DEFAULT environment, for two reasons:
- they should not have to manage both the locale and the STextEnvironment unless they need to specify non-default mirroring and orientation values.
- they should not have to create STextEnvironment instances if all they need is the DEFAULT environment.
Comment 135 Matitiahu Allouche CLA 2011-07-21 10:24:47 EDT
(In reply to comment #129)

> - renamed STextUtil#processTyped() into process()

Since STextUtil exists for providing compatibility with the existing TextProcessor API, I am not sure it is a good idea to change this API.
On the other hand, this is a minor change and the class name has changed anyway, so it cannot be too bad.
Comment 136 Matitiahu Allouche CLA 2011-07-21 10:29:12 EDT
Since after all the changes the only class remaining in org.eclipse.equinox.bidi.custom is STextProcessor, is it conceivable to transfer this class into org.eclipse.equinox.bidi and get rid of the .custom package?
Comment 137 Oleg Besedin CLA 2011-07-21 11:24:50 EDT
(In reply to comment #133)
> 2. However, you have removed user-oriented information about what APIs do, such
> as which bidi formatting characters are added when using which option. Without
> having this info available in the javadoc, the user has no recourse but looking
> into the source code.

This information is an implementation detail. All that consumer should be interested in is that the string will be processed so that it will be properly displayed. 

Describing implementation details somewhere might be interesting, but they should not be forced on users (meaning, it should be a separate help page - an article describing inner workings would be a very good thing) and they should say somewhere "this is not API, this is implementation detail. This is how we do things now, and we might change that later."

Providing that many details will both limit how you can change the code in future and will distract consumers with details they they don't need to know. 

Obligatory car example: do you expect that a car driver will be interested in knowing color wiring in the car ignition system? 

For the specific Javadoc in question, I moved it onto internal class STextImpl#leanToFullText().

That applies to points (1) and (3) in your comment.

===

In contrast, some API arguments such as
 byte[] dirProps, 
 int[] offsets, 
 int caseNumber
need to have a much better descriptions, or, probably, be turned into separate classes. 
For instance, for the dirProps:
 * @param  dirProps is a parameter received by <code>processSpecial</code>
 *         uniquely to be used as argument for calls to methods which
 *         need it.

In reality this is an array keeping directionality information for each character in the text with "0" meaning "unknown", and the rest being subset of static values from the Character class shifted by 2. In addition, it has an extra element at the end that keeps current orientation.

This is the information that a person implementing their own text processor would need to know.

> 4. In many places, references of the form {@link STextClass#someMethod
> someMethod} have been replaced by {@link STextClass#someMethod}. This causes
> the method to be listed with its full argument list, which IMHO blurs the
> message.
> With the first form, the reader may follow the link if he wants to see the
> details of this API.
> I don't understand the rationale for these changes.

That is how it is done in all other Javadoc. For a random example, see

http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.platform.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjface%2Ftext%2FITextHoverExtension2.html

or Java help itself:

http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Character.html#MAX_RADIX
Comment 138 Oleg Besedin CLA 2011-07-21 11:54:45 EDT
(In reply to comment #134)
> The use of get/setLocale in the tests was not arbitrary. Users may use the
> default locale for other reasons than handling structured text.
...
> Different threads using different locales ...

There is a contradiction here. To be precise, the code used to do:

  Locale.setDefault(new Locale("..."));

That is the static value and won't work well with multiple threads using different locales. (Which is a real use case that we have today in RAP.)

By contrast, SText's environment can be easily set to different things on a per thread or per call basis, if needed.

Hopefully that examplains why I changed Locale#get/setDefault() code to use SText environment. 

> Besides, I see a problem with the new code in STextEnvironment.
> STextEnvironment.DEFAULT is a static instance. Its language member will be set
> when the class is loaded, according to the default locale at this moment.

Yes, I agree, if you expect default language or orientation to change during the run, that would be a problem.

At present, changing default locale forces Eclipse to restart. In recent releases we added a mode where Eclipse supports multiple locales (on RCP level, not on IDE level), however, even then it still has a default locale that does not change while the instance is running.

To me, this approach works:
- you have a default language obtained from OS and runtime options which is not changing during the runtime (default SText environment);
- there is a way to request a different language (create a new instance of SText environment). 

*Note that multi-language support is done only on the extension registry level; it is not something presently tested or supported on IDE level.
Comment 139 Matitiahu Allouche CLA 2011-07-22 07:18:07 EDT
(In reply to comment #137)
> (In reply to comment #133)
> > 2. However, you have removed user-oriented information about what APIs do, such
> > as which bidi formatting characters are added when using which option. Without
> > having this info available in the javadoc, the user has no recourse but looking
> > into the source code.
> 
> This information is an implementation detail. All that consumer should be
> interested in is that the string will be processed so that it will be properly
> displayed. 
. . .
> Obligatory car example: do you expect that a car driver will be interested in
> knowing color wiring in the car ignition system? 
> 
This is not the proper metaphore IMHO. A car driver is not interested in the color ignition because it does not make any practical difference for him. But he should be interested e.g. in the Miles Per Gallon performance of the vehicle, because it influences decisions about budget, whether and when to use the car, etc...
Similarly, a bidi-aware user should be interested in what the API will do to the text, specially when this text may interact with neighboring text.
Comment 140 Matitiahu Allouche CLA 2011-07-22 07:35:01 EDT
(In reply to comment #137)
> (In reply to comment #133)
 

> In contrast, some API arguments such as
>  byte[] dirProps, 
>  int[] offsets, 
>  int caseNumber
> need to have a much better descriptions, or, probably, be turned into separate
> classes. 
> For instance, for the dirProps:
>  * @param  dirProps is a parameter received by <code>processSpecial</code>
>  *         uniquely to be used as argument for calls to methods which
>  *         need it.
> 
> In reality this is an array keeping directionality information for each
> character in the text with "0" meaning "unknown", and the rest being subset of
> static values from the Character class shifted by 2. In addition, it has an
> extra element at the end that keeps current orientation.
> 
> This is the information that a person implementing their own text processor
> would need to know.
> 
Absolutely not! The person implementing a text processor has methods getDirProp and setDirProp to get and set the character info, and he should not access directly the dirProps, offsets or caseNumber parameters, which are and should stay opaque for him.
The only reason they appear as parameters in the processor API is that we have no other means to exchange data between the code in STextImpl and the methods invoked by the processor.
Comment 141 Oleg Besedin CLA 2011-07-26 14:37:20 EDT
Created attachment 200389 [details]
A possible change for direction cache

(In reply to comment #140)
> Absolutely not! The person implementing a text processor has methods getDirProp
> and setDirProp to get and set the character info, and he should not access
> directly the dirProps, offsets or caseNumber parameters, which are and should
> stay opaque for him.

Yes, I agree, and that's a good argument for turning them into classes. Please have a look at this patch - it changes dirProps into a class. 

(Please ignore STextProcessorMultipass, that's a side thought that I am not yet sure about.)
Comment 142 Matitiahu Allouche CLA 2011-07-31 08:36:31 EDT
Created attachment 200629 [details]
Clean up after adding STextDirections

Clean up after adding STextDirections.

STextDirections has been renamed STextCharTypes.
STextProcessor.getSeparators and .getSpecialsCount have had superfluous
parameters removed.
Changed name of baseOrientation in STextCharTypes to orientation.
Several more minor changes.
Comment 143 Matitiahu Allouche CLA 2011-07-31 08:38:36 EDT
Created attachment 200630 [details]
Clean up test programs after adding STextCharTypes

This attachment is the counterpart of attachment 200629 [details] for test programs.
Comment 144 Matitiahu Allouche CLA 2011-07-31 09:01:41 EDT
Created attachment 200631 [details]
Add getDirection variant without STextCharTypes parameter for a few processor types

Add getDirection variant without STextCharTypes parameter for a few
processor types.

This should have been part of attachment 200629 [details].
Comment 145 Matitiahu Allouche CLA 2011-08-02 11:55:06 EDT
Created attachment 200742 [details]
Change STextEngine to STextHandler which can be instantiated

Change STextEngine to STextHandler which can be instantiated.

There are 3 new classes:
- STextHandler (was STextEngine)
- STextProcessorData (data structure which is passed to called methods)
- STextOffsets (API for dealing with the offsets array)

With this change,
- the "state" is handled automatically and does not need to be
  manipulated by the user/consumer.
- the parameter list in STextProcessor methods is streamlined.
- the handling of offsets is isolated in class STextOffsets.
Comment 146 Matitiahu Allouche CLA 2011-08-02 11:57:18 EDT
Created attachment 200743 [details]
Update test programs after changes in attachment 200742 [details]

This patch updates the test programs after the main code has been changed by attachment 200742 [details].
Comment 147 Matitiahu Allouche CLA 2011-08-03 08:11:51 EDT
Created attachment 200803 [details]
Remove 2 fields in STextProcessorData, fix STextCharTypes

Remove 2 fields in STextProcessorData, fix STextCharTypes

In STextProcessorData, remove fields orientation and prefixLength.

In STextCharTypes
- rename getOrientation to resolveOrientation (to avoid confusion with
  method of STextEnvironment)
- modify logic in resolveOrientation
- rename dirProp to charType and dirProps to charTypes in the whole
  package.
Comment 148 Matitiahu Allouche CLA 2011-08-03 08:13:52 EDT
Created attachment 200805 [details]
Update test programs after fixes to STextCharTypes

Update test programs after fixes to STextCharTypes in attachment 200803 [details]
Comment 149 Oleg Besedin CLA 2011-08-03 15:39:14 EDT
Applied patches 200629, 200630, 200631. Thank you!
Comment 150 Matitiahu Allouche CLA 2011-08-04 07:51:47 EDT
Created attachment 200905 [details]
Add STextOffsets, some code fixes, rename dirProp to charType

Class STextOffsets is added to manage the array of offsets.
Some code fixes are transferred from the version with STextHandler.
dirProp, dirProps are renamed everywhere charType, charTypes.
Comment 151 Matitiahu Allouche CLA 2011-08-04 07:53:25 EDT
Created attachment 200907 [details]
update test programs after adding STextOffsets

Update test programs following attachment 200905 [details].
Comment 152 Oleg Besedin CLA 2011-08-04 10:43:18 EDT
Applied patches 200905 and 200907. I made couple tiny changes to STextOffsets: added copyright header, added visibility qualifiers to its fields.

Thank you!
Comment 153 Matitiahu Allouche CLA 2011-08-13 20:08:31 EDT
Created attachment 201464 [details]
Cleanup, Rename and Merge Impl

Cleanup, big renaming operation, merge implementation files
Comment 154 Matitiahu Allouche CLA 2011-08-13 20:10:04 EDT
Created attachment 201465 [details]
Tests after renaming and merging
Comment 155 Matitiahu Allouche CLA 2011-08-13 23:58:50 EDT
Created attachment 201467 [details]
Cleanup, Rename, Merge Impl

This patch replaces patch 201464 (only touches the bidi packages, not the tests packages).
Comment 156 Matitiahu Allouche CLA 2011-08-14 00:11:36 EDT
Created attachment 201468 [details]
Tests after Cleanup, Rename, Merge Impl

This patch replaces patch 201465
Comment 157 Matitiahu Allouche CLA 2011-08-14 00:35:29 EDT
Created attachment 201469 [details]
CHange state to Object
Comment 158 Matitiahu Allouche CLA 2011-08-14 00:36:42 EDT
Created attachment 201470 [details]
Tests after changing state to Object
Comment 159 Oleg Besedin CLA 2011-08-15 11:41:33 EDT
Thank you! I applied patches "Cleanup, Rename, Merge Impl", "Tests after Cleanup, Rename, Merge Impl", "CHange state to Object", "Tests after changing state to Object".
Comment 160 Oleg Besedin CLA 2011-08-15 16:43:24 EDT
I took a liberty to change how state is exposed in APIs:
 - started passing the "expert" into STextTypeHandler#processSpecial() and using its set/getState() methods
 - removed STextState
 - rolled methods from ISTextExpertStateful into ISTextExpert and removed ISTextExpertStateful

I very slightly changed STextImpl (removed most "static" qualifiers and cleaned up state handling), but tried to keep my changes to the minimum to ease comparison.

Git log reference:
http://git.eclipse.org/c/equinox/rt.equinox.bundles.git/commit/?id=08145c3ce3c347e8e90f47329ac7db9775ff164f
Comment 161 Matitiahu Allouche CLA 2011-08-17 17:30:16 EDT
Created attachment 201676 [details]
Pass ISTextExpert instance as argument to STextTypeHandler methods, and other various changes

- Add checks to STextProcessor#insertMarks() on "direction" argument

- STextProcessor#process(String str, STextTypeHandler handler), STextProcessor#deprocess(String str, STextTypeHandler handler) -> make it STextProcessor#processType(String str, String stringType)

- ISTextExpert#getCurDirection() renamed to getTextDirection()

- ISTextExpert#resetState() - rename to clearState()

- STextEnvironment#ORIENT_CONTEXTUAL_LTR , ORIENT_CONTEXTUAL_RTL -> this shoudl be defined as "ORIENT_CONTEXTUAL | ORIENT_LTR ", "ORIENT_CONTEXTUAL = 1 << 1",
"ORIENT_UNKNOWN = 1 << 2", "ORIENT_IGNORE = 1 << 3"

- STextOffsets#ensureRoom() has been integrated into insertOffset().

- STextOffsets#getArray() renamed to getOffsets()

- STextOffsets#resetCount() renamed to clear()

- STextTypeHandler: instead of passing to all methods "STextEnvironment 
environment", pass the "expert" as an argument and add "ISTextExpert#getEnvironment() to allow greated flexibility.
A similar change has been made in STextCharTypes.

- The "internal" methods in STextImpl have been removed and the processing code moved to the body of the public methods.
Most member fields of STextImpl have been made final.
Added API for getEnvironment (used by Type handlers and getTypeHandler (used by STextCharTypes.
Comment 162 Matitiahu Allouche CLA 2011-08-17 17:33:43 EDT
Created attachment 201677 [details]
Tests after passing ISTextExpert instance as argument to STextTypeHandler methods, and other various changes
Comment 163 Oleg Besedin CLA 2011-08-18 09:52:53 EDT
Thank you! Patches 201676 and 201677 applied to Git master.

http://git.eclipse.org/c/equinox/rt.equinox.bundles.git/commit/?id=443d39df762926eff2c600a9cfd0f1e620dd7cb8
Comment 164 Matitiahu Allouche CLA 2011-08-19 08:19:02 EDT
Created attachment 201785 [details]
Move insertMarks() from STextProcessor to STextImpl, and other various changes

- Move insertMarks() from STextProcessor to STextImpl
- Remove STextDirection
- Add STextProcessor#getDefaultSeparators() and use it in STextExpertFactory
- Remove STextTypeHandlerFactory#getAllHandlerIDs()
- restore TestHandler1,2,3 and TestHandlerMyCommaxx as internal classes and remove the separate files
- make corresponding changes in plugin.xml
- Rename STextExpertFactory#getPrivateExpert() methos to getStatefulExpert()
Comment 165 Matitiahu Allouche CLA 2011-08-19 08:21:09 EDT
Created attachment 201787 [details]
Tests after moving insertMarks to STextImpl
Comment 166 Oleg Besedin CLA 2011-08-19 09:28:38 EDT
Patches 201785 and 201787 applied to Git master. Thank you!
Comment 167 Matitiahu Allouche CLA 2011-08-25 15:39:01 EDT
Created attachment 202172 [details]
Mainly javadoc update

Also a few minor changes:
- rename STextExpertFactory#getExpert(STextTypeHandler handler, STextEnvironment environment) to #getStatefulExpert(...).
- add some parameter checking in STextExpertFactory#getExpert and getStatefulExpert.
- remove unneeded emvironment argument in charTypes.resolveOrientation().
Comment 168 Matitiahu Allouche CLA 2011-08-25 15:41:10 EDT
Created attachment 202173 [details]
tests after changing mainly javadoc
Comment 169 Oleg Besedin CLA 2011-08-26 10:43:07 EDT
Applied patches 202172 and 202173. Thanks Mati!
Comment 170 Matitiahu Allouche CLA 2011-10-23 08:44:52 EDT
Created attachment 205767 [details]
change bidi support in bug 299031 to use STextProcessor instead of (old) TextProcessor

This patch shows that replacing the existing bidi support using TextProcessor by STextProcessor from the org.eclipse.equinox.bidi package is just a matter of changing the class name and the corresponding import statement.
Comment 171 Matitiahu Allouche CLA 2011-10-23 08:50:05 EDT
Created attachment 205768 [details]
fix bug 234451

This patch fixes the bidi defect in bug #234451. It shows how the classes in the org.eclipse.equinox.bidi packages can be used to fix structured text related problems.
Comment 172 Matitiahu Allouche CLA 2011-11-08 08:08:05 EST
Created attachment 206588 [details]
Javadoc updates, no changes in executing code
Comment 173 Oleg Besedin CLA 2011-11-08 15:11:36 EST
(In reply to comment #172)
> Created attachment 206588 [details]
> Javadoc updates, no changes in executing code

Applied patch to the Git master, thank you!

http://git.eclipse.org/c/equinox/rt.equinox.bundles.git/commit/?id=3450a64873b0b4c157c9a24d02255881447ab25c
Comment 174 Matitiahu Allouche CLA 2012-02-07 10:48:51 EST
Created attachment 210657 [details]
article describing what is structured text and how it is handled in Eclipse

This is the second version of this article after a first review by Oleg. The main difference is the addition of examples.
Comment 175 Oleg Besedin CLA 2012-02-15 11:23:12 EST
(In reply to comment #174)
> Created attachment 210657 [details]
> article describing what is structured text and how it is handled in Eclipse
> 
> This is the second version of this article after a first review by Oleg. The
> main difference is the addition of examples.

Thanks Mati! I restructured the article to better fit into Eclipse documentation style and placed it in the Eclipse help under


Platform Plug-in Developer Guide > Programmer's Guide > Runtime overview
	Structured text
		Terminology and design
		Supported text types

The Git pointer is:

http://git.eclipse.org/c/platform/eclipse.platform.common.git/commit/?id=2cf53b05f82a8aa6d78412453a1c3560e80b75e2