Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cross-project-issues-dev] com.ibm.icu requirement

Guys,

Consider the following block of code:
    for (int codePoint = 0; codePoint <= Character.MAX_CODE_POINT; ++codePoint)
    {
      if (Character.isDefined(codePoint) &&
            Character.isWhitespace(codePoint) !=  UCharacter.isWhitespace(codePoint))
      {
        System.err.println("Character and UCharacter dissagree on codePoint=" + codePoint);
        System.err.println("   Character.isWhitespace(" + codePoint + ") == " +  Character.isWhitespace(codePoint));
        System.err.println("   UCharacter.isWhitespace(" + codePoint + ") == " +  UCharacter.isWhitespace(codePoint));
      }
    }

It produces the following trace
Character and UCharacter dissagree on codePoint=8199
   Character.isWhitespace(8199) == false
   UCharacter.isWhitespace(8199) == true
Character and UCharacter dissagree on codePoint=8203
   Character.isWhitespace(8203) == true
   UCharacter.isWhitespace(8203) == false
It's a bit disconcerting that they disagree.  What characters are these?  Why is there disagreement on these specific two? Is there one that's more properly correct and why? What are the implications of getting these characters "wrong" in any particular application, if there is a right or wrong.

here's how they are documented:

UCharacter.isWhitespace

Determines if the specified code point is a white space character. A code point is considered to be an whitespace character if and only if it satisfies one of the following criteria:

  • It is a Unicode space separator (category "Zs"), but is not a no-break space (\u00A0 or \u202F or \uFEFF).
  • It is a Unicode line separator (category "Zl").
  • It is a Unicode paragraph separator (category "Zp").
  • It is \u0009, HORIZONTAL TABULATION.
  • It is \u000A, LINE FEED.
  • It is \u000B, VERTICAL TABULATION.
  • It is \u000C, FORM FEED.
  • It is \u000D, CARRIAGE RETURN.
  • It is \u001C, FILE SEPARATOR.
  • It is \u001D, GROUP SEPARATOR.
  • It is \u001E, RECORD SEPARATOR.
  • It is \u001F, UNIT SEPARATOR.
This API tries to synch to the semantics of the Java API, java.lang.Character.isWhitespace().
Character.isWhitespace

Determines if the specified character (Unicode code point) is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:

  • It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
  • It is '\u0009', HORIZONTAL TABULATION.
  • It is '\u000A', LINE FEED.
  • It is '\u000B', VERTICAL TABULATION.
  • It is '\u000C', FORM FEED.
  • It is '\u000D', CARRIAGE RETURN.
  • It is '\u001C', FILE SEPARATOR.
  • It is '\u001D', GROUP SEPARATOR.
  • It is '\u001E', RECORD SEPARATOR.
  • It is '\u001F', UNIT SEPARATOR.

My favorite comment is "This API tries to synch to the semantics of the Java API, java.lang.Character.isWhitespace(). "  Maybe it could try harder! :-P

Regards,
Ed



Thomas Hallgren wrote:
Hi Igor,
You can safely fall back to using the Character.isWhitespace(). From all I know, there is in fact no difference between UCharacter and Character in that particular method. They both fall back on Unicode and a special set of Java rules.

Also, to my knowledge, the NLS support provided by the Subversion libraries that you are on top of is rudimentary. I doubt very much that it goes beyond whats provided by the standard Java platform which would make the use of ICU4J completely redundant for all your bundles. Does Subversion support a Hebrew Calendar?

Regards,
Thomas Hallgren


Igor V. Burilo wrote:
Hello All,
 
We reviewed Subversive's UI plugin (org.eclipse.team.svn.ui) and found that it also uses UCharacter class which is the only one class which makes it impossible to use com.ibm.use.base. We use only UCharacter.isWhitespace method. Is there any workaround to replace its usage in order not to have dependency to com.ibm.icu plugin, probably we can use java.lang.Character instead of it? Is it acceptable that Subversive has direct dependecy on com.ibm.icu plugin?
 
 

Best regards,
Burilo Igor

 


From: cross-project-issues-dev-bounces@xxxxxxxxxxx [mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On Behalf Of Thomas Hallgren
Sent: Tuesday, January 20, 2009 10:31 PM
To: Cross project issues
Subject: Re: [cross-project-issues-dev] com.ibm.icu requirement

John Arthorne wrote:

I agree you should be able to avoid using ICU4J in headless applications, if you don't need to build locale-specific representations of dates, times, etc. We don't use ICU4J at all in non-UI bundles in Equinox, and the Platform core bundles also don't use it for the most part. Maybe the answer for you is to avoid the problematic classes mentioned on http://wiki.eclipse.org/ICU4J altogether, which would remove any need for ICU4J in Buckminster. Is the problem for you that you have dependencies that in turn pull in the dependency on ICU4J?
Yes, that's the problem. One exampel is the Subversive adapter. As a response to the new requirement, Subversive now has a direct bundle requirement to the com.ibm.icu bundle. They have also some code that uses the UCharacter class which makes it impossible to use the com.ibm.icu.base bundle even if we'd like to. I have submitted a patch for this already to the Subversive project that rectifies the problem and hopefully they will accept it.

<grumpyMode>
I file that under yet another time consuming effort made to satisfy the new requirements that nobody was asking for.
</grumpyMode>

Regards,
Thomas Hallgren


_______________________________________________ cross-project-issues-dev mailing list cross-project-issues-dev@xxxxxxxxxxx https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev


_______________________________________________ cross-project-issues-dev mailing list cross-project-issues-dev@xxxxxxxxxxx https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev

Back to the top