Community
Participate
Working Groups
Text search for 'ß' gives results for 's' NOTES: DM (9/28/01 5:03:03 PM) StringMatcher uses toUppercase when "ignore case" is selected. This leads to errors because String.toUppercase() is not the same as calling toUppercase on each character and then concatenating. In this particular case 'ß' is converted to 'SS' and because the length of the pattern is 1 all occurrences of 's' are reported. Because the uppercase conversion is only needed at one place we can simply convert the characters there - no need to convert the patterns itself. Fixed > 0.202 NE (10/2/01 10:25:18 AM) I'm fixing this up in the Navigator's StringMatcher, but couldn't find a case where 'ß' is converted to 'SS'. If I just do "ß".toUpperCase(), it comes back as "ß". I also tried a few combinations with other characters. Could you give me a case where it converts to 'SS'? NE (10/02/01 10:27:44 AM) Never mind. "uß".toUpperCase() -> "USS" NE (10/2/01 10:35:03 AM) If I'm searching for a real word containing 'ß', and I don't care about case, what's the correct behaviour? E.g. if I'm searching for 'Strueßel', should it match "strueßel" and "STRUESSEL" but not "STRUEßEL", or should it match "strueßel" and "STRUEßEL" but not "STRUESSEL"? Please forgive the silly example. NE (10/2/01 12:04:13 PM) DM says: The correct uppercase word for "strueßel" would be "STRUESSEL". A search engine however should not be that smart. I would expect to find "strueßel" and "STRUEßEL" but not "STRUESSEL". Kai-Uwe which is native German confirmed that. NE (10/2/01 12:09:04 PM) This is good, because that's what String.compareToIgnoreCase does. It says: * Two characters <code>c1</code> and <code>c2</code> are considered * the same, ignoring case if at least one of the following is true: * <ul><li>The two characters are the same (as compared by the * <code>==</code> operator). * <li>Applying the method {@link java.lang.Character#toUppercase(char)} * to each character produces the same result. * <li>Applying the method {@link java.lang.Character#toLowercase(char) * to each character produces the same result.</ul> Likewise for regionMatches(...) where ignoreCase==true. StringMatcher should be changed to use compareToIgnoreCase.
moved to fixed