Bug 4251 - Text search for 'ß' gives results for 's' (1GKQ0XA)
Summary: Text search for 'ß' gives results for 's' (1GKQ0XA)
Status: RESOLVED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 2.0   Edit
Hardware: All Windows NT
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Erich Gamma CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2001-10-10 23:08 EDT by Dani Megert CLA
Modified: 2001-10-12 06:54 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2001-10-10 23:08:36 EDT
Text search for 'ß' gives results for 's'


NOTES:
DM (9/28/01 5:03:03 PM)
	StringMatcher uses toUppercase when "ignore case" is selected.
	This leads to errors because String.toUppercase() is not the same as
	calling toUppercase on each character and then concatenating. In this
	particular case 'ß' is converted to 'SS' and because the length of the
	pattern is 1 all occurrences of 's' are reported.

	Because the uppercase conversion is only needed at one place we
	can simply convert the characters there - no need to convert the
	patterns itself.

	Fixed > 0.202

NE (10/2/01 10:25:18 AM)
	I'm fixing this up in the Navigator's StringMatcher, but couldn't find a
	case where 'ß' is converted to 'SS'.
	If I just do "ß".toUpperCase(), it comes back as "ß".
	I also tried a few combinations with other characters.
	Could you give me a case where it converts to 'SS'?

NE (10/02/01 10:27:44 AM)
	Never mind.
	"uß".toUpperCase() -> "USS"

NE (10/2/01 10:35:03 AM)
	If I'm searching for a real word containing 'ß', and I don't care about case,
	what's the correct behaviour?
	E.g. if I'm searching for 'Strueßel', should it match "strueßel" and "STRUESSEL" but not "STRUEßEL",
	or should it match "strueßel" and "STRUEßEL" but not "STRUESSEL"?
	Please forgive the silly example.

NE (10/2/01 12:04:13 PM)  DM says:
	The correct uppercase word for "strueßel" would be "STRUESSEL". 
	A search engine however should not be that smart. 
	I would expect to find "strueßel" and "STRUEßEL" but not "STRUESSEL". 
	Kai-Uwe which is native German confirmed that.

NE (10/2/01 12:09:04 PM)
	This is good, because that's what String.compareToIgnoreCase does.
	It says:
     * Two characters <code>c1</code> and <code>c2</code> are considered
     * the same, ignoring case if at least one of the following is true:
     * <ul><li>The two characters are the same (as compared by the 
     * <code>==</code> operator).
     * <li>Applying the method {@link java.lang.Character#toUppercase(char)} 
     * to each character produces the same result.
     * <li>Applying the method {@link java.lang.Character#toLowercase(char) 
     * to each character produces the same result.</ul>

	Likewise for regionMatches(...) where ignoreCase==true.

	StringMatcher should be changed to use compareToIgnoreCase.
Comment 1 Claude Knaus CLA 2001-10-12 06:54:07 EDT
moved to fixed