157007 – Unicode pages cleanup

Bug 157007 - Unicode pages cleanup

Summary: Unicode pages cleanup

Status:	NEW

Alias:	None

Product:	WTP Source Editing
Classification:	WebTools
Component:	wst.html (show other bugs)
Version:	unspecified
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	David Williams
QA Contact:	Nick Sandonato

URL:
Whiteboard:
Keywords:	helpwanted

Depends on:
Blocks:

Reported:	2006-09-12 06:56 EDT by Siarhei Berdachuk
Modified:	2011-09-21 13:19 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
Unicode pages cleanup samples (7.80 KB, application/zip) 2006-09-13 09:02 EDT, Siarhei Berdachuk	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Siarhei Berdachuk

2006-09-12 06:56:53 EDT

When we have web page with Unicode content, something like:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...

Then will be useful to convert special symbols like:
title="Eclipse RCP. &#1060;&#1072;&#1081;&#1083;&#1086;&#1074;&#1099;&#1081; &#1084;&#1077;&#1085;&#1077;&#1076;&#1078;&#1077;&#1088;" />

to supported Unicode symbols like:
 "Eclipse RCP. &#1060;&#1072;&#1081;&#1083;&#1086;&#1074;&#1099;&#1081; &#1084;&#1077;&#1085;&#1077;&#1076;&#1078;&#1077;&#1088;"

In this case we can reduce pages size and code becomes more readable.
You can try Oracle Java developer html editor, this feature present in it.

Comment 1 Siarhei Berdachuk

2006-09-12 07:00:33 EDT

Sorry, but Unicode symbols converted by bugzilla to codes :(

Comment 2 Amy Wu

2006-09-12 16:43:45 EDT

Perhaps you could attach an example

Comment 3 Siarhei Berdachuk

2006-09-13 09:02:27 EDT

Created attachment 50023 [details]
Unicode pages cleanup samples

There are two same html Unicode pages in sample.
After Oracle JDeveloper cleanup, page size reduced almost two times and code becomes readable.
Original page was generated from docbook xml file.

Comment 4 David Williams

2006-09-23 00:30:26 EDT

I'll leave as a feature request, but have to ask ... 
how do you convert from docbook xml file? Shouldn't that be the point
the characters are encoded to UTF8?

Comment 5 Siarhei Berdachuk

2006-09-23 13:52:56 EDT

I'm converted Docbook xml files with external program (xml mind editor).
I know that it is possible to prepare my own xsl files for transformation, but I have not so much time to learn xsl. And maybe it is possible to generate documents directly in UTF-8, but I don't know how to do this right now.

Comment 6 Nitin Dahyabhai

2007-09-13 04:47:39 EDT

This is a great idea, but we don't have people to work on this just now.  There is still a chance it would fit in for 3.0, but we would accept high quality patches, regardless.