Bug 157007 - Unicode pages cleanup
Summary: Unicode pages cleanup
Status: NEW
Alias: None
Product: WTP Source Editing
Classification: WebTools
Component: wst.html (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: David Williams CLA
QA Contact: Nick Sandonato CLA
URL:
Whiteboard:
Keywords: helpwanted
Depends on:
Blocks:
 
Reported: 2006-09-12 06:56 EDT by Siarhei Berdachuk CLA
Modified: 2011-09-21 13:19 EDT (History)
1 user (show)

See Also:


Attachments
Unicode pages cleanup samples (7.80 KB, application/zip)
2006-09-13 09:02 EDT, Siarhei Berdachuk CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Siarhei Berdachuk CLA 2006-09-12 06:56:53 EDT
When we have web page with Unicode content, something like:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...

Then will be useful to convert special symbols like:
title="Eclipse RCP. &#1060;&#1072;&#1081;&#1083;&#1086;&#1074;&#1099;&#1081; &#1084;&#1077;&#1085;&#1077;&#1076;&#1078;&#1077;&#1088;" />

to supported Unicode symbols like:
 "Eclipse RCP. &#1060;&#1072;&#1081;&#1083;&#1086;&#1074;&#1099;&#1081; &#1084;&#1077;&#1085;&#1077;&#1076;&#1078;&#1077;&#1088;"

In this case we can reduce pages size and code becomes more readable.
You can try Oracle Java developer html editor, this feature present in it.
Comment 1 Siarhei Berdachuk CLA 2006-09-12 07:00:33 EDT
Sorry, but Unicode symbols converted by bugzilla to codes :(
Comment 2 Amy Wu CLA 2006-09-12 16:43:45 EDT
Perhaps you could attach an example
Comment 3 Siarhei Berdachuk CLA 2006-09-13 09:02:27 EDT
Created attachment 50023 [details]
Unicode pages cleanup samples

There are two same html Unicode pages in sample.
After Oracle JDeveloper cleanup, page size reduced almost two times and code becomes readable.
Original page was generated from docbook xml file.
Comment 4 David Williams CLA 2006-09-23 00:30:26 EDT
I'll leave as a feature request, but have to ask ... 
how do you convert from docbook xml file? Shouldn't that be the point
the characters are encoded to UTF8? 

Comment 5 Siarhei Berdachuk CLA 2006-09-23 13:52:56 EDT
I'm converted Docbook xml files with external program (xml mind editor).
I know that it is possible to prepare my own xsl files for transformation, but I have not so much time to learn xsl. And maybe it is possible to generate documents directly in UTF-8, but I don't know how to do this right now.
Comment 6 Nitin Dahyabhai CLA 2007-09-13 04:47:39 EDT
This is a great idea, but we don't have people to work on this just now.  There is still a chance it would fit in for 3.0, but we would accept high quality patches, regardless.