Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [stellation-res] Stellation status

On Mon, 2002-11-25 at 07:52, Rodolfo M.Raya wrote:
> On Sun, 2002-11-24 at 17:18, Mark C. Chu-Carroll wrote:
> > On Sat, 2002-11-23 at 06:55, Rodolfo M.Raya wrote:
> > > Hi all,
> > > 
> > > After a long absence I'm in contact again.
> > > 
> > > I've been reading all the messages since I left my house about 3
> > > months ago and this is the first opportunity I have to participate
> > > again.
> > 
> > Welcome back!
> 
> Thanks
> 
> > 
> > > I remember a couple of messages about international users of
> > > Stellation. I may change the map quite a lot these days as I'm
> > > currently working in Thailand and plan to use it here, in Singapore
> > > and Beijing as soon as there is a stable release that you can
> > > recommend. I18n is not a problem now, but in a near future Stellation
> > > will be used in Korea too and it will be a major issue.
> > 
> > Jonathan looked at doing i18n, and to do it full-bore all through the
> > code was incredibly complicated. So we decided, at the time, to put it
> > off until there was someone who really needed it.
> 
> I know it is complicated. It involves a lot of extra work even in simple
> projects.
> 
> > My first question, before we get into any more detail, is how far
> > does the internationalization need to go? There's a bunch of
> > options:
> > 
> > (1) Be able to deal with international data. That is, make sure that
> >   all artifacts, comments, labels, names, etc., all work even in
> >   non-latin character sets. Since Java is working in UTF-8, as long as
> >   the databases can handle UTF-8 characters correctly, we're already
> >   there. But this requires testing to be sure in all the DBs - do you
> >   have any good UTF-8 non-latin data that you can give us for testing?
> > 
> > (2) Everything in (1) plus have all system messages be in translatable
> >   resources. Java and Eclipse both provide a lot of support for this,
> >   but it's still difficult. 
> > 
> > (3) Everything in (2) plus have all command names and options also be
> >   in translatable resources.
> > 
> > My opinion is that right now, (3) is probably not doable. Jonathan's
> > experiments with pulling all string literals out into translatable
> > resources showed what was involved. I think coping with it is just too
> > much complexity for us right now. 
> 
> I need (1) working since yesterday and (2) in a near future.
> 
> My plan is to use Stellation as a text repository with version control 
> for storing XML documents in many languages. Correct handling of Asian
> languages is a must. I'm trying to standardize on UTF-8 encoding, but
> anything can show up.

This is potentially a *very* serious problem, which is going to be
extremely hard for us to address. We're using Java IO, and Java IO
assumes that everything is encoded in UTF-8. If you push non-UTF8 text
containing 8 bit character data through Java IO, the misinterpretations
can cause data loss and/or corruption.

The only workaround with the current code is to use binary artifacts,
which use binary IO, and so avoid the Java UTF-8 assumptions. But binary
artifact storage is currently very inefficient, and doesn't support
merges. 

Since I don't know much about the character encodings in use, I'm not
sure of exactly what will work. *If* you can easily identify 
line breaks, *and* you can configure the database so that it will accept
and correctly store strings in your character encodings, then you can
probably create a variant of the current TextArtifact/TextArtifactAgent
classes that will use binary IO, and work with byte[], instead of
String. 

> > (2) we could do if it was really necessary, but we'd need you to really
> > dig in and help.
> 
> I will help with it.

Good. Make sure to grab a very recent copy of Eclipse. The latest
builds have better refactoring support, and I've heard that one of
the refactorings is an improved way of pulling inline String constants
out.

	-Mark

-- 
Mark Craig Chu-Carroll,  IBM T.J. Watson Research Center  
*** The Stellation project: Advanced SCM for Collaboration
***		http://www.eclipse.org/stellation
*** Work Email: mcc@xxxxxxxxxxxxxx  ------- Personal Email: markcc@xxxxxxxxxxx




Back to the top