178897 – Join needs to support object reference joins

Bug 178897 - Join needs to support object reference joins

Summary: Join needs to support object reference joins

Status:	VERIFIED FIXED

Alias:	None

Product:	z_Archived
Classification:	Eclipse Foundation
Component:	BIRT (show other bugs)
Version:	2.2.0
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	2.5.1 RC2
Assignee:	Wenjie Tu
QA Contact:	Tianli Zhang

URL:
Whiteboard:	Autoed
Keywords:	helpwanted

Depends on:	176396 285108
Blocks:
	Show dependency tree

Reported:	2007-03-22 17:13 EDT by Mike Boyersmith
Modified:	2009-09-08 02:34 EDT (History)
CC List:	9 users (show)

See Also:

Attachments
Multi-project patch to add support for object column and parameter types (56.85 KB, patch) 2009-06-10 10:14 EDT, Andreas Mayer	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mike Boyersmith

2007-03-22 17:13:53 EDT

So talking with the DTP folks in bugzilla entry

https://bugs.eclipse.org/bugs/show_bug.cgi?id=176396

It was pointed out that the 'Join' data sets code is owned by BIRT and the BIRT team would need to also implement support for object references in the join operation.  

So our RFE is to ask for the ability to take two tables both of which have columns that contain live object references and do a join based on object references. This would be a very fast operation compared to doing string compares, and useful to some of our use cases.

Comment 1 Lin Zhu

2007-03-27 09:15:53 EDT

To implement this feature we first need ODA to support Object reference data type.

Comment 2 Lin Zhu

2007-05-16 10:14:00 EDT

After discussion we defer this bug to post 2.2.0.

Comment 3 Andreas Mayer

2009-06-10 10:14:38 EDT

Created attachment 138796 [details]
Multi-project patch to add support for object column and parameter types

Hello,

I am in need for object column and parameter types, which I want to use in a Oda driver for EMF models. DTP 1.7 added the necessary methods to its API, but BIRT still lacks the corresponding code to handle such types. The attached multi-project patch will change this. However, there are still open issues: 

* The biggest issue is ParameterHint, which is used to evaluate parameter values. Unfortunately, it converts all values to strings. That's not a big deal for simple types such as numbers, but there is no way to parse an arbitrary object from its string representation. This is a show stopper for object parameters, but object columns are working fine.

* There are a lot of places that deal with conversion of data types (data type <-> column data type <-> parameter type) and the corresponding data values. I am not sure if I've found all of them. 

* Most of the conversion code also knows about DataType.UNKNOWN_TYPE and DataType.ANY_TYPE. Since objects are some kind of "any type", some of that code could now be obsolete. But I am not sure about that. 


Please let me know if there is any way, I can further contribute to resolve this bug. 

Andreas

Comment 4 Lin Zhu

2009-06-11 03:55:51 EDT

Hi There,

Thank you for your contribution.

However, as in BIRT 2.3 we've removed the ANY data type support (all the ANY data type related code are either removed or deprecated), by now you actually cannot define "ANY" type columns for joined data set. So that in BIRT all the data set can only contains BIRT built-in types data. As we cannot define a general "Object" data in data set, we can by no means support object reference join in BIRT.

We will need further investigation to see if this feature do need to be implemented by BIRT -- if this is the case we may need to re-introduce the ANY data type.

Thanks.
Lin

Comment 5 Wenfeng Li

2009-06-11 18:11:27 EDT

(In reply to comment #4)
> Hi There,
> Thank you for your contribution.
> However, as in BIRT 2.3 we've removed the ANY data type support (all the ANY
> data type related code are either removed or deprecated), by now you actually
> cannot define "ANY" type columns for joined data set. So that in BIRT all the
> data set can only contains BIRT built-in types data. As we cannot define a
> general "Object" data in data set, we can by no means support object reference
> join in BIRT.
> We will need further investigation to see if this feature do need to be
> implemented by BIRT -- if this is the case we may need to re-introduce the ANY
> data type.
> Thanks.
> Lin

Shall we consider adding java object reference type instead of ANY data type for dataset/result set?  since object reference could be handle more efficiently if we can decide on it is size (such as 4 bytes?).

We can have the metadata of the column to include the class id, so that the object reference can be casted if needed?

When two dataset join on the object reference column,  do we just need to compare the address of the reference, or do we intend to call a custom compare funtion passing the object references from the two table?

Comment 6 Linda Chan

2009-06-11 18:59:25 EDT

It might not be necessary to add a new BIRT API data type for the Java object reference type.  An ODA JavaObject type can be treated similarly as the BIRT DataType.BINARY_TYPE, which is mapped to a "String" ROM data type in the Designer.

In order to compare the object value from an ODA result set column (defined as an ODA JavaObject type), e.g. for join and/or sorting, its class must implement the java.lang.Comparable interface. An of course, all values from the same result column must be homogeneous. 
Similarly, in order to be serialized properly in a report document, the Object reference must implement java.io.Serializable.  If not, Object#toString will be used to persist it, etc.  IOW, it is the responsibilty of an oda driver to implement the required interface for its object reference, if it wants to be consumed properly as an object type.  The ODA framework itself does not require a ODA driver implementation to provide a wrapper for such Java objects.

Comment 7 Andreas Mayer

2009-06-15 08:41:46 EDT

(In reply to comment #4)

> However, as in BIRT 2.3 we've removed the ANY data type support (all
> the ANY data type related code are either removed or deprecated), by
> now you actually cannot define "ANY" type columns for joined data
> set. So that in BIRT all the data set can only contains BIRT
> built-in types data. 

After reading your comment, I recognized that I've mainly re-implemented the ANY data type -- after all there is not much difference between ANY and an (arbitrary) Java object reference.

Why has the ANY data type been removed? Does this also mean that the recent ODA API enhancements (support of Java object data type in result set columns and parameters) will not be supported by BIRT?

> As we cannot define a general "Object" data in data set, we can by
> no means support object reference join in BIRT.

I have a (somewhat) different use case for object references. As I indicated in my first comment (#3), I want to use EMF models as input for reports and implemented an appropriate ODA driver (also see [1, 2]). The driver's data sets take object references (a single reference or a collection of references) as input, so that you can have, for example, a data set of all books of a particular author. Currently I am using URIs to represent the references, since BIRT restricts me to basic types and the URIs are already provided by EMF. Real object references would simplify the driver code a lot. 

In some cases even the report design would be simplified. For example, let's assume we have a library model with writers and books. If you could call into a writer (an object, not a dumb string), you just could add a computed column such as

  bookrow["author"].name

or call the getter ad-hoc in the dynamic expression. This would save you an additional binding in the report definition. 

> We will need further investigation to see if this feature do need to
> be implemented by BIRT -- if this is the case we may need to
> re-introduce the ANY data type.

It certainly is not essential. But I hope to have shown that it opens some possibilities. 

[1] http://dev.eclipse.org/newslists/news.eclipse.dtp/msg01410.html

[2] http://dev.eclipse.org/newslists/news.eclipse.dtp/msg01441.html

Comment 8 Linda Chan

2009-06-18 18:10:51 EDT

(In reply to comment #7)

Andreas,
RE: your use case to use object reference type for a data set's input parameter,

Do you mean that the Object value in a data set row will be passed by script to be the input value of another data set's input parameter?
Having a new BIRT data type for Java Object would indeed support such case.
However, if such type of data set parameter is linked to a report parameter, its input value cannot be easily collected through the UI. To keep it simple, it may be best to initially disallow mapping a JavaObject typed data set parameter to a report parameter.  Will that work for your use cases?

Separately, the "Any" data type is deprecated due to the ambiguity on what its "correct" or actual data type is, and how to best handle its values.
It is better to introduce a new BIRT API data type for Java Object.  Its implementation would in some cases take similar code path as "Any". The advantage of having an explicit Object data type, is that it will be defined with an explicit set of rules on how it would get consumed by BIRT. And thus no need to infer the "correct" data type.  
 
We (BIRT committers) are looking at the best solution for this, and plan to propose a spec for the feature.  We will however need code contribution for its implementation from the community like yourself.

Comment 9 Andreas Mayer

2009-06-19 08:52:02 EDT

(In reply to comment #8)

> Do you mean that the Object value in a data set row will be passed by script to
> be the input value of another data set's input parameter?

Yes, that's right. The object is then used as starting point for the data set, for example, a set of all books of the given writer. 

> Having a new BIRT data type for Java Object would indeed support such case.
> However, if such type of data set parameter is linked to a report parameter,
> its input value cannot be easily collected through the UI. To keep it simple,
> it may be best to initially disallow mapping a JavaObject typed data set
> parameter to a report parameter.  Will that work for your use cases?

Yes, that will work for me. 

> It is better to introduce a new BIRT API data type for Java Object.  Its
> implementation would in some cases take similar code path as "Any". The
> advantage of having an explicit Object data type, is that it will be defined
> with an explicit set of rules on how it would get consumed by BIRT. And thus no
> need to infer the "correct" data type.  

You think of something like IObject, which has to be implemented by the objects? I would have to wrap the EObjects from the EMF models, but that should be no problem. 

At least in my use case the objects won't be consumed by BIRT. The objects will only be passed between data sets and used to navigate the EMF model. However, there should be a suitable textual representation for the data set preview. 

It also may be useful if object could be passed to report items, for example, a report item that generates a graphical representation of the object. 

> We (BIRT committers) are looking at the best solution for this, and plan to
> propose a spec for the feature. 

Is there a rough time frame for this?

> We will however need code contribution for its
> implementation from the community like yourself.

That's a deal. :-)
-- 
Andreas

Comment 10 Linda Chan

2009-07-11 01:37:16 EDT

(In reply to comment #9)

>> something like IObject, which has to be implemented by the objects?

Instead of introducing new interface for this, we will simply use existing JDK interfaces, i.e. java.lang.Comparable and java.io.Serializable, whose implementation are optional (as described in comment #6).
IOW, it is the responsibilty of an ODA driver to implement the appropriate interface(s) for its object reference, if it wants to be consumed properly as an object type. 

To summarize, the proposed feature is:

Add a new BIRT API data type for "Java Object". This new BIRT API data type will be mapped (indirectly) to/from an ODA JavaObject data type found in a data set column or data set parameter.
However, this new API data type will not be allowed on a a BIRT report parameter.  (It may be a future enhancement if there are valid use cases.)  Such limitation would avoid the complexity involved in collecting a proper Object input value through the report parameter requester UI.

For report output formatting and computation purpose, a JavaObject-typed data item will be handled like a String, using the value returned by Object#toString.

In performing join, sort, and/or filtering operations on a JavaObject-typed data item, the Object reference class implementation of java.lang.Comparable, if implemented,  will be used.  Otherwise, it will be handled like a String by default, using the value returned by Object#toString.
For example, if a comparison by an instance hashcode is sufficient, it is up to the object ref implementation to optimize its implementation of the Comparable interface as appropriate.

Similarly, to serialize the value of a JavaObject-typed data item in a report document, the Object reference class implementation of java.io.Serializable, if implemented, will be used.  Otherwise, it will be handled like a String by default, using the value returned by Object#toString.

Comment 11 Andreas Mayer

2009-07-13 09:59:52 EDT

(In reply to comment #10)

> Instead of introducing new interface for this, we will simply use existing JDK
> interfaces, i.e. java.lang.Comparable and java.io.Serializable, whose
> implementation are optional (as described in comment #6).

This implies, that I don't have to implement any of these interfaces, if my objects are never compared or persisted in a report document, doesn't it?

> To summarize, the proposed feature is:
> 
> Add a new BIRT API data type for "Java Object". This new BIRT API data type
> will be mapped (indirectly) to/from an ODA JavaObject data type found in a data
> set column or data set parameter.

What's the difference between this solution and the deprecated any-type? Again you will handle all the defined types (integer, real, string, date, ...) and treat anything else as JavaObject. 

> However, this new API data type will not be allowed on a a BIRT report
> parameter.  (It may be a future enhancement if there are valid use cases.) 

I have (currently) no need for this, but what's so special about report parameters? You face the same issue with default values of parameters or the value list for the "in"-operator, for example. You simply could use the JavaScript-syntax to obtain or create a particular JavaObject instance or provide for a conversion from a string representation to the particular JavaObject type.

Will you be able to have JavaObject parameters for report items?

> For report output formatting and computation purpose, a JavaObject-typed data
> item will be handled like a String, using the value returned by
> Object#toString.

It's okay to use the string representation for output formatting. But what do you mean by computation purpose? In my book the point of this feature is to have custom objects in your JavaScript expressions that you can call into.So these values must not be converted to strings until the expressions have been evaluated.

> In performing join, sort, and/or filtering operations on a JavaObject-typed
> data item, the Object reference class implementation of java.lang.Comparable,
> if implemented,  will be used.  Otherwise, it will be handled like a String by
> default, using the value returned by Object#toString.
> For example, if a comparison by an instance hashcode is sufficient, it is up to
> the object ref implementation to optimize its implementation of the Comparable
> interface as appropriate.

Either a comparion by equality or identity should do for a join. I can think of many cases, where the objects have no order (required for Comparable) but still can be compared by identity or equality.

Regarding filters, there should be no special treatment for JavaObject. The value of the filter's JavaScript expression has to be converted to the type expected by the filter operator anyway: Comparable for relational operators, boolean for "Is True" or "Is False", String for "LIKE", etc.

> Similarly, to serialize the value of a JavaObject-typed data item in a report
> document, the Object reference class implementation of java.io.Serializable, if
> implemented, will be used.  Otherwise, it will be handled like a String by
> default, using the value returned by Object#toString.
 
In which cases are values persisted in the report document, that is when have the values to be serializable?

Comment 12 Linda Chan

2009-07-22 19:17:28 EDT

(In reply to comment #11)

>> This implies, that I don't have to implement any of these interfaces, if my
objects are never compared or persisted in a report document, doesn't it?

Yes.

>> What's the difference between this solution and the deprecated any-type?

The "Any" data type expects BIRT to somehow figure out what the 
"correct" or actual data type is, and how to best handle its values.  E.g. if an Any-typed parameter turns out to have a Date value, it was supposed to get handled as a Date-typed parameter. Whereas a data item defined with this new Java Object data type, will be handled per the rules described earlier (in comment #10), even if its value is actually one of the other BIRT supported data types.

>> what's so special about report parameters (that the new API data type is not allowed)?
A report parameter triggers the UI to prompt for input value when running a report in the report designer.  Keeping this UI implementation simple is the main factor here.

>> what do you mean by computation purpose? 
E.g. a data set column is referenced in a computation JS expression , such as   row{"myObjectColumn") + "_suffix"

>> the objects have no order (required for Comparable) but still can be compared by identity or equality.
It would be up to the referenced object class implementation of Comparable to compare by identity or equality and returns 0 accordingly.  For the non-equal cases, it can use its hashcode to determine the ordering.

>> In which cases are values persisted in the report document
When a report design is run to generate report output, the output may optionally be persisted in a report document, which stores the data set column values.  If you do not have such use cases, no need then to have the values serializable.

Comment 13 Andreas Mayer

2009-07-23 08:27:54 EDT

(In reply to comment #12)

>>> For report output formatting and computation purpose, a JavaObject-typed data
>>> item will be handled like a String, using the value returned by
>>> Object#toString.

>> what do you mean by computation purpose? 

> E.g. a data set column is referenced in a computation JS expression , such as  
> row{"myObjectColumn") + "_suffix"

If row("myObjectColumn") returns an Object, then JavaScript will already convert it to a string in order to concatenate "_suffix". In addition, I still will be able to write 

  row("myObjectColumn").name + ", " row("myObjectColumn").surename

which won't work with an implicit conversion to string. In my opinion, you sacrifice flexibility and get nothing in return by such conversions.
 
> >> the objects have no order (required for Comparable) but still can be compared by identity or equality.
> It would be up to the referenced object class implementation of Comparable to
> compare by identity or equality and returns 0 accordingly.  For the non-equal
> cases, it can use its hashcode to determine the ordering.

A join should use Object.equals() instead of Compare.compareTo().

Comment 14 Wenjie Tu

2009-08-19 04:11:11 EDT

(In reply to comment #10)
> (In reply to comment #9)

> Similarly, to serialize the value of a JavaObject-typed data item in a report
> document, the Object reference class implementation of java.io.Serializable, if
> implemented, will be used.  Otherwise, it will be handled like a String by
> default, using the value returned by Object#toString.
> 
Besides report document generation, BIRT internal disk cache and cube generation all need to save objects. If un-serializable object is converted to string automatically after disk I/O, there may be some unpredictable or undesirable effects. For example, object class itself implements Comparable but not Serializable and user defines Sort based on it in report. Then, the real sort result will in fact depend on whether I/O operation is involved.  
What¡¯s more, It¡¯s hard for common users to understand in what cases I/O operations will be triggered.

So, in current implementation, Exception will be thrown out to remind user if BIRT is trying to save un-serializable objects.

Comment 15 Wenjie Tu

2009-08-19 04:18:50 EDT

(In reply to comment #13)
> (In reply to comment #12)

> A join should use Object.equals() instead of Compare.compareTo(). 
> 
BIRT always uses Comparable.compareTo() to compare objects. If object itself is not a Comparable instance, object.toString().compareTo() will be called instead as Linda said in comment #10.

Comment 16 Wenjie Tu

2009-08-19 04:22:15 EDT

Fixed with https://bugs.eclipse.org/bugs/show_bug.cgi?id=285108

Comment 17 Tianli Zhang

2009-08-27 03:22:11 EDT

Verified this feature in bug #285108.