Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [science-iwg] Data Structures - part deux

Tracy,

Also, if you are the "self-appointed project manager" and will be contributing to the work, then you should be as a committer on the proposal.

Jay

On Wed, Jan 27, 2016 at 9:56 AM, Jay Jay Billings <jayjaybillings@xxxxxxxxx> wrote:
Tracy,

Thanks for getting this going. First, let me say that if we want January to be just about 'numpy for Java,' that is completely OK with me. We should just make that clear in the scope. In that case, we would be looking more at ICE and EAVP using January instead of the data structures from ICE and EAVP being moved into January.

I just shared a description of our data structures with Matt on the other thread. I have expanded it and share it below.

Jay

-----

Here's the code:


The goal of this package is to create general purpose data classes, structures and pattern realizations that can be mapped to a wide range of scientific problems while also maintaining metadata about that information. They are also all bound with JAXB so that they can be persisted to XML. Their design is verbose so that developers can almost immediately know how to pack their data into the classes. 

They are, in a sense, the exact opposite of IDataSet because they are design to store "higher-level" quantities meant for direct consumption by users (as opposed to reduction into a plot, etc.) We store all raw, n-dimensional data, in files and link to those files through our ResourceComponent.

Our long term goals with this are to switch this to an EMF model, optimize the way metadata is stored, use IDataSet to back structures like MatrixComponent and ResourceComponent (ILazyDataSet in this case), and allow developers to create their own Component implementations simply through annotations.

Consider, for example, a battery. If the state of that battery would be represented on disk by five quantities - say a string, two integers and two floats - and each of those quantities has associated metadata such as descriptions, ids, names, etc., then we could map them as follows:

Battery --> 1 instance DataComponent
Quantities 1-5 --> 5 instances of Entry

Let's consider another example: a 3D geometry. In this case, the developer would use a GeometryComponent and the associated CSG tree (which is moving to EAVP) to create a 3D geometry constructed from shapes and boolean operations on those shapes. Alternatively, they could construct that geometry purely from a mesh using a MeshComponent and Edges, Vertices, etc.

Other classes, such as ListComponent, offer Generic solutions to storing whatever data structure a user can come up with so long as they provide JAXB bindings on that class so that it can be written to disk.

After that, any collection of Components, etc. are stored in a root class called Form that is processed by the workflow engine and the UI. All of this creates a single gigantic tree structure that can be walked in O(N) time by smartly implementing the IComponentVisitor interface.


On Wed, Jan 27, 2016 at 8:34 AM, Tracy Miranda <tracy@xxxxxxxxxxxxxxxx> wrote:
Hi all, 

Following on from feedback for the January project proposal this is a thread for clarifying the scope and what the project should encompass. 

As a sort-of self-appointed product manager I'm looking at it from the user perspective trying to answer these questions:
- What is it all about?
- What problems does it solve?
- Who really gives a damn?

For the initial proposal, touted as a 'numpy for Java' I have good answers for all those questions (mainly from the proposal itself, and work on python integration with Java).

When it comes to expanding the scope, I'm guilty of getting excited about integrating all the tools and not necessarily understanding what the structures are or are good for and how they all fit together. 

I am certainly aware of specific use-cases beyond the current nd-array implementation, especially for the Triquetrum project, but it's pretty limited. 

So maybe best to start with both the ICE and EAVP data structures first - do some good knowledge transfer from Jay on the types of structures we are talking about and the usecases for these...

Tracy 



_______________________________________________
science-iwg mailing list
science-iwg@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/science-iwg



--
Jay Jay Billings
Oak Ridge National Laboratory
Twitter Handle: @jayjaybillings



--
Jay Jay Billings
Oak Ridge National Laboratory
Twitter Handle: @jayjaybillings

Back to the top