Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [science-iwg] Data Structures - part deux

+1

On Wed, Jan 27, 2016 at 10:03 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
There seems to be enough commonality between this proposal and the ICE work to warrant it being in the same project. It is very easy to structure a project so that components are separate, and allow people to obtain only the functionality they are interested in. This would also not preclude using the “NumPy for Java” tag, which is a good way of enhancing interest in the project.

You really want to avoid having a project with a small group of developers who have too much “ownership”. Projects like this tend to alienate other contributors, and have a lifespan that depends on the initial developers' ability to continue contributing. It is much more advantageous to have a broader community at the beginning, because you will be able to leverage the enthusiasm and expertise of a larger group in order to create an ecosystem around the project.

Notwithstanding this, clarifying the goals and objectives of the project is essential. Having a clearly articulated scope will help both contributors and users understand how they can get value out of it.

My 2 cents worth.

Greg

On Jan 27, 2016, at 9:56 AM, Jay Jay Billings <jayjaybillings@xxxxxxxxx> wrote:

Tracy,

Thanks for getting this going. First, let me say that if we want January to be just about 'numpy for Java,' that is completely OK with me. We should just make that clear in the scope. In that case, we would be looking more at ICE and EAVP using January instead of the data structures from ICE and EAVP being moved into January.

I just shared a description of our data structures with Matt on the other thread. I have expanded it and share it below.

Jay

-----

Here's the code:


The goal of this package is to create general purpose data classes, structures and pattern realizations that can be mapped to a wide range of scientific problems while also maintaining metadata about that information. They are also all bound with JAXB so that they can be persisted to XML. Their design is verbose so that developers can almost immediately know how to pack their data into the classes. 

They are, in a sense, the exact opposite of IDataSet because they are design to store "higher-level" quantities meant for direct consumption by users (as opposed to reduction into a plot, etc.) We store all raw, n-dimensional data, in files and link to those files through our ResourceComponent.

Our long term goals with this are to switch this to an EMF model, optimize the way metadata is stored, use IDataSet to back structures like MatrixComponent and ResourceComponent (ILazyDataSet in this case), and allow developers to create their own Component implementations simply through annotations.

Consider, for example, a battery. If the state of that battery would be represented on disk by five quantities - say a string, two integers and two floats - and each of those quantities has associated metadata such as descriptions, ids, names, etc., then we could map them as follows:

Battery --> 1 instance DataComponent
Quantities 1-5 --> 5 instances of Entry

Let's consider another example: a 3D geometry. In this case, the developer would use a GeometryComponent and the associated CSG tree (which is moving to EAVP) to create a 3D geometry constructed from shapes and boolean operations on those shapes. Alternatively, they could construct that geometry purely from a mesh using a MeshComponent and Edges, Vertices, etc.

Other classes, such as ListComponent, offer Generic solutions to storing whatever data structure a user can come up with so long as they provide JAXB bindings on that class so that it can be written to disk.

After that, any collection of Components, etc. are stored in a root class called Form that is processed by the workflow engine and the UI. All of this creates a single gigantic tree structure that can be walked in O(N) time by smartly implementing the IComponentVisitor interface.


On Wed, Jan 27, 2016 at 8:34 AM, Tracy Miranda <tracy@xxxxxxxxxxxxxxxx> wrote:
Hi all, 

Following on from feedback for the January project proposal this is a thread for clarifying the scope and what the project should encompass. 

As a sort-of self-appointed product manager I'm looking at it from the user perspective trying to answer these questions:
- What is it all about?
- What problems does it solve?
- Who really gives a damn?

For the initial proposal, touted as a 'numpy for Java' I have good answers for all those questions (mainly from the proposal itself, and work on python integration with Java).

When it comes to expanding the scope, I'm guilty of getting excited about integrating all the tools and not necessarily understanding what the structures are or are good for and how they all fit together. 

I am certainly aware of specific use-cases beyond the current nd-array implementation, especially for the Triquetrum project, but it's pretty limited. 

So maybe best to start with both the ICE and EAVP data structures first - do some good knowledge transfer from Jay on the types of structures we are talking about and the usecases for these...

Tracy 



_______________________________________________
science-iwg mailing list
science-iwg@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/science-iwg



--
Jay Jay Billings
Oak Ridge National Laboratory
Twitter Handle: @jayjaybillings
_______________________________________________
science-iwg mailing list
science-iwg@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/science-iwg


_______________________________________________
science-iwg mailing list
science-iwg@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/science-iwg



--
Jay Jay Billings
Oak Ridge National Laboratory
Twitter Handle: @jayjaybillings

Back to the top