Monday, May 7, 2012

The Future of NoSQL with Java EE


I've been following the recent NoSQL momentum since some time now and it seems as if this buzzword also is drawing some kind of attention in the enterprise java world. Namely EclipseLink 2.4 started supporting MongoDB and Oracle NoSQL. Having EclipseLink as the JPA reference implementation you might wonder what this means for Java EE 7. A short side-note here: Even if I am part of the JSR-342 EG this isn't meant to be an official statement. In the following I simply try to summarize my own personal experiences and feelings towards NoSQL support with future Java EE versions. A big thank you goes out to Emmanuel Bernard for providing early feedback! Happy to discuss what follows:

What is NoSQL?
NoSQL is a classification of database systems that do not conform to the relational database or SQL standard. Most often they are categorized according to the way they store the data and fall under categories such as key-value stores, BigTable implementations, document store databases, and graph databases. In general the term isn't well enough defined to reduce it to a single supporting JSR or technology. So the only way to find suitable integration technologies is to dig through every single category.

Key/Value Stores
Key/Value stores allow data storage in a schema-less way. It could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model. This is obviously comparable to parts of JSR 338 (Java Persistence 2.1) and JSR 347 ( Data Grids for the Java Platform) and also to what is done with JSR 107 (JCACHE - Java Temporary Caching API).


with native JPA2
Also primary aimed at caching is the JPA L2 Cache. The JPA Cache API is good for basic cache operations, while L2 cache shares the state of an entity -- which is accessed with the help of the entity manager factory -- across various persistence contexts. Level 2 cache underlies the persistence context, which is highly transparent to the application. When Level 2 cache is enabled, the persistence provider will look for the entities in the persistence context first. If it does not find them there, the persistence provider will look in the Level 2 cache next instead of sending a query to the database. The drawback here obviously is, that as of today this only works with NoSQL as some kind of "Cache". And not as a replacement for the RDBMS data store. Given the scope of this spec it would be a good fit: But I strongly believe that JPA is designed to be an abstraction on RDBS and nothing else. If there has to be some kind of support for non relational databases we might end up having a more high level abstraction layer in place which tons of different persistence modes and features (maybe something like Spring Data). Generally mapping at the object level has many advantages including the ability to think object and let the underlying engine drive the de-normalization if needed. So reducing JPA to the caching features probably is the wrong decision.


with JCache
JCache having a CacheManager that holds and controls a collection of Caches and every single Caches have it's entries. The basic API can be thought of map-­like with additional features (compare Greg's blog). With JCache being designed as a "Cache" using it as a standardized interface against NoSQL data stores this isn't a good fit on the first look. But given the nature of the use-cases for unstructured Key/Value based data with enterprise java this might be the right kind of integration. And the NoSQL concept also allows for the "Key-value cache in RAM" category which is an exact fit for both JCache and DataGrids.

with DataGrids
This JSR proposes an API for interacting with in-memory and disk-based distributed data grids. The API aims to allow users to perform operations on the data grid (PUT, GET, REMOVE) in an asynchronous and non-blocking manner returning a java.util.concurrent.Futures rather than the actual return values. The process here is not really visible at the moment (at least to me). So there aren't any examples or concepts for integration of a NoSQL Key/Value store available until today. Beside this the same reservations as for the JCache API are in place.

with EclipseLink
EclipseLink's NoSQL support is based on previous EIS support offered since EclipseLink 1.0. EclipseLink's EIS support allowed persisting objects to legacy and non-relational databases. EclipseLink's EIS and NoSQL support uses the Java Connector Architecture (JCA) to access the data-source similar to how EclipseLink's relational support uses JDBC. EclipseLink's NoSQL support is extendable to other NoSQL databases, through the creation of an EclipseLink EISPlatform class and a JCA adapter. At the moment it supports MongoDB (Document Oriented) and Oracle NoSQL (BigData). It's interesting to see, that Oracle doesn't address the Key/Value DBs first. Might be because of the possible confusion with the Cache features (e.g. Coherence).

Column based DBs
Read and write is done using columns rather than rows. The best known examples are Google's BigTable and the likes of HBase and Cassandra that were inspired by BigTable. The BigTable paper says that BigTable is a sparse, distributed, persistent, multidimensional sorted Map. GAE for example works only with BigTable. It offers variety of APIs: from "native" low-level API to "native" high-level ones (JDO and JPA). With the older Datanucleus version used by Google there seem to be a lot of limitations in place which could be removed (see comments) but still are in place.

Document-oriented DBs
The Document-oriented DBs are most obviously best addressed by JSR 170 (Content Repository for Java) and JSR 283 (Content Repository for Java Technology API Version 2.0). With JackRabbit as a reference implementation it's a strong sign for that :) The support for other NoSQL document stores is non existent as of today. Even Apache's CouchDB doesn't provide a JSR 170/283 compliant way of accessing the documents.  The only drawback is that both JSR's aren't sexy or bleeding edge. But for me this would be the right bucket to put support for document-oriented DBs. Flip side of the medal? The content  repository API isn't exactly a natural model for an application. Does an app really want to deal with Nodes and attributes in Java? The notion of a domain model works nicely for many apps and if there is no chance to use it, you probably would be better off going native and use the MondoDB driver directly.

Graph oriented DBs
This kind of databases are thought for data whose relations are well represented with a graph-style (elements interconnected with an undetermined number of relations between them). Aiming primarily at any kind of network topology the recently rejected JSR 357 (Social Media API) would have been a good place to put support. At least from a use-case point of view. If those graph-oriented DBs are considered as a data-store there are a couple of options. If the Java EE persistence is steering into the direction of a more general data abstraction layer the 338 or it's successors would be the right place to put support. If you know a little bit about how Coherence works internally and what had to be done to put JPA on top of it you also could consider 347 a good fit for it. With all the drawbacks already mentioned. Another alternative would be to have a separate JSR for it. The most prominent representative of this category is Neo4J which itself has an easy API available to simply include everything you need directly into your project. There is additional stuff to consider if you need to control the Neo4J instance via the application server.

Conclusion
To sum it up: We already have a lot in place for the so-called "NoSQL" DBs. And the groundwork for integrating this into new Java EE standards is promising. Control of embedded NoSQL instances should be done via JSR 322 (Java EE Connector Architecture) with this being the only allowed place spawn threads and open files directly from a filesystem. I'm not a big supporter of having a more general data abstraction JSR for the platform comparable to what Spring is doing with Spring Data. To me the concepts of the different NoSQL categories are too different than to have a one-size-fits-all approach. The main pain point of NoSQL besides the lack of standard API is that users are forced to denormalize and maintain de-normalization by hand.

What I would like to see are some smaller changes to both the products to be more Java EE ready and also to the way the integration into the specs is done. Might be a good idea to simply define the different persistence types and generally define the JSRs which could be influenced by this and noSQLing those accordingly.


For users willing to facilitate a domain model (ie a higher level of abstraction compared to the raw NoSQL API), JPA might be the best vehicle for that at the moment. The feedback from both EclipseLink and Hibernate OGM users is needed to value what is working and what not. From a political point of view it might also make sense to pursue 347. Especially since main big players are present here already.

The really hard part is querying.  Should there be standardized query APIs for each family? With Java EE? Or would that better be placed within the NoSQL space? Would love to read your feedback on this!

Thursday, May 3, 2012

New Article in German iX Magazin: WebLogic 12c


Another article hit the road a few days back. This time a short review in German iX Magazine 5/2012 about Oracles WebLogic 12c.

Rising Star
With more than 200 new features as well as the long missing Java EE 6 compliance WebLogic 12c is designed to impress the customers. The small "c" stands for the ever-present cloud which also is powered by the application server now.

This is a German article and you can either grab the latest issue online or buy it at your favorite kiosk.

Find some other articles of mine by search this blog for posts labeled "article" and you get some results.

Monday, April 30, 2012

JavaOne 2012 Analysis - Submitted Proposals and Speaker Distribution.


Beginning some time last year I started to have a closer look at conferences and their speakers. My main interest was to find out who was speaking how often. One conference was missing in this analysis because I really was not sure what can be published without breaking the confidentiality of the information. Being a member of the program committee for the second time this year and seeing all those wonderful sessions forced me to take another look at it and finally today I have at least some percentages to show to you. A big thank you goes out to Oracle's Sharat Chander for giving the permission to do that!
Based on the complete data for what has been submitted to JavaOne 2012 in San Francisco I will let you have a look at types, distribution and speakers. Every number given here is a percentage and the numbers behind them are still confidential. And again: This is an analysis of the complete submitted data. This doesn't tell you anything about what is going to be selected! The voting is still ongoing and the different program committees are hard at work reviewing every single proposal.

Submission Types
First of all let's look at the general distribution of submitted types independently of any track. Speakers could select any of five different types for their submission. The classic session, a BoF (Birds of a feather) a tutorial, a HoL (Hands on Lab) and for the first time this year a community keynote.
Not a big surprise that most of the submissions are sessions (70,14%). Second most proposed content are BoFs. Followed by tutorials, HoLs and some community keynote proposals. Even if this sounds very concrete, there is still some motion in here. Some BoFs might become sessions and the other way around.

Submission Types
Submissions per Track
Next most interesting figure is the general distribution of submissions per track. Seven tracks are there to chose from. Starting with the Core Java Platform and finishing with Java on Card and Devices. It is good to see a very evenly distributed number of proposals for every track. Lead by the Development Tools and Techniques track (24,15%) both Java ME, Java Card Embedded and Devices (8,21%) and Emerging Languages (5,86%) are the bottom end. Very few proposals are moved around from track to track during the voting process but it happens. I don't expect the final distribution to differ heavily from the one shown below.
Distribution per Track
Internal vs. External Submissions
The no 1 question discussed a lot in the past is the number of sessions given by Oracle employees. even if I would love to make an educated guess here, anything I can show you is the distribution with regards to the proposals. I have looked at the first speaker of every session and assigned it an internal or external flag (yes, that took some time ;)). More than 2/3rd (71%) of the submissions come from external (aka non-Oracle) speakers. Even if I have seen some combined proposals also this is a clear sign, that JavaOne is a community driven conference.
External vs. Internal Speakers
But where exactly is Oracle jumping in? Are there differences in submissions per track if we look at the internal speakers? Internal proposals have a stronger focus on Embedded Java, the Core Platform and JavaFX compared with the external submissions.

Submission Distribution by internal speakers


Submission Distribution by external speakers
What do we learn from all that? JavaOne is going to be a great, community driven conference with a lot of awesome sessions to come! If you haven't done so take a look and register for it! The final program is going to be announced in a few weeks and there still is plenty of time to find a flight and a hotel near by.


Friday, April 27, 2012

Thank you! I'm a JBoss Community Leader 2012!


Huge news yesterday: The voting for the JBoss Community Recognition Award has been closed and the final winner have been announced! And I won the documentation category! A big thank you goes out to everybody who voted for me. It's a real pleasure working with the JBoss Arquillian Team and I have to thank Aslak, Dan, Vineet and Lincoln for supporting me since my early tries with Arquillian (beginning in late 2010).

There are a couple of reasons I am proud to receive this award. First and foremost I enjoy being part of the broad Java EE ecosystem. Seeing the JBoss guys doing their work with joy and passion is a big proof that standards and the JCP in general are anything but boring. Java EE and the supporting technologies deliver a pool of knowledge and creativity to make life easier for every developer.

Second most important reason is, that it shows how you can contribute to open source successfully or more general to the technology you admire without contributing code. There are a couple of reasons which prevent me from contributing code to OSS. But as in one of my favorite slogans: "Power is nothing without control", code is nothing without documentation. But you don't have to be a rock star programmer or genius to push things. Providing guidance and tutorials, articles ... all that kind of stuff is at least equally important when great software comes to the masses. (Some product documentation still shows this day by day.) Given the fact that this blog and even some magazine articles are written in English forces me to also thank my former English teachers for transforming a D student into someone being able to express his thoughts reasonable.

You probably know me as an Oracle/GlassFish/WebLogic advocate, so the third and last reason for being proud is that it is possible for a guy like me to receive a JBoss award. If not that, what else could be a better proof for a big and working technology family. Even if the kids pick on each other from time to time work and play is fun for everybody.

My congratulations go out to the other winners:
Esteban Aliverti - New Features Drools (@ilesteban)
George Gastaldi - Bug Fixes Seam (@gegastald)
Bartosz Majsak - Issue / JIRA Arquillian (@majson)
Hantsy Bai - Wiki Arquillian

Hope to see you soon at JUDCon 2012 Boston June 25th and 26th when I am awarded! Thank you!

Tuesday, April 17, 2012

German Article about Arquillian.


A short reminder for my German readers. Heise.de Developer Channel published an updated version of my Arquillian article from last year. It now covers the new features from 1.0.0-Final and also introduces you to use the Drone Extension with Selenium to do some very basic UI tests.

The good news is that this is available for free. The bad news: It's only available in German. But don't forget to visit the new arquillian.org for latest guides and how-to's to get you started in many many other languages.

If you still need more information check out the complete Reference Guide for the Arquillian project. Or the FAQ.