Simplifying NoSQL database integration with Jakarta NoSQL

Maximillian Arruda on May 10, 2024

jakarta-ee , community , developer-experience , data-sources ,

Post available in languages:

In the data-driven age of AI and cloud computing, choosing the right data storage strategy directly impacts the success of a solution. We currently rely on two major data storage approaches: relational and NoSQL databases. When it comes to working with relational databases, the Java Persistence API (JPA) specification is a robust and widely adopted API. In contrast, the crowded landscape of NoSQL databases opens a large set of integration options for developers, who often find themselves learning at the mercy of vendor-specific solutions instead of relying on standards.

Most importantly, how we define data storage and code design either facilitates or hinders the project’s maintainability, performance, scalability, and ability to change databases later on.

Fortunately, Java developers who embrace NoSQL solutions no longer need to worry about many of these concerns. The Jakarta NoSQL specification offers a well-defined, standards-based implementation with a low learning curve and familiar coding APIs, thanks to its similarities with the widely-adopted JPA specification.

Join us in this comprehensive article, as we explore Jakarta NoSQL and how this specification can simplify developers' lives, no matter which NoSQL database their applications use!

Table of Contents

Understanding NoSQL databases
Key differences: SQL vs NoSQL
Exploring NoSQL database types
Challenges of NoSQL integration
Introducing Jakarta NoSQL and Jakarta Data
Next steps: continuing the journey
Special Thanks
References and Further Reading

Understanding NoSQL databases

The market has different views on the meaning of the name "NoSQL"; while some believe it is an acronym for "Not Only SQL", others understand it as "Non-SQL" or "Not relational". Regardless of these different interpretations, there’s an undeniable aspect of this solution: NoSQL databases are fundamentally different in terms of data storage and management when compared to SQL.

Key differences: SQL vs NoSQL

So what are the differences between these two? In the following table, we can see the key differences between SQL and NoSQL:

Description	SQL Databases	NoSQL Databases
Data Storage	Use tabular relations to store and retrieve data	Provide mechanisms for retrieving and storing unstructured and semi-structured data (non-relational)
Schema	Predefined schemas	No predefined schemas
Scalability	Vertical scalability (scale up)	Horizontal scalability (scale out)
Data Distribution	Data distributed across fewer servers	Data distributed across many servers
Suitability	Better suited for smaller, structured datasets	Well-suited for handling large and growing datasets
Industry Adoption	Traditionally used in many industries	Gaining popularity in industries like finance and streaming

Description

SQL Databases

NoSQL Databases

Data Storage

Use tabular relations to store and retrieve data

Provide mechanisms for retrieving and storing unstructured and semi-structured data (non-relational)

Schema

Predefined schemas

No predefined schemas

Scalability

Vertical scalability (scale up)

Horizontal scalability (scale out)

Data Distribution

Data distributed across fewer servers

Data distributed across many servers

Suitability

Better suited for smaller, structured datasets

Well-suited for handling large and growing datasets

Industry Adoption

Traditionally used in many industries

Gaining popularity in industries like finance and streaming

Now that we’ve explored the fundamental differences between SQL and NoSQL databases, let’s focus more on NoSQL databases. The next section looks at the different types of NoSQL databases, each designed to solve specific problems with unique data models and architectures.

Exploring NoSQL database types

In the NoSQL space, each database type solves different needs. The following overview examines the most popular non-relational storage approaches: key-value, document, column, and graph databases .

Key-value databases

Key-value databases are the simplest of the NoSQL database types. Data is stored as a collection of key-value pairs. It’s ideal for getting information when you know the specific key, like finding a contact in your phone by their name. The structure resembles the Java java.util.Map API, where values are mapped to keys.

Key-value databases are often used for caching, session management, and message queues.

Examples of vendors currently offering key-value solutions include Amazon DynamoDB, Redis, Hazelcast, and Memcached. The last three are open source technologies.

Document databases

Document databases are designed to store, retrieve, and manage documents with minimally defined structures, such as XML and JSON formats. A document without a predefined structure is a data model that can be composed of multiple fields to hold different data types.

Since it’s possible to design models that hold documents inside other documents, Domain-Driven Design (DDD) practitioners can leverage document aggregates to manage their derived entities in a hierarchical structure.

Document databases are well-suited for managing unstructured data, such as user profiles, product catalogs, or content management systems.

Key vendors providing document database solutions include MongoDB, Couchbase, Elastic, and Oracle NoSQL Database.

Column Databases

Column databases, also known as column-oriented or wide-column databases, store data as columns instead of rows, in contrast to traditional relational databases. This approach is a differentiator across other types, as it’s an efficient way to handle large amounts of data and run performant complex queries.

This type is designed and optimized for storing large amounts of structured, semi-structured, and unstructured data with a flexible schema, and supports high levels of concurrency and scalability.

Wide-column databases are often used for analytics, content management, and data warehousing.

Examples of column databases include Apache HBase, Apache Cassandra, Scylla, Azure Cosmos DB, and many others. The first two mentioned here are open source technologies.

Graph databases

The graph NoSQL database is optimized for storing and querying data with complex relationships. In this approach, data is managed as a graph, where entities can be represented as nodes and edges, resulting in performant management of complex relationship and hierarchies.

The graph data resembles the graph of objects in the object-oriented programming (OOP) paradigm. Graph NoSQL database solutions are a good fit for scenarios that require fast querying of highly interconnected data, such as social networks, recommendation engines, and fraud detection systems.

Graph databases available today are Neo4J, Arango DB, OrientDB, and JanusGraph, among others. The last one mentioned is an open source technology.

Challenges of NoSQL integration

Modern solutions frequently require the capabilities and benefits of different types of NoSQL databases, therefore, we should be able to work with multiple NoSQL solutions, coming from different vendors. Having said that, we can expect to face challenges such as:

A high cognitive load when choosing a NoSQL database solutions
A significant learning curve derived from usage of different database APIs
Extra time invested on changing and maintaining the existing codebase
Potential increase in complexity when onboarding new developers to the team

Furthermore, in the cloud era of consumption-based pricing models, we must consider ways to deliver solutions with efficient resource consumption for costs reduction. This is how potential discussions arise, opening the possibility to consider changes of the underlying database used by existing applications.

In addition to these challenges, a solution based on NoSQL must be flexible and designed with a concise isolation between business logic and the underlying persistence layer, given the high probability of changes being made to both.

As of February 2024, the DB-Engines Ranking, an initiative that aims to list and rank DBMS by popularity, there are over 180 non-relational/NoSQL databases available in the market.

To solve these challenges, we can refer to a not-so-distant past, when we faced a similar challenge with relational databases and Java integration. The Java Database connectivity (JDBC) specification standardized the way Java integrates with relational databases. To better align it with the OOP paradigm, the ORM pattern was popularized and the Jakarta Persistence specification was created to facilitate working with different relational databases engines and vendors.

So, based on this background context about solutions with Jakarta Persistence, wouldn’t it be interesting to have a similar solution and APIs to work with NoSQL?

Say hello to Jakarta NoSQL and Jakarta Data!

Introducing Jakarta NoSQL and Jakarta Data

These are specifications proposing a simpler NoSQL integration and abstract aspects of database vendors, allowing for developers to manipulate data with intuitive and developer-friendly APIs.

Jakarta NoSQL

Jakarta NoSQL is a Jakarta EE specification designed to easily integrate Java and NoSQL databases. It uses common annotations and specific APIs for key-value, column, and document databases.

Jakarta Data

The Jakarta Data specification, part of Jakarta EE, proposes a unified API for simplified data access across different types of databases, from relational to NoSQL databases.

Jakarta Data achieves its goal by introducing concepts like repositories and custom query methods, improving the developer experience when using data retrieval and manipulation APIs.

Jakarta Data is planned for inclusion in Jakarta EE 11

Eclipse JNoSQL: A reference implementation

A Jakarta EE specification can’t solve the problem by itself - it becomes consumable through an implementation.

Each Jakarta EE specification has at least one implementation. The existence of an implementation proves the proposed specification is achievable and can be developed by interested third-parties. This is when companies and communities actively start providing their own implementations, empowering the Jakarta EE developers and ecosystem with a diverse and powerful toolset.

Example of reference implementations (RI) are Hibernate for Jakarta Persistence 3.1 specification; Jersey for Jakarta RESTFul Web Services 3.1 specification; Glassfish for Jakarta Servlet 6.0 specification; Weld for Jakarta Context And Dependency Injection (CDI) 4.0 specification; and so on…

Eclipse JNoSQL is a compatible implementation of the Jakarta NoSQL and Jakarta Data specifications, a framework that streamlines the integration of Java applications with relational and NoSQL databases. Powered by the Jakarta Contexts and Dependency Injection (CDI) specification, it is compatible with Jakarta EE and MicroProfile compatible solutions.

Eclipse JNoSQL covers four NoSQL database types: key-value, column, document and graph.

Currently, the Jakarta NoSQL doesn’t define an API for Graph database types but Eclipse JNoSQL provides a Graph template to explore the specific behavior of this database type by using Apache TinkerPop as a communication layer.

As of March 2024, Eclipse JNoSQL supports about 30 NoSQL databases.

Why Eclipse JNoSQL?

In the following code samples, note the APIs similarities and differences across different vendors. All samples demonstrate a commonly used functionality for document databases: a document creation and the addition of a property to that document:

var document = new Document();
document.append(name, value);

var baseDocument = new BaseDocument();
baseDocument.addAttribute(name, value);

var jsonObject = JsonObject.create();
jsonObject.put(name, value);

var document = new ODocument("collection");
document.field(name, value);

With Eclipse JNoSQL, developers can use a common API to integrate with different database types, free of vendor lock-in and with a low cognitive load during learning phases. For example, using the Document API, it’s possible to switch between MongoDB and ArangoDB as needed, based on convention over configuration (CoC).

var entity = CommunicationEntity.of("collection");
entity.add(name, value);

Check out some of the Jakarta NoSQL annotations:

import jakarta.nosql.Entity;
import jakarta.nosql.Id;
import jakarta.nosql.Column;

@Entity
public class Book {

    @Id
    private String isbn;

    @Column
    private String title;

    @Column
    private String author;

    @Convert(YearConverter.class)
    @Column
    private Year year;

}

When using Java 17+, Eclipse JNoSQL allows the usage of Java Records as entities:

import jakarta.nosql.Entity;
import jakarta.nosql.Id;
import jakarta.nosql.Column;

@Entity
public record Book(@Id String isbn,
                   @Column("title") String title,
                   @Column("author") String author,
                   @Convert(YearConverter.class) @Column("year") Year year,
                   @Column("edition") int edition) {

}

Last but not least, Eclipse JNoSQL, as a Jakarta Data implementation, allows us to create repositories, enabling domain-driven development (DDD) capabilities through the usage of the repository pattern. This approach helps developers bring the code closer to the business (domain-centric) instead of working with database semantics.

import jakarta.data.page.Page;
import jakarta.data.page.PageRequest;
import jakarta.data.repository.Delete;
import jakarta.data.repository.Repository;
import jakarta.data.repository.DataRepository;
import jakarta.data.repository.Query;
import jakarta.data.repository.Param;
import jakarta.data.repository.Save;

@Repository
public interface Garage extends DataRepository<Car,String>{

    @Save
    Car park(Car car);

    @Delete
    Car unPark(Car car);

    @Query("select * from Car where driver.name = @name")
    Set<Car> findByDriver(@Param("name") String name);

    Page<Car> findByColor(Color color, PageRequest pageRequest);

}

What to expect from Eclipse JNoSQL

Beyond being a Jakarta NoSQL and Jakarta Data implementation, Eclipse JNoSQL supports the following goals:

Increase productivity performing common NoSQL operations
Use of convention over configuration
Rich object mapping integrated with contexts and dependency injection (CDI)
Java-based query and fluent API
Persistent lifecycle events
Low-level mapping using standard NoSQL APIs
Specific template API to each NoSQL category
Annotation-oriented using JPA-like naming when it makes sense
Extensible to explore the particular behavior of a NoSQL database
Explore the popularity of Apache TinkerPop in Graph API

Next steps: continuing the journey

Congratulations on getting this far!

After getting an overview of Jakarta NoSQL, Jakarta Data, and the Eclipse JNoSQL, I invite you to explore a hands-on approach to these tools, managing and querying data from NoSQL databases and switching between NoSQL databases as needed.

This blog post is the 1st part of a set of blog posts, keep you eyes out for these follow-ups coming soon:

Jakarta NoSQL in Action: JNopo Game;
Jakarta NoSQL in Action: Switching NoSQL Databases with Ease

To see some sample projects, take a look at the official Eclipse JNoSQL samples repositories:

To learn more about Eclipse JNoSQL, take a look at these official repositories:

If you’re an expert on some NoSQL database that Eclipse JNoSQL doesn’t support, feel free to open an issue or a PR in these project repositories.

Except for previously mentioned NoSQL solutions like MongoDB and Couchbase, all the technology used in this blog post is open-source. So, what do you think about contributing to these projects? If you don’t know how to get started, take a look at this Coffee.withJava("Contribute to JNoSQL") Youtube Series, or if you prefer, feel free to contact me! Contributing to these projects is not just code, you could help a lot by promoting and speaking about them wherever you go! Contributing to open-source is a great way to boost your career, and improve your skills to become an effective developer and relevant in the market! Think about that!

Special Thanks

I’m bursting with gratitude and would love to give a big shout-out to my incredible Java community friends for their unwavering support throughout my journey. A special round of applause for:

Otavio Santana, you’re not just a mentor but a guiding star in my open-source journey. Your mentorship have opened doors for me to become an active open-source contributor and a proud Eclipse Foundation committer. Thank you for being such a monumental part of my journey. Also, thanks for your insightful reviews of the codes featured in this blog post.
Karina Varela, your keen eye for detail and your generosity in sharing your knowledge have enriched this content beyond measure. Your thoughtful reviews have made this content not just better, but truly curated and relevant. I’m so grateful for your contribution.
Gabriel Silva Andrade, your support and encouragement have been a constant source of inspiration for me. Your partnership at the SouJava JUG and Java community initiatives around this subject have been invaluable in shaping this content. Thank you for your unwavering support.
Fabio Franco, you were the catalyst for this wonderful opportunity, connecting me with the fantastic OpenLiberty team and offering your support throughout the publishing process of this blog post. Your belief in me and your encouragement have been invaluable. Thank you for making this possible.
And to the OpenLiberty team, thank you for opening your doors and allowing me the privilege to share and post this content that I’ve thoroughly enjoyed working on. Thanks for this opportunity.

To each of you, your support means a lot to me, and I’m deeply thankful.

References and Further Reading

See all blog posts