Hibernate is not so evil

You just don’t know it well enough

Łukasz Pięta
Better Programming

--

Photo from my presentation about Hibernate at Warsaw IT Days

A few years ago, when I was a Junior Backend Developer, I was really frustrated at Hibernate. Every time I made any change in the persistence layer designed by my older colleagues, I had serious problems with JPA. I thought that maybe Hibernate was some kind of an old crappy tool nobody wants to use anymore. Especially, when I saw the alternatives — lightweight, small ORM frameworks like Exposed.

However, after many years of using JPA, I started to see the bigger picture. After many unequal fights with Hibernate, I started to know it better. I started to understand why it does what it does. Today I know that there is an object-relational impedance mismatch, which cannot be avoided when using ORM and it’s totally irrational to be upset about it. I wrote about it in my last article: “Kotlin and JPA — a good match or a recipe for failure?”, in which I focus mainly on cooperation between Kotlin and Hibernate, but you can also read their ORMs in general and about different perspectives on the persistence issue. So if you don’t know what exactly the object-relational impedance mismatch is, feel free to read it.

Definitions

Let’s start by defining stuff. People tend to use the following terms: JPA, Hibernate, and Spring Data JPA interchangeably like these were just different names for the same thing. But they are not!

JPA (Java Persistence API) is an interface specification that describes how to manage relational data in Java applications. So it’s basically just a set of rules to follow. In the specification document, you can read e.g. what the constraints for an entity class are, i.e. which rules an entity class has to satisfy to be managed properly by any JPA provider.

So if there is an interface (JPA), there must be an implementation, right? This is basically what Hibernate is — an implementation of JPA. The most popular one. The most powerful one. And probably the most hated one.

Both Hibernate and JPA expose methods for managing relational data in Java applications, but usually, we don’t operate on that level of abstraction. During the business logic implementation, we shouldn’t need to use EntityManager or HibernateSession directly, because they are just implementation details and the actual domain doesn’t need to know how exactly we’re going to persist the data. So we need to put some abstraction on top of Hibernate/JPA. That’s why the Repository design pattern — introduced at the tactical level of Domain Driven Design — is so commonly used and that’s also probably the reason why Spring Data JPA is so popular.

It introduces another layer of abstraction, which makes the JPA so convenient and easy to use. Moreover, you don’t need to pick Hibernate when using Spring Data JPA. You might want to use another provider, e.g. EclipseLink.

Using Spring Data JPA let us avoid a lot of boilerplate code because most of the common usages of JPA are already provided in the implementation of such interfaces as org.springframework.data.jpa.repository.JpaRepository, which is a part of the Spring Data JPA.

Hibernate logo
Hibernate logo, source: Wikipedia

However, hiding the details doesn’t mean we don’t need to know about them. Usually, people use Hibernate with a very simple purpose — “I need to save these changes in the database”. Spring Data JPA is so easy to use (even for the very beginners), that in order to create a simple CRUD application, you don’t need to understand Hibernate at all. You just call repository.save(..) or repository.delete(...).

That’s why it is usually the first choice when picking up the persistence provider — it’s easy to start. However, it can be really hard to extend your project in the future if you don’t know JPA and Hibernate well. When your application gets popular, performance issues related to the JPA layer might occur. And there is no universal cure for that.

When it comes to software architecture, many concepts are strongly connected to each other and it’s very hard to discuss any of them separately. Many architecture decisions can impact directly your persistence layer, e.g. poorly separated modules / aggregates (link to Martin Fowler’s definition of aggregate in terms of DDD) can cause too many entities to be connected to each other within a single aggregate or can arise problems with data consistency. Let’s be honest, it’s never just a “Hibernate issue”. If you knew your tool better, you’d use it the way it was designed for, wouldn’t you?

Hibernate is a very powerful tool, which offers plenty of features and all of them cannot be covered in just one article. So let me focus on some of the most crucial pitfalls, personally selected by me. I will go through:

  • pitfalls regarding the JPA entity lifecycle,
  • lazy loading trade-offs.

Let’s start from the very must-know basic knowledge of JPA — the JPA entity lifecycle.

JPA Entity Lifecycle

When you invoke a constructor of an entity class, you create an entity object in the New (Transient) state. The newly created object isn’t assigned to any row in the database table yet. In order to save (persist) it, you need to call persist method. After that, the entity object changes its state to the Managed, which means that now it’s managed by the Persistence Context, i.e. Hibernate Session in the Hibernate implementation of JPA. When the entity object is in the Managed state, any change you make on it will be detected and propagated to the database within the invocation of the flush method. Usually, you don’t invoke it explicitly (at least you shouldn’t need to do it unless you’re in some very specific situation). The moment in which the flush method is invoked can be configured with the FlushModeType property. By default, it is set to FlushModeType.AUTO. It means that the flush method will be invoked in two situations:

  • before the transaction commit,
  • before query execution using a database table, for which the current Persistence Context contains any pending changes.
JPA Entity Lifecycle Graph
JPA Entity Lifecycle Graph

If you decide to delete an entity object, you call the remove method and the entity object changes its state to Removed. Similarly to the Managed state, the database will be synchronized within the invocation of the flush method.

The Managed entity becomes Detached when the Persistence Context (Hibernate Session) is being closed. Usually, you don’t close the session by yourself, because this is already implemented in Spring Data JPA. You can of course detach an entity manually, using Entity Manager / Hibernate Session, but if you take advantage of Spring Data JPA, you operate on the higher level of abstraction, which is Spring Data Repository. Thus, you don’t use Entity Manager / Hibernate Session explicitly.

Detached means that an entity object is no longer managed by the Persistence Context (Hibernate Session) because it was already closed. So any change made on a detached entity won’t result in synchronization with the database unless it’s merged to the open Persistence Context (Hibernate Session). So any Detached entity can become Managed again if you call the merge method on a detached entity object. In the result of the merge method Hibernate will try to find the entity in the current Persistence Context(Hibernate Session). If it’s there, Hibernate will copy the data from the detached object to the managed one. But if Hibernate doesn’t find the entity in current Persistence Context (Hibernate Session), it will fetch it directly from the database. It’s important to notice here that in some cases calling the repository::save method will result in an additional SELECT statement which is not so obvious at all, especially for beginners.

Okay, we already have some fundamental knowledge of JPA and Hibernate. But it’s still just theoretical knowledge. Let’s take a look at the code example.

For readability’s sake, I avoided unnecessary code in the article, so if you’d like to check out the whole source code, you can do it on GitHub here.

Don’t let Hibernate do the magic

Take a look at the first example. Let’s discuss a common problem which is TransientPropertyValueException thrown when saving a JPA entity.

@Entity
class Post(
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
var id: Long? = null,
var title: String,
var content: String,

@ManyToOne
val author: Author,
)
@Entity
class Author(
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
var id: Long? = null,
var firstName: String,
var lastName: String,
)
@Repository
interface PostRepository : JpaRepository<Post, Long>

The example above consists of 2 simple entities: Post and Author, where the first one is in @ManyToOne relationship with the other (that means that Author can have many Posts) and one repository interface — PostRepository, which will be used to save posts with authors.

Now, let’s take a look at the following test (notice that I’m using BehaviorSpec from Kotest):

@SpringBootTest
class Test(
private val postRepository: PostRepository
) : BehaviorSpec({

Given("Author") {
val author = Author(firstName = "Jan", lastName = "Nowak")

When("Author creates a post") {
val post = Post(title = "First Post", content = "Just hanging around", author = author)
val postId = postRepository.save(post).id

Then("Post is created") {
postRepository.findByIdOrNull(postId) shouldNotBe null
}
}
}
})

When you run the test, it’s failing with the message:

org.hibernate.TransientPropertyValueException: object references an unsaved transient instance - save the transient instance before flushing : com.pientaa.hibernatedemo.transientPropertyValueException.Post.author -> com.pientaa.hibernatedemo.transientPropertyValueException.Author

So when we take a look at the JPA Entity Lifecycle Graph again, it might already be clear what is going on here.

JPA Entity Lifecycle Graph
JPA Entity Lifecycle Graph

The Author object is not following the Post object in the entity’s state transitions. Post gets persisted, so it’s changing its state from Transient to Managed, but the Author object is still in the Transient state. When the Hibernate session closes, Post is flushed, but it still references the transient Author object and that’s why we got the exception.

Baeldung makes it clear that there are two possible solutions to that. However, only one of them is presented in the aforementioned tutorial. They encourage to use cascade parameter in relationship annotation, which in my opinion shouldn’t be the default strategy because the problem we have is not caused just by some wrong usage of Hibernate and its annotations. The actual problem is in the architecture/software design. Why do we need to persist Author and Post at the same time? Does it really need to be the same transaction? Shouldn’t Author and Post be different aggregates? In the real-life example, Author should already exist before the Post is written, shouldn’t it? Otherwise — who is writing the Post?

In many cases, Hibernate issues can be caused by wrong software design, and using Hibernate features to minimize the consequences of the mistake doesn’t solve the actual problem at all — we just hide it until it escalates, e.g. when new business requirements come to the implementation stage.

If we just googled TransientPropertyValueException and followed the instructions from Baeldung, we would probably have taken a wrong decision. Baeldung says:

To cover all scenarios, we need a solution to cascade our save/update/delete operations for entity relationships that depend on the existence of another entity. We can achieve that by using a proper CascadeType in the entity associations.

But what if this particular entity should have already been persisted before we try to associate it with another one? Maybe we’re trying to save too many changes at once and it should be divided into smaller, more domain-like requests. In order to accomplish that, we can create another repository interface — AuthorRepository, to manage Author‘s persistence.

@Repository
interface AuthorRepository : JpaRepository<Author, Long>

Providing another repository interface results in decoupled Author and Post classes which in this case seems reasonable. Notice that even though Author and Post are managed separately, we can still reference the Author entity in Post using the same relationship annotation we did before. Someone will tell that we did not decouple these classes completely, because of that reference we left. And it’s true. It can be decoupled further. But discussion about coupling and decoupling strategies between entities is a good candidate for a separate article. Thus, for the sake of this article’s purpose, let’s keep this relationship. Let’s keep Hibernate busy handling our data access layer and let’s discover more Hibernate features and their pitfalls!

Lazy loading — an overused feature?

Usually, lazy loading is used to improve performance, but I’ve seen many people using it as a default strategy, even though it is not required at all. Some of them even consider it a well-established best practice in JPA:

Lazy loading of associations between entities is a well established best practice in JPA. Its main goal is to retrieve only the requested entities from the database and load the related entities only if needed. That is a great approach if you only need the requested entities.

But right after this glorification of lazy loading, we can read in the same article that lazy loading:

[…] creates additional work and can be the cause of performance problems if you also need some of the related entities.

The problem is (as usual) not a binary decision like “should I use it or not”, because — “it depends”.

In an ideal example of a well-designed application with perfectly distilled aggregates, which are small and independent, and changes from one aggregate can affect the other with eventual consistency, there is no need for lazy loading — because every aggregate is small enough to be loaded entirely to the application memory. However, no application is perfect, is it? Quite often we — programmers — need to deal with legacy code and bad architectural decisions (taken mostly because of the deadlines) and it’s much more likely that you end up in a project with multiple entities related to each other with various types of relationships, rather than some well-designed and perfectly distilled software. So it’s totally understandable that the first thing everybody says about JPA optimization is lazy loading.

Let’s go back to our code example. To enable lazy loading on the Author property in the Post entity, we need to add an additional parameter:

@ManyToOne(fetch = FetchType.LAZY)

because the default FetchType for @ManyToOne relationship is FetchType.EAGER.

So now, if we fetch Post from the database using PostRepository, the generated SQL statement should not include any join on the Author table. That’s what we expect from lazy loading, right? But if we refer (accidentally or not) to some fields, not loaded yet, which are meant to be loaded lazily, we should be able to fetch them “on demand”, shouldn’t we? Let’s take a look at the following test:

@SpringBootTest
@ActiveProfiles("test")
class PostAuthorLazyLoadingFailingTest(
private val postRepository: PostRepository,
private val authorRepository: AuthorRepository,
) : BehaviorSpec({

Given("Author - Jan Nowak") {
val janNowak = Author(firstName = "Jan", lastName = "Nowak").let { authorRepository.save(it) }

When("Jan Nowak creates a Post") {
val postId = Post(title = "First Post", content = "Just hanging around", author = janNowak)
.let { postRepository.save(it) }.id

Then("Fetching lazy loaded property should throw LazyInitializationException") {
shouldThrow<LazyInitializationException> {
postRepository.findByIdOrNull(postId)!!.author.firstName shouldBe "Jan"
}
}
}
}
})

In the test above I save the Author , I save the Post, and after that, I get the Post from the database. Later on, I’m trying to assert the author’s first name, but because of the lazy loading, I’m not allowed to do that. The assertion is failing because of LazyInitializationException:

could not initialize proxy [com.pientaa.hibernatedemo.lazyLoading.Author#1] - no Session
org.hibernate.LazyInitializationException: could not initialize proxy [com.pientaa.hibernatedemo.lazyLoading.Author#1] - no Session

The problem is that Hibernate couldn’t initialize the proxy (used for lazy loading), because “there is no Session”. We need to be really careful using lazy loading in Hibernate, because every time we fetch any lazy-loaded data, we need to have Hibernate Session opened. Well, as I already mentioned at the beginning of the article, we shouldn’t need to use EntityManager or HibernateSession directly, because they are just implementation details. But it turns out that if we’d like to use lazy loading, we should be able to somehow open the session. The question is: what is the right place to do this? Let’s try to answer that question considering the onion architecture.

Onion architecture visualization
Onion architecture

Domain layer? I don’t think so. Domain code should be perfectly clean and framework-agnostic, so there is no place for Hibernate in this layer, especially in a project with a rich domain model, where DDD makes sense.

Presentation layer? Maybe not a good idea. At some point, it will cause performance issues.

Application layer? Well, it’s better than the presentation layer (regarding performance), but HibernateSession doesn’t seem to belong anywhere else except the data access / infrastructure layer, especially because we already use an abstraction (JPA) and Hibernate is just an implementation detail (JPA provider), therefore it shouldn’t be exposed in other parts of the application than aforementioned infrastructure layer.

So it looks like we already know that session boundaries could be set in the application layer, but HibernateSession itself doesn’t fit there. Hmm, I wish there was some kind of an abstraction on top of HibernateSession that could be used in the application layer…

Exactly! Transactions! It does make sense, doesn’t it? Even from the business perspective, there is something like a transactional consistency — some business processes need to be synchronized and guaranteed to be done as an atomic transaction, but their boundary is bigger than just one aggregate. Thus, the transaction looks like a perfect abstraction on top of HibernateSession. So, all we need is @Transactional which marks a method to be run in the scope of a transaction, therefore it opens the HibernateSession for us.

Using lazy loading leads to higher complexity, especially when you need to specify transactions to make it work correctly. Don’t use it as a default strategy. Try to consider all alternatives, especially re-designing your aggregates’ boundaries.

However, LazyInitializationException is not the only pitfall of using lazy loading. Actually, the most popular one is n + 1 problem, which can be really painful when it comes to the application performance.

Lazy loading — n + 1 problem

Lazy loading is about cutting the primary SQL query to the “minimum”, not loading the lazy loaded data until it’s needed. So we don’t JOIN on a table, which is a lazy-loaded association. However, when the lazy loaded data is needed, we will execute additional SQL statements anyway. This causes the n + 1 problem because the data access layer needs to execute “n” additional SQL statements to get the same data as we could retrieve by the primary query.

Let’s take a look at the code example again, which I extended by an additional @OneToMany relationship. I won’t go through the details of the best practices regarding this relationship type in Hibernate, because there is already an excellent article about it, which I totally recommend if you’re not familiar with @OneToMany relationship pitfalls:

Let’s create a new entity class PostComment.

@Entity
class PostComment(
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
var id: Long? = null,
var content: String,

@ManyToOne
val author: Author,

@ManyToOne
val post: Post
)

And let’s extend the current Post class implementation by additional field comments and methods for removing and adding comments to the post.

@Entity
class Post(
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
var id: Long? = null,
var title: String,
var content: String,

@OneToMany(mappedBy = "post", cascade = [CascadeType.ALL], orphanRemoval = true)
val comments: MutableSet<PostComment> = mutableSetOf(),

@ManyToOne
val author: Author,
) {
fun addComment(content: String, author: Author) {
comments.add(
PostComment(content = content, author = author, post = this)
)
}

fun removeComment(commentId: Long) {
comments.removeIf { it.id == commentId }
}
}

Notice, that I made an author association eagerly loaded because the relationship we’re going to focus on analyzing the n + 1 problem is @OneToMany. This association in Hibernate is lazy-loaded by default.

The n + 1 problem in our case is n + 1 SELECT statements generated for fetching n Post with their PostComments from the database. Let’s say we have a query (simplified for readability’s sake):

select * from post where id < 6

which returns 5 Posts. Fetching the Posts’ comments results in 5 additional SELECT statements:

select * from post_comment where post_id = 1
select * from post_comment where post_id = 2
select * from post_comment where post_id = 3
select * from post_comment where post_id = 4
select * from post_comment where post_id = 5

This problem wouldn’t exist if we didn’t use lazy loading. Because instead of additional n SELECT statements, we would have JOIN on post_comment table. Of course, if n is a small number, it doesn’t hurt us, but the problem escalates when this number gets bigger. Notice, that if we made all the associations lazy loaded, there will be n additional SQL statements generated for each association. As you can see, the problem can escalate very quickly if we overuse lazy loading. And again, if this number is huge, maybe the aggregate is too big and the problem is in design, not in the persistence layer itself.

However, sometimes we need to decide whether JOIN or n additional SELECTs is better for us. Usually, if we’re sure that we’ll need that association to be loaded entirely in the current request, it’s better to fetch it eagerly, using JOIN. But the same entity might be used in another request in which this association is not needed at all. Some kind of hybrid would be perfect, wouldn’t it? And here we’ve got plenty of options, for example:

and the easiest (in my opinion), which is a simple JPQL query using
join fetch explicitly:

@Query("select distinct p from Post p left join fetch p.comments where p.id = :postId")
fun getPostWithComments(@Param("postId") postId: Long): Post

Taking advantage of the aforementioned JPA options to eagerly fetch a custom set of entity’s associations, you can use FetchType.LAZY as the default FetchType and create projections or for a very specific use case. But it’s still easier and safer not to use FetchType.LAZY at all, so don’t do this unless it’s necessary.

Summary

As you can see, Hibernate is not as easy as it seems, but for the very basic usage you don’t need to know everything about its tremendous number of features. For any problem you might have, there is always more than one solution. However, there are some very specific cases in which there is the optimal one, which is highly recommended to choose.

Many problems with the data access layer can be just a result of a wrong application design, so don’t be that hard on Hibernate. It’s not as evil as everybody describes it.

Remember to use the proper abstraction in your domain code. In complex projects, with rich domain model, you shouldn’t use JPA entities directly in your domain code.

Thanks for reading.

If you’re looking for a comprehensive source of knowledge regarding JPA/Hibernate, I strongly recommend the book “High-Performance Java Persistence” by Vlad Mihalcea. Or maybe you feel like there are some JPA/Hibernate-related topics you’d like me to write about — just let me know!

I’d like to thank Adrian Glapiński and Paweł Dereziński for being very helpful in reviewing my article. I appreciate it a lot!

--

--

Kotlin passionate, conference speaker, always taking the side of the oppressed (why does everyone hate Hibernate so much?).