By Christian Beikov on 17 November 2018
In the last post I have shown examples of subquery mappings and how they allow to correlate scalar values. This time I will show how to correlate multiple values as subviews with some common use cases.
Sometimes, correlating just a simple scalar value with a correlated subquery is just not enough and multiple values are required. Imagine you program the data access for a blog post and want to show the title and the text of a specific post but also the title and the first 100 characters of a previous blog post. How would you access the previous blog post based on the current blog post? You could of course simply do a separate query, or you take a look at what correlated mappings have to offer and enjoy the expressiveness!
The source code for the following examples can be found on GitHub, so you can play around with it.
Entity model
I am again going to use the post and comment data model, but will adapt it a little bit for making a point.
@Entity public class User { @Id @GeneratedValue private Integer id; private String userName; private String email; @Temporal(TemporalType.DATE) private Date registrationDate; // Other attributes } @Entity public class Post { @Id @GeneratedValue private Integer id; @Temporal(TemporalType.DATE) private Date publishDate; private String title; private String text; @ManyToOne(fetch = LAZY, optional = false) private User poster; @ElementCollection @OrderColumn private List<Comment> comments = new ArrayList<>(); } @Embeddable public class Comment { private String comment; @ManyToOne(fetch = LAZY, optional = false) private User commenter; }
I just added title
and publishDate
to the Post
entity which will later be used.
Entity View model
Similar to the @MappingSubquery
annotation, the @MappingCorrelated
annotation allows to define a custom provider. Contrary to the subquery mapping, a correlation can use different fetch strategies that offer different performance characteristics
-
JOIN
- Joins the correlated entity in the main query -
SELECT
- Executes a dedicated query for every distinct correlated value of the main query. Supports batching via@BatchFetch
-
SUBSELECT
- Executes a single query embedding the main query as correlation source
Let’s start off by defining the entity view for the previous post. We only want the title and the first 100 characters, so the view might look like the following.
@EntityView(Post.class) public interface SimplePostView { @IdMapping Integer getId(); String getTitle(); @Mapping("SUBSTRING(text, 1, 100)") String getText(); }
Since we want the same structure for the current post we will extend this interface, but are going to override the mapping for the text, as we want all of it, not just the first 100 characters. The correlation will be based on the id of the current post.
@EntityView(Post.class) public interface NormalPostView extends SimplePostView { @Mapping("text") (1) String getText(); @MappingCorrelated( correlator = PreviousPostCorrelationProvider.class, (2) correlationBasis = "id", (3) fetch = FetchStrategy.JOIN (4) ) SimplePostView getPreviousPost(); class PreviousPostCorrelationProvider implements CorrelationProvider { @Override public void applyCorrelation(CorrelationBuilder correlationBuilder, String correlationExpression) { String correlationAlias = correlationBuilder.getCorrelationAlias(); JoinOnBuilder<CorrelationQueryBuilder> onBuilder = correlationBuilder.correlate(Post.class); onBuilder.on(correlationAlias + ".id") (5) .in() .from(Post.class, "subPost") .select("subPost.id") .where("subPost.id").notEqExpression(correlationExpression) .where("subPost.poster").eqExpression("VIEW_ROOT(poster)") .where("subPost.publishDate").leExpression("VIEW_ROOT(publishDate)") .orderByDesc("subPost.publishDate") .setMaxResults(1) .end() .end(); } } }
1 | Override the mapping as we want the current post to display the whole text |
2 | Define which correlation provider to use for the correlation mapping |
3 | The attribute of the current post on which the correlation is based on |
4 | Use the JOIN strategy to correlate as part of the main query |
5 | The alias for the correlated entity type is provided by getCorrelationAlias() to avoid alias conflicts |
The idea of the correlation provider is to find the latest published Post
with matching poster of the current post that isn’t equal to the current post.
Let’s figure this out step by step before delving too much into the details of what is happening behind the scenes.
We start off by correlating a Post
to the current post which will become our previous post after applying some restrictions.
JoinOnBuilder<CorrelationQueryBuilder> onBuilder = correlationBuilder.correlate(Post.class);
The alias for the correlated entity can’t be defined as that would open up the possibility for name clashes. Instead, an alias is generated which can be retrieved via CorrelationBuilder.getCorrelationAlias()
. Whenever we want to refer to an attribute of the correlated entity, the path expression should be absolute i.e. properly prefixed with the alias. Otherwise the path expression might refer to an attribute of the query root, which is something we usually don’t want.
Next, we are going to restrict the correlated posts to at most a single post based on a subquery.
onBuilder.on(correlationAlias + ".id") .in() // ... .setMaxResults(1) .end() .end();
The use of setMaxResults(1)
guarantees our at most one post requirement, but we still need to formulate that we want the most recent.
.from(Post.class, "subPost") .select("subPost.id") .where("subPost.id").notEqExpression(correlationExpression) .where("subPost.poster").eqExpression("VIEW_ROOT(poster)") .where("subPost.publishDate").leExpression("VIEW_ROOT(publishDate)") .orderByDesc("subPost.publishDate") .setMaxResults(1)
We select the id
of the most recent post that
-
has an
id
unequal to the current post’s idwhere("subPost.id").notEqExpression(correlationExpression)
-
has a poster matching the one of the current post
where("subPost.poster").eqExpression("VIEW_ROOT(poster)")
-
was published before the current post
where("subPost.publishDate").leExpression("VIEW_ROOT(publishDate)")
Note that using a subquery in the ON
clause didn’t work before version 1.2.0-Alpha4 so please update if you are using an older version. Also note the use of the VIEW_ROOT
function which is the only way to refer to context before version 1.3.0. Starting with version 1.3.0, the EMBEDDING_VIEW
function is introduced, which will finally allow to model the exact semantics one usually desires. Using the VIEW_ROOT
function, you essentially require a specific type of view root, independent of the location where the view is used within the view hierarchy. The use of the EMBEDDING_VIEW
function only requires that the direct parent has a specific type.
Behind the scenes
When querying with this view for a specific post id we will get a query like the following
SELECT thePost.id, thePost.title, thePost.text, correlated_previousPost.id, correlated_previousPost.title, SUBSTRING(correlated_previousPost.text, 1, 100) FROM Post thePost LEFT JOIN Post correlated_previousPost ON correlated_previousPost.id IN ( SELECT subPost.id FROM Post subPost WHERE subPost.id <> thePost.id AND subPost.poster = thePost.poster AND subPost.publishDate < thePost.publishDate ORDER BY subPost.publishDate DESC LIMIT 1 ) WHERE thePost.id = :param_1
Note that this query isn’t strictly JPQL, as the LIMIT clause suggests. This is a representation of the query in, what we call, JPQL.next, the query language used by Blaze-Persistence which is a super-set of JPQL that adds support for various more modern DBMS features. The LIMIT clause translates to the DBMS specific way of doing result limitation.
If we had used the fetch strategy SELECT
we would have seen the following two queries instead.
SELECT thePost.id, thePost.title, thePost.text, thePost.id FROM Post thePost WHERE thePost.id = :param_1
This is the query to retrieve the current post and also the correlation basis thePost.id. Based on the number of distinct values M, the following query would be executed N times where N = ceiling(M / K)
and K is the batch size.
SELECT correlationBasis.value, correlated_previousPost.id, correlated_previousPost.title, SUBSTRING(correlated_previousPost.text, 1, 100) FROM Integer(K VALUES) correlationBasis INNER JOIN Post thePost ON thePost.id = correlationBasis.value INNER JOIN Post correlated_previousPost ON correlated_previousPost.id IN ( SELECT subPost.id FROM Post subPost WHERE subPost.id <> correlationBasis.value AND subPost.poster = thePost.poster AND subPost.publishDate < thePost.publishDate ORDER BY subPost.publishDate DESC LIMIT 1 )
The number of batched correlation basis values is represented by the VALUES clause Integer(K VALUES)
where K by default is 1. The inner join of the Post
entity based on the correlation basis, which will join the current post, is only done, because other attributes than the id of the current post are accessed in the query. If only the correlation basis was used, there wouldn’t be an inner join.
The batch size is very important here. It shouldn’t be too big, but generally it should be able to handle 90% of the cases by a single query. If the number of distinct correlation basis values is mostly below 10 and only rarely 30, a batch size of 10 might be a good fit. Most cases would be handled by invoking a single query, and the rare case will cause 3 queries to be executed.
Note that the 1.2 series had issues with the support of VIEW_ROOT
along with the SELECT
strategy which are going to be fixed in 1.3.0.
Another fetch strategy that can be used is the SUBSELECT
strategy which will embed the main query into a single correlation query. Overall, the following two queries will be executed. The query for the current post…
SELECT thePost.id, thePost.title, thePost.text, thePost.id FROM Post thePost WHERE thePost.id = :param_1
and a single query for the correlated previous post
SELECT correlationBasis.value, correlated_previousPost.id, correlated_previousPost.title, SUBSTRING(correlated_previousPost.text, 1, 100) FROM Post thePost INNER JOIN Post correlated_previousPost ON correlated_previousPost.id IN ( SELECT subPost.id FROM Post subPost WHERE subPost.id <> thePost.id AND subPost.poster = thePost.poster AND subPost.publishDate < thePost.publishDate ORDER BY subPost.publishDate DESC LIMIT 1 ) WHERE thePost.id = :param_1
The secondary query for the SUBSELECT
strategy is quite different from the secondary query of the SELECT
strategy as it will not use the correlation basis values, but embed the query used to generate the correlation basis values into this secondary query.
Simple correlation
Correlations that can be represented by just a simple predicate can be done with @MappingCorrelatedSimple
. To showcase it’s power, let’s consider an example. Assume we want to correlate all posts based on the poster.
@EntityView(Post.class) public interface NormalPostView2 extends SimplePostView { @MappingCorrelatedSimple( correlated = Post.class, correlationBasis = "poster.id", correlationExpression = "poster.id = correlationKey", fetch = FetchStrategy.JOIN ) Set<SimplePostView> getPostersPosts(); }
The query looks very similar to the first one, except for the subquery part.
SELECT thePost.id, thePost.title, thePost.text, correlated_previousPost.id, correlated_previousPost.title, SUBSTRING(correlated_previousPost.text, 1, 100) FROM Post thePost LEFT JOIN Post correlated_previousPost ON correlated_previousPost.poster.id = thePost.poster.id WHERE thePost.id = :param_1
Conclusion
We have seen how correlations can be used to connect entities based on ad-hoc conditions. Oftentimes such mappings are necessary or desired because of modularization constraints or to avoid bi-directional associations.
Next time I’ll show you how you can use attribute filters and sorters to filter or sort based on the mappings defined in the entity views.
Stay tuned!