Correlating unrelated entity types in Entity Views

By Christian Beikov on 17 November 2018

In the last post I have shown examples of subquery mappings and how they allow to correlate scalar values. This time I will show how to correlate multiple values as subviews with some common use cases.

Sometimes, correlating just a simple scalar value with a correlated subquery is just not enough and multiple values are required. Imagine you program the data access for a blog post and want to show the title and the text of a specific post but also the title and the first 100 characters of a previous blog post. How would you access the previous blog post based on the current blog post? You could of course simply do a separate query, or you take a look at what correlated mappings have to offer and enjoy the expressiveness!

The source code for the following examples can be found on GitHub, so you can play around with it.

Entity model

I am again going to use the post and comment data model, but will adapt it a little bit for making a point.

@Entity
public class User {
    @Id
    @GeneratedValue
    private Integer id;
    private String userName;
    private String email;
    @Temporal(TemporalType.DATE)
    private Date registrationDate;
    // Other attributes
}

@Entity
public class Post {
    @Id
    @GeneratedValue
    private Integer id;
    @Temporal(TemporalType.DATE)
    private Date publishDate;
    private String title;
    private String text;
	@ManyToOne(fetch = LAZY, optional = false)
    private User poster;
    @ElementCollection
	@OrderColumn
    private List<Comment> comments = new ArrayList<>();
}

@Embeddable
public class Comment {
	private String comment;
	@ManyToOne(fetch = LAZY, optional = false)
	private User commenter;
}

I just added title and publishDate to the Post entity which will later be used.

Entity View model

Similar to the @MappingSubquery annotation, the @MappingCorrelated annotation allows to define a custom provider. Contrary to the subquery mapping, a correlation can use different fetch strategies that offer different performance characteristics

JOIN - Joins the correlated entity in the main query
SELECT - Executes a dedicated query for every distinct correlated value of the main query. Supports batching via @BatchFetch
SUBSELECT - Executes a single query embedding the main query as correlation source

Let’s start off by defining the entity view for the previous post. We only want the title and the first 100 characters, so the view might look like the following.

@EntityView(Post.class)
public interface SimplePostView {
    @IdMapping
    Integer getId();
    String getTitle();
    @Mapping("SUBSTRING(text, 1, 100)")
    String getText();
}

Since we want the same structure for the current post we will extend this interface, but are going to override the mapping for the text, as we want all of it, not just the first 100 characters. The correlation will be based on the id of the current post.

@EntityView(Post.class)
public interface NormalPostView extends SimplePostView {
    @Mapping("text") (1)
    String getText();
    @MappingCorrelated(
        correlator = PreviousPostCorrelationProvider.class, (2)
        correlationBasis = "id", (3)
        fetch = FetchStrategy.JOIN (4)
    )
    SimplePostView getPreviousPost();

    class PreviousPostCorrelationProvider implements CorrelationProvider {

        @Override
        public void applyCorrelation(CorrelationBuilder correlationBuilder, String correlationExpression) {
            String correlationAlias = correlationBuilder.getCorrelationAlias();
            JoinOnBuilder<CorrelationQueryBuilder> onBuilder = correlationBuilder.correlate(Post.class);
            onBuilder.on(correlationAlias + ".id") (5)
                .in()
                    .from(Post.class, "subPost")
                    .select("subPost.id")
                    .where("subPost.id").notEqExpression(correlationExpression)
                    .where("subPost.poster").eqExpression("VIEW_ROOT(poster)")
                    .where("subPost.publishDate").leExpression("VIEW_ROOT(publishDate)")
                    .orderByDesc("subPost.publishDate")
                    .setMaxResults(1)
                .end()
            .end();
        }
    }
}

1	Override the mapping as we want the current post to display the whole text
2	Define which correlation provider to use for the correlation mapping
3	The attribute of the current post on which the correlation is based on
4	Use the `JOIN` strategy to correlate as part of the main query
5	The alias for the correlated entity type is provided by `getCorrelationAlias()` to avoid alias conflicts

The idea of the correlation provider is to find the latest published Post with matching poster of the current post that isn’t equal to the current post.

Let’s figure this out step by step before delving too much into the details of what is happening behind the scenes.

We start off by correlating a Post to the current post which will become our previous post after applying some restrictions.

JoinOnBuilder<CorrelationQueryBuilder> onBuilder = correlationBuilder.correlate(Post.class);

The alias for the correlated entity can’t be defined as that would open up the possibility for name clashes. Instead, an alias is generated which can be retrieved via CorrelationBuilder.getCorrelationAlias(). Whenever we want to refer to an attribute of the correlated entity, the path expression should be absolute i.e. properly prefixed with the alias. Otherwise the path expression might refer to an attribute of the query root, which is something we usually don’t want.

Next, we are going to restrict the correlated posts to at most a single post based on a subquery.

onBuilder.on(correlationAlias + ".id")
    .in()
        // ...
        .setMaxResults(1)
    .end()
.end();

The use of setMaxResults(1) guarantees our at most one post requirement, but we still need to formulate that we want the most recent.

.from(Post.class, "subPost")
.select("subPost.id")
.where("subPost.id").notEqExpression(correlationExpression)
.where("subPost.poster").eqExpression("VIEW_ROOT(poster)")
.where("subPost.publishDate").leExpression("VIEW_ROOT(publishDate)")
.orderByDesc("subPost.publishDate")
.setMaxResults(1)

We select the id of the most recent post that

has an id unequal to the current post’s id where("subPost.id").notEqExpression(correlationExpression)
has a poster matching the one of the current post where("subPost.poster").eqExpression("VIEW_ROOT(poster)")
was published before the current post where("subPost.publishDate").leExpression("VIEW_ROOT(publishDate)")

Note that using a subquery in the ON clause didn’t work before version 1.2.0-Alpha4 so please update if you are using an older version. Also note the use of the VIEW_ROOT function which is the only way to refer to context before version 1.3.0. Starting with version 1.3.0, the EMBEDDING_VIEW function is introduced, which will finally allow to model the exact semantics one usually desires. Using the VIEW_ROOT function, you essentially require a specific type of view root, independent of the location where the view is used within the view hierarchy. The use of the EMBEDDING_VIEW function only requires that the direct parent has a specific type.

Behind the scenes

When querying with this view for a specific post id we will get a query like the following

SELECT
    thePost.id,
    thePost.title,
    thePost.text,
    correlated_previousPost.id,
    correlated_previousPost.title,
    SUBSTRING(correlated_previousPost.text, 1, 100)
FROM
    Post thePost
LEFT JOIN
    Post correlated_previousPost
ON
    correlated_previousPost.id IN (
        SELECT
            subPost.id
        FROM
            Post subPost
        WHERE
            subPost.id <> thePost.id
        AND
            subPost.poster = thePost.poster
        AND
            subPost.publishDate < thePost.publishDate
        ORDER BY
            subPost.publishDate DESC
        LIMIT 1
    )
WHERE
    thePost.id = :param_1

Note that this query isn’t strictly JPQL, as the LIMIT clause suggests. This is a representation of the query in, what we call, JPQL.next, the query language used by Blaze-Persistence which is a super-set of JPQL that adds support for various more modern DBMS features. The LIMIT clause translates to the DBMS specific way of doing result limitation.

If we had used the fetch strategy SELECT we would have seen the following two queries instead.

SELECT
    thePost.id,
    thePost.title,
    thePost.text,
    thePost.id
FROM
    Post thePost
WHERE
    thePost.id = :param_1

This is the query to retrieve the current post and also the correlation basis thePost.id. Based on the number of distinct values M, the following query would be executed N times where N = ceiling(M / K) and K is the batch size.

SELECT
    correlationBasis.value,
    correlated_previousPost.id,
    correlated_previousPost.title,
    SUBSTRING(correlated_previousPost.text, 1, 100)
FROM
    Integer(K VALUES) correlationBasis
INNER JOIN
    Post thePost
ON
    thePost.id = correlationBasis.value
INNER JOIN
    Post correlated_previousPost
ON
    correlated_previousPost.id IN (
        SELECT
            subPost.id
        FROM
            Post subPost
        WHERE
            subPost.id <> correlationBasis.value
        AND
            subPost.poster = thePost.poster
        AND
            subPost.publishDate < thePost.publishDate
        ORDER BY
            subPost.publishDate DESC
        LIMIT 1
    )

The number of batched correlation basis values is represented by the VALUES clause Integer(K VALUES) where K by default is 1. The inner join of the Post entity based on the correlation basis, which will join the current post, is only done, because other attributes than the id of the current post are accessed in the query. If only the correlation basis was used, there wouldn’t be an inner join.

The batch size is very important here. It shouldn’t be too big, but generally it should be able to handle 90% of the cases by a single query. If the number of distinct correlation basis values is mostly below 10 and only rarely 30, a batch size of 10 might be a good fit. Most cases would be handled by invoking a single query, and the rare case will cause 3 queries to be executed.

Note that the 1.2 series had issues with the support of VIEW_ROOT along with the SELECT strategy which are going to be fixed in 1.3.0.

Another fetch strategy that can be used is the SUBSELECT strategy which will embed the main query into a single correlation query. Overall, the following two queries will be executed. The query for the current post…

SELECT
    thePost.id,
    thePost.title,
    thePost.text,
    thePost.id
FROM
    Post thePost
WHERE
    thePost.id = :param_1

and a single query for the correlated previous post

SELECT
    correlationBasis.value,
    correlated_previousPost.id,
    correlated_previousPost.title,
    SUBSTRING(correlated_previousPost.text, 1, 100)
FROM
    Post thePost
INNER JOIN
    Post correlated_previousPost
ON
    correlated_previousPost.id IN (
        SELECT
            subPost.id
        FROM
            Post subPost
        WHERE
            subPost.id <> thePost.id
        AND
            subPost.poster = thePost.poster
        AND
            subPost.publishDate < thePost.publishDate
        ORDER BY
            subPost.publishDate DESC
        LIMIT 1
    )
WHERE
    thePost.id = :param_1

The secondary query for the SUBSELECT strategy is quite different from the secondary query of the SELECT strategy as it will not use the correlation basis values, but embed the query used to generate the correlation basis values into this secondary query.

Simple correlation

Correlations that can be represented by just a simple predicate can be done with @MappingCorrelatedSimple. To showcase it’s power, let’s consider an example. Assume we want to correlate all posts based on the poster.

@EntityView(Post.class)
public interface NormalPostView2 extends SimplePostView {
    @MappingCorrelatedSimple(
        correlated = Post.class,
        correlationBasis = "poster.id",
        correlationExpression = "poster.id = correlationKey",
        fetch = FetchStrategy.JOIN
    )
    Set<SimplePostView> getPostersPosts();
}

The query looks very similar to the first one, except for the subquery part.

SELECT
    thePost.id,
    thePost.title,
    thePost.text,
    correlated_previousPost.id,
    correlated_previousPost.title,
    SUBSTRING(correlated_previousPost.text, 1, 100)
FROM
    Post thePost
LEFT JOIN
    Post correlated_previousPost
ON
    correlated_previousPost.poster.id = thePost.poster.id
WHERE
    thePost.id = :param_1

Conclusion

We have seen how correlations can be used to connect entities based on ad-hoc conditions. Oftentimes such mappings are necessary or desired because of modularization constraints or to avoid bi-directional associations.

Next time I’ll show you how you can use attribute filters and sorters to filter or sort based on the mappings defined in the entity views.

Stay tuned!