Separation of concerns with Entity Views for projections

By Christian Beikov on 26 April 2020

After a long Twitter discussion about the benefits of the separation of logic and projection concerns for database queries I thought it might be good to write a post about it. In software engineering, we usually agree that separation of concerns is a good thing as it allows to think about concerns in isolation which makes reasoning about behavior easier. Why not apply the separation of concerns principle to database queries as well?

When you fulfill a business use case that involves a database you usually have to think about the following things:

Formulation of the business use case as query to serve as data source
Definition of the structure of the result i.e. the DTO definition
Enrich the query for the business use case to fit the desired result structure
Map the query result to the desired result structure

As soon as you alter the business use case query to fit the result structure needs, you are mixing things i.e. breaking the principle of separation of concerns. Reusing such an altered query for other use cases with the same business but different representation requirements might not be possible anymore or might not be as efficient as it could be.

Example model

Let’s consider a simple model to serve as a basis for examples to illustrate my point.

@Entity
public class Order {
    @Id
    @GeneratedValue
    private Long id;
    private Instant orderDate;
    @Enumerated
    private OrderStatus status;
    @ElementCollection
    @OrderColumn(name = "position")
    private List<OrderPosition> orderPositions;
    @ManyToOne
    private Customer customer;
    private BigDecimal price;
}
public enum OrderStatus {
    NEW,
    PROGRESS,
    DELIVERED,
    CANCELLED;
}
@Embeddable
public class OrderPosition {
    @ManyToOne
    private Product product;
    private Integer amount;
    private BigDecimal price;
}
@Entity
public class Product {
    @Id
    @GeneratedValue
    private Long id;
    private String name;
    private BigDecimal price;
}
@Entity
public class Customer {
    @Id
    @GeneratedValue
    private Long id;
    private String customerNumber;
    private String firstName;
    private String lastName;
}

In SQL DDL this would be something like the following:

CREATE TABLE order_head(
    id SERIAL NOT NULL,
    order_date TIMESTAMP NOT NULL,
    status INT NOT NULL,
    customer_id BIGINT NOT NULL REFERENCES customer,
    price DECIMAL(16,3) NOT NULL,
    PRIMARY KEY(id)
)
CREATE TABLE order_position(
    order_id BIGINT NOT NULL REFERENCES order_head,
    position INT NOT NULL,
    product_id BIGINT NOT NULL REFERENCES product,
    amount INT NOT NULL,
    price DECIMAL(16,3) NOT NULL,
    PRIMARY KEY(order_id, position)
)
CREATE TABLE product(
    id SERIAL NOT NULL,
    name TEXT NOT NULL,
    price DECIMAL(16,3) NOT NULL,
    PRIMARY KEY(id)
)
CREATE TABLE customer(
    id SERIAL NOT NULL,
    customer_number TEXT NOT NULL,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    PRIMARY KEY(id)
)

Business use case example

As a customer, I want to see my past orders.

A pretty simple business use case that can be implemented with a simple query

SELECT oh.id, oh.order_date, oh.status, oh.price
FROM order_head oh
WHERE oh.customer_id = ?
  AND oh.order_date < CURRENT_TIMESTAMP
  AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED

The query is pure business logic i.e. it implements the data source for the business use case, but the use case does not specify what data to visualize. Let’s assume the customer wants to see the status, order date, final amount and the names of the first 3 products in that overview. The desired result structure could look like the following:

public class OrderOverview {
    private Long id;
    private Instant orderDate;
    private OrderStatus status;
    private List<OrderPositionOverview> orderPositions;
    private BigDecimal price;
}
public class OrderPositionOverview {
    private String productName;
}

There are many ways to fetch the described data but usually the business use case query is adapted to meet the needs.

SELECT oh.id, oh.order_date, oh.status, oh.price, pos.position, p.name
FROM order_head oh
JOIN LATERAL (
    SELECT *
    FROM order_positon pos
    WHERE pos.order_id = oh.id
    ORDER BY pos.position
    LIMIT 3
)               pos ON 1 = 1
JOIN product    p   ON pos.product_id = p.id
WHERE oh.customer_id = ?
  AND oh.order_date < CURRENT_TIMESTAMP
  AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
ORDER BY
    oh.order_date DESC,
    oh.id DESC,
    pos.position

Customers might have dozens of orders and to avoid loading and transferring all data all the time, we add a simple pagination mechanism.

SELECT oh.id, oh.order_date, oh.status, oh.price, pos.position, p.name
FROM (
    SELECT *
    FROM order_head oh_inner
    WHERE oh_inner.id IN (
        SELECT oh.id
        FROM order_head oh
        WHERE oh.customer_id = ?
          AND oh.order_date < CURRENT_TIMESTAMP
          AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
        ORDER BY
            oh.order_date DESC,
            oh.id DESC
        LIMIT ?
        OFFSET ?
    )
) oh
JOIN LATERAL (
    SELECT *
    FROM order_positon pos
    WHERE pos.order_id = oh.id
    ORDER BY pos.position
    LIMIT 3
)               pos ON 1 = 1
JOIN product    p   ON pos.product_id = p.id
ORDER BY
    oh.order_date DESC,
    oh.id DESC,
    pos.position

With all the projection related parts now being part of the query, can you still see the essential part of the business logic that this query implements? Even if you can, you have to admit, it gets harder. Real world queries are often much more complex involving more joins and filter predicates.

As a customer, I want to export my past orders with all positions to a PDF file.

This business use case requires the same data source i.e. past orders of a customer, but requires a different projection i.e. all positions. Implementing this requirement could be done by writing just another query without any reuse:

SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name, pos.position, p.name
FROM order_head oh
JOIN customer       c   ON c.id = oh.customer_id
JOIN order_positon  pos ON pos.order_id = oh.id
JOIN product        p   ON pos.product_id = p.id
WHERE oh.customer_id = ?
  AND oh.order_date < CURRENT_TIMESTAMP
  AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
ORDER BY
    oh.order_date DESC,
    oh.id DESC, pos.position

When you continue with writing just another query for every new business use case that uses the same data source you might get into trouble when the business requirement changes.

If you add a new status value e.g. LOST, you will have to adapt every query, but how do you know which queries to adapt when the business logic is hidden in the queries which are adapted for their representation?
Maybe you will forget to add the new status to a query and get a bug report that some customer can’t see a specific order in a UI. How do you know if the behavior is on purpose i.e. business logic or a problem of your projection?

The example were kept simple on purpose to illustrate the point I am trying to make, that decoupling of business logic from projections is important. Real world use cases are probably much more complicated and can benefit even more from the separation of concerns.

Separation of concerns

Separating the business logic and projection concerns can be done in various ways. Since everything is a relation in SQL, the natural way to do composition is through a relation. Let’s define business logic as a relation that provides at least an identifier for the data source. In case of the past orders, the relation would look like the following:

SELECT oh.*
FROM order_head oh
WHERE oh.customer_id = ?
  AND oh.order_date < CURRENT_TIMESTAMP
  AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED

On top of that, we can define our projection logic yet always keep the business logic as separate relation.

The pagination query for customer orders could be adapted to the following

WITH past_orders AS (
    SELECT oh.*
    FROM order_head oh
    WHERE oh.customer_id = ?
      AND oh.order_date < CURRENT_TIMESTAMP
      AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
)
SELECT oh.id, oh.order_date, oh.status, oh.price, pos.position, p.name
FROM (
    SELECT *
    FROM order_head oh_inner
    WHERE oh_inner.id IN (
        SELECT oh.id
        FROM past_orders oh
        ORDER BY
            oh.order_date DESC,
            oh.id DESC
        LIMIT ?
        OFFSET ?
    )
) oh
JOIN LATERAL (
    SELECT *
    FROM order_positon pos
    WHERE pos.order_id = oh.id
    ORDER BY pos.position
    LIMIT 3
)               pos ON 1 = 1
JOIN product    p   ON pos.product_id = p.id
ORDER BY
    oh.order_date DESC,
    oh.id DESC,
    pos.position

Finding the business logic part of a query is much easier now. Let’s do the same for the reporting query:

WITH past_orders AS (
    SELECT oh.*
    FROM order_head oh
    WHERE oh.customer_id = ?
      AND oh.order_date < CURRENT_TIMESTAMP
      AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
)
SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name, pos.position, p.name
FROM past_orders oh
JOIN customer       c   ON c.id = oh.customer_id
JOIN order_positon  pos ON pos.order_id = oh.id
JOIN product        p   ON pos.product_id = p.id
ORDER BY
    oh.order_date DESC,
    oh.id DESC, pos.position

It’s again very easy to find the business logic.

The benefits?

It’s easier to determine the reason why objects are visible/not visible
Adaptions to the business logic can be done at a single place
Abstractions are usually easier to understand
Decoupling allows to evolve parts independently

Possible downsides

Query planner might not like the query
Overhead of separation of concerns

The query planner part might become apparent when the business logic requires joins that also need to be done for the projection. Here a simple example to illustrate this:

WITH past_orders AS (
    SELECT oh.*
    FROM order_head oh
    JOIN customer   c   ON c.id = oh.customer_id
    WHERE c.customer_number = ?
      AND oh.order_date < CURRENT_TIMESTAMP
      AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
)
SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name
FROM past_orders oh
JOIN customer       c   ON c.id = oh.customer_id

A query planner is probably not able to infer that the join to the customer table in the CTE could be reused for the projection part of the query. The optimal query for this would look like the following:

SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name
FROM order_head oh
JOIN customer   c   ON c.id = oh.customer_id
WHERE c.customer_number = ?
  AND oh.order_date < CURRENT_TIMESTAMP
  AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED

In order to get to this query, the business logic and the projection logic need to be mixed together somehow i.e. the projection should be applied onto the business logic query.

In simple terms, a projection could be represented as a list of joins and select expressions that can be merged with a base query based on a projection root. The projection in the example requires a projection root ROOT that must refer to a order_head. It defines the joins [JOIN(customer c, ROOT.customer_id = c.id)] and the select item expressions [ROOT.id, ROOT.order_date, ROOT.status, ROOT.price, c.first_name, c.last_name].

Applying a projection onto a base query can be achieved by replacing ROOT with the alias for which to apply the projection e.g. oh and replacing projection joins with compatible base query joins if possible. A join J1 is compatible with a join J2 if it uses the same predicate and is cardinality preserving i.e. if J1 is a LEFT JOIN and J2 is an INNER JOIN, then it is compatible if the join source is compatible.

This way, the optimal query can be derived from applying the projection onto the business logic base query. The transformation obviously also works for aggregate projections.

Tool support for separation of concerns

Blaze-Persistence Entity-Views is a library that works on top of JPA which implements the described query transformations in an easy to use manner to be able to benefit from the separation of projection and business logic concerns.

The initially mentioned definition of the desired result structure is almost the definition of an entity view. In fact, entity views are structured according to the desired result structure and annotated to map the data based on an entity definition.

@EntityView(Order.class)
public interface OrderData {
    Long getId();
    Instant getOrderDate();
    OrderStatus getStatus();
    BigDecimal getPrice();
    @Mapping("customer.firstName")
    String getCustomerFirstName();
    @Mapping("customer.lastName")
    String getCustomerLastName();
}

An entity view is a projection of an entity i.e. you map entity attributes to entity view attributes. The default mapping for an entity view attribute is the attribute name i.e. the price entity view attribute as denoted by the abstract method getPrice() will map to the entity attribute price unless a mapping is given via one of the various mapping annotations like e.g. @Mapping.

The customerFirstName attribute maps to the expression customer.firstName. The entity view as a whole represents the previously mentioned projection in terms of a JPA model projection. It defines the joins [JOIN(ROOT.customer c)] and select item expressions [ROOT.id, ROOT.order_date, ROOT.status, ROOT.price, c.firstName, c.lastName].

The entity view is applied to a base query that represents the business logic like this:

CriteriaBuilder<Order> cb = criteriaBuilderFactory.create(entityManager, Order.class);
cb.from(Order.class, "o")
    .where("o.customer.customerNumber").eq("12345")
    .where("orderDate").ltExpression("CURRENT_TIMESTAMP")
    .where("status").inLiterals(OrderStatus.DELIVERED, OrderStatus.CANCELLED);

EntityViewSetting<OrderData, CriteriaBuilder<OrderData>> setting = EntityViewSetting.create(OrderData.class);
// Apply the entity view for the projection root "o"
List<OrderData> list = entityViewManager.applySetting(setting, cb, "o").getResultList();

The resulting JPQL query that is executed

SELECT o.id, o.orderDate, o.status, o.price, c.firstName, c.lastName
FROM Order o
JOIN o.customer c
WHERE c.customerNumber = ?
  AND o.orderDate < CURRENT_TIMESTAMP
  AND o.status IN (OrderStatus.DELIVERED, OrderStatus.CANCELLED)

results in the expected SQL query

SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name
FROM order_head oh
JOIN customer   c   ON c.id = oh.customer_id
WHERE c.customer_number = ?
  AND oh.order_date < CURRENT_TIMESTAMP
  AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED

As you have probably already guessed, entity views are not just a means to define the projection! The query result is a list of entity view objects so you don’t have to map the scalar results back to an object structure manually.

Conclusion

Mixing projection and business logic might be easy at the beginning but can become hard to maintain over time. There are different ways to implement the separation and it certainly has maintainability benefits. To avoid possible performance problems it is important to implement an approach that eliminates unnecessary joins.

With Blaze-Persistence Entity Views this can be achieved quite easily while still working with a JPA model and if desired, also with JPA Criteria API for the query definition.

I would love to hear your thoughts about this topic in the comments section!