By Christian Beikov on 26 April 2020
After a long Twitter discussion about the benefits of the separation of logic and projection concerns for database queries I thought it might be good to write a post about it. In software engineering, we usually agree that separation of concerns is a good thing as it allows to think about concerns in isolation which makes reasoning about behavior easier. Why not apply the separation of concerns principle to database queries as well?
When you fulfill a business use case that involves a database you usually have to think about the following things:
-
Formulation of the business use case as query to serve as data source
-
Definition of the structure of the result i.e. the DTO definition
-
Enrich the query for the business use case to fit the desired result structure
-
Map the query result to the desired result structure
As soon as you alter the business use case query to fit the result structure needs, you are mixing things i.e. breaking the principle of separation of concerns. Reusing such an altered query for other use cases with the same business but different representation requirements might not be possible anymore or might not be as efficient as it could be.
Example model
Let’s consider a simple model to serve as a basis for examples to illustrate my point.
@Entity public class Order { @Id @GeneratedValue private Long id; private Instant orderDate; @Enumerated private OrderStatus status; @ElementCollection @OrderColumn(name = "position") private List<OrderPosition> orderPositions; @ManyToOne private Customer customer; private BigDecimal price; } public enum OrderStatus { NEW, PROGRESS, DELIVERED, CANCELLED; } @Embeddable public class OrderPosition { @ManyToOne private Product product; private Integer amount; private BigDecimal price; } @Entity public class Product { @Id @GeneratedValue private Long id; private String name; private BigDecimal price; } @Entity public class Customer { @Id @GeneratedValue private Long id; private String customerNumber; private String firstName; private String lastName; }
In SQL DDL this would be something like the following:
CREATE TABLE order_head( id SERIAL NOT NULL, order_date TIMESTAMP NOT NULL, status INT NOT NULL, customer_id BIGINT NOT NULL REFERENCES customer, price DECIMAL(16,3) NOT NULL, PRIMARY KEY(id) ) CREATE TABLE order_position( order_id BIGINT NOT NULL REFERENCES order_head, position INT NOT NULL, product_id BIGINT NOT NULL REFERENCES product, amount INT NOT NULL, price DECIMAL(16,3) NOT NULL, PRIMARY KEY(order_id, position) ) CREATE TABLE product( id SERIAL NOT NULL, name TEXT NOT NULL, price DECIMAL(16,3) NOT NULL, PRIMARY KEY(id) ) CREATE TABLE customer( id SERIAL NOT NULL, customer_number TEXT NOT NULL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, PRIMARY KEY(id) )
Business use case example
As a customer, I want to see my past orders.
A pretty simple business use case that can be implemented with a simple query
SELECT oh.id, oh.order_date, oh.status, oh.price FROM order_head oh WHERE oh.customer_id = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
The query is pure business logic i.e. it implements the data source for the business use case, but the use case does not specify what data to visualize. Let’s assume the customer wants to see the status, order date, final amount and the names of the first 3 products in that overview. The desired result structure could look like the following:
public class OrderOverview { private Long id; private Instant orderDate; private OrderStatus status; private List<OrderPositionOverview> orderPositions; private BigDecimal price; } public class OrderPositionOverview { private String productName; }
There are many ways to fetch the described data but usually the business use case query is adapted to meet the needs.
SELECT oh.id, oh.order_date, oh.status, oh.price, pos.position, p.name FROM order_head oh JOIN LATERAL ( SELECT * FROM order_positon pos WHERE pos.order_id = oh.id ORDER BY pos.position LIMIT 3 ) pos ON 1 = 1 JOIN product p ON pos.product_id = p.id WHERE oh.customer_id = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED ORDER BY oh.order_date DESC, oh.id DESC, pos.position
Customers might have dozens of orders and to avoid loading and transferring all data all the time, we add a simple pagination mechanism.
SELECT oh.id, oh.order_date, oh.status, oh.price, pos.position, p.name FROM ( SELECT * FROM order_head oh_inner WHERE oh_inner.id IN ( SELECT oh.id FROM order_head oh WHERE oh.customer_id = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED ORDER BY oh.order_date DESC, oh.id DESC LIMIT ? OFFSET ? ) ) oh JOIN LATERAL ( SELECT * FROM order_positon pos WHERE pos.order_id = oh.id ORDER BY pos.position LIMIT 3 ) pos ON 1 = 1 JOIN product p ON pos.product_id = p.id ORDER BY oh.order_date DESC, oh.id DESC, pos.position
With all the projection related parts now being part of the query, can you still see the essential part of the business logic that this query implements? Even if you can, you have to admit, it gets harder. Real world queries are often much more complex involving more joins and filter predicates.
As a customer, I want to export my past orders with all positions to a PDF file.
This business use case requires the same data source i.e. past orders of a customer, but requires a different projection i.e. all positions. Implementing this requirement could be done by writing just another query without any reuse:
SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name, pos.position, p.name FROM order_head oh JOIN customer c ON c.id = oh.customer_id JOIN order_positon pos ON pos.order_id = oh.id JOIN product p ON pos.product_id = p.id WHERE oh.customer_id = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED ORDER BY oh.order_date DESC, oh.id DESC, pos.position
When you continue with writing just another query for every new business use case that uses the same data source you might get into trouble when the business requirement changes.
-
If you add a new status value e.g.
LOST
, you will have to adapt every query, but how do you know which queries to adapt when the business logic is hidden in the queries which are adapted for their representation? -
Maybe you will forget to add the new status to a query and get a bug report that some customer can’t see a specific order in a UI. How do you know if the behavior is on purpose i.e. business logic or a problem of your projection?
The example were kept simple on purpose to illustrate the point I am trying to make, that decoupling of business logic from projections is important. Real world use cases are probably much more complicated and can benefit even more from the separation of concerns.
Separation of concerns
Separating the business logic and projection concerns can be done in various ways. Since everything is a relation in SQL, the natural way to do composition is through a relation. Let’s define business logic as a relation that provides at least an identifier for the data source. In case of the past orders, the relation would look like the following:
SELECT oh.* FROM order_head oh WHERE oh.customer_id = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
On top of that, we can define our projection logic yet always keep the business logic as separate relation.
The pagination query for customer orders could be adapted to the following
WITH past_orders AS ( SELECT oh.* FROM order_head oh WHERE oh.customer_id = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED ) SELECT oh.id, oh.order_date, oh.status, oh.price, pos.position, p.name FROM ( SELECT * FROM order_head oh_inner WHERE oh_inner.id IN ( SELECT oh.id FROM past_orders oh ORDER BY oh.order_date DESC, oh.id DESC LIMIT ? OFFSET ? ) ) oh JOIN LATERAL ( SELECT * FROM order_positon pos WHERE pos.order_id = oh.id ORDER BY pos.position LIMIT 3 ) pos ON 1 = 1 JOIN product p ON pos.product_id = p.id ORDER BY oh.order_date DESC, oh.id DESC, pos.position
Finding the business logic part of a query is much easier now. Let’s do the same for the reporting query:
WITH past_orders AS ( SELECT oh.* FROM order_head oh WHERE oh.customer_id = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED ) SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name, pos.position, p.name FROM past_orders oh JOIN customer c ON c.id = oh.customer_id JOIN order_positon pos ON pos.order_id = oh.id JOIN product p ON pos.product_id = p.id ORDER BY oh.order_date DESC, oh.id DESC, pos.position
It’s again very easy to find the business logic.
The benefits?
-
It’s easier to determine the reason why objects are visible/not visible
-
Adaptions to the business logic can be done at a single place
-
Abstractions are usually easier to understand
-
Decoupling allows to evolve parts independently
Possible downsides
-
Query planner might not like the query
-
Overhead of separation of concerns
The query planner part might become apparent when the business logic requires joins that also need to be done for the projection. Here a simple example to illustrate this:
WITH past_orders AS ( SELECT oh.* FROM order_head oh JOIN customer c ON c.id = oh.customer_id WHERE c.customer_number = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED ) SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name FROM past_orders oh JOIN customer c ON c.id = oh.customer_id
A query planner is probably not able to infer that the join to the customer table in the CTE could be reused for the projection part of the query. The optimal query for this would look like the following:
SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name FROM order_head oh JOIN customer c ON c.id = oh.customer_id WHERE c.customer_number = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
In order to get to this query, the business logic and the projection logic need to be mixed together somehow i.e. the projection should be applied onto the business logic query.
In simple terms, a projection could be represented as a list of joins and select expressions that can be merged with a base query based on a projection root. The projection in the example requires a projection root ROOT
that must refer to a order_head
. It defines the joins [JOIN(customer c, ROOT.customer_id = c.id)]
and the select item expressions [ROOT.id, ROOT.order_date, ROOT.status, ROOT.price, c.first_name, c.last_name]
.
Applying a projection onto a base query can be achieved by replacing ROOT
with the alias for which to apply the projection e.g. oh
and replacing projection joins with compatible base query joins if possible. A join J1 is compatible with a join J2 if it uses the same predicate and is cardinality preserving i.e. if J1 is a LEFT JOIN
and J2 is an INNER JOIN
, then it is compatible if the join source is compatible.
This way, the optimal query can be derived from applying the projection onto the business logic base query. The transformation obviously also works for aggregate projections.
Tool support for separation of concerns
Blaze-Persistence Entity-Views is a library that works on top of JPA which implements the described query transformations in an easy to use manner to be able to benefit from the separation of projection and business logic concerns.
The initially mentioned definition of the desired result structure is almost the definition of an entity view. In fact, entity views are structured according to the desired result structure and annotated to map the data based on an entity definition.
@EntityView(Order.class) public interface OrderData { Long getId(); Instant getOrderDate(); OrderStatus getStatus(); BigDecimal getPrice(); @Mapping("customer.firstName") String getCustomerFirstName(); @Mapping("customer.lastName") String getCustomerLastName(); }
An entity view is a projection of an entity i.e. you map entity attributes to entity view attributes. The default mapping for an entity view attribute is the attribute name i.e. the price
entity view attribute as denoted by the abstract method getPrice()
will map to the entity attribute price
unless a mapping is given via one of the various mapping annotations like e.g. @Mapping
.
The customerFirstName
attribute maps to the expression customer.firstName
. The entity view as a whole represents the previously mentioned projection in terms of a JPA model projection. It defines the joins [JOIN(ROOT.customer c)]
and select item expressions [ROOT.id, ROOT.order_date, ROOT.status, ROOT.price, c.firstName, c.lastName]
.
The entity view is applied to a base query that represents the business logic like this:
CriteriaBuilder<Order> cb = criteriaBuilderFactory.create(entityManager, Order.class); cb.from(Order.class, "o") .where("o.customer.customerNumber").eq("12345") .where("orderDate").ltExpression("CURRENT_TIMESTAMP") .where("status").inLiterals(OrderStatus.DELIVERED, OrderStatus.CANCELLED); EntityViewSetting<OrderData, CriteriaBuilder<OrderData>> setting = EntityViewSetting.create(OrderData.class); // Apply the entity view for the projection root "o" List<OrderData> list = entityViewManager.applySetting(setting, cb, "o").getResultList();
The resulting JPQL query that is executed
SELECT o.id, o.orderDate, o.status, o.price, c.firstName, c.lastName FROM Order o JOIN o.customer c WHERE c.customerNumber = ? AND o.orderDate < CURRENT_TIMESTAMP AND o.status IN (OrderStatus.DELIVERED, OrderStatus.CANCELLED)
results in the expected SQL query
SELECT oh.id, oh.order_date, oh.status, oh.price, c.first_name, c.last_name FROM order_head oh JOIN customer c ON c.id = oh.customer_id WHERE c.customer_number = ? AND oh.order_date < CURRENT_TIMESTAMP AND oh.status IN (2, 3) -- 2=DELIVERED, 3=CANCELLED
As you have probably already guessed, entity views are not just a means to define the projection! The query result is a list of entity view objects so you don’t have to map the scalar results back to an object structure manually.
Conclusion
Mixing projection and business logic might be easy at the beginning but can become hard to maintain over time. There are different ways to implement the separation and it certainly has maintainability benefits. To avoid possible performance problems it is important to implement an approach that eliminates unnecessary joins.
With Blaze-Persistence Entity Views this can be achieved quite easily while still working with a JPA model and if desired, also with JPA Criteria API for the query definition.
I would love to hear your thoughts about this topic in the comments section!