Window functions and Common Table Expressions coming to CakePHP

CakePHP 4.1.0 is going to be released soon. Thanks to othercorey and ndm2 4.1.0 adds powerful new features to the ORM – Window functions and Common Table Expressions.

Window functions

Window functions enable you to generate aggregations for ‘windows’ of your data. You define what the ‘window’ is by telling the database how to partition your data into frames around the current row, and perform aggregations across the rows in each frame.

Some example problems you could solve with window functions are:

- Getting the percentage of total views for each article an author has written.
- Getting a running total for timeseries, or financial data.
- Computing values based on data in adjacent rows. For example computing time between train stations, or free time in a person’s calendar.

A minimal example of window functions using the schema from the CMS tutorial would be:

Show Plain Text
  1. $query = $this->Articles->find();
  2.     $query->select([
  3.         'Articles.title',
  4.         'Articles.user_id',
  5.         'Articles.views',
  6.         'running_total_views' => $query
  7.             ->func()
  8.             ->sum('Articles.views')
  9.             ->partition('Articles.user_id')
  10.             ->order('Articles.views ASC')
  11.             ->rows(0, null),
  12.     ])
  13.     ->orderDesc('Articles.user_id')
  14.     ->orderDesc('Articles.views')
  15.     ->disableHydration();
  16.  
  17.     // Each row will have a `running_total_views` key.
  18.     debug($query->toArray());
  19.  
  20.     // The output of the above would be:
  21.     [
  22.         [
  23.             'title' => 'First Article',
  24.             'user_id' => 1,
  25.             'views' => 1,
  26.             'running_total_views' => 1,
  27.         ],
  28.         [
  29.             'title' => 'Latest Article',
  30.             'user_id' => 1,
  31.             'views' => 4,
  32.             'running_total_views' => 5,
  33.         ],
  34.     ]

You could get the average views using a window function like this:

Show Plain Text
  1. $query = $this->Articles->find();
  2.     $query->select([
  3.         'Articles.title',
  4.         'Articles.user_id',
  5.         'Articles.views',
  6.         // get average views
  7.         'average_views' => $query
  8.             ->func()
  9.             ->avg('views')
  10.             ->partition('Articles.user_id'),
  11.     ])
  12.     ->orderDesc('Articles.user_id')
  13.     ->orderDesc('Articles.views')
  14.     ->disableHydration();
  15.  
  16.     // The output of the above would be:
  17.     [
  18.         [
  19.             'title' => 'First Article',
  20.             'user_id' => 1,
  21.             'views' => 1,
  22.             'average_views' => 2.5,
  23.         ],
  24.         [
  25.             'title' => 'Latest Article',
  26.             'user_id' => 1,
  27.             'views' => 4,
  28.             'running_total_views' => 2.5,
  29.         ],
  30.     ]

Window functions can be a bit opaque to work with at times, but we can use a GROUP_CONCAT to dump out the data each window is looking at:

Show Plain Text
  1. $query = $this->Articles->find();
  2.     $query->select([
  3.         'Articles.title',
  4.         'Articles.user_id',
  5.         'Articles.views',
  6.         // Look at windows
  7.         'debug' => $query
  8.             ->func()
  9.             ->aggregate(
  10.                 'group_concat',
  11.                 ['Articles.views' => 'literal', ','],
  12.                 [],
  13.                 'string'
  14.             )
  15.             ->partition('Articles.user_id')
  16.             ->rows(0, null),
  17.     ])
  18.  
  19.     // The output of the above would be:
  20.     [
  21.         [
  22.             'title' => 'First Article',
  23.             'user_id' => 1,
  24.             'views' => 1,
  25.             'debug' => '1',
  26.         ],
  27.         [
  28.             'title' => 'Latest Article',
  29.             'user_id' => 1,
  30.             'views' => 4,
  31.             'debug' => '1,4',
  32.         ],
  33.     ]

Common Table Expressions

Common table expressions which are often abbreviated to ‘CTE’ enable you to create intermediate result sets and then use those tables to build queries on. They serve a similar purpose to database views or derived tables. However, they differ from derived tables and views a few ways. First you don’t have to maintain schema for a common table expression. Second, you can reference the results of a common table expression multiple times without incuring performance penalties, unlike derived table joins.

As an example, we want to fetch a list of tags, and the number of articles each one has.

Show Plain Text
  1. $query = $this->Articles->find();
  2.     $query->with(function (CommonTableExpression $cte, Query $query) {
  3.         $q = $query->cleanCopy();
  4.         $q->repository($this->Articles->Tags->junction());
  5.  
  6.         $q->select([
  7.             'article_id' => 'article_id',
  8.             'tag_count' => $q->func()->count('*'),
  9.         ])
  10.         ->from('articles_tags')
  11.         ->group(['article_id']);
  12.  
  13.         return $cte
  14.             ->name('tag_counts')
  15.             ->query($q);
  16.     });
  17.  
  18.     $query->select([
  19.         'title',
  20.         'tag_count' => 'tag_counts.tag_count'
  21.     ])
  22.     ->join([
  23.         'tag_counts' => [
  24.             'table' => 'tag_counts',
  25.             'type' => 'LEFT',
  26.             'conditions' => 'tag_counts.article_id = Articles.id'
  27.         ],
  28.     ])
  29.     ->disableHydration();
  30.  
  31.     // The above would output
  32.     [
  33.         [
  34.             'title' => 'First Article',
  35.             'tag_count' => 4,
  36.         ]
  37.     ]

I have found common table expressions most useful in reporting queries when I need to break up a big complex query into easier to understand blocks, or merge the results of multiple aggregate queries together. The ORM will support common table expressions in SELECT, INSERT, UPDATE and DELETE operations. Window functions, and common table expressions work with Postgres, SQLServer, SQLite and MySQL 8.0+. I’m really happy to see the CakePHP ORM continuing to evolve and support advanced SQL features. While my examples are simple they illustrate how straightforward the new APIs introduced in 4.1.0 are to use, and I hope they help you build even better applications.

Comments

Excelent! Thank you!

Guishito on 10/4/20

It is possible to use the query ‘with recursive’? I am trying to get the data from a bill of materials with sub assemblies.

Guishito on 10/22/21

Guishito: Yes, you can use the @recursive()@ method on the CommonTableExpression object to make a @WITH RECURSIVE@ style statement.

mark story on 11/23/21

Have your say: