Using CakePHP and a Horizontally Sharded Database
One approach to scaling out a database for a multi-tenant application is to horizontally shard or partition the data by customer. This often takes the form of having multiple identical copies of an application’s schema in each shard. For example customer A’s data would be in shard 1, while customer C’s data would be in shard 2. Often, in addition to the sharded databases, there is a lookup database that knows which shard contains a given customer’s data. By sharding a database in this manner you can horizontally scale an application by creating horizontal instances of your application. It also helps mitigate any given customer’s ability to impact another customer. Sharding creates additional complexity in both the code and operations wise. Before you go down the path of a sharded database weigh the benefits and drawbacks to your application’s architecture.
If you happen to manage a sharded database, using CakePHP has historically been somewhat painful, but the improvements we’ve made in 3.0 greatly simplify working with sharded databases. For this example, I’ll have an application with 3 databases. It will be a very simple Rolodex application where each tenant or customer’s is never mixed. This situation fits a sharded database well, as all the data you will need can live in the shard. The first database we’ll need is a lookup database, and the other 2 will be our shards. We’ll define connections for each of these databases in our config/app.php
:
- // in config/app.php
- 'Datasources' => [
- 'lookup' => [
- 'className' => 'Cake\Database\Connection',
- 'driver' => 'Cake\Database\Driver\Mysql',
- 'host' => 'localhost',
- 'username' => 'root',
- 'password' => '',
- 'database' => 'lookup',
- 'encoding' => 'utf8',
- 'timezone' => 'UTC',
- 'cacheMetadata' => true,
- ],
- 'shard1' => [
- 'className' => 'Cake\Database\Connection',
- 'driver' => 'Cake\Database\Driver\Mysql',
- 'host' => 'localhost',
- 'username' => 'root',
- 'password' => '',
- 'database' => 'shard1',
- 'encoding' => 'utf8',
- 'timezone' => 'UTC',
- 'cacheMetadata' => true,
- ],
- 'shard2' => [
- 'className' => 'Cake\Database\Connection',
- 'driver' => 'Cake\Database\Driver\Mysql',
- 'host' => 'localhost',
- 'username' => 'root',
- 'password' => '',
- 'database' => 'shard2',
- 'encoding' => 'utf8',
- 'timezone' => 'UTC',
- 'cacheMetadata' => true,
- ],
- ],
In our example application, we’ll use a sub-domain to identify tenants. So our application’s URLs look like https://account.example.com
. In our lookup database we have a table that looks like:
- CREATE TABLE account_mappings (
- id INT(11) NOT NULL PRIMARY KEY AUTO_INCREMENT,
- account_id INT(11) NOT NULL,
- name VARCHAR(255) NOT NULL,
- shard_id INT(11) NOT NULL,
- created DATETIME,
- modified DATETIME,
- UNIQUE INDEX account_idx (account_id),
- UNIQUE INDEX name_idx (name)
- );
The above table will allow us to locate each account and know which shard any given account is in. In each our our shards we have a clients table that contains the clients for all of the accounts on each shard:
- CREATE TABLE clients (
- account_id INT(11) NOT NULL,
- client_id INT(11) NOT NULL AUTO_INCREMENT,
- name VARCHAR(255),
- email VARCHAR(255),
- created DATETIME,
- modified DATETIME,
- PRIMARY KEY (account_id, client_id),
- KEY client_idx (client_id)
- );
Defining Models
Now that we have some sample schema, we’ll want to connect CakePHP to our sharded database. The first thing we’ll want to do in each request is locate the shard an account is on so all the ‘sharded’ models can work. First we’ll define our AccountMappingsTable.
- // in src/Model/Table/AccountMappingsTable.php
- namespace App\Model\Table;
- use Cake\ORM\Table;
- class AccountMappingsTable extends Table
- {
- {
- $this->table('account_mappings');
- $this->primaryKey(['id']);
- }
- public static function defaultConnectionName()
- {
- return 'lookup';
- }
- }
While our ClientsTable will look like:
- namespace App\Model\Table;
- use Cake\ORM\Table;
- class ClientsTable extends Table
- {
- {
- $this->table('clients');
- $this->primaryKey(['account_id', 'client_id']);
- }
- }
Accessing the Correct Shard
We’ll want to locate accounts early in the request. To do this we’ll examine the request in a dispatcher filter and create the active shard alias:
- // in src/Routing/Filter/ShardFilter.php
- namespace App\Routing\Filter;
- use Cake\Event\Event;
- use Cake\Datasource\ConnectionManager;
- use Cake\ORM\TableRegistry;
- use Cake\Routing\DispatcherFilter;
- class ShardFilter extends DispatcherFilter
- {
- public function beforeDispatch(Event $event)
- {
- $request = $event->data['request'];
- // Find the account map data for the current request. Missing accounts will raise errors.
- $accountMap = TableRegistry::get('AccountMappings');
- $mapping = $accountMap->findByName($request->subdomains()[0])->firstOrFail();
- // Alias the active account's shard to our 'default' connection.
- ConnectionManager::alias('shard' . $mapping->shard_id, 'default');
- // Store the active account in the session
- Configure::write('Account.active', $mapping->account_id);
- }
- }
- // In config/bootstrap.php
- DispatcherFactory::add('Shard');
At this point, our application can now connect to each of the shards in a way that is transparent to the rest of the application. We haven’t solved the problem of ensuring we only ever access the currently active account’s data, but that is a topic for another day. You should be able to use bake
to create a skeleton for the clients table and create some clients. Before you can update or view clients, you will have to update the controller methods so that the Clients->get()
calls include both columns in the composite key.
Hopefully if you’re thinking of building or already work with a sharded multi-tenant database this gives you some ideas on how you could adapt CakePHP to work well with a bit of extra work. The code I’ve included in this post is by no means battle tested and is intended to act as inspiration not canon.
useful article.thanks
one question though, how w’d you distribute accounts across shards at app level. like when saving new account
cholthi on 3/13/15
Sounds very nice! And just on the right moment while working on a project :) Thanks!
Bob on 3/16/15
thank,
a1d_line_in on 3/23/15
This was really really useful.
I would like to change between shards from my CLI shells.
Currently, I am doing the trick from every method inside the shell.
Like this: https://gist.github.com/Tzaoh/3f5d0f3eceae7aec9cd6
I would like to change the alias in the initialize method, (because I will have to repeat that two lines of code each method I will create in the sell) but I am not able to get the shells parameters. I think they are unreachable from there.
Is there any way to accomplish that?
Thanks
Tzaoh on 3/30/15
cholthi: There are a few ways you can do that. In the past I’ve just put new accounts into the newest shard. Overtime though, you’ll probably need to re-balance shards as accounts become inactive. Alternatively you can randomly assign accounts in to the various shards round robin style. I’ve also found it useful to have way to track the ‘next account id’ in the lookup database.
mark story on 4/3/15
I want everything to be automatic. I don’t want to save the database config in config file. I want the visitor to register and becomes my client and everything would be done automatically.
I am currently doing the multi database by creating database connection from session of client. Which is becoming painful for me now and difficult to extend.
Can you please help me in?
Anupal on 12/3/15
Anupal: I wouldn’t recommend trying to dynamically create the connection credentials. Generally additional databases need to be made manually or through automated tooling. This tooling could also be used to update your configuration files.
mark story on 12/4/15
Hi
Thank you for your helpful article
I have one questiom
How can i configure so that each tenant will have separate tmp folder to store cache , model , session data.
Dev on 2/23/16
DispatcherFilter is deprecated in cakephp3.3
Pavan on 11/5/16
Pavan: Yes. Post 3.3 You could do the same thing using a Middleware Layer.
mark story on 11/25/16