Bitfields are a trap

I recently came across this video on Youtube and while the author makes some convincing arguments for using bitfields. I’d like to share my experiences having worked with them in a few applications. A long time ago, I also thought that bitfields were neat. However, in the intervening years, my opinion has shifted. While the prospect of simplifying your schema and being able to pack multiple values into a single column through seems attractive; the complexity cost isn’t worth it.

What are bitfields?

Bitfields are generally stored in integer columns. Inside the integer you use each bit of the field to store status/flag values. For example, within a 32bit integer you can store 31 values. You interact with the bits using bitwise operators to ‘mask’ each bit. For example you could store multiple statuses for an article using 4 bits:

published needs_review comments_allowed promoted
1 (0001) 2 (0010) 4 (0100) 8 (1000)

When you filter or update records you use bitwise operators to filter and mutate values. If you’d like to learn more about bitfields either my previous article or the youtube video I linked are great resources.

Harder to evolve over time

One of the attractive qualities of bitfields is that they simplify schema management as you don’t need to change your schema as often. Instead of having to add and remove columns, you can use new bits as you add flags. Eventually, you’ll stop using bits, or need to reuse a bit. Finding and clearing unused bits requires a backfill operation across every record in a table to clear out stale flags. If you don’t clean up unused flags, you’re faced with increased cognitive load from junk data. Either way, your quick win turns into long term complexity.

More complex to operate than boolean flags

Working with bitfields requires using the bit mask operators |, &, ^, and ~. While all of these operators are well documented and consistently used in most programming languages, they are more complex to read, understand, and operate over the alternatives. As a quick example, which bits are set in 384 that are not set in 192? Even if you can figure that out, what do those bits mean in any given context? I found that answering these questions required memorizing lookup tables, or having to cross reference the bit values each time I needed to interact with a bitfield. Let’s look at some code and SQL to see the additional complexity in action.

Show Plain Text
  1. $entity->status = Post::PUBLISHED | Post::COMMENTS_ALLOWED;

While this example is reasonably simple, it is more complex than manipulating boolean attributes:

Show Plain Text
  1. $entity->published = true;
  2. $entity->comments_allowed = true;

Turning on fields doesn’t seem to rough, but turning fields off is harder:

Show Plain Text
  1. $entity->status = $entity->status & ~Post::COMMENTS_ALLOWED;
  2.  
  3. // vs
  4. $entity->comments_allowed = false;

The additional complexity required for bitfields presents itself more in query logic. Lets filter subscriptions where one of two flags is enabled. Compare using bitfields:

Show Plain Text
  1. $query = <<<SQL
  2. SELECT *
  3. FROM subscriptions
  4. WHERE
  5. status_flags = (status_flags | %s)
  6. OR status_flags = (status_flags | %s)
  7. SQL;
  8. $params = [Subscription::IS_INVOICED, Subsciption::EXTERNAL_BILLING];

And now with using boolean attributes:

Show Plain Text
  1. $query = <<<SQL
  2. SELECT *
  3. FROM subscriptions
  4. WHERE
  5. is_invoiced = true OR external_billing =true
  6. SQL;

Queries where you want to filter out records that have flags disabled are also more complex:

Show Plain Text
  1. $query = <<<SQL
  2. SELECT *
  3. FROM subscriptions
  4. WHERE
  5. (status_flags & %s = 0) AND (status_flags & %s = 0)
  6. SQL;
  7. $params = [Subscription::CLOSED, Subsciption::PAST_DUE];

Compared with:

Show Plain Text
  1. $query = <<<SQL
  2. SELECT *
  3. FROM subscriptions
  4. WHERE
  5. closed = false and past_due = false
  6. SQL;

While we’ve only looked at basic SQL queries, the complexity in manipulating bits often spills out into application logic as your ORM might not provide high-level abstractions for these operations.

Difficult to query efficiently

As you get more flags in your bitfield, you’re going to want to use them for filtering results. This will work great for a while, however you cannot index individual bits in the field. The best you can do is index the flags field, which won’t always improve query performance. You’re quite limited in the options you have for making bitfields efficient. You’ll always need to read the full index in order to match any row. With boolean flags you can index your ‘hot’ column tuples. You can make as many indexes as you have storage for and make all your queries fast.

Training is harder

Finally, most folks that join your team in the future will be familiar with boolean values. The same cannot be said of bitfield and bit masking. You’ll need to teach each and every employee the ropes about how bitfields work, what the landmines are and how to use the operators in the DSL your framework or application exposes work costing your team time.

While 15 years ago, I thought bitfields were pretty neat, and clearly people still do. I think bitfields are an attractive trap and hope that you avoid them. Having worked with bitfields for many years, they are a trap. Instead, I recommend using discrete columns for boolean flags and string enums for enums. While they consume a more in storage space, they make working with data in adhoc query tools, application debuggers and tests much easier as you don’t have to memorize as much. Discrete columns are also easier to maintain, evolve and optimize as necessary.

Comments

Well, I sure wouldn’t want to use bitmaskes without a proper CakePHP behavior.
But having one actually makes it quite easy to work with them, providing a convenient way of having multi-select on a few bools for some rapid development use cases.

Sure, your arguments about performance can be valid with larger row count, but then you can still extract them eventually.

See https://www.dereuromark.de/2012/02/26/bitmasked-using-bitmasks-in-cakephp/ for details on the behavior.

As for enums, I would always recommend int backed enums here, as their advantages far outweigh any string ones.

mark on 1/4/24

Have your say: