Tips and Tricks for Clean RDB Schemas

I'm online!



λ



Scala



JS

Mongo
Riak
Redis
Couchbase
CouchDB
ElasticSearch

Titan
Cassandra
Kafka
Hbase
Neo4J
Datomic

I've played with lots of NoSQL dbs over the years



Postgres

But most of my work is done with postgres nowadays

Why bother with SQL?

Too complex, not flexible enough

relational algebra

Very well defined mathematical foundations. Clear, rigorous design

relational thinking

Very natural way to think



Data consistency

Strong guarantees regarding the validity and the consistency of stored data

Exploratory queries

Very easy to mix and match data. Data can be used in unforeseen ways due to good composability

All of this works only if you have a properly designed schema

Reliance on the data model

Pay attention to your schema

using a RDBMS without paying attention to the db schema is foolish. If you want an unstructured object dump, use document / graph stores

Write your schema yourself

Your apps and your DB are 2 separate bricks, with an untyped boundary. Acknowledge this, don't try to hide it.

Don't let a program design your schema

The DB schema is central to your architecture. Handle it yourself

Avoid ORMs

ORMs may be useful in certain, very specific cases, often related to operational concerns, not design-related. When it's design related, maybe a document store would be better.

DB-agnosticism

If you're not using db-specific features, you may not need a rdbms at all

Use postgreSQL

If you want to use open source stuff

Avoid ORMs

But please use data mapping libraries

Anorm

  SQL("""select student.student_id,
           student.name,
           student.birth_date""")
    .as(parser.*)

Anorm

val parser = get[UUID]("student_id") ~
             str("name") ~
             get[DateTime]("birth_date")

Anorm

  SQL(selectSQL[Student])
    .as(parser[Student].*)

Slick

val q2 = for {
  c <- coffees if c.price < 9.0
  s <- suppliers if s.id === c.supID
} yield (c.name, s.name)

JOOQ

Result<Record3<String, String, String>> result =
create.select(
        BOOK.TITLE,
        AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME)
      .from(BOOK)
      .join(AUTHOR)
      .on(BOOK.AUTHOR_ID.equal(AUTHOR.ID))
      .where(BOOK.PUBLISHED_IN.equal(1948))
      .fetch();

DRY =!> ORM

You can stay dry without an ORM.

Stay in control of the fetching depth

With lazy loading, you can have complex queries automatically performed without joins. The query depend on the code in your templates. Is your frontend dev knowledgable in DBs ?

Model your data with flat records

Models representing the DB fields

Make joins explicit

Composite objects representing joins

Now let's design great schemas

Entities

Roughly corresponds to objects

Relations

Relations between entities

Normal Forms

1NF

The key

Homogeneous fields. Some fields are key fields

2NF

The whole key

All the non-key fields depend on all the key fields. If a field depends only on a part of the key field then there is a need for a 2nd entity

3NF

Nothing but the key

A non-key field cannot depend on another non-key field

Boyce Codd NF

So help me Codd

A bit stronger than 3FN: forbids redundency caused by functional dependencies

That's all well and good

Let's get practical

Design with Querying in mind

Composite entities can be good

Split data with Querying in mind

Consistent naming

Singular table names

Uniform naming for PKs & FKs

<table_name>_id

Uniform naming for PKs & FKs

select <fields> from
  table_1
  inner join table_2
    on table_1.table_1_id =
       table_2.table_2_id

Uniform naming for PKs & FKs

select <fields> from
  table_1
  inner join table_2
    using (table_1_id)

Uniform naming for PKs & FKs

select <fields> from
  table_1
  natural join table_2

Primary Keys

Use them

(I shouldn't even have to say that)

Use UUIDs

(or random IDs)

Prevent entity enumeration

Prevent growth rate disclosure

Avoid linking the wrong table

Default to not null

Don't fear the join

Avoid deletions

Avoid deletions

created_at
    timestamp with time zone not null,
deleted_at
    timestamp with time zone

Don't let invalid states be representable

Use rich types

Use precise types

No more integers used as flags

4,294,967,295

Use enums

create type status
as enum('pending', 'validated');

Use enums even when a boolean could do

create type status
as enum('pending', 'validated');

Do that in your code too

Boolean Blindness

Use precise types

How do you store IP addresses?

varchar(15), right?

Use precise types

inet (IP address)
timestamp with time zone
point (2D point)
tstzrange (time range)
interval (duration)

Create your own

    create type my type
    as (
        field1 int,
        field2 text
    );

Use constraints when types are not enough

Rich types => powerful constraints

create table reservation(
    reservation_id uuid primary key,
    dates tstzrange not null,
    exclude using gist (dates with &&)
);

Use arrays

    select '{1,2,3}'::int[]

You can store K/V data

Hstore

SELECT 'a=>1,a=>2'::hstore;
  hstore
----------
 "a"=>"1"

You can store JSON data

Json / Jsonb

SELECT
('{ "a": "b"}'::jsonb)->>'a' as field;
  field
----------
 "b"

jsonb

Structure your results

array_agg

select
  article.*,
  array_agg(comment.*) as comments
from
  article
  left join comment using(article_id)

C2C Json

select
  row_to_json(article.*) as article,
  row_to_json(author.*) as author
from
  article
  inner join author using(author_id)

C2C Json

select
  row_to_json(article.*) as article,
  array_to_json(
    array_agg(
      row_to_json(comment.*))) as comments
from article
inner join author using(author_id)

Common Table Expressions

with sub_request as (
    select <fields> from table1
)

select <fields> from sub_request;

Shameless plug

jDbT

https://github.com/divarvel/jdbt

jDbT

status:
  - Test
  - Prod

jDbT

member:
  name: text
  email: text
  status: status | 'Test'
  __unique: [ name, email ]

jDbT

post:
  member_id:
  +?title: text
  ?content: text

jDbT

tag:
    +name: text
    __check: name <> 'forbidden'

jDbT

post_tag:
    post_id:
    tag_id:
    __pk: [ tag_id, post_id ]

jDbT

jDbT

http://jdbt-api.cleverapps.io

So

Acknowledge the app / DB boundary

The boundary is there, trying to hide it will only result in strange shit happening at runtime

Design your schema

It's central to your overall design

… with querying in mind

Your database is only there to be queried, don't overthink stuff that's not needed.

(avoid ORMs)

I don't say it out loud bc I'm always afraid of being thrown out with tar and feathers. Depends on what is an ORM to you

Use precise types

PostgreSQL provides wonderful types and is extensible.

Denormalize responsibly

you can have nested instances, as long as you always access the child instance through its parent, that's ok

Thanks

geecon15roxx