Optimize database to reduce resource confliction and deadlock by splitting table into many tables

Are you usually witness this error:

ERROR:  deadlock detected
DETAIL:  Process 24430 waits for ShareLock on transaction 3165848692; blocked by process 30917.
Process 30917 waits for ShareLock on transaction 3165848693; blocked by process 24430.

When working with database (I'm now working with PostgreSQL), no need to mention fact that the deadlock error is one of the most problem which takes long headache time to be solved correctly. By searching and studying on the internet, I witness that many people use the most simple way is re-execute that query after a randomize delay time when receive deadlock error. This solution is realy nice because it very simple but not enough to build a powerful system, however, I would like my system can use resource effeciently. How should we do that? Yep, I hope my post can help you something to think about your best solution.

First of all, I would like to show a not good database design of a stack, we use a table as a stack

stack
  1.         CREATE TABLE stack
  2.         (
  3.                 id bigserial NOT NULL,
  4.                 pop timestamp without time zone, -- The popping time
  5.                 push timestamp without time zone DEFAULT now(),
  6.                 done boolean,
  7.                 CONSTRAINT stack_pk PRIMARY KEY (id)
  8.         )
Hide/show line number
In design, there are three process will operate with table stack
  • The pusher: will insert new item into stack that will lock the table because of CONSTAINT
  • The popper: will select the matched item and then update column pop to NOW()
  • The reporter: will update the column done of a list of item that marked by done

We need column done and pop because if this item is not processed successfully in 30 minutes, the popper will re-select this item to process again, the SQL of the popper look like:

  1.         SELECT *
  2.         FROM stack
  3.         WHERE pop < NOW() - INTERVAL '30 minutes'
  4.                 AND NOT done
  5.         ORDER BY ...
Hide/show line number

At first glance, each of three our processes will lock another because it need to lock the table or rows (if lock to row level) for updating and may be deadlock error will be raised. Therefor, I try to find out another design like this:

stack
  1.         CREATE TABLE stack
  2.         (
  3.                 id bigserial NOT NULL,
  4.                 push timestamp without time zone DEFAULT now(),
  5.                 CONSTRAINT stack_pk PRIMARY KEY (id)
  6.         )
Hide/show line number
 
stack_popping
  1.         CREATE TABLE stack_popping
  2.         (
  3.                 id bigserial NOT NULL,
  4.                 pop timestamp without time zone, -- The popping time
  5.                 done boolean,
  6.                 CONSTRAINT stack_popping_pk PRIMARY KEY (id)
  7.         )
Hide/show line number
 
stack_done
  1.         CREATE TABLE stack_done
  2.         (
  3.                 id bigserial NOT NULL,
  4.                 done boolean,
  5.                 CONSTRAINT stack_popping_pk PRIMARY KEY (id)
  6.         )
  7.        
Hide/show line number
 
Now, what's differents:
  • The pusher: will insert new items into stack table and do lock on it only.
  • The popper: will select matched items from stack_popping table and update pop value of table stack_popping to NOW().
  • The reporter: will update column done in table stack_done to TRUE that mean these stack items has already processed.

Great, the pusher only lock table stack, the popper only lock stack_popping and stack_done is locked only by reporter. No any conflict here, right? So no matter what my SQL is, no deadlock can be raised. Though, we need to do some extras, when the popper want to pop a list of items from stack_popping, it need to do the syncronization task before, the syncronization task will try to select the new inserted item from table stack into it and update column done from stack_done. Don't worry about this task because it can be design to get only just new inserted and update only changed item, more over, this task still only lock table stack_popping only and only do SELECT statement on two others.

I'm not sure this is the best solution but may be it is better enough to improve system performance. Our system work with large data everyday and is designed by this method. Hope that help.