Recursively grant user access to the database in PostgreSQL

Posted by: Arthur Tue, 21 Dec 2010 15:55:12 GMT

While PostgreSQL supports GRANT ALL PRIVILEGES ON DATABASE database_name TO user_name, it won't recursively grant access to all the tables of this database. While it is possible to iterate through all the tables with PL/PgSQL or some shell script, plain "grant everything" could be achieved by a single line in console:

pg_dump --schema-only database_name |
egrep '(GRANT.*postgres;|search_path)' |
sed 's/postgres;/user_name;/' |
psql database_name

This basically copies all the privileges of postgres user declared within specified database_name to user_name without granting access to objects in any other database. search_path thing is required if database contains any custom schemas. Of course, additional attention should be paid for databases which contain objects with names ending on "postgres" or containing "search_path" in their name.

Posted in  | Tags ,  | 
no comments

Calling super without parameters in Ruby

Posted by: Arthur Tue, 12 Oct 2010 03:49:06 GMT

Calling just super from child method will invoke parent's method with all the arguments received by children. Sometimes it could be problematic when you need to invoke parent method without any arguments (since it doesn't have them):

class Parent
  def initialize
    @a = 1
  end
end

class Child < Parent
  def initialize(b)
    super
    @b = b
  end
end

c = Child.new 2

Executing this code will result in an error:

ArgumentError: wrong number of arguments (1 for 0)

The error is caused by an attempt to call Parent#super with argument initially passed to Child object. To avoid sending all the child's arguments to parent, *[] notation can be used:

...

class Child < Parent
  def initialize(b)
    super *[] # no args are passed
    @b = b
  end
end

Posted in  | Tags ,  | 
no comments

Ruby Syntax Ambiguosity: Hash or block?

Posted by: Arthur Fri, 01 Oct 2010 02:32:48 GMT

Ruby has the same syntax for declaring Hash and block, namely curly braces:

a = {}

b = lambda {}

While in most cases those uses are distinguishable, sometimes they can play a trick on Ruby developer:

def braces_trick(arg = nil)
  puts "arg = #{arg.inspect}"
  puts "block = #{block.inspect}"
end

braces_trick {}

# Guess what?
ruby-1.8.7-p174 > braces_trick {}
arg = nil
block = #<Proc:0x0000000000000000@(irb):10>

Posted in  | Tags ,  | 
no comments

Symlinking classes in Ruby

Posted by: Arthur Thu, 22 Jul 2010 14:03:45 GMT

ln -s for classes made easy:

class Something
end

module LinkSomething
  def self.extended(base)
    base.const_set 'LinkedSomething', ::Something
  end
end

class Folder
  extend LinkSomething

  class InheritSymlink < LinkedSomething; end

  puts LinkedSomething.inspect  # => Something
  puts InheritSymlink.inspect   # => Folder::InheritSymlink
end

Is there any use of this? Yes, sometimes it could be used to split inheritable functionality to the other class and thus separate concerns. Although in most cases simple inheritance and module reopening will be a much better solution.

Posted in  | Tags ,  | 
no comments

PostgreSQL concurrency

Posted by: Arthur Thu, 18 Feb 2010 04:53:37 GMT

Multi-threaded databases cannot guarantee the order of queries execution. And while it is obvious for separate queries running from separate connections, it's not so explicit when you are dealing with functions and transactions. This issue is generally inherent to all databases which operate with queries concurrently, and I'll cover PostgreSQL here as it's the main db engine I'm using.

A problem

Consider a classical example: transferring funds from one user account to other, where negative balances are prohibited. For this example, let's assume that two accounts are present and we have a PostgreSQL function to handle funds transfer:

CREATE LANGUAGE plpgsql;

CREATE TABLE balance(
  user_id SERIAL PRIMARY KEY,
  money INTEGER NOT NULL
);

INSERT INTO balance(user_id, money) VALUES (1, 500);
INSERT INTO balance(user_id, money) VALUES (2, 200);

CREATE OR REPLACE FUNCTION transfer_money(from_user_id INTEGER,
                                            to_user_id INTEGER,
                                                amount INTEGER)
RETURNS BOOLEAN AS $$
BEGIN
  IF (SELECT money >= amount FROM balance WHERE user_id = from_user_id) THEN
    UPDATE balance SET money = (money - amount) WHERE user_id = from_user_id;
    UPDATE balance SET money = (money + amount) WHERE user_id = to_user_id;
    RETURN true;
  ELSE
    RETURN false;
  END IF;
END
$$ LANGUAGE plpgsql;

Functions in PostgreSQL are executed within a single transaction, but this doesn't safeguard us in any way against the fact that concurrent call to such function could modify balance of a user between IF and first UPDATE. We can simulate this behavior by creating a special slowed down function:

CREATE OR REPLACE FUNCTION transfer_money_slow(from_user_id INTEGER,
                                                 to_user_id INTEGER,
                                                     amount INTEGER)
RETURNS BOOLEAN AS $$
BEGIN
  IF (SELECT money >= amount FROM balance WHERE user_id = from_user_id) THEN
    EXECUTE pg_sleep(2); -- sleep for 2 seconds
    UPDATE balance SET money = (money - amount) WHERE user_id = from_user_id;
    UPDATE balance SET money = (money + amount) WHERE user_id = to_user_id;
    RETURN true;
  ELSE
    RETURN false;
  END IF;
END
$$ LANGUAGE plpgsql;

Then, we need to execute slow function first and while it sleeps, we run the fast one:

$ psql -c 'SELECT transfer_money_slow(1, 2, 100);'

# Then in the second shell:
$ psql -c 'SELECT transfer_money(1, 2, 450);'

# And that's what we get:
$ psql -c 'SELECT * FROM balance'
 user_id | money
---------+-------
       1 |   -50
       2 |   750
(2 rows)

Oops, we've just lost $50!

Transaction doesn't ensure that data cannot be changed between queries inside it. Rather, it provides ability to roll all of them back at once.

Solutions

Check constraints

We have a clearly defined requirement: balance should not go under 0 under any circumstances. So, why not to enforce this constraint? That's how initial table could be defined from the start:

CREATE TABLE balance(
  user_id SERIAL PRIMARY KEY,
  money INTEGER NOT NULL,
  CHECK(money > 0)
);

Then if we try to simulate the same behavior with concurrent function calls, we'll get an error message saying:

ERROR:  new row for relation "balance" violates check constraint "balance_money_check"

So, it prevents balances from being overused. But check constraints are not an universal cure. For example, what if you need to block any concurrent query which could access the same data?

Locking

The simplest (and most popular) example where locking is needed to prevent concurrent access to objects is a queue. Consider having a list of jobs which need to be done. And the requirement is that no job can be executed twice:

CREATE TABLE jobs(
  job_id SERIAL PRIMARY KEY,
  finished BOOLEAN NOT NULL DEFAULT false,
  description TEXT
);

INSERT INTO jobs (description) VALUES ('Go for a walk');
INSERT INTO jobs (description) VALUES ('Drink a cup of tea');

Another requirement is that jobs aren't deleted from the queue - rather, they are marked as "finished". The first idea which comes to mind is adding a column "in_process" to track which jobs are currently being executed. But that has two problems:

  1. Two concurrent workers can read "false" from "in_process" and simultaneously set it to "true".
  2. If the only worker executing the job crashes, we don't know about this unless we somehow periodically ping the worker itself.

The first issue could be solved with some check constraint which ensures that "in_process" could not be changed to "true" once it is already set to "true". Although, PostgreSQL check constraints don't allow to compare old and new values and using trigger for such check will be prone to the same issue as it will try to solve.

So, the locking comes to help. There are three types of explicit locking in PostgreSQL: table-level locks, row-level locks and advisory locks.

Table-level and row-level locks are standard way to deal with locking in SQL databases. Although, they are prone to being affected by deadlocks, which become a real problem under a heavy load. While there are simple cases when deadlocks could be prevented, with some complex applications such kind of locking is inappropriate.

And here come advisory locks - a native PostgreSQL solution for those troubles. The main advantage of them is that advisory locks are totally singletasking, just like sequences: even concurrent transactions cannot simultaneously lock the same value. Advisory locks have two 32-bit identifiers (for object's class id and object's personal id) which may be represented as a single 64-bit value. Whenever lock is created, a corresponding record is inserted into pg_locks database:

=# SELECT pg_try_advisory_lock(1, 1);
 pg_try_advisory_lock
----------------------
 t
(1 row)

=# SELECT * FROM pg_locks WHERE locktype='advisory';
 locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid  |     mode      | granted
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+---------------+---------
 advisory |    94115 |          |      |       |            |               |       1 |     1 |        2 | 1/193              | 5031 | ExclusiveLock | t
(1 row)

This helps with querying the database against present locks. I've settled with schema of using table OID as lock's classid and row's primary key value as objid. Locks with zero classid are reserved for some global application locks which do not relate to any specific table: it is useful when you need some way to synchornize applications running on different servers where memcached is not suitable due to some specific reasons.

With all that in mind, here's how a proper funds transfer function should look like:

CREATE OR REPLACE FUNCTION transfer_money(from_user_id INTEGER,
                                            to_user_id INTEGER,
                                                amount INTEGER)
RETURNS BOOLEAN AS $$
BEGIN
  PERFORM pg_advisory_lock((SELECT oid::integer FROM pg_class WHERE relname = 'balance'), from_user_id);
  IF (SELECT money >= amount FROM balance WHERE user_id = from_user_id) THEN
    UPDATE balance SET money = (money - amount) WHERE user_id = from_user_id;
    UPDATE balance SET money = (money + amount) WHERE user_id = to_user_id;
    RETURN true;
  ELSE
    RETURN false;
  END IF;
  PERFORM pg_advisory_unlock((SELECT oid::integer FROM pg_class WHERE relname = 'balance'), from_user_id);
END
$$ LANGUAGE plpgsql;

As for queue, this case is a little bit more complicated: worker needs to select any first unlocked task instead of specific one. It is NOT a good solution to use queries like SELECT j.* FROM jobs j, pg_class pgc WHERE pgc.relname = 'jobs' AND pg_try_advisory_lock(pgc.oid::integer, j.job_id), because you can never guarantee the decisions which planner may take in order to optimize the query. In most cases such query will walk jobs' rows one-by-one and check if they are locked. This will effectively:

  • Walk across all the jobs' rows
  • Lock them all from the same connection

Thus, totally locking queue for all concurrent connections. To get safe lock checking, a join with the pg_locks table can be introduced:

   SELECT j.* FROM jobs j
LEFT JOIN pg_locks pgl
       ON pgl.classid = tableoid('jobs')
      AND pgl.objid = j.job_id
      AND pgl.pid <> pg_backend_pid()
    WHERE pgl.objid IS NULL

This join eliminates all the elements which are already locked and returns a list of jobs which were ready for being processed at the moment when the query was executed. It skips any locks issued by current connection by checking the connection's manager PID. Also, this query uses tableoid function for convenience, which can be defined as follows:

CREATE OR REPLACE FUNCTION tableoid(tablename TEXT) RETURNS INTEGER AS $$
BEGIN
  RETURN (SELECT oid FROM pg_class WHERE relname = tablename);
END
$$
LANGUAGE plpgsql IMMUTABLE SECURITY DEFINER;

This function is marked as IMMUTABLE for performance reasons and assumes that it is highly unlikely that oid of any table will change during the application activity.

Returning to job query, it should be noted that it doesn't do any locking by itself. The query is just a getter of non-locked jobs which should be further processed:

CREATE OR REPLACE FUNCTION get_job() RETURNS BIGINT AS $$
DECLARE
  busy_job_id integer;
  job record;
BEGIN
  -- If the same job was already locked, it makes sense to return it if the worker is single-jobbed.
  -- Otherwise, this check could be dropped.
  busy_job_id := objid FROM pg_locks pgl, jobs j WHERE pid = pg_backend_pid() AND classid = tableoid('jobs') LIMIT 1;
  IF busy_job_id::boolean THEN
    RETURN busy_job_id;
  END IF;

  -- Iterating over pending jobs list, ordered by id.
  -- A LIMIT could be added if it is expected that there is a lot of pending jobs compared to number of workers.
  FOR job IN SELECT DISTINCT j.* FROM jobs j
LEFT JOIN pg_locks pgl
       ON pgl.classid = tableoid('jobs')
      AND pgl.objid = j.job_id
      AND pgl.pid <> pg_backend_pid()
    WHERE pgl.objid IS NULL
 ORDER BY j.job_id LOOP
    -- Trying to lock the job:
    IF pg_try_advisory_lock(tableoid('jobs'), job.job_id) THEN
      -- Need to recheck the unlocked state: race conditions are still possible.
      -- For example, job could be requested, locked, processed, finished and unlocked
      -- while this function iterates over "then-unlocked" jobs list.
      IF (SELECT NOT finished FROM jobs WHERE job_id = job.job_id) THEN
        RETURN job.job_id;
      END IF;
      PERFORM pg_advisory_unlock(tableoid('jobs'), job.job_id);
    END IF;
  END LOOP;
  RETURN NULL;
END
$$
LANGUAGE plpgsql SECURITY DEFINER;

After prosessing the job, worker should unlock it with PERFORM pg_advisory_unlock(tableoid('jobs'), job_id) if worker is expected to process more than one job during the single execution lifetime. PostgreSQL automatically unlocks all the advisory locks obtained by connection when it is terminated, so crashed apps won't leave zombie locks.

Described solution provides reliable way to overcome concurrency issues of the database. For more detail, consult PostgreSQL manual on advisory locks

Posted in  | Tags

Recursive lambda in Ruby

Posted by: Arthur Tue, 16 Feb 2010 05:17:35 GMT

If you've ever thought about writing a recursive lambda, here's how to achieve this:

# This example calculates n-th number in Fibonacci sequence
fib = lambda { |n|
  n < 2 ? n : fib.call(n-1) + fib.call(n-2)
}

This also could be used for a tree flattening given an array of child nodes:

child_nodes = [...] # array of any objects which correctly respond to method "parent"
flattened_tree = []

(flatten_tree = lambda { |children|
  flattened_tree |= children
  parents = children.map(&:parent).compact
  flattened_tree.call(parents) unless parents.blank?
}).call(child_nodes)

Although, it's practically useless solution for production code, as it by itself shows that the problem is solved with a more complex and inefficient approach than it could be.

Speed comparison of recursive method call and recursive lambda call:

                  user     system      total        real
method call   0.170000   0.060000   0.230000 (  0.224818)
lambda call   0.390000   0.160000   0.550000 (  0.559160)

The code used for benchmark:

require 'benchmark'

def fib(n)
  n < 2 ? n : fib(n-1) + fib(n-2)
end

fibl = lambda { |n|
  n < 2 ? n : fibl.call(n-1) + fibl.call(n-2)
}

Benchmark.bmbm do |b|
  b.report("method call") {10.times {fib(20)}}
  b.report("lambda call") {10.times {fibl.call(20)}}
end

Posted in  | Tags , , , ,  | 
no comments