Sunday, September 11, 2005

Today I was fortunate enough to participate FOSSSL (http://www.fosssl.org/) a free & open source software conference. I joined a few others for two tutorials, one in mysql (Introducing MySQL 5.0) and the other in postgres (PostgreSQL Internals Through Pictures) . Both the presenters were renonwned personalities in software world (let alone open source) and they were,
1. MySQL - David Axmark, Co Founder of MySQL
2. Postgres - Bruce Momjian, Lead contributer to PostgreSQL

The postgres session was my favourite out of the two and it prompted me to put up this entry so I could prompt a few more people to read a bit more on postgres internals and in turn DBMS fundemantals.

Bruce with his catchy presentation style and very humble explanations was soon very much appreciated by all of us and explained how postgres internally handles SQL queries (http://candle.pha.pa.us/main/writings/pgsql/internalpics.pdf).
This was actually much more generic and I suppose one could get a general understanding of how a DBMS handles SQL queries fires against it.

The basic steps of any decent DBMS of picking up a SQL query and executing is,
1. Parsing and Tokenizing (Postgres uses flex & bison..surprise J ) the query
2. Forming a easy to work with data structure from the tokens
3. Optimizing (This was the bulk of the discussion and was very interesting)
4. Execution

Out of the many small discussion threads we had I’d like to mention one here which was about preparing sql statements. This basically means that you inform to the DB engine about the plan of execution (which is normally derived after doing an optimization step) with the query it self. So the whole overhead of optimizing is vanished since the plan of the query is saved in the DB for you. This can look like a very clever thing to do but could prove to be your downfall pretty easily as shown by Bruce in the session.

The basic idea is that if your query is gonna execute 100 and 1000 times it’s wise to prepare it. Also it should be a pretty complex query where the actual time taken for an optimization (in case of no prepared plan) is comparable with the execution time. Even here if your same query which runs 1000 times a day runs with different sets of parameter values then you could get screwed again.
Eg : SELECT name from T1 where age=?
Assuming that the age column actually contains the age of human employees of your company ;-), there could be lots of values between say…25-40 (Assume 95%). So if you prepared the plan with a value in this range your optimizer will most probably go for a sequential scan. Now suppose your CEO comes in and issues the same SQL with the age 100 to find out who he should fire next. And since the prepared plan is tuned for the 95% of values this instance of the query will perform badly and the CEO decides to fire the poor old DB folkJ.

(A good thread I came across –
http://www.mail-archive.com/pgsql-general@postgresql.org/msg65562.html )

All in old what I understood is that the most difficult part of this query processing is actually the optimizing part. Bruce is actually pretty confident that postgres has the best optimizing engine even compared to commercial products (SECRETS: mySQL optimizer is a single C file). He also pointed out one reason why this is so. In Oracle you have “Oracle Hints” which actually provides users guidance on tuning the DB to a specific kind of query. This is like loosing a very good opportunity for the DB vendor to improve his DB engine. On the other hand Postgres does not have this capability and thus the user is more or less pushed towards informing postgres on a low performing query. This give the postgres community to find yet another weapon in the never ending optimization war. This actually sounds very sensible given the turn over time of patched up version of open source postgres (Oracle can’t do that). Also the fact that a user is not actually desperate for a answer within minutes of finding out that his query is slow. She could actually wait for a couple of days for an answer while her system continues to operate correctly although slowlyJ.

(Check out Bruce Momjian web side at http://candle.pha.pa.us/ for more cool stuff. His writings section has a lot more resources on postgres)
Post a Comment