Monday, October 20, 2014

Before you code

A new project starts and everyone is excited. The development team is so keen to dig right in and start coding. But wait, there are few things that needs to be done at the start, to stop you from some head banging later in to the project.


Here are few that I’ve come across. Love to hear your suggestions.


  1. Servers and Developer machines
    1. Establish templates for your servers - You should be able to churn out a server with relevant software/hardware configuration in minutes during the life of your project.
    2. Make sure your developer machines have enough firepower - This may be an ideal time to get some funding for those upgrades that developers never got
    3. Development Server - In some instances, teams decide to have a common development server as an integration testing area. They may even end up working on a common database at the initial phases of the project. Although this could seem productive at the start, sooner you get off of this approach the better. This set up delays automation activities, gives false sense of velocity and could easily result in artificial delays in areas like migration.


  1. Environments
    1. A typical project can have separate environments for development/testing and productions. Each environment may consist of a combination of applications, configurations, databases etc...
    2. Team need to decide how many environments each of development and testing. Also how they wish to manage differences between each. Commissioning a new environment should again take only minutes.
    3. Another concern is how you connect your upstream and downstream applications to each of this environments. In most of the enterprise projects, data flows have to established both up and downstream from external systems in to your dev/test environments.


  1. Version Control
      1. Distributed or central. Distributed version control is more common place now and you got great and varied options. You can even rely on third party providers like github instead of setting up your own. Or may be you want to head down for a tool like TFS in light of its overall Application life cycle support.
      2. Establish your version control workflow
        1. Do you go for named branches or separate repositories?
        2. Is rebasing allowed?
        3. Tagging
        4. Logging the commits
      3. How do you version control your database work? Traditionally db developers are not very excited of using version control software. As a team you need to make sure that this is not the case and that database is treated as a first class citizen in the version control world.


  1. Knowledge Management
    1. The team should decide on a mechanism to share and build knowledge about the product/project.
    2. A preferred method is to have a project wiki. You can start dumping your initial thoughts and then easily keep updating it.
    3. If the project involves a lot of documents, a document management system might also be considered.
    4. The Wiki can be used to drive consistency across the team, be it standardization of terminology or standardization of technical entities.
      1. Common project specific terms, abbreviations and their meanings
      2. Technical standards and compliance details (data types, naming standards)
      3. Reference implementations and code samples
    5. It can also host project management information such as contact details, communication escalation paths, holiday plans and high level project plans and schedule information


  1. Project Management methodology and delivery expectations
    1. Team should decide what their delivery model is. For example, Is it going to be 2 week iterations or a continuous deployment of features in to an integration area as and when things roll out of developer hands.
    2. What sort of tool or method may be used to track the progress of the team.
    3. Seemingly simple things like consensus on  when a task will be marked 100% can go a long way in understanding the status of the project at any given moment.

I guess you do understand that none of the above should stop you from actually getting things done in the start. It’s not imperative that you have all of it before you open up your IDE. However in my experience the team should make it a point to get most of these concerns out of the way within the first couple of iterations.

Tuesday, August 19, 2014

Mercurial Queues to manage 'Spikes'



We all do some kind of R & D on code. Sometimes it’s purely for learning purposes whilst sometimes it’s to try something new on an existing code base. I call the second exercise a ‘Spike’.
A spike is quite a fluid activity. Depending on the complexity of the piece, it can take weeks if not months to bring a spike close to a state of fully fledged feature. The goal of this post is to identify a neat way to incorporate version control to keep a proper history of the work done while you are spiking.
Unlike when you are fixing a bug or developing a well understood feature, a spike can sometimes be a walk in the dark - until you see the light. You will of course set your self incrementing targets but some of these targets (or ideas) might turn out to inefficient, not robust or downright wrong. So once you decide that what you have done for the past 2-3 days is wrong, how would you start over? Would you revert all the changes? Or do you painstakingly try to identify changes that are still useful and get rid of the rest? This may sound trivial, but a healthy spike can touch various part of your solution including many source files. Believe me it’s not a  very fulfilling activity.
But with version control you wouldn’t get this problem, right? You’d of course commit what ever atomic changes you do as part of the small target and then it’s a matter of rolling back the selected changesets. Although this sounds better than the previous approach, this still has problems. Changesets in version control is immutable. It will be part of the history. Whilst this is desirable in most instances, when you are spiking this might not be so. Because spikes can have lot of intermediate steps which are either not complete, or misdirected or even wrong. Even if these changesets may be of value to you, once you push this upstream it might be confusing or irrelevant to other users of the repository. These changesets can end up polluting your version control history unnecessarily.
[Note: You don’t necessarily have to be familiar with mercurial to follow the rest of the article, any experience with a modern version control system would suffice]
With mercurial queues (MQ), you can get the best of both worlds. MQ is an extension to the popular distributed version control system Mercurial (Hg). It extends Hg functionality to manage a collection of patches together. MQ is said to be initially inspired by a tool called ‘Quilt’ used by open source community in the pre-git days to manage patches. Although the use of a similar tool in an open source project is quite useful, the focus of discussion here is on managing spike branches. MQ can be enabled by putting the following line in .hgrc file in any of your repositories.

[extensions]
hgext.mq =

MQ helps you build a stack of patches. Each patch on its own is like a bucket continuously accepting changes to it. Compared to a changeset - which crystallizes as soon as you create it- a patch in the context of MQ can be refreshed again and again without permanently saving or committing it to the repository. The fact that you can record your temp check points in to a patch is the most useful feature for me. Once you get some confidence then you save the patch and create a new one.

MQ also posses commands to alter the state of your patch stack. You can pop or push patches or even change the order of patches in the stack. This can greatly help to clean up the repository. In addition MQ enables you to roll up several patches into one. This is a great pre-cursor to converting patches into a regular Hg changesets.

Let’s go through a typical workflow.

1. I need to refactor a large part of my code base involving presentation, business logic and data access layers. This will touch at least 4-5 projects.
2. I get a clone of a repository to my work directory
Basic Usage of MQ

1. Start a queue
>hg qnew -m “Try a sample change” -u “user” patch1a.patch
This will create an empty patch

2. Do necessary changes in the project(s). (Assume new files are added in the process)
>hg add
Adds new files to the tip
>hg qrefresh
qrefresh will update the current patch with the latest changes on the tip

3. Now keep doing changes and keep doing hg qrefresh as often as possible.

4. Remember to start a new patch as soon as you think your immediate goal is achieved. This is quite important as the granularity of your patches will decide how flexible your patch stack is for tinkering later in the process.

5. Once you are ready, start a new patch (essentially closing the existing patch)
>hg qnew -m “Change few more entities” -u “user” patch1b.patch
This will start a new patch patch1b.patch. Previous patch is now in the stack.

6. Check the patch stack
>hg qseries
patch1a.patch
patch1b.patch
>hg qpplied
patch1a.patch
>hg qunapplied
patch1b.patch
7. Go back to step #2


Patch Stack Management - Change Order MQ also has some simple stack manipulation commands that can be used to pop or push patches in and out of the stack.
1. Assume the following state in your patch queue
>hg qapplied
patch1a.patch
patch1b.patch
patch1c.patch

2. Now imagine that you realise there’s a bug in the core of your code base. Ideally, this should have been done before any of these patches. This is  how you would go about it;
>hg qpop -a
This pops all patches out of the quese. (They are still in-tact, don’t worry)

3. >hg qnew core.patch (Other command params are omitted for brevity)

4. Now do the code changes you want to make in core code.

5. >hg qrefresh
Updates the patch

6. >hg qpush core.patch

7. >hg qpush -a
Push all the rest of the patches on top of core.patch

8. >hg qapplied
core.patch
patch1a.patch
patch1b.patch
patch1c.patch


Patch Stack Management - Combine Patches
Assume that you have spent time on the spike up to a level that you are quite confident that the existing patches will end up in a feature branch. May be some patched on its own don’t deserve to be an hg changeset.

1. >hg qapplied
core.patch
patch1a.patch
patch1b.patch
patch1c.patch
2. >hg qpop -a
3. >hg qpush
This pushe core.patch back in to the queue
4. >hg qpplied
core.patch
5. Now we need to combine patch1a, patch1b and patch1c in to a single patch - patch1
6. >hg qnew patch1.patch
7. >hg qapplied
core.patch
patch1.patch
8. >hg qfold patch1a.patch patch1b.patch patch1c.patch
9. >hg qapplied
core.patch
patch1.patch


Converting Patches to Changesets
Once you are comfortable with your changes - i.e patch stack, you can easily convert the patches to changesets to be pushed to other team members or to a separate branch.

1. >hg qfinish
Moves all the applied patches in to the repository history by converting them to changesets. This will release the patches from mq history
2.>hg qseries
Shows nothing
3. Alternatve to applying all of your patches, you can apply just the patch(es) you want (starting from the bottom of the stack up to the provided revision number)
>hg qfinish revision_number

I find Mercurial Queues to be extremely useful in the context of ongoing substantial changes to an existing code base. It provides me all the goodness of a version control system without having to sacrifice the consistency of my repository with unnecessary finger prints.

Monday, May 26, 2014

Who said history is boring?

Wars in real world is not 'funny' business whether you are in the middle of one or watching from the sidelines. However in computer science, we tend to make the most of the wars and flames and turn them in to these extremely humorous and self-deprecating affairs. We've seen many including 'Browser Wars', 'Mobile Platform Wars', 'Operating System Wars', but none can topple the ongoing war between programming languages

I came across a gem of a blog recently on exactly the same and thought I'd shamelessly copy/paste it to easily share the pieces that made me cry with laughter the most. Soon I realized that the pieces that made me laugh most are the ones about languages that I have dealt with in the past in one way or the other. So technically this piece in it's whole should be able to make any programmer laugh. A must read indeed. 

The original link named 'A Brief, Incomplete, and Mostly Wrong History of Programming Languages' is here

Here are the ones that was more personal for me in chronological order. I'd love to hear from my friends on which ones made them rofl ;)

1972 - Dennis Ritchie invents a powerful gun that shoots both forward and backward simultaneously. Not satisfied with the number of deaths and permanent maimings from that invention he invents C and Unix.

1983 - Bjarne Stroustrup bolts everything he's ever heard of onto C to create C++. The resulting language is so complex that programs must be sent to the future to be compiled by the Skynet artificial intelligence. Build times suffer. Skynet's motives for performing the service remain unclear but spokespeople from the future say "there is nothing to be concerned about, baby," in an Austrian accented monotones. There is some speculation that Skynet is nothing more than a pretentious buffer overrun.

1987 - Larry Wall falls asleep and hits Larry Wall's forehead on the keyboard. Upon waking Larry Wall decides that the string of characters on Larry Wall's monitor isn't random but an example program in a programming language that God wants His prophet, Larry Wall, to design. Perl is born.

1995 - At a neighborhood Italian restaurant Rasmus Lerdorf realizes that his plate of spaghetti is an excellent model for understanding the World Wide Web and that web applications should mimic their medium. On the back of his napkin he designs Programmable Hyperlinked Pasta (PHP). PHP documentation remains on that napkin to this day.

1995 - Brendan Eich reads up on every mistake ever made in designing a programming language, invents a few more, and creates LiveScript. Later, in an effort to cash in on the popularity of Java the language is renamed JavaScript. Later still, in an effort to cash in on the popularity of skin diseases the language is renamed ECMAScript.

1996 - James Gosling invents Java. Java is a relatively verbose, garbage collected, class based, statically typed, single dispatch, object oriented language with single implementation inheritance and multiple interface inheritance. Sun loudly heralds Java's novelty.

2001 - Anders Hejlsberg invents C#. C# is a relatively verbose, garbage collected, class based, statically typed, single dispatch, object oriented language with single implementation inheritance and multiple interface inheritance. Microsoft loudly heralds C#'s novelty.

2003 - A drunken Martin Odersky sees a Reese's Peanut Butter Cup ad featuring somebody's peanut butter getting on somebody else's chocolate and has an idea. He creates Scala, a language that unifies constructs from both object oriented and functional languages. This pisses off both groups and each promptly declares jihad.

Monday, May 19, 2014

ODP dot Net - Usage and Samples - Part II

In Part 1 of this blog post series we discussed how we to handle Arrays of simple types with ODP.Net. In this post we''ll discuss how to extend this to complex types by using Oracles' User Defined Types (UDT).

UDTs allows a PL/SQL developer to expose complex db types to outside world above and beyond the PL/SQL layer. The reason for this is because ODP.Net does not support PL/SQL tables as of yet. So to represent a list of a complex object UDTs has to be used. 

There are few additional requirements from the .Net developer to get this working;

  • Mapping classes have to be written for the UDT type and the Array of UDT type
  • Factory classes have to be written (for creation of) for both the above types.
  • The parameter needs to be created with additional property 'UdtTypeName' set. 


The mapping classes and the factory classes used 2 base classes called TypeTemplate, TypeFactoryTemplate and TableTemplate, TableFactoryTemplate classes respectively. These 2 classes abstract away the actual mapping business between the 2 paradigms from the rest of your data access/domain code base. (http://developergeeks.com/article/3/user-defined-type-support-in-oracle-odp-net-11g) 

In addition couple of custom attribute classes are also shown. These can be used to enforce/migrate basic rules like Nullabilitiy of a property to the .Net type. 

These set of classes can be put in a common class library like xxx.Common.Core to be used by multiple projects. 


In the part III of this blog post we will look at a cool new feature of Oracle 11g - Loosely typed cursors and how we can use it with Odp.Net. 

Thursday, May 08, 2014

Odp dot Net - Usage and Samples - Part I

I've been working in Oracle - .Net set up for a while and the development experience is totally different to a SQL Server - .Net integration. The learning curve is high, tool set is not good and there are lot of caveats. Hell, the installation of the driver it self can be a nightmare. We have started with Microsoft oracle driver that is obsolete now, then moved to Odp.net (Odp.Net is an implementation of ADO.Net data provider for the Oracle Database by Oracle) which is the focus of this blog post. In our application landscape, Oracle constructs are primarily PL/SQL apis (stored procs).

One thing that has improved over time is the driver, i.e Odp.net. For one, there's some documentation. But looking back one thing that was missing was a set of code samples backed up with explanations to get you started. The official documentation is typical Oracle documentation (i.e lots of content but either they are outdated or hard to understand for a novice, mostly because they use quite out of date .Net idioms), however there's a sample project that's not too bad.

Hopefully this blog will cover some of the most common usage patterns of Odp.Net and get some one rolling faster. (I'm not covering the absolute start of working with scalar/primitive types as it can be found quite easily via Google/SO)

Part 1 - Working with lists (Arrays & User Defined Objects)
Part 2 - Cursors (Both strongly typed & loosely typed)
Part 3 - Xml 

Each data type will be discussed in the form of an Oracle Parameter. This is because, essentially how you create and pass the parameter to Oracle dictates whether you get the integration working or not. It's a lean way to explain the crux of the problem. In general whether the parameter is input or output doesn't matter unless it's explicitly mentioned or is plain obvious (As in the case of setting the value only in input parameters but not in output)

  • Scalar Arrays (Numbers and Varchars) 


At the Oracle End the parameter has to be declared as an Associative Array of Number and Varchar respectively as follows;

TYPE t_id_tab IS TABLE OF NUMBER(20) INDEX BY PLS_INTEGER;
 t_id_tab should be the type of the parameter

TYPE t_string_tab IS TABLE OF VARCHAR2(32767) INDEX BY PLS_INTEGER;
 t_string_tab should be the type of the parameter


  • Array of complex type / Composite Arrays

This is not possible to do using above techniques as of now. 
For an example, a list of PL/SQL type is not consumable by a .Net application using ODP.Net.  The solution is to use User Defined Objects or UDTs.  UDTs are basically objects created with a global scope in Oracle instance. The PL/SQLs developers were not particularly fans of this approach. I will discuss this approach in detail in the next blog.

Thursday, April 03, 2014

Two tier cache strategy


Application level caching is still used when you want to achieve high availability and response time. There are various techniques of achieving this and below is the technique used by an application that I work with currently.
(Click to view large image)
It employs a 2 level cache where the first level (Persistent Cache - Not to be confused with cache which resides in a Database, both of these caches are in memory) is a long term cache (with 7 day expiry) serving end user requests while the second level (Transient Cache) is a short term cache (with 30 seconds expiry) which feeds the long term cache asynchronously.

Any object list/db result which needs to be cached will be cached in both the persistent and transient caches. The fact that the end user is fully isolated not only from the DB query, but also from the cache refresh mechanisms works beautifully to make the application very responsive irrespective of the inherit inefficiencies of an integrated Enterprise Environment.

The probability of stale data is quite low due to the high refresh rate of the transient cache as well as the nature of the application usage, which mostly comprises of reading vast lists of data (cached) and then operating on one of them (not cached)

The choice of cache manager is Microsoft Application Block which may be questioned. However given the legacy nature of the application and the performance of the cache over the past few years, it's an interesting question whether it should be changed. In any case I'm very interested to figure out alternatives given our application stack being ASP.Net MVC / Oracle.

Tuesday, December 31, 2013

New Year Resolutions

Another year has completed and it’s time to look back and appreciate what you have done. Chances are, our attention will be mostly spent on things that we manage to slip up. Especially the dreaded new year resolutions that you made at the start of 2013 (i.e. If you still remember those of course)

New year resolution can be defined as an agreement that you make with your current self to enforce a positive effect on your future self. In essence it's a contract between current you and future you. We all have done this in the past and will continue to do so - lose weight, quit smoking, give up pornography, start exercising, learn swimming, work towards a promotion and the list goes on. But time and time again we fail horribly. Can we rethink our strategy behind new year resolutions in order to improve the success rate? Or are new year resolutions just a big fat farce?

In order to analyse this problem better, let’s put our attention to the world of Economics where a similar idea can be found. It’s known as a ‘Commitment Device’. In the context of the new year resolutions, let’s shrink the scope of the commitment device to a personal level. In that sense a commitment device is an effective way to force yourself into doing something that you want to do but aren't able to get yourself to do it or in most cases not persistent enough at it. As with everything to do with Economics the critical factor here is the notion of ‘incentive’ or in this context ‘penalty’. A commitment device should always be associated with a penalty.

There are several ways to establish a commitment device.
  • Establishing a (financial) penalty upon (future) you in case you falter. The penalty should be sizeable enough that it motivates you to stay on course
  • Make the commitment public thus putting your reputation on the line. With the advent of social media this is quite easy to implement now.
  • Create a large obstacle to temptations to increase the cost of succumbing to one. So the penalty here could be time, effort or money etc...

Either one or a collection of them could work in a given situation. (There are also online solutions helping you to do this - http://www.stickk.com/)

However there are several factors which influence which method has a higher probability of success.

One such factor is the nature of the action itself - i.e. whether it’s a preventing from an act or persisting with an act. Quitting or ramping down on something - be it smoking, drinking or excessive junk food is a preventive act. The obstacle oriented method or penalty driven method could work well here. If the commitment is about starting a new action - exercising - putting your reputation on the line could be more effective.

In addition, the personality and the social set up of a particular person could also make an impact. For an example a person who sits slightly high in the social pecking order (be it family or work) should opt for the reputation method where the cost of failure could be high.

So can we expect the success of the new year resolutions to improve by making sure that they comply with this framework? May be...or maybe not. My personal opinion is that it will only improve the results of a certain class of personalities albeit a low percentage.

So the more interesting question is why doesn't it work? The analysis seems logical and rational enough. Although most classical schools of economic thinking assumes humans to behave rationally, we are awfully bad at it.

The trouble is that by nature people deep down really don’t want to commit. No matter how clever the current self ties its future self to a commitment device, the future self will find a loophole (excuse) to breach it. Sometimes the problem with the commitment device is the non-linear pressure associated with it. As an example the benefit of losing a few kilos (and possibly gaining it back) might not outweigh the stress someone goes through during the committed period. This is especially true in the reputation method. At the same time we very well know that nothing will get done without a commitment.

“So what’s the sweet spot”? That’s up to each individual to find out. At least by giving it a shot you could find out what doesn't work. So that when 2015 arrives, you hopefully get better at getting something done from your future self.

Happy New Year Everyone!

The post was inspired by this freaknomics pod cast.