Tuesday, August 19, 2014

Mercurial Queues to manage 'Spikes'



We all do some kind of R & D on code. Sometimes it’s purely for learning purposes whilst sometimes it’s to try something new on an existing code base. I call the second exercise a ‘Spike’.
A spike is quite a fluid activity. Depending on the complexity of the piece, it can take weeks if not months to bring a spike close to a state of fully fledged feature. The goal of this post is to identify a neat way to incorporate version control to keep a proper history of the work done while you are spiking.
Unlike when you are fixing a bug or developing a well understood feature, a spike can sometimes be a walk in the dark - until you see the light. You will of course set your self incrementing targets but some of these targets (or ideas) might turn out to inefficient, not robust or downright wrong. So once you decide that what you have done for the past 2-3 days is wrong, how would you start over? Would you revert all the changes? Or do you painstakingly try to identify changes that are still useful and get rid of the rest? This may sound trivial, but a healthy spike can touch various part of your solution including many source files. Believe me it’s not a  very fulfilling activity.
But with version control you wouldn’t get this problem, right? You’d of course commit what ever atomic changes you do as part of the small target and then it’s a matter of rolling back the selected changesets. Although this sounds better than the previous approach, this still has problems. Changesets in version control is immutable. It will be part of the history. Whilst this is desirable in most instances, when you are spiking this might not be so. Because spikes can have lot of intermediate steps which are either not complete, or misdirected or even wrong. Even if these changesets may be of value to you, once you push this upstream it might be confusing or irrelevant to other users of the repository. These changesets can end up polluting your version control history unnecessarily.
[Note: You don’t necessarily have to be familiar with mercurial to follow the rest of the article, any experience with a modern version control system would suffice]
With mercurial queues (MQ), you can get the best of both worlds. MQ is an extension which helps manage a collection of patches together. MQ is said to be initially inspired by a tool called ‘Quilt’ used by open source community in the pre-git days to manage patches. Although the use of a similar tool in an open source project is quite useful, the focus of discussion here is on managing spike branches. MQ can be enabled by putting the following line in .hgrc file in any of your repositories.


[extensions]
hgext.mq =


MQ helps you build a stack of patches. Each patch on its own is like a bucket coniously accepting changes to it. Compared to a changeset - which crystalises as soon as you create it- a patch in the context of MQ can be refreshed again and again without permanently saving or committing it to the repository. The fact that you can record your temp check points in to a patch is the most useful feature for me. Once you get some confidence then you save the patch and create a new one.


MQ also posses commands to alter the state of your patch stack. You can pop or push patches or even change the order of patches in the stack. This can greatly help to clean up the repository. In addition MQ enables you to roll up several patches into one. This is a great pre-cursor to converting patches into a regular Hg changesets.


Let’s go through a typical workflow.


1. I need to refactor a large part of my code base involving presentation, business logic and data access layers. This will touch at least 4-5 projects.
2. I get a clone of a repository to my work directory
Basic Usage of MQ

1. Start a queue
>hg qnew -m “Try a sample change” -u “user” patch1a.patch
This will create an empty patch

2. Do necessary changes in the project(s). (Assume new files are added in the process)
>hg add
Adds new files to the tip
>hg qrefresh
qrefresh will update the current patch with the latest changes on the tip

3. Now keep doing changes and keep doing hg qrefresh as often as possible.

4. Remember to start a new patch as soon as you think your immediate goal is achieved. This is quite important as the granularity of your patches will decide how flexible your patch stack is for tinkering later in the process.

5. Once you are ready, start a new patch (essentially closing the existing patch)
>hg qnew -m “Change few more entities” -u “user” patch1b.patch
This will start a new patch patch1b.patch. Previous patch is now in the stack.

6. Check the patch stack
>hg qseries
patch1a.patch
patch1b.patch
>hg qpplied
patch1a.patch
>hg qunapplied
patch1b.patch
7. Go back to step #2


Patch Stack Management - Change Order MQ also has some simple stack manipulation commands that can be used to pop or push patches in and out of the stack.
1. Assume the following state in your patch queue
>hg qapplied
patch1a.patch
patch1b.patch
patch1c.patch

2. Now imagine that you realise there’s a bug in the core of your code base. Ideally, this should have been done before any of these patches. This is  how you would go about it;
>hg qpop -a
This pops all patches out of the quese. (They are still in-tact, don’t worry)

3. >hg qnew core.patch (Other command params are omitted for brevity)

4. Now do the code changes you want to make in core code.

5. >hg qrefresh
Updates the patch

6. >hg qpush core.patch

7. >hg qpush -a
Push all the rest of the patches on top of core.patch

8. >hg qapplied
core.patch
patch1a.patch
patch1b.patch
patch1c.patch


Patch Stack Management - Combine Patches
Assume that you have spent time on the spike up to a level that you are quite confident that the existing patches will end up in a feature branch. May be some patched on its own don’t deserve to be an hg changeset.


1. >hg qapplied
core.patch
patch1a.patch
patch1b.patch
patch1c.patch
2. >hg qpop -a
3. >hg qpush
This pushe core.patch back in to the queue
4. >hg qpplied
core.patch
5. Now we need to combine patch1a, patch1b and patch1c in to a single patch - patch1
6. >hg qnew patch1.patch
7. >hg qapplied
core.patch
patch1.patch
8. >hg qfold patch1a.patch patch1b.patch patch1c.patch
9. >hg qapplied
core.patch
patch1.patch


Converting Patches to Changesets
Once you are comfortable with your changes - i.e patch stack, you can easily convert the patches to changesets to be pushed to other team members or to a separate branch.


1. >hg qfinish
Moves all the applied patches in to the repository history by converting them to changesets. This will release the patches from mq history
2.>hg qseries
Shows nothing
3. Alternatve to applying all of your patches, you can apply just the patch(es) you want (starting from the bottom of the stack up to the provided revision number)
>hg qfinish revision_number


I find Mercurial Queues to be extremely useful in the context of ongoing substantial changes to an existing code base. It provides me all the goodness of a version control system without having to sacrifice the consistency of my repository with unnecessary finger prints.

Monday, May 26, 2014

Who said history is boring?

Wars in real world is not 'funny' business whether you are in the middle of one or watching from the sidelines. However in computer science, we tend to make the most of the wars and flames and turn them in to these extremely humorous and self-deprecating affairs. We've seen many including 'Browser Wars', 'Mobile Platform Wars', 'Operating System Wars', but none can topple the ongoing war between programming languages

I came across a gem of a blog recently on exactly the same and thought I'd shamelessly copy/paste it to easily share the pieces that made me cry with laughter the most. Soon I realized that the pieces that made me laugh most are the ones about languages that I have dealt with in the past in one way or the other. So technically this piece in it's whole should be able to make any programmer laugh. A must read indeed. 

The original link named 'A Brief, Incomplete, and Mostly Wrong History of Programming Languages' is here

Here are the ones that was more personal for me in chronological order. I'd love to hear from my friends on which ones made them rofl ;)

1972 - Dennis Ritchie invents a powerful gun that shoots both forward and backward simultaneously. Not satisfied with the number of deaths and permanent maimings from that invention he invents C and Unix.

1983 - Bjarne Stroustrup bolts everything he's ever heard of onto C to create C++. The resulting language is so complex that programs must be sent to the future to be compiled by the Skynet artificial intelligence. Build times suffer. Skynet's motives for performing the service remain unclear but spokespeople from the future say "there is nothing to be concerned about, baby," in an Austrian accented monotones. There is some speculation that Skynet is nothing more than a pretentious buffer overrun.

1987 - Larry Wall falls asleep and hits Larry Wall's forehead on the keyboard. Upon waking Larry Wall decides that the string of characters on Larry Wall's monitor isn't random but an example program in a programming language that God wants His prophet, Larry Wall, to design. Perl is born.

1995 - At a neighborhood Italian restaurant Rasmus Lerdorf realizes that his plate of spaghetti is an excellent model for understanding the World Wide Web and that web applications should mimic their medium. On the back of his napkin he designs Programmable Hyperlinked Pasta (PHP). PHP documentation remains on that napkin to this day.

1995 - Brendan Eich reads up on every mistake ever made in designing a programming language, invents a few more, and creates LiveScript. Later, in an effort to cash in on the popularity of Java the language is renamed JavaScript. Later still, in an effort to cash in on the popularity of skin diseases the language is renamed ECMAScript.

1996 - James Gosling invents Java. Java is a relatively verbose, garbage collected, class based, statically typed, single dispatch, object oriented language with single implementation inheritance and multiple interface inheritance. Sun loudly heralds Java's novelty.

2001 - Anders Hejlsberg invents C#. C# is a relatively verbose, garbage collected, class based, statically typed, single dispatch, object oriented language with single implementation inheritance and multiple interface inheritance. Microsoft loudly heralds C#'s novelty.

2003 - A drunken Martin Odersky sees a Reese's Peanut Butter Cup ad featuring somebody's peanut butter getting on somebody else's chocolate and has an idea. He creates Scala, a language that unifies constructs from both object oriented and functional languages. This pisses off both groups and each promptly declares jihad.

Monday, May 19, 2014

ODP dot Net - Usage and Samples - Part II

In Part 1 of this blog post series we discussed how we to handle Arrays of simple types with ODP.Net. In this post we''ll discuss how to extend this to complex types by using Oracles' User Defined Types (UDT).

UDTs allows a PL/SQL developer to expose complex db types to outside world above and beyond the PL/SQL layer. The reason for this is because ODP.Net does not support PL/SQL tables as of yet. So to represent a list of a complex object UDTs has to be used. 

There are few additional requirements from the .Net developer to get this working;

  • Mapping classes have to be written for the UDT type and the Array of UDT type
  • Factory classes have to be written (for creation of) for both the above types.
  • The parameter needs to be created with additional property 'UdtTypeName' set. 


The mapping classes and the factory classes used 2 base classes called TypeTemplate, TypeFactoryTemplate and TableTemplate, TableFactoryTemplate classes respectively. These 2 classes abstract away the actual mapping business between the 2 paradigms from the rest of your data access/domain code base. (http://developergeeks.com/article/3/user-defined-type-support-in-oracle-odp-net-11g) 

In addition couple of custom attribute classes are also shown. These can be used to enforce/migrate basic rules like Nullabilitiy of a property to the .Net type. 

These set of classes can be put in a common class library like xxx.Common.Core to be used by multiple projects. 


In the part III of this blog post we will look at a cool new feature of Oracle 11g - Loosely typed cursors and how we can use it with Odp.Net. 

Thursday, May 08, 2014

Odp dot Net - Usage and Samples - Part I

I've been working in Oracle - .Net set up for a while and the development experience is totally different to a SQL Server - .Net integration. The learning curve is high, tool set is not good and there are lot of caveats. Hell, the installation of the driver it self can be a nightmare. We have started with Microsoft oracle driver that is obsolete now, then moved to Odp.net (Odp.Net is an implementation of ADO.Net data provider for the Oracle Database by Oracle) which is the focus of this blog post. In our application landscape, Oracle constructs are primarily PL/SQL apis (stored procs).

One thing that has improved over time is the driver, i.e Odp.net. For one, there's some documentation. But looking back one thing that was missing was a set of code samples backed up with explanations to get you started. The official documentation is typical Oracle documentation (i.e lots of content but either they are outdated or hard to understand for a novice, mostly because they use quite out of date .Net idioms), however there's a sample project that's not too bad.

Hopefully this blog will cover some of the most common usage patterns of Odp.Net and get some one rolling faster. (I'm not covering the absolute start of working with scalar/primitive types as it can be found quite easily via Google/SO)

Part 1 - Working with lists (Arrays & User Defined Objects)
Part 2 - Cursors (Both strongly typed & loosely typed)
Part 3 - Xml 

Each data type will be discussed in the form of an Oracle Parameter. This is because, essentially how you create and pass the parameter to Oracle dictates whether you get the integration working or not. It's a lean way to explain the crux of the problem. In general whether the parameter is input or output doesn't matter unless it's explicitly mentioned or is plain obvious (As in the case of setting the value only in input parameters but not in output)

  • Scalar Arrays (Numbers and Varchars) 


At the Oracle End the parameter has to be declared as an Associative Array of Number and Varchar respectively as follows;

TYPE t_id_tab IS TABLE OF NUMBER(20) INDEX BY PLS_INTEGER;
 t_id_tab should be the type of the parameter

TYPE t_string_tab IS TABLE OF VARCHAR2(32767) INDEX BY PLS_INTEGER;
 t_string_tab should be the type of the parameter


  • Array of complex type / Composite Arrays

This is not possible to do using above techniques as of now. 
For an example, a list of PL/SQL type is not consumable by a .Net application using ODP.Net.  The solution is to use User Defined Objects or UDTs.  UDTs are basically objects created with a global scope in Oracle instance. The PL/SQLs developers were not particularly fans of this approach. I will discuss this approach in detail in the next blog.

Thursday, April 03, 2014

Two tier cache strategy


Application level caching is still used when you want to achieve high availability and response time. There are various techniques of achieving this and below is the technique used by an application that I work with currently.
(Click to view large image)
It employs a 2 level cache where the first level (Persistent Cache - Not to be confused with cache which resides in a Database, both of these caches are in memory) is a long term cache (with 7 day expiry) serving end user requests while the second level (Transient Cache) is a short term cache (with 30 seconds expiry) which feeds the long term cache asynchronously.

Any object list/db result which needs to be cached will be cached in both the persistent and transient caches. The fact that the end user is fully isolated not only from the DB query, but also from the cache refresh mechanisms works beautifully to make the application very responsive irrespective of the inherit inefficiencies of an integrated Enterprise Environment.

The probability of stale data is quite low due to the high refresh rate of the transient cache as well as the nature of the application usage, which mostly comprises of reading vast lists of data (cached) and then operating on one of them (not cached)

The choice of cache manager is Microsoft Application Block which may be questioned. However given the legacy nature of the application and the performance of the cache over the past few years, it's an interesting question whether it should be changed. In any case I'm very interested to figure out alternatives given our application stack being ASP.Net MVC / Oracle.

Tuesday, December 31, 2013

New Year Resolutions

Another year has completed and it’s time to look back and appreciate what you have done. Chances are, our attention will be mostly spent on things that we manage to slip up. Especially the dreaded new year resolutions that you made at the start of 2013 (i.e. If you still remember those of course)

New year resolution can be defined as an agreement that you make with your current self to enforce a positive effect on your future self. In essence it's a contract between current you and future you. We all have done this in the past and will continue to do so - lose weight, quit smoking, give up pornography, start exercising, learn swimming, work towards a promotion and the list goes on. But time and time again we fail horribly. Can we rethink our strategy behind new year resolutions in order to improve the success rate? Or are new year resolutions just a big fat farce?

In order to analyse this problem better, let’s put our attention to the world of Economics where a similar idea can be found. It’s known as a ‘Commitment Device’. In the context of the new year resolutions, let’s shrink the scope of the commitment device to a personal level. In that sense a commitment device is an effective way to force yourself into doing something that you want to do but aren't able to get yourself to do it or in most cases not persistent enough at it. As with everything to do with Economics the critical factor here is the notion of ‘incentive’ or in this context ‘penalty’. A commitment device should always be associated with a penalty.

There are several ways to establish a commitment device.
  • Establishing a (financial) penalty upon (future) you in case you falter. The penalty should be sizeable enough that it motivates you to stay on course
  • Make the commitment public thus putting your reputation on the line. With the advent of social media this is quite easy to implement now.
  • Create a large obstacle to temptations to increase the cost of succumbing to one. So the penalty here could be time, effort or money etc...

Either one or a collection of them could work in a given situation. (There are also online solutions helping you to do this - http://www.stickk.com/)

However there are several factors which influence which method has a higher probability of success.

One such factor is the nature of the action itself - i.e. whether it’s a preventing from an act or persisting with an act. Quitting or ramping down on something - be it smoking, drinking or excessive junk food is a preventive act. The obstacle oriented method or penalty driven method could work well here. If the commitment is about starting a new action - exercising - putting your reputation on the line could be more effective.

In addition, the personality and the social set up of a particular person could also make an impact. For an example a person who sits slightly high in the social pecking order (be it family or work) should opt for the reputation method where the cost of failure could be high.

So can we expect the success of the new year resolutions to improve by making sure that they comply with this framework? May be...or maybe not. My personal opinion is that it will only improve the results of a certain class of personalities albeit a low percentage.

So the more interesting question is why doesn't it work? The analysis seems logical and rational enough. Although most classical schools of economic thinking assumes humans to behave rationally, we are awfully bad at it.

The trouble is that by nature people deep down really don’t want to commit. No matter how clever the current self ties its future self to a commitment device, the future self will find a loophole (excuse) to breach it. Sometimes the problem with the commitment device is the non-linear pressure associated with it. As an example the benefit of losing a few kilos (and possibly gaining it back) might not outweigh the stress someone goes through during the committed period. This is especially true in the reputation method. At the same time we very well know that nothing will get done without a commitment.

“So what’s the sweet spot”? That’s up to each individual to find out. At least by giving it a shot you could find out what doesn't work. So that when 2015 arrives, you hopefully get better at getting something done from your future self.

Happy New Year Everyone!

The post was inspired by this freaknomics pod cast.

Friday, November 15, 2013

Technical Disobedience

If you Google for “Technical Disobedience” you’ll come across content featuring Ernesto Oraza, a Cuban industrial designer.  For years, Ernesto studied and collected various improvised machines - mostly simple day to day gadgets that we generally take for granted - made by Cubans out of sheer necessity. The list spread from TV antennas to workplace machines and included activities like repairing a machine to keep using it well past its intended lifetime or using left-over parts of other ‘dead’ machines to build a new machine out of scratch.


After the fall of Soviet Union Cuba went in to a state of complete closed economy. USA had already left Cuba along with it most of its investment, material resource sources and engineering intellect. Without Soviet Union to help them out, the Cuban government was not able to provide for even basic needs.


Things that we take for granted, like a Motorcycle, a TV antenna or an electric fan were not to be found. Once the existing lot of items ran out of their life time, Cubans had to invent and invent fast to make use of the remainings. Quite correctly Ernesto sees the situation not as a form of imprisonment or constraint rather as a form of liberation - freedom from the technical boundaries imposed by the objects. As an example a casing for an electric fan is a form of boundary imposed by the product designer on the consumer. A consumer is not meant to violate this restriction. However by opening up the casings of the fan and using its internal parts for originally un-intending purposes the Cubans achieved a sort of technological freedom that the present day consumer can not even fathom.


We are in times when products are intentionally designed to not last long and even before it’s intended short lifespan, they are made obsolete by either a competing product or most probably by the ‘next generation’ item of the same product line. (This philosophy actually extends to areas like food, where attributes like ‘Best Before Date’ pushes consumers to throw food away even when they are fit for human purpose) So when a group of people - in this case a country goes ahead and reuses the product even after it had its natural death is quite extraordinary. It must be noted that this observation is not based on several one off incidents - The Cuban government themselves formally identified the phenomena and fuelled the process by providing handbooks or guidelines for people encouraging them to invent more. After all “Worker, build your machine!” - is claim to be said by non other than Ernesto “Che” Guevara’s

How does this realisation relate to our profession of building Software? Are we trapped by the technical boundaries imposed by the language, tool, framework or the formal education we have received?


I’d like to think of Software engineering as a way of making a living out of repeatedly breaking boundaries, be it process, technological or even people behaviour. However with increasing trend of productization of our field, we tend to be losing that incentive to go beyond the boundaries or building stuff from scratch. Lack of popularity and understanding of what constitutes software engineering in broader society may well be another factor. After all most involved in software development is helping some big (or small) corporation grow their wealth as opposed to trying to solve a burning problem of the society that they live in.


On the other hand for most users software is seen as a black box, doing what its supposed to do and heavily blocked off. Although its true that more and more people are using software in their day to day life, not many of them have a proper understanding of how they should be used, let alone taking them beyond their original purpose.


One challenge the Cubans have faced during their product augmentation endeavour must be the scalability - scalability of ideas. Suppose couple of guys have come up with an ingenious way to resurrect a machine, how can this be scaled across the country. With the information technology infrastructure expanding every day, scalability should be the least of the problems if you want to break free. Sadly, this is not the case in most instances. Inherent issues with the way software is designed (client - server or binary delivery) and artificial constraints imposed by delivery mechanisms by vendors (Apple AppStore and its devices) make it quite hard to share an ‘improved’ software product even if you are somehow able to do it.


Fundamentally its even questionable whether software can be taken apart like the machines were in revolutionary Cuba. Cubans were able to pull it off even when the original product design has no intention of being improved by its users. Suppose one of our design requirements is to facilitate end user tinkering, can we even design it in such a way? How can this idea can coexist in a world where most innovations are based on more and more centralized software (SOA) and data (BigData).


I suppose this idea is only applicable to a certain class of Software. Example is when the intended use is purely personal (or among a group of people). However this class is ever increasing with the popularization of the mobile devices.


May be being ‘disobedient’ is not only cool, but can be useful or even revolutionary.
Inspiration :