Pawel Pabich's blog: 2010

Saturday, 11 December 2010

YOW 2010 - loose thoughts

It doesn’t happen often that nearly every single talk at a conference is great and on top of that half of them are actually funny. That’s YOW 2010 for you summarized in one sentence :).

Justin Sheehy explained how to quickly narrow down the choice of database technologies that might be useful in a particular case. His method is based on a simple matrix of operations requirement (local, single server, distributed, etc) by data model (relational, column families, key/value, etc). Once this is done and there are only a few solutions on the table a more sophisticated and time consuming research can be conducted to choose the right solution. Every single NoSQL solution is different and a generic split SQL/NoSQL doesn’t really make sense. It’s all about tread-offs. It’s amazing how often this simple fact needs to be reminded.

Eric Evans talk was focused on the idea of bounded contexts. In other words a single enterprise model is an anti-pattern and is one of software engineering fallacies. Eric mentioned also a few disadvantages of doing big design upfront (AKA let’s build a great framework that less skilled devs can use) and postponing the initial release for long time. Nothing really new but it was well delivered.

Gregor Hohpe talked about trade-off decisions that Google had to make to be able to reach its current scale. He covered the whole spectrum of optimizations from data access at the disk level to minimize heat generation to skipping some longer than expected running parts of map reduce executions to make sure results are delivered in timely manner. When I asked Gregor if Google uses regular Pub/Sub or transactions he said that if there is a technology out there Google has built something on top it :). Just use the right tool for the job.

Second day started with Erik Meijer explaining coSQL (AKA NoSQL). It was a funny presentation about what NoSQL really is and how it relates to SQL. They both complement each other even in a mathematical sense hence the co part of coSQL. Additionally co is more positive than no and this makes Erik happy :).

Jim Webber talked passionately about how much he ~~hates~~ dislikes ESBs and how rarely ESB is the right tool for the job. His presentation was extremely funny but still full of useful information. The main point was that a custom built system can be cheaper (but not cheap) and less risky to deploy than an out of the box ESB which often requires a substantial up-front cost.

Dave Farley took us to the world of <1ms latency and speed of 100k per second. According to Dave this is achievable on commodity servers. The main enabler seems to be lack of synchronization, keeping as few threads per core as possible, keeping all the data in memory and keeping methods very short. 1 CPU can execute 1 billion instructions a second. That’s a lot and as long as we don’t waste it today hardware should be more than enough for needs of most consumers. The main message was that we underestimate what we can get from today hardware. I suppose this is only partially true because nowadays we rarely deploy apps on real hardware. In most cases all we see is a VM that shares the host with Gazillion of other VMs. This might the main reason why the perception of the current hardware capabilities is skewed.

After the conference there were 2 days of workshops. I spent the first day with Ian Robinson and Jim Webber learning about REST. What I believed constituted a fully blown RESTfull service was actually a very basic RESTfull service that scores only 1 out of 3 points in Richardson maturity model. Each of the levels has its place but obviously the higher you get the more you take advantage of the Web and that’s the whole purpose of using REST. REST is CRUDish as it mostly relies on GET, POST, PUT and DELETE. My initial thought was that this is very limiting but then it turned out that it doesn’t have to be. The same applies to lack of transactions. This can be worked around with proper structure of resources, meaningful response codes and proper use of HTTP idioms. Another important thing to keep in mind is that domain model shouldn’t be exposed directly. What you want to expose instead are resources that represent client – server interactions (use cases). In most cases O(resources) > O(domain classes) – notation by Jim Webber :). The Web is inherently based on polling (request/response) thus REST is not suitable for apps which require low latency. In this case you might want to use Pub/Sub.

The next day I attended a workshop with Corey Haines. This was a true hands-on workshop. I spent at least half a day writing code retreats, code katas and coding dojos. Going back to the very basics was surprisingly refreshing. I spent two 45 minutes long sessions constantly refactoring maybe 15 lines of code until most of if statements were gone and code read properly. You wouldn’t do this at work but the whole point of the exercise was to actually go over the line and try to come up with best possible code without feeling the time pressure.

At last but not least, the attendees were fantastic and every coffee/lunch break was full of valuable conversations.

I had an amazing time and YOW 2010 is the best conference I’ve ever been to.

Tags: YOW, REST, NoSQL, Loose Thoughts, Performance

Monday, 15 November 2010

Microsoft Azure on-premises in 2011

It looks like Microsoft is filling an obvious gap and its customers will be able to deploy Azure on their own machines in 2011. This should significantly speed up the adoption of Microsoft Cloud offering as it introduces an additional checkpoint half way through the migration and lowers the risk of the whole process. If on top of that other Cloud providers deploy Azure to their own data centers then the risk will be even smaller because ”vendor lock in” stops being such a big problem. These are all good changes and I’m really looking forward to how they affect the global market of Cloud Computing.

BTW I recently watched an interesting presentation by Chris Read from ThoughtWorks where he focuses on the Cloud from the Operations perspective. One of the takeaways that very often is not obvious to people is that there is no need to fire the infrastructure guys. You simply give them different, more creative tasks :)

Monday, 6 September 2010

Google CDN is not immune to being down

Just a reminder to myself that it’s good to have a fallback procedure when Google is down…not that it happens often :)
This happened to my blog a few days ago:

Thursday, 26 August 2010

RubyMine is a real gem

Today I had to fix a piece of custom code inside of Redmine. I have very little experience with Ruby on Rails but I was able to get the app up and running with a debugger attached within 15 minutes.

I downloaded the latest version of RubyMine 2.5 EAP, installed it, pointed it to the folder with the app, selected production configuration and hit Debug. RubyMine analysed my Ruby setup and popped up a window with a notification that I’m missing some gems and the IDE can download and install them for me. I hit Ok and 5 minutes later I was debugging the app. Ruby on Rails experience on Windows is far from being perfect but RubyMine is simply awesome.

Thursday, 22 July 2010

Java Script unit testing with YUI Test and Jack mocking framework

I strongly believe in unit testing and recently I spent a bit of time trying to apply this technique to Java Script code.
The first problem that I had to solve was which framework to use. From what I’ve read it looks like JSSpec, qUnit and YUI Test get most of the attention nowadays. YUI is the most mature from them and offers by far the most functionality out of the box. On the other hand it is the most complex one to setup but still the whole process takes only a few copy/paste clicks. At the end of the day I decided to go with YUI Test because I wanted to check if I really need its rich capabilities.
In C# world to make unit testing easy we use mocking frameworks. In Java Script world mocking frameworks are not needed because Java Script is a dynamic language and every method/object can be overwritten at any time at runtime. Still mocking might take a bit of effort because you have to keep the original method somewhere around to put it back to where it was at the end of a test. Otherwise you end up with state that is shared between tests which is a bad thing. Jack is a mocking framework that helps solve this problem. It’s not perfect but it is good enough for what I wanted to do.
Enough introduction, let’s start with the story that I’ve implemented. The link to the complete source code is located at the bottom of this post.
There is a simple form and we have to write client side validation logic for it. The rules are as follows:

The user can select either one or more predefined reasons or can provide a custom reason. The user can not use both.
If the form validation succeeds then the user gets a popup with “Correct” message
If the form validation fails then the user gets a popup with “Wrong” message.

This is how the form looks like:

and this is the HTML behind it:
To be able to run the unit tests we have to have an HTML page that simply loads all required Java Script code and executes it.

As you can see the HTML page is very simple. It loads a few files that belong to YUI framework, then it loads code under test from Form.js and the actual unit tests from UnitTests.js.
Below is the content of UnitTest.js file.
and the end result in a web browser:

The only thing that requires explanation here is the difference between Validation and Submission tests. The validate method is a standalone method that does not have any dependencies hence its unit testing is very simple and boils down to passing different sets of input parameters and asserting the correct results.
The unit testing of the submitForm method on the other hand is not that simple because the method relies on getPredefinedReasons and getCustomReason methods that grab data from the DOM and validate method that ensures that the user provided data is valid. We are not interested in the way those methods work while unit testing submitForm method. They actually gets in the way. What we need to do is to mock them and focus on making sure the submitForm method shows correct messages to the user.
The mocking framework takes care of that. All we have to do is create an anonymous method that encapsulate all our mocking logic. The mocking framework will make sure that once the test is done the global state gets rolled back to where it was before the test was executed. The way Jack is designed reminds me of using and IDisposable in C#.
As you can see the jQuery based code is encapsulated into getXXX methods which makes easy to mock them. Some people don’t mock jQuery and instead try to recreate enough DOM elements on the test page to satisfy the tests. I don’t like this approach because changes to either HTML or jQuery might force us to change the unit tests which makes them brittle. It is like using L2S in a unit test. It’s not a unit test, it is an integration test. Other approach I’ve seen is to mock jQuery methods one by one. This is a slightly better approach but still changes to jQuery queries can break the tests. It is like trying to mock a sequence of Linq extension methods. It’s way easier to mock the whole method that simply encapsulates the query.
If this was a C# code then getXXX methods would be defined on some kind of repository and validate method would belong to a validation component. Both of them would be injected to the main logic that handles the form submission. If this was an ASP.NET MVC app that would be a controller. It was not my intention to structure the code in this way but that’s what I ended up with writing the tests first.
You might wonder why I haven’t shown the actual code yet. Well, I did it on purpose. The unit tests should be enough to understand what the client side code does and what its desired behaviour is. If this is still not clear then it means that either the code is not structured properly or that the names are not descriptive enough.
And that would be it. The last thing to do is to show the actual code:

JSUnitTesting.zip

Monday, 28 June 2010

Blog upgraded from Subtext 2.1 to Subtext 2.5

No problems so far and I really like the new dashboard. Good job guys!

Sunday, 20 June 2010

Generic retry logic in PowerShell

I spent recently a bit of time writing PowerShell scripts that deploy a system that I’m working on both locally and to QA. Basically, you get the latest code from TFS, run single PowerShell script and press F5 :). Why? Have a look at The Joel Test.
Anyway, while writing those scripts I needed to implement a basic retry logic in multiple places. It turned out that PowerShell supports closures and that you can pass any part of the script to a function as an argument. Having all of that at my disposal made my task very easy:

function Execute-Command($Command, $CommandName) {
    $currentRetry = 0;
    $success = $false;
    do {
        try 
        { 
            & $Command;
            $success = $true;
            Log-Debug "Successfully executed [$CommandName] command. Number of entries: $currentRetry";
        } 
        catch [System.Exception] 
        {
            $message = 'Exception occurred while trying to execute [$CommandName] command:' + $_.Exception.ToString();
            Log-Error $message;
            if ($currentRetry -gt 5) {
                $message = "Can not execute [$CommandName] command. The error: " + $_.Exception.ToString();
                throw $message;
            } else {
                Log-Debug "Sleeping before $currentRetry retry of [$CommandName] command";
                Start-Sleep -s 1;
            }
            $currentRetry = $currentRetry + 1;
        }
    } while (!$success);
}

And this is how you can use it:

$command = { Get-ChildItem $Folder -Recurse | Remove-Item -Recurse -Force};  
$commandName = "Delete content of [$Folder]";  
Execute-Command -Command $command -CommandName $commandName;

CloudCamp is coming again to Sydney!

This time you need to take half a day off to attend it but I believe it’s well worth your time and I really enjoyed the previous camp.

Wednesday, 19 May 2010

Possible NoSQL(MongoDB) training in Australia

Over to Simon for more details. It might be interesting if 10Gen decides to dive deep into details.

Friday, 7 May 2010

Unit testing - it's about the feedback cycle

When I start introducing unit testing to someone that is not familiar with it, one of the first complaints I hear is that it takes more time to develop code and unit tests as opposed to just code. This is not true and actually in most cases it takes less time and as a side effect you end up with a set of unit tests that will make your life easier in the future.
Let’s assume that every piece of code gets tested by its creator before it gets handed over to the QA guys. Pretty reasonable assumption, isn't it? If there are no unit tests the only way to test the functionality is to run the whole application or some kind of integration tests. This takes seconds if not minutes for each test case. You can run a well written unit test within milliseconds. You can run 100 unit tests within a couple of seconds(even with mstest as long as you use VS2010 RTM).
If a test takes minutes to execute it’s easy to loose focus and switch to something else for a while. We all know how expensive the context switching is. Unit tests give you instant feedback which helps you stay focused and more productive. Less time for a single test case means that you can test more cases which in turn leads to fewer bugs. Sure, once the code is unit tested you need to actually run it from within the application but this is more to make sure that all bits and pieces are correctly configured rather than to do extensive testing.
Now calculate what is the cost in terms of time of a bug found by your QA team. In such a case the following needs to happen:

a tester has to create a bug report
someone has to triage it
a developer needs to:
- get familiar with the problem
- recreate the problem
- fix it
- test it (without unit tests)
- promote to the source control system
- make sure the CI build is green
a tester needs to test the fix

In the best organized company I’ve seen, the total time of such an exercise would be around 2h assuming that the whole process finishes within a day or two since the initial checkin.
It’s all about the length of the feedback cycle. The shorter it is the better. If you find a bug with a unit test you loose a couple of minutes, if you let it through to QA environment or even worse to Production you loose hours or days(think about all the hours you spent with windbg :)).
If unit tests help you lower the number of bugs by 1 for a given feature then you end up with enough time to cover the task of writing them. Interesting, isn’t it?
P.S.
There are many more advantages of unit testing that can even further reduce the time needed to write a piece of code. One of the most efficient ones is TDD which lets you drive the design of the code with unit tests. Crazy? No, it actually works very well. You can read more about it here.

Saturday, 13 March 2010

Cloud computing enables and forces us to do proper engineering

Very often testing is a second class citizen in the IT world. It’s not like software is not tested at all but it’s far from being perfect:

the testing environment is nothing like production environment
performance testing is nonexistent

This leads to problems that you can observe only in production. The process of fixing those kind of issues tends to take a lot of time because most often developers have very limited access to production and the set of tools they can use for debugging is limited. A permanent testing environment that matches production is very expensive and that’s why businesses take risk and deploy applications to production without proper testing. Very often they are unlucky and the price they pay is much higher than the price of proper testing. It’s like a mortgage, you get a lot of money quickly but then later on you have to pay the interest. Nothing is for free. With Cloud computing this is no more such a big problem. If your production deployment requires 50 servers then you can provision a testing environment which looks exactly like production within minutes. What is more once your are done with testing you can simply get rid of the whole environment. But this sounds like a lot of effort. Doesn’t it? Well, that’s true only if the whole process in manual. If it’s automated it’s not a problem at all. You can write your own scripts or use services like Right Scale that will help you with this. The point is that the use of Cloud computing forces you to automate your software development processes which is good. The same applies to performance testing. You can setup a testing lab only for the duration of a test. You can read here how MySpace leveraged Cloud computing to make sure it can handle 1 million of concurrent users.

I’m sure everybody heard at least once that scaling applications in the Cloud is easy. As you can expect this is not entirely true. It might be true in the marketing world though :). If you simply move your application from your own data centre to a Cloud there is a good chance that it will be much slower and less reliable. Why? Most Cloud providers offer you a few predefined server configurations that you can choose from. What is more most of them are virtual servers. This means that you don’t have any control over the hardware the application will run on. If the Cloud provider can’t match your existing setup then there is a good chance the application will be slower. Even if you manage to get enough CPUs and RAM you might still suffer from slow disk IO and the fact that the machines are less reliable than you would expect. You can read more about that here. The bottom line is that that you can’t expect the application to simply run unchanged in the Cloud. One of the ways of aligning the application with the Cloud is making sure that it can run on multiple servers at the same time. This basically prevents you from building monolithic systems.

Security is another topic that tend to get very little attention. The reason is that there is an implicit assumption that the application will always run locally thus nobody from outside will have access to it. Obviously this is a fallacy and a huge security hole. Nobody can see it (or everybody can hide it) because it’s implicit. Without addressing this problem you can’t really move your application to the Cloud which forces you to take care of it. Cloud computing makes a lot of things very explicit which is a very good thing. There are way too many secret handshakes and implicit assumptions that we take advantage of to build applications nowadays. Cloud Applications Architectures deals with security in the Cloud quite extensively. It’s a good book that is a bit outdated and a bit too much focused on the Amazon Cloud but still worth reading.

Disaster recovery is very similar to load testing. Everybody knows that it’s needed and everybody has a plan how to do it but the plan never gets executed because it takes way too much time and resources. Again, Cloud computing makes it cheaper and easier to do. What is more you get more options. You can start with a plan that deals with failures of a single server and extend it, if it’s required, to procedures that can deal with data centres or even whole countries being offline.

As you can see you can gain a lot from Cloud computing but it doesn’t come for free and more than likely you will have to redesign your applications and rethink your processes to make sure you can take full advantage of what Cloud computing has to offer.

Wednesday, 10 February 2010

Training with Udi Dahan - it's all about business

I know it’s been a while since I wrote my last post but in my defence :) I went to SEA for over 5 weeks and on purpose disconnected from the whole IT-related online world. It was great :).
Anyway, back to the topic. A couple weeks ago I went for a week long training(Advanced Distributed System Design with SOA & DDD) with Udi Dahan. It was, in a positive way, a mind blowing exercise. I already knew about messaging and how it helps to fight different types of coupling but only listening to Udi made me understand those concepts in depth.
From what I observed the topic that caused most of the confusion among the attendees was Command-Query Responsibility Segregation pattern. I have to admit that I’m still wrestling a bit with this topic myself but I’ve learnt the hard way that using the same channel/model/approach for both queries and commands simply doesn’t scale and will bite you sooner or later.
But today I want to talk about something else that Udi mentioned a few times. Namely, one of the most common mistakes that IT people make is trying to solve business problems with technology. Yes, you read it correctly :). Let me give you an example that should explain what I mean. I suppose everybody saw at least one implementation of WCF smart proxy. One of the reasons people write it is to transparently handle exceptions and make sure that developers that write business logic don’t have to deal with them. One of the implementations I’ve seen catches a few WCF exceptions and then re-tries the failed call. The implementation assumes that the call didn’t succeed on the server side. Obviously that’s a wrong assumption and this code can cause a lot of damage. Would you send a customer 2 or more laptops(depends on the number of re-tries) whereas he/she ordered just one? In this case a developer tries very hard to solve a business problem(what happens if a call to a shipping service fails) with a technology (WCF smart proxy). Maybe the code shouldn’t try the same shipping service again but move on and call a different one or maybe it should notify the system administrator that there is a problem that needs to be handled manually. This question needs to be answered by the business and then based on the business input a technology-based (or not) solution can be implemented.
Someone could still argue that it’s a technology problem because the communication channel is not reliable enough. Well, the fact that a company uses web services (and not phone calls) to place shipment requests was a business decision. On one hand in this way you can ship goods faster to the customers but on the other hand you have to deal with a new type of problems. Unfortunately, in real life nothing comes for free.
Again what’s the role of IT? I believe our role is to implement solutions provided by the business. This doesn’t mean though that there is only one-way relationship between business and IT and IT always does what it’s told to do. As IT we can and should give feedback to the business whenever we see anything that might have impact on it but at the end of the day it’s up to the business to make the final call.