Over to Simon for more details. It might be interesting if 10Gen decides to dive deep into details.
Wednesday, 19 May 2010
Friday, 7 May 2010
Unit testing - it's about the feedback cycle
Let’s assume that every piece of code gets tested by its creator before it gets handed over to the QA guys. Pretty reasonable assumption, isn't it? If there are no unit tests the only way to test the functionality is to run the whole application or some kind of integration tests. This takes seconds if not minutes for each test case. You can run a well written unit test within milliseconds. You can run 100 unit tests within a couple of seconds(even with mstest as long as you use VS2010 RTM).
If a test takes minutes to execute it’s easy to loose focus and switch to something else for a while. We all know how expensive the context switching is. Unit tests give you instant feedback which helps you stay focused and more productive. Less time for a single test case means that you can test more cases which in turn leads to fewer bugs. Sure, once the code is unit tested you need to actually run it from within the application but this is more to make sure that all bits and pieces are correctly configured rather than to do extensive testing.
Now calculate what is the cost in terms of time of a bug found by your QA team. In such a case the following needs to happen:
- a tester has to create a bug report
- someone has to triage it
- a developer needs to:
- get familiar with the problem
- recreate the problem
- fix it
- test it (without unit tests)
- promote to the source control system
- make sure the CI build is green
- a tester needs to test the fix
It’s all about the length of the feedback cycle. The shorter it is the better. If you find a bug with a unit test you loose a couple of minutes, if you let it through to QA environment or even worse to Production you loose hours or days(think about all the hours you spent with windbg :)).
If unit tests help you lower the number of bugs by 1 for a given feature then you end up with enough time to cover the task of writing them. Interesting, isn’t it?
P.S.
There are many more advantages of unit testing that can even further reduce the time needed to write a piece of code. One of the most efficient ones is TDD which lets you drive the design of the code with unit tests. Crazy? No, it actually works very well. You can read more about it here.
Saturday, 13 March 2010
Cloud computing enables and forces us to do proper engineering
Very often testing is a second class citizen in the IT world. It’s not like software is not tested at all but it’s far from being perfect:
- the testing environment is nothing like production environment
- performance testing is nonexistent
This leads to problems that you can observe only in production. The process of fixing those kind of issues tends to take a lot of time because most often developers have very limited access to production and the set of tools they can use for debugging is limited. A permanent testing environment that matches production is very expensive and that’s why businesses take risk and deploy applications to production without proper testing. Very often they are unlucky and the price they pay is much higher than the price of proper testing. It’s like a mortgage, you get a lot of money quickly but then later on you have to pay the interest. Nothing is for free. With Cloud computing this is no more such a big problem. If your production deployment requires 50 servers then you can provision a testing environment which looks exactly like production within minutes. What is more once your are done with testing you can simply get rid of the whole environment. But this sounds like a lot of effort. Doesn’t it? Well, that’s true only if the whole process in manual. If it’s automated it’s not a problem at all. You can write your own scripts or use services like Right Scale that will help you with this. The point is that the use of Cloud computing forces you to automate your software development processes which is good. The same applies to performance testing. You can setup a testing lab only for the duration of a test. You can read here how MySpace leveraged Cloud computing to make sure it can handle 1 million of concurrent users.
I’m sure everybody heard at least once that scaling applications in the Cloud is easy. As you can expect this is not entirely true. It might be true in the marketing world though :). If you simply move your application from your own data centre to a Cloud there is a good chance that it will be much slower and less reliable. Why? Most Cloud providers offer you a few predefined server configurations that you can choose from. What is more most of them are virtual servers. This means that you don’t have any control over the hardware the application will run on. If the Cloud provider can’t match your existing setup then there is a good chance the application will be slower. Even if you manage to get enough CPUs and RAM you might still suffer from slow disk IO and the fact that the machines are less reliable than you would expect. You can read more about that here. The bottom line is that that you can’t expect the application to simply run unchanged in the Cloud. One of the ways of aligning the application with the Cloud is making sure that it can run on multiple servers at the same time. This basically prevents you from building monolithic systems.
Security is another topic that tend to get very little attention. The reason is that there is an implicit assumption that the application will always run locally thus nobody from outside will have access to it. Obviously this is a fallacy and a huge security hole. Nobody can see it (or everybody can hide it) because it’s implicit. Without addressing this problem you can’t really move your application to the Cloud which forces you to take care of it. Cloud computing makes a lot of things very explicit which is a very good thing. There are way too many secret handshakes and implicit assumptions that we take advantage of to build applications nowadays. Cloud Applications Architectures deals with security in the Cloud quite extensively. It’s a good book that is a bit outdated and a bit too much focused on the Amazon Cloud but still worth reading.
Disaster recovery is very similar to load testing. Everybody knows that it’s needed and everybody has a plan how to do it but the plan never gets executed because it takes way too much time and resources. Again, Cloud computing makes it cheaper and easier to do. What is more you get more options. You can start with a plan that deals with failures of a single server and extend it, if it’s required, to procedures that can deal with data centres or even whole countries being offline.
As you can see you can gain a lot from Cloud computing but it doesn’t come for free and more than likely you will have to redesign your applications and rethink your processes to make sure you can take full advantage of what Cloud computing has to offer.
Wednesday, 10 February 2010
Training with Udi Dahan - it's all about business
Anyway, back to the topic. A couple weeks ago I went for a week long training(Advanced Distributed System Design with SOA & DDD) with Udi Dahan. It was, in a positive way, a mind blowing exercise. I already knew about messaging and how it helps to fight different types of coupling but only listening to Udi made me understand those concepts in depth.
From what I observed the topic that caused most of the confusion among the attendees was Command-Query Responsibility Segregation pattern. I have to admit that I’m still wrestling a bit with this topic myself but I’ve learnt the hard way that using the same channel/model/approach for both queries and commands simply doesn’t scale and will bite you sooner or later.
But today I want to talk about something else that Udi mentioned a few times. Namely, one of the most common mistakes that IT people make is trying to solve business problems with technology. Yes, you read it correctly :). Let me give you an example that should explain what I mean. I suppose everybody saw at least one implementation of WCF smart proxy. One of the reasons people write it is to transparently handle exceptions and make sure that developers that write business logic don’t have to deal with them. One of the implementations I’ve seen catches a few WCF exceptions and then re-tries the failed call. The implementation assumes that the call didn’t succeed on the server side. Obviously that’s a wrong assumption and this code can cause a lot of damage. Would you send a customer 2 or more laptops(depends on the number of re-tries) whereas he/she ordered just one? In this case a developer tries very hard to solve a business problem(what happens if a call to a shipping service fails) with a technology (WCF smart proxy). Maybe the code shouldn’t try the same shipping service again but move on and call a different one or maybe it should notify the system administrator that there is a problem that needs to be handled manually. This question needs to be answered by the business and then based on the business input a technology-based (or not) solution can be implemented.
Someone could still argue that it’s a technology problem because the communication channel is not reliable enough. Well, the fact that a company uses web services (and not phone calls) to place shipment requests was a business decision. On one hand in this way you can ship goods faster to the customers but on the other hand you have to deal with a new type of problems. Unfortunately, in real life nothing comes for free.
Again what’s the role of IT? I believe our role is to implement solutions provided by the business. This doesn’t mean though that there is only one-way relationship between business and IT and IT always does what it’s told to do. As IT we can and should give feedback to the business whenever we see anything that might have impact on it but at the end of the day it’s up to the business to make the final call.
Sunday, 25 October 2009
Release it! - some loose thoughts
There is one thing in the book that I disagree with though. Page 199, Michael recommends to use SoftReference when implementing a cache in Java. The counterpart of SoftReference in .NET world is WeakReference. I think that is a very bad idea. The most important part of every caching solution is its expiration policy which would translate to a simple question – when does the data need to be refreshed? GC operates at a very low level and it doesn’t have enough information to make an informed decision. Let me give you an example. Let’s say we have 2 arrays of integers(System.Int32). Both of them 1000 elements long and it takes 10 ms to fill the first one and 100 sec to fill the second one and they both need to be refreshed once an hour. From GC perspective they are basically the same objects. It doesn’t matter which one gets collected as in both cases GC will reclaim 4000 bytes. This is not true from the application perspective. If GC decides to release often the memory associated with the second array the application will crawl. If not it will be lightning fast. What if the GC implementation changes and after upgrade to the next version of the runtime the performance of the app changes completely. I wouldn’t like to debug this problem. In other words, you can’t build a solution that needs to be predictable(cache expiration policy) based on a component (GC) that is beyond your control.
Sunday, 6 September 2009
Failing fast saves a lot time
Have a look at this MSDN blog post to see a real life example of how crucial it is to fail fast.
Tuesday, 1 September 2009
CloudCamp - a bunch of loose thoughts
I know more or less what Cloud Computing is but until recently I still struggled to figure what it is good for. That’s why I decided to attend CloudCamp at Google’s Sydney Office which is a Cloud Computing event focused on sharing thoughts in a form of open discussions. The presenters were there just to start conversations and finish them soon enough to have a few beers afterwards :). I participated in two sessions(Scaling Applications in the Cloud, Cloud Computing from business perspective) and each of them taught me something different.
The sexiest part of Cloud Computing is its promise of scalability. The funny thing is that most of the people will never need it. If I’m not mistaken StackOverflow handles 1 million of hits a day running on a beefy DB server and two medium size Web servers. Sure, it depends what your app does but in most cases thinking a lot about scaling issues before you actually face them is a waste of time. It’s better to spend that time on making sure your app is successful. This of course doesn’t justify complete lack of design and messy code. It’s a matter of striking the right balance. If you know for fact that your app needs to handle huge load then it makes sense to design for it upfront. But again, if there is one problem that startups are dreaming of it’s the load problem :).
One of the presenters mentioned that hosting of a regular WordPress based site in the Cloud is 7 times more expensive than regular, dedicated hosting. The cloud seems to be good for apps which resource utilization is either low or variable. If it’s low then it means that by hosting it yourself you pay for resources that you don’t take advantage of. If your utilization is high it might not make sense to move to the Cloud because the Cloud Computing providers charge you for resources (you will use the same amount) and additionally for promise of more resources when you need them. If you have to handle spikes Cloud Computing might be the way to go as you don’t want to buy bunch of servers that you use only a couple of weeks a year. In other words the key to successful migration to the Cloud is to know your application capacity.
A few people mentioned that they use the Cloud as their stress testing environments. This actually makes a lot sense because you can quickly provision a lot of boxes, do your testing and then discard them for a very reasonable price. In general if you need to perform some kind of activity that is either temporary or repetitive you might want to consider doing it off your premises.
Another presenter said that the Cloud Computing price should drop in the near future because more and more people are using it and it might become a commodity. Someone compared Amazon pricelist to the price lists of mobile network operators. At the beginning the price list was simple to attract customers. The more people use the service the more complicated the price list gets until the point when a regular user is totally lost and the service provider can make more money off him/her then before. It is an interesting theory and definitively true with regards to at least some mobile network operators. I still can’t figure out why 50 AUD worth CAP converts into 150 AUD value :).
An employee of of a big hardware vendor mentioned that some big Cloud Computing providers are working on a standard that will let people to move their apps from one provider to the other. I suppose they figured out that being locked to a particular provider is not what business is looking for.
From business perspective IT infrastructure is a cost. If the cost can be lowered by moving it to the Cloud that’s great. It’s not like business is mad about Could Computing. If a bunch of monkeys were good enough and cheaper than on premises IT infrastructure they would go for it. IT is a tool. So far the best one and let’s try to keep it that way :).