Saturday, 13 March 2010

Cloud computing enables and forces us to do proper engineering

Very often testing is a second class citizen in the IT world. It’s not like software is not tested at all but it’s far from being perfect:

  • the testing environment is nothing like production environment
  • performance testing is nonexistent

This leads to problems that you can observe only in production. The process of fixing those kind of issues tends to take a lot of time because most often developers have very limited access to production and the set of tools they can use for debugging is limited. A permanent testing environment that matches production is very expensive and that’s why businesses take risk and deploy applications to production without proper testing. Very often they are unlucky and the price they pay is much higher than the price of proper testing. It’s like a mortgage, you get a lot of money quickly but then later on you have to pay the interest. Nothing is for free. With Cloud computing this is no more such a big problem. If your production deployment requires 50 servers then you can provision a testing environment which looks exactly like production within minutes. What is more once your are done with testing you can simply get rid of the whole environment. But this sounds like a lot of effort. Doesn’t it? Well, that’s true only if the whole process in manual. If it’s automated it’s not a problem at all. You can write your own scripts or use services like Right Scale that will help you with this. The point is that the use of Cloud computing forces you to automate your software development processes which is good. The same applies to performance testing. You can setup a testing lab only for the duration of a test. You can read here how MySpace leveraged Cloud computing to make sure it can handle 1 million of concurrent users.

I’m sure everybody heard at least once that scaling applications in the Cloud is easy. As you can expect this is not entirely true. It might be true in the marketing world though :). If you simply move your application from your own data centre to a Cloud there is a good chance that it will be much slower and less reliable. Why? Most Cloud providers offer you a few predefined server configurations that you can choose from. What is more most of them are virtual servers. This means that you don’t have any control over the hardware the application will run on. If the Cloud provider can’t match your existing setup then there is a good chance the application will be slower. Even if you manage to get enough CPUs and RAM you might still suffer from slow disk IO and the fact that the machines are less reliable than you would expect. You can read more about that here. The bottom line is that that you can’t expect the application to simply run unchanged in the Cloud.  One of the ways of aligning the application with the Cloud is making sure that it can run on multiple servers at the same time. This basically prevents you from building monolithic systems.

Security is another topic that tend to get very little attention. The reason is that there is an implicit assumption that the application will always run locally thus nobody from outside will have access to it. Obviously this is a fallacy and a huge security hole. Nobody can see it (or everybody can hide it) because it’s implicit. Without addressing this problem you can’t really move your application to the Cloud which forces you to take care of it. Cloud computing makes a lot of things very explicit which is a very good thing. There are way too many secret handshakes and implicit assumptions that we take advantage of to build applications nowadays. Cloud Applications Architectures deals with security in the Cloud quite extensively. It’s a good book that is a bit outdated and a bit too much focused on the Amazon Cloud but still worth reading.

Disaster recovery is very similar to load testing. Everybody knows that it’s needed and everybody has a plan how to do it but the plan never gets executed because it takes way too much time and resources. Again, Cloud computing makes it cheaper and easier to do. What is more you get more options. You can start with a plan that deals with failures of a single server and extend it, if it’s required, to procedures that can deal with data centres or even whole countries being offline.

As you can see you can gain a lot from Cloud computing but it doesn’t come for free and more than likely you will have to redesign your applications and rethink your processes to make sure you can take full advantage of what Cloud computing has to offer.

Wednesday, 10 February 2010

Training with Udi Dahan - it's all about business

I know it’s been a while since I wrote my last post but in my defence :) I went to SEA for over 5 weeks and on purpose disconnected from the whole IT-related online world. It was great :).
Anyway, back to the topic. A couple weeks ago I went for a week long training(Advanced Distributed System Design with SOA & DDD) with Udi Dahan. It was, in a positive way, a mind blowing exercise. I already knew about messaging and how it helps to fight different types of coupling but only listening to Udi made me understand those concepts in depth.
From what I observed the topic that caused most of the confusion among the attendees was Command-Query Responsibility Segregation pattern. I have to admit that I’m still wrestling a bit with this topic myself but I’ve learnt the hard way that using the same channel/model/approach for both queries and commands simply doesn’t scale and will bite you sooner or later.
But today I want to talk about something else that Udi mentioned a few times. Namely, one of the most common mistakes that IT people  make is trying to solve business problems with technology. Yes, you read it correctly :).  Let me give you an example that should explain what I mean. I suppose everybody saw at least one implementation of WCF smart proxy. One of the reasons people write it is to transparently handle exceptions and make sure that developers that write business logic don’t have to deal with them. One of the implementations I’ve seen catches a few WCF exceptions and then re-tries the failed call. The implementation assumes that the call didn’t succeed on the server side. Obviously that’s a wrong assumption and this code can cause a lot of damage. Would you send a customer 2 or more laptops(depends on the number of re-tries) whereas he/she ordered just one? In this case a developer tries very hard to solve a business problem(what happens if a call to a shipping service fails) with a technology (WCF smart proxy). Maybe the code shouldn’t try the same shipping service again but move on and call a different one or maybe it should notify the system administrator that there is a problem that needs to be handled manually. This question needs to be answered by the business and then based on the business input a technology-based (or not) solution can be implemented.
Someone could still argue that it’s a technology problem because the communication channel is not reliable enough. Well, the fact that a company uses web services (and not phone calls) to place shipment requests was a business decision. On one hand in this way you can ship goods faster to the customers but on the other hand you have to deal with a new type of problems. Unfortunately, in real life nothing comes for free.
Again what’s the role of IT? I believe our role is to implement solutions provided by the business. This doesn’t mean though that there is only one-way relationship between business and IT and IT always does what it’s told to do. As IT we can and should give feedback to the business whenever we see anything that might have impact on it but at the end of the day it’s up to the business to make the final call. 

Sunday, 25 October 2009

Release it! - some loose thoughts

I’ve been trying to finish that book for quite some time. It was difficult :) because it is a book that you can read chapter by chapter without losing the plot and hence it’s easy to abandon it every now and then. Anyway, I managed to finish it this week and I have to say that I haven’t really learnt anything new. This doesn’t mean that it was a waste of time. On the contrary, after reading it I’m more confident that what I’ve been doing is right and I’m not some kind of weirdo that demands the impossible :). It’s definitively a must read for developers that haven’t worked in 24/7 environment where part of their job is to be on call for a week every month or two. When you can get a call at 3 am you design your software in a little bit different way :). I will dedicate a separate post to that topic.
There is one thing in the book that I disagree with though. Page 199, Michael recommends to use SoftReference when implementing a cache in Java. The counterpart of SoftReference in .NET world is WeakReference. I think that is a very bad idea. The most important part of every caching solution is its expiration policy which would translate to a simple question – when does the data need to be refreshed? GC operates at a very low level and it doesn’t have enough information to make an informed decision. Let me give you an example. Let’s say we have 2 arrays of integers(System.Int32). Both of them 1000 elements long and it takes 10 ms to fill the first one and 100 sec to fill the second one and they both need to be refreshed once an hour. From GC perspective they are basically the same objects. It doesn’t matter which one gets collected as in both cases GC will reclaim 4000 bytes. This is not true from the application perspective. If GC decides to release often the memory associated with the second array the application will crawl. If not it will be lightning fast. What if the GC implementation changes and after upgrade to the next version of the runtime the performance of the app changes completely. I wouldn’t like to debug this problem. In other words, you can’t build a solution that needs to be predictable(cache expiration policy) based on a component (GC) that is beyond your control.

Sunday, 6 September 2009

Failing fast saves a lot time

Have a look at this MSDN blog post to see a real life example of how crucial it is to fail fast.

Tuesday, 1 September 2009

CloudCamp - a bunch of loose thoughts

I know more or less what Cloud Computing is but until recently I still struggled to figure what it is good for. That’s why I decided to attend CloudCamp at Google’s Sydney Office which is a Cloud Computing event focused on sharing thoughts in a form of open discussions. The presenters were there just to start conversations and finish them soon enough to have a few beers afterwards :). I participated in two sessions(Scaling Applications in the Cloud, Cloud Computing from business perspective) and each of them taught me something different.

The sexiest part of Cloud Computing is its promise of scalability. The funny thing is that most of the people will never need it. If I’m not mistaken StackOverflow handles 1 million of hits a day running on a beefy DB server and two medium size Web servers. Sure, it depends what your app does but in most cases thinking a lot about scaling issues before you actually face them is a waste of time. It’s better to spend that time on making sure your app is successful. This of course doesn’t justify complete lack of design and messy code. It’s a matter of striking the right balance. If you know for fact that your app needs to handle huge load then it makes sense to design for it upfront. But again, if there is one problem that startups are dreaming of it’s the load problem :).

One of the presenters mentioned that hosting of a regular WordPress based site in the Cloud is 7 times more expensive than regular, dedicated hosting. The cloud seems to be good for apps which resource utilization is either low or variable. If it’s low then it means that by hosting it yourself you pay for resources that you don’t take advantage of. If your utilization is high it might not make sense to move to the Cloud because the Cloud Computing providers charge you for resources (you will use the same amount) and additionally for promise of more resources when you need them. If you have to handle spikes Cloud Computing might be the way to go as you don’t want to buy bunch of servers that you use only a couple of weeks a year. In other words the key to successful migration to the Cloud is to know your application capacity.

A few people mentioned that they use the Cloud as their stress testing environments. This actually makes a lot sense because you can quickly provision a lot of boxes, do your testing and then discard them for a very reasonable price. In general if you need to perform some kind of activity that is either temporary or repetitive you might want to consider doing it off your premises.

Another presenter said that the Cloud Computing price should drop in the near future because more and more people are using it and it might become a commodity. Someone compared Amazon pricelist to the price lists of mobile network operators. At the beginning the price list was simple to attract customers. The more people use the service the more complicated the price list gets until the point when a regular user is totally lost and the service provider can make more money off him/her then before. It is an interesting theory and definitively true with regards to at least some mobile network operators. I still can’t figure out why 50 AUD worth CAP converts into 150 AUD value :).

An employee of of a big hardware vendor mentioned that some big Cloud Computing providers are working on a standard that will let people to move their apps from one provider to the other. I suppose they figured out that being locked to a particular provider is not what business is looking for.

From business perspective IT infrastructure is a cost. If the cost can be lowered by moving it to the Cloud that’s great. It’s not like business is mad about Could Computing. If a bunch of monkeys were good enough and cheaper than on premises IT infrastructure they would go for it. IT is a tool. So far the best one and let’s try to keep it that way :).

Wednesday, 26 August 2009

No coding exercise. I'm not interested.

Richard, one of my colleagues at Readify, wrote an interesting blog post about how he learnt the hard way that if you want to hire a software developer you need to check her/his skills by giving her/him a coding exercise to solve. If you don’t do this you are simply asking for trouble. Have a look at Richard’s blog post for more details.
When I was looking for a job in Australia and I got in touch with a few companies and some of them didn’t require me to write any code. Those companies were immediately off my list. Moving to the other hemisphere was risky enough that I didn’t want to deal with companies that didn’t pay attention to their recruitment process. What is more if a company doesn’t bother to interview you properly that might be a sign that they won’t treat you well.
Just my two cents.

Sunday, 16 August 2009

Subtext 1.9.5 -> 2.1.2

I’ve just upgraded my blog from Subtext 1.9.5 to Subtext 2.1.2. The upgrade has been smooth and I haven’t had to make any manual changes. I’m really impressed. Keep up good work guys! I can’t wait for Subtext 3.0 which is meant to based on ASP.NET MVC. Now I need to find a better skin for my blog but taking into account my UI skills that might take some time ;).