Showing posts with label Cloud Computing. Show all posts
Showing posts with label Cloud Computing. Show all posts

Tuesday, 1 January 2013

How to run Hadoop on Windows

One can spend only so much time surfing in 30C+ weather ;). So while my body was recovering from too much sunshine I decided to play with Hadoop to learn first hand what it actually is.

The easiest way to start is to download a preconfigured VMware image from Cloudera. This is what I did and it worked but it did not work well. The highest resolution I could set was 1024x768. I installed the VMware client tools but they did not seem  to work with the Linux distribution picked by Cloudera. I managed to figure out how to use vim to edit text files but a tiny window with flaky UI (you can see what is happening inside Hadoop using a web browser) was more that I could handle. Then I thought about getting it working on Mac OS X which is a very close cousin of Linux. The installation process is simple but the configuration process is not.

So I googled a bit more and came across Microsoft HDInsight which is Microsoft distribution of Hadoop that runs on Windows and Windows Azure. HDInsight worked great for me on Windows 8 and I was able to play with 3 most often used query APIs: native Hadoop Java based map/reduce framework, Hive and Pig. I used Word count as a  problem to see what each of them is capable of.  Below are links to sample implementations:
  • Java map/reduce framework – run c:\hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd to get into command line interface for Hadoop
  • Pig – run C:\Hadoop\pig-0.9.3-SNAPSHOT\bin\pig.cmd to get into Grunt which lets you use Pig
  • Hive – run C:\Hadoop\hive-0.9.0\bin\hive.cmd to get into Hive command line interface

As far as I know Microsoft is going to to contribute their changes back to the Hadoop project so at some stage we might get Hadoop running natively on Windows in the same way nodejs is.

Thursday, 18 October 2012

Quick overview of TFS Preview

I spent last week playing with shiny/new Web technologies and I needed a place where I could store my todo list and keep my code. I could’ve used Git + Trello but I wanted to give TFS Preview a try so I created a project there, converted my todo list into user stories and connected my repo with my Azure account for automated deployments. After a week of using TFS Preview this is what I’ve learnt:
  • It was very easy to configure automated deployments to Azure
  • The performance of web portal ranged from acceptable to painfully slow.
  • The UI is decent and it’s easy to execute all the basic tasks like add user story, add bug, look up a build
  • Burndown chart worked out of the box
  • Scrum board is simple and does the job
  • Builds would take up to couple of minutes to start even if the build queue was empty
  • Total build time was a bit unpredictable, ranged from 60s to 160s for the same code.
  • Adding support for nUnit was more hassle than I anticipated
  • Story boarding in PowerPoint is not integrated with TFS so you can’t push files directly from PowerPoint to TFS
  • There is no Wiki
  • Build log is as “useful” as it was in TFS 2010
All in all it is a decent start and I hope that Brian Harry’s team will keep deploying often so every week will welcome us with something new.

Update

You actually can specify how many builds you want to keep. For some reason when I clicked on the number of builds to preserve it didn't change from a label into a drop down list.

Tuesday, 3 July 2012

Cloudcast show: the clouds are different down under

A few weeks ago I was a guest on the Uhuru podcast where I talked to Micheal Surkan about my experience with deploying to the Cloud and using Cloud based service providers. The discussion was focused on risks associated with using the Cloud and techniques to mitigate them.

You can listen to it here. It’s only 10 minutes :). Enjoy.

Monday, 15 November 2010

Microsoft Azure on-premises in 2011

It looks like Microsoft is filling an obvious gap and its customers will be able to deploy Azure on their own machines in 2011. This should significantly speed up the adoption of Microsoft Cloud offering as it introduces an additional checkpoint half way through the migration and lowers the risk of the whole process. If on top of that other Cloud providers deploy Azure to their own data centers then the risk will be even smaller because ”vendor lock in” stops being such a big problem. These are all good changes and I’m really looking forward to how they affect the global market of Cloud Computing.

BTW I recently watched an interesting presentation by Chris Read from ThoughtWorks where he focuses on the Cloud from the Operations perspective. One of the takeaways that very often is not obvious to people is that there is no need to fire the infrastructure guys. You simply give them different, more creative tasks :)

 

Sunday, 20 June 2010

CloudCamp is coming again to Sydney!

This time you need to take half a day off to attend it but I believe it’s well worth your time and I really enjoyed the previous camp.

Saturday, 13 March 2010

Cloud computing enables and forces us to do proper engineering

Very often testing is a second class citizen in the IT world. It’s not like software is not tested at all but it’s far from being perfect:

  • the testing environment is nothing like production environment
  • performance testing is nonexistent

This leads to problems that you can observe only in production. The process of fixing those kind of issues tends to take a lot of time because most often developers have very limited access to production and the set of tools they can use for debugging is limited. A permanent testing environment that matches production is very expensive and that’s why businesses take risk and deploy applications to production without proper testing. Very often they are unlucky and the price they pay is much higher than the price of proper testing. It’s like a mortgage, you get a lot of money quickly but then later on you have to pay the interest. Nothing is for free. With Cloud computing this is no more such a big problem. If your production deployment requires 50 servers then you can provision a testing environment which looks exactly like production within minutes. What is more once your are done with testing you can simply get rid of the whole environment. But this sounds like a lot of effort. Doesn’t it? Well, that’s true only if the whole process in manual. If it’s automated it’s not a problem at all. You can write your own scripts or use services like Right Scale that will help you with this. The point is that the use of Cloud computing forces you to automate your software development processes which is good. The same applies to performance testing. You can setup a testing lab only for the duration of a test. You can read here how MySpace leveraged Cloud computing to make sure it can handle 1 million of concurrent users.

I’m sure everybody heard at least once that scaling applications in the Cloud is easy. As you can expect this is not entirely true. It might be true in the marketing world though :). If you simply move your application from your own data centre to a Cloud there is a good chance that it will be much slower and less reliable. Why? Most Cloud providers offer you a few predefined server configurations that you can choose from. What is more most of them are virtual servers. This means that you don’t have any control over the hardware the application will run on. If the Cloud provider can’t match your existing setup then there is a good chance the application will be slower. Even if you manage to get enough CPUs and RAM you might still suffer from slow disk IO and the fact that the machines are less reliable than you would expect. You can read more about that here. The bottom line is that that you can’t expect the application to simply run unchanged in the Cloud.  One of the ways of aligning the application with the Cloud is making sure that it can run on multiple servers at the same time. This basically prevents you from building monolithic systems.

Security is another topic that tend to get very little attention. The reason is that there is an implicit assumption that the application will always run locally thus nobody from outside will have access to it. Obviously this is a fallacy and a huge security hole. Nobody can see it (or everybody can hide it) because it’s implicit. Without addressing this problem you can’t really move your application to the Cloud which forces you to take care of it. Cloud computing makes a lot of things very explicit which is a very good thing. There are way too many secret handshakes and implicit assumptions that we take advantage of to build applications nowadays. Cloud Applications Architectures deals with security in the Cloud quite extensively. It’s a good book that is a bit outdated and a bit too much focused on the Amazon Cloud but still worth reading.

Disaster recovery is very similar to load testing. Everybody knows that it’s needed and everybody has a plan how to do it but the plan never gets executed because it takes way too much time and resources. Again, Cloud computing makes it cheaper and easier to do. What is more you get more options. You can start with a plan that deals with failures of a single server and extend it, if it’s required, to procedures that can deal with data centres or even whole countries being offline.

As you can see you can gain a lot from Cloud computing but it doesn’t come for free and more than likely you will have to redesign your applications and rethink your processes to make sure you can take full advantage of what Cloud computing has to offer.

Tuesday, 1 September 2009

CloudCamp - a bunch of loose thoughts

I know more or less what Cloud Computing is but until recently I still struggled to figure what it is good for. That’s why I decided to attend CloudCamp at Google’s Sydney Office which is a Cloud Computing event focused on sharing thoughts in a form of open discussions. The presenters were there just to start conversations and finish them soon enough to have a few beers afterwards :). I participated in two sessions(Scaling Applications in the Cloud, Cloud Computing from business perspective) and each of them taught me something different.

The sexiest part of Cloud Computing is its promise of scalability. The funny thing is that most of the people will never need it. If I’m not mistaken StackOverflow handles 1 million of hits a day running on a beefy DB server and two medium size Web servers. Sure, it depends what your app does but in most cases thinking a lot about scaling issues before you actually face them is a waste of time. It’s better to spend that time on making sure your app is successful. This of course doesn’t justify complete lack of design and messy code. It’s a matter of striking the right balance. If you know for fact that your app needs to handle huge load then it makes sense to design for it upfront. But again, if there is one problem that startups are dreaming of it’s the load problem :).

One of the presenters mentioned that hosting of a regular WordPress based site in the Cloud is 7 times more expensive than regular, dedicated hosting. The cloud seems to be good for apps which resource utilization is either low or variable. If it’s low then it means that by hosting it yourself you pay for resources that you don’t take advantage of. If your utilization is high it might not make sense to move to the Cloud because the Cloud Computing providers charge you for resources (you will use the same amount) and additionally for promise of more resources when you need them. If you have to handle spikes Cloud Computing might be the way to go as you don’t want to buy bunch of servers that you use only a couple of weeks a year. In other words the key to successful migration to the Cloud is to know your application capacity.

A few people mentioned that they use the Cloud as their stress testing environments. This actually makes a lot sense because you can quickly provision a lot of boxes, do your testing and then discard them for a very reasonable price. In general if you need to perform some kind of activity that is either temporary or repetitive you might want to consider doing it off your premises.

Another presenter said that the Cloud Computing price should drop in the near future because more and more people are using it and it might become a commodity. Someone compared Amazon pricelist to the price lists of mobile network operators. At the beginning the price list was simple to attract customers. The more people use the service the more complicated the price list gets until the point when a regular user is totally lost and the service provider can make more money off him/her then before. It is an interesting theory and definitively true with regards to at least some mobile network operators. I still can’t figure out why 50 AUD worth CAP converts into 150 AUD value :).

An employee of of a big hardware vendor mentioned that some big Cloud Computing providers are working on a standard that will let people to move their apps from one provider to the other. I suppose they figured out that being locked to a particular provider is not what business is looking for.

From business perspective IT infrastructure is a cost. If the cost can be lowered by moving it to the Cloud that’s great. It’s not like business is mad about Could Computing. If a bunch of monkeys were good enough and cheaper than on premises IT infrastructure they would go for it. IT is a tool. So far the best one and let’s try to keep it that way :).