Wednesday, 16 May 2012

Entity Framework migrations don't support parallel development

EF migrations borrowed a lot of ideas from Ruby on Rails migrations and this a good thing. The bad thing is that EF treats database as the source of truth. Unpleasant side effect of this approach is that only one developer can work on database changes at a time.
It seems that every time EF creates a DB migration it stores some kind hash of the current database schema which is then used before to the migration is actually executed to make sure that the database is in the right state. This breaks down very quickly when more than 1 developer works on the model. Sample scenario:
  1. Dev1 adds migration for a new property called Master  and pushes the change.
  2. Dev2 adds migration for AnotherDev property and applies it to the database
  3. Dev2 pulls Dev1’s changes
  4. Dev2 tries to run the app and gets an exception.
  5. Dev2 needs to delete his migration and rollback corresponding changes in the database
  6. Dev2 applies Dev1’s migration
  7. Dev2 adds again migration for AnotherDev property
  8. Dev2 is ready to push his changes unless someone else introduced a new migration in the meantime. In such case Dev2 goes back to step number 4.
I’ve put together a simple app on GitHub that reflects this flow. Each commit is meant to represent a single step.
EF has a chance to become a decent ORM with even better tooling but before this happens EF team needs to understand that developers know what they do and prefer to have more than less rope at their disposal.
I’m happy with the product to be safe by default but there should be a way of disabling those annoying limitations. Without this ability Scott Hanselman idea of LEGO size building blocks is….well just an idea when it comes to EF.

12 comments:

  1. I can't really defend EF - it has a lot of problems - but there are at least workarounds for your approach.

    It's true that the hash of the migrations is DB-specific, and that's partly because they're automatic and assume the initial state of the DB when run; if you had parallel development on 2 separate DBs where the initial state of each differs, you would end up with potential conflicts.

    There are a few development approaches available to you to work around this sort of conflict:

    A) Devs work on their own feature branches in git, each with their own local DB. Their Migrations are their own and no one else's. When they merge, they don't merge migrations. With an EF Code First approach, all you need to merge is the Models anyway - the migrations are irrelevant. Once merged and built on a Staging server, you can dump Production to Staging for testing, generate Migrations for it, and verify they don't cause any harm before finally applying those Migrations to Production.

    B) Devs share a common development database and common Git origin/master. In this scenario the fact migrations and the DB are tied is fine, because everyone's on the same DB. Staging and Production will move in lock step with this Dev DB, so the Migrations will apply without having to be regenerated.

    What I like about scenario A is I as a dev can make a lot of model changes that turn out to be a waste of time, and throw away all the related Migrations that are just performing changes that will ultimately amount to nothing. The final Model that gets merged into Master is what's important, and my Migrations along the way are a bit of chaos I'm happy to set aside.

    I'm not saying the EF approach is perfect, just that you can do parallel development - just not in exactly the same way as someone coming from Rails would assume.

    ReplyDelete
  2. Hi Chris,

    Thanks for taking your time and coming up with possible workarounds. Unfortunately none of them appeals to me as what I'm looking for is simplicity, predictability and short feedback cycle.

    A) If I understand correctly the real integration happens on a Staging server which means problems are discoverd late in the development cycele.
    B) Can you really imagine 10 devs sharing the same DB? What if I want to experiment and I drop a column or even a table? Does it mean that the rest of the team can't work?

    Ruby on Rails isn't perfect but one thing the guys behind it understand well, is KISS.

    ReplyDelete
  3. Kaarel Nummert29 May 2012 19:10

    I've added a feature request for this to be fixed: data.uservoice.com/...

    ReplyDelete
  4. Hi Kaarel,

    I think i did the same but on a different site....hmmm. Anyway, thanks for that.

    ReplyDelete
  5. Hi pawel,

    I've worked long enough to have developed in environments where every dev had their own DB, and ones where every dev shared several DBs - usually dev, test, staging, and production. Both cases were pre-EF. There are scenarios where, for example, the size of the data necessary for effective development is too large to fit on a developer's machine, so a big dev db or cluster is used instead.

    Each dev having their own DB lends itself to fast iterations. I do question whether the method you refer to in Rails is the simplest approach. Development often involves experimentation and walking down a path that turns out to have been the wrong one, then going down another. As that proceeds, the migrations to and from can be complex and can require manually written up and down code to accomplish. In a parallel development scenario under a shared schema, this can require awareness of other feature development that may be otherwise irrelevant, meaning devs spend more time getting migrations straightened out than they need to - in some cases only to check in more migrations walking in a different direction.

    By deferring model change integration to an Integration phase, experimentation by devs is easier and some manual migrations can be avoided entirely.

    I understand wanting it to work like Ruby on Rails, especially when a lot of the wording and overall design is so similar, but it isn't necessarily a worse process.

    If you're looking for chinks in the armor, I find EF's slow speed and EF5's forced upgrade to ASP.Net 4.5 more troublesome.

    ReplyDelete
  6. Chris,

    I'm not looking for chinks and I wrote the blog post because I strongly believe this is a massive problem.

    If work is split properly then parallel development is rarely a problem. I've worked on projects with shared databases and I never wasted more time. Imagine having a team of 10 devs who can't work because someone dropped a stored procedure. It's an extremely frustrating experience.

    And again, I'm not saying that the current behavior should be removed but we need to have an option to disable it.

    ReplyDelete
  7. We rarely create a migration w/o needing to change the generated migration. Scenario A doesn't account for that nor creating a simple schema-less data migration.

    As far as I know this isn't an EF limitation, it's a up/down migration problem. I agree that you should be to disable it and go with a forward only migration path.

    ReplyDelete
  8. A simple console command to reorder two parallel migrations (setting the start state of one to the final state of the other) would be a relatively easy fix (given the two migrations are compatible of course)

    ReplyDelete
  9. Yes this seems to be a huge problem. I do not use automatic migrations because I don't want devs removing a property from an object and oops, data loss in production. So I use forced migrations. You change the model, you deliberately add-migration and make sure the behavior is what you want. But when there are multiple developers, your migrations don't work for each other if both devs are making migrations. It's a serious pain.

    ReplyDelete
  10. Maybe the EF team could approach this by decoupling the db-stored EF data model from the individual explicit migrations, and just store a "validated" data model in the DB after confirming the schema is compatible with the in-memory data model, via a comparison to the physical schema. And then, when the map changes, have EF recognize that incompatibility and require a migration to be run (automatic or explicit) and if the resulting physical schema is a superset of the EF map, then store that new EF map and have an optimized startup until the next evolution of the EF map.

    As I write this, I think this is kind of what happens when working entirely under automatic migrations. However, I'm working with explicit migrations, based on the theory that I want developers aware of the changes they are affecting on the DB, and that I believe the process by which DBs are evolved is more consistent across environments (developers, integ, test, stage, prod) with explicit migrations. So, maybe it is just that explicit migrations might benefit from being handled differently. But, then I imagine it might be harder to mix automatic and explicit migrations over the course of time.

    So, I'm hoping the EF team gives this more thought and sees a clear path forward. I gave my votes to the EF feature request item referenced above (http://data.uservoice.com/forums/72025-entity-framework-feature-suggestions/suggestions/2886670-support-for-parallel-development).

    ReplyDelete
  11. Just adding a note for people who may or may not read this... I have come up with another 'work around' pulling some code from Entity Framework. If you can handcraft your database context to reflect point in time changes, you can generate your own hash values, and shove them into old migrations.

    The hash value in the .resx file in a migration is just a hash of your current model. So all you need to do is generate a new DatabaseContext for that 'migration' (your changes plus other changes you possibly merged in), Use the EdxWriter in EF to build a currentModel, then convert it to a base64String using gzip.

    If you do all this, you can modify old migrations to 'sequence' them, instead of EF exploding and having to re-create them everytime...

    ReplyDelete
  12. Thanks for one more option. I wish we didn't have to use these workarounds.

    ReplyDelete