Friday 1 July 2016

Software Engineering : My History with Revision Control (Issues with Git)

I'm sure most of you can tell I'm one of those developers whom has been using Revision control systems for a long time... So long in fact, that I wrote a program for my Atari ST; whilst at college; which would span files across multiple floppy disks and use a basic form of LZH to compress them.

Later, when I graduated I worked for a company using a home-brew revision control system, imaginatively called "RCS".  Which basically zipped the whole folder up and posted it to their server, or unzipped it and passed it back to you, there was no way to merge changes between developers, it was a one task, one worker, one at a time system; almost as lacking in use as my floppy based solution from six years prior.

During my years at university, revision control was not a huge issue, it was NEVER mentioned, never even thought about.  Yet today we, sometimes happily, live in a world where software engineers need to use revision control.  Not only to ensure we keep our code safe, but to facilitate collaborative working, to control the ever growing spread of files and the expanding scope of most all projects beyond the control of a single person.

Now, I came to using professional grade revision control with subversion, in early 2004.  I think we were a very early adopter of Subversion in fact, and we spent a lot of time working with it.

If you've ever taken a look around my blog posts you will see Subversion is mentioned and tutorials exist for it all befitting for nearly twelve years working with it.  And unlike the comments made by Linus Torvalds I totally believe Subversion works, and works well.  It is not perfect, but I find it fits my ways of working pretty well.

Perhaps after twelve years my ways of working have evolved to adopt subversion and vice versa, but whatever the situation, I'm currently being forced  down the route of using git a lot more.

Now, I have no issues with git when working with them locally, ALL my issues are with using git remotely, firstly the person whom (in the office) elected to use git, started off working alone, he was creating a compiler project with C# and so he just had it all locally and used Visual Studio plug-ins to push to a local repo, all was fine.

I've used git with local repos without problem.

All the problems come with pulling and pushing, with remote, and controlling that access.  Git intrinsically fails to protect the access to the repo easily, relying instead on the underlying operating system.  Which is fine, when you have a controlled, and easy to manage user base as with a Linux server, however, with the minefield of integrating with Active Directories, domains and whatever on windows based infrastructure nothing but problem comes up.

The next problem I've had with Git has been the handling of non-mergable files.  We have lots of digital files, movies, sounds and plenty of graphics.  As such we've had to work around git, by having people work on files one at a time, and to cross reference which files they are responsible for.  With an art crew of five people, this means a flip chart or white board is constantly listed with the media files, and someones initials are next to it, just to help control access.

"Surely git should be able to lock these files", they constantly cry.  No, how can it, how can a distributed control system manage locks across five or more repos's which are not talking to one another, and if you did elect one to be considered the master, how do you then transmit out to the passive clients every time you lock or release a file?  You can't, the artists would each have to remember to pull, or shout to each other to pull now!  It simply doesn't work.

And as a way of working the white board is pretty poor, but it's all we have right now.

The next problem we had was the massive amount of disk space being used by the repos.  We boot our machines off of very small (128GB) drives, then use either NAS or SAN for our main storage.  This was fine, and efficient, and critically it was all well backed up on the infrastructure we use, and it worked for twelve years with subversion.  However, with Git our huge files are constantly being snapshotted, this growth in the size of the overall repo is replicating files over and over and over.

In short, despite someone else, and the world at large turning its back on Subversion, we here in my area are strongly drifting back to Subversion.

Trouble is, it feels as though we're swimming against the tide, despite all these slight deficiencies in Git, the over all organisation; and even external projects I'm working on; are pushing Git.  Torvalds himself calls people still working on Subversion "brain dead".  But has he thought about the short comings?  Or these case-studies we can give where subversion is a better fit for our working style?

Above all this wrangling internally has been my problem expressing our situation with Git to both the initiated and uninitiated.  When talking to those advocates of Git, all sorts of acronyms, actions and comments are made "use git this", "use git that".  The problem being, there are something like about 130+ commands in Git, that's a huge amount of things you can work with.  But, we can break down what we've done as "git init", "git checkout", "git add", "git commit", "git push", "git pull" and "git status" (as I've said merging utterly failed, so I'll gloss over that right now).

Given this huge scope of possible usage, and such a small exposure experience it's hard to put words against why things were not a good fit with Git, the initiated always seem to argue "you didn't give it a good crack of the whip".  But we don't work in an environment where we can try one thing and then another, it's an old working structure, which has evolved over time, people are used to it; and I'm nearly 40 yet I'm the youngest guy here!  Training those around me in new ways of working is very much an uphill struggle.  So, when introducing something as alien to their mindset as Git, it was always a loosing battle.

To express this to the uninitiated is even harder, they don't know what an RCS does, nor what we mean by centralised or distributed control, they just want to see our work kept safe, and our work to be released to the customer.  Gripes about Git and Subversion make no in roads with them, they're just unimpressed when you explain that these solutions are both open source and have no support.  The fact that they're free has been wildly ignored, yet I could - for the price of the support contract of another system here - easily buy and operate a whole new SAN just for our needs!

Lucky for me though, after struggling with this issue, I ran across Peter Lundgren's post on the same topic, of expressing what's wrong with Git.  He doesn't advocate Subversion, or anything, over Git, he just lists the problems he had with Git, and he crosses much of the same ground I have had to.

No comments:

Post a Comment