воскресенье, 18 января 2015 г.

Simple architecture harder than complex

I want to write about the architecture of my latest commercial software project I developed (http://iseepro.ru/) and I want to do this in storytelling manner. The reason I chose a narrative style is that it is not the architecture itself that is valuable (there is nothing special about it as you going to see later), but the way we got to it, the decisions and considerations that led us to where we currently are. We were restricted on time and on budget (we had none of both), but nevertheless managed to come up with the best product available within the market (this is very subjective, until proven by others, forgive me). This is a story of simplifying everything to the very extreme.


Before I start, please check out statistics. The infamous Chaos Report published by Standish Group in 1995 revealed horrible numbers - 84% of IT projects considered to be a failure. The experiment involved managers of all levels from several thousand companies throughout all industries involved in IT activities. Well, 1995 is almost Middle Ages, we can only feel sorry for IT dinosaurs. Technologies made a tremendous progress with time both in complexity of tasks solved and in easiness of adoption. We’d expect to have drastically different numbers nowadays, right? IBM have published results for a similar survey conducted in 2008. The numbers were not that promising either - 60% of projects failed in that or another way. Imagine, with .NET 3.5, Java 6, RoR 2.x, with all that goodness we failed almost every second project. Seems like the problem is not the technology alone.  The main reason why projects fail is the lack of focus (McKinsey, link). It is the lack of focus on business problem that leads to over-complicated solutions in a desperate rush for superfluous qualities. This is why we break the schedules and get out of budget.


Having this in our minds we started our Cash Control system development.
Cash Control systems are used to mitigate money loss risk at points of sale. They help fight internal and external fraud in retail.
The appliance is made of a CCTV system integrated with a set of analytical tools operating on devices’ event streams.
The system is relatively simple and functional requirements can be captured in these three simple bullet points. The system should be able to:
  • mix and correlate video stream with device event stream, both in real-time and afterwards;
  • match risky behaviour patterns in device event flow in real-time with customizable templates;
  • provide post-analysis tools and generate reports by cashdesks, cashiers, goods and alerts found.


Non-functional requirements can be captured with a relatively short list as well. System should be:
  • open for integration with devices of different types:
    • cashdesks
    • petrol stations, etc.
  • open for integration with different CCTV systems. Even though our default package includes our partner’s system, our client may have already invested in another CCTV  appliance. In this case we provide them with analytics tool package, integrating it with CCTV system of their choice.
  • cross-platform;
  • scalable, as our clients may be as small as one shop with several cashdesks and as huge as a network of hypermarkets.


We set up our first goal to launch the system in a big hypermarket in Moscow in half a year. Please bear in mind that none of us left their primary occupation.


Can you imagine, how happy my inner architecture astronaut became, as he met this kind of challenge? Immediately, without a single moment of thinking, I got a message from stars with the overall system design. Nice, isn’t it? So here is the ideal world (or the world with lack of oxygen, it depends on how you see it) picture.





This solution is so fucking universal that it could install and administer itself, if only my inner architecture astronaut stayed without oxygen a little longer. It can serve any client, no matter how big it is. Could it be another way if you have SPA, nginx, Thin, Cramp, node.js, NoSql, Mapreduce, amqp, RabbitMQ?!


A couple of years ago I had a joy of attending QCon London. I spent two days in a row at my favourite real-life projects track. Should I mention, that by the end of the second day I had already had enough of MySQL to Postgres and vice versa epic changes? The last talk was performed by ex-googlers Triposo founders. And believe it or not, but I was rewarded for my patience by these guys. Their attitude seemed so different from others’. They didn’t bother building super reliable infrastructure, but used existing available infrastructure instead: they kept their knowledge base backups in Dropbox; had static content stored and served by Amazon S3; put dynamic settings to change applications’ behaviour into a Google docs spreadsheet. Em.. Whoat?! Google docs as a part of a system?! Believe it or not, but ex-googlers didn’t create a clusterized parallelized distributed highly available redundant infrastructure. They chose not to do it at all. Their capital expenses were huge enough to allow themselves two workstations under a kitchen table. That’s where all magic happens - these two stations generate and publish travel guides.
So what’s so special about these guys? They managed to build a successful business, because they had focused on their competitive advantage - extensible knowledge base, automatic guide builder and UX.


Our competitive advantage to be focused on is the most convenient analytics toolset which allows to solve our clients’ real life issues. Our major clients are retail market networks. Our first playground is a hypermarket in Moscow. Facts are:
  1. Real-time monitoring performed by guards is not the main use case by any means. The main scenario is alert post-analysis performed by a security analyst.
  2. Load is not evenly distributed throughout the day and has a certain profile. It grows slowly from morning till evening and reaches its peak at rush hours. It is almost flat around zero during nighttime. We may get a load increased by a factor of two or three at days before holidays and weekends.
  3. Retail infrastructure is mostly built on Windows;
  4. Our software delivered on-premise.


Data-layer.
There is a NoSql database drawn by my inner architecture-astronaut. I salute this choice, in general, as long as it at least shows that you have thought about your data and the best way to store that. But this is less and less true nowadays.
But why NoSQL? First of all, it is data volume. We’ve examined a cashdesk communication protocol and estimated our hypermarket to produce around 600К events daily or 200М annually.  With average message size balancing around 1 KB, it totals to ~600 MB daily, or ~200 GB annually for just one hypermarket. However, we should remember that several hypermarkets  united in a single network may generate an order of magnitude more events. NoSQL works well with huge amounts of non-relational data, as long as this data is easily distributable due to its nature. The second reason to opt for NoSQL is write performance. It is obvious, that write operations will prevail read operations in an aggregating system. NoSQL has a good write performance as long as these operations are as easy to distribute as it is easy to partition the data itself. And the last, but probably the most important reason to chose NoSQL, is the lack of schema. Initially we would prefer not to restrict a message schema as long as different protocols have different fields in messages. There are few messages with identical set of fields even within one protocol.
Considering the fact that we were going to need elaborate search instruments our best bet here would have been a document-oriented database, like MongoDB. To accommodate for a huge dataset we would have several instances holding several shards in a cluster, of course.
However, as I’ve already noticed, we were going to deploy our product on-premise and we had to take into account unskilled personnel. They can handle routine tasks like a database backup without problem, yet database cluster support is far beyond their capabilities. That would have been a huge risk for the overall system reliability. Among other things, our client could invest money in a storage solution of their own. Chances are very high that it would be that or another RDBMS. You surely know that switching RDBMS is far easier than changing NoSQL db to relational, don’t you? But I could have dropped all these clever excuses in favor of one and simple reason - neither of us had a production experience with MongoDB. The end.



So, as you might have guessed by now, we chose to store our data in RDBMS - MySQL. As for the data volume - we decided to postpone this problem till we face it. After all, we could have not succeeded with this project at all.
Ok, so we now use relational storage, but what happened  to data? Is it relational now? We introduced canonical events model and started to translate any message arrived within any protocol supported into this model. Canonical events model effectively defines the data schema, we know the meaning of information we store and get rid of what we don’t care about. We threw every event message in one big table in denormalized form. We filled cashiers, cashdesks and goods dictionaries in parallel, duplicating data. Event itself can already include every goods or cashier attribute. Events table had no foreign keys to dictionaries, after all the fact that we used relational database didn’t bind us to normalize ad nauseam.


After we ran our product in trial for several months, we noticed that our queries had gotten slower. Oh gosh, we forgot that our dataset increases indefinitely.
After we analysed real processes, we found out that events older than three months are, basically, rubbish. Nobody ever goes that far to the past. We had to get rid of this garbage, but we had to guarantee constant system performance. I mentioned earlier, that we have a certain load profile with almost absolute zero during the night. We might have used that timespan to run cleanup tasks deleting hundreds of thousands of records. Sounds like a good idea, doesn’t it? There is one problem with this approach. Deletion is the most expensive operation in relational database. You certainly can run the query with hundreds of thousands of rows to be deleted, but it doubtfully would complete in a reasonable timeframe. There is an easier way to clean up a table - to truncate it. But could we truncate parts of tables? Sure we can, by partitioning the table! Where is my cap badge?
We partitioned our events table by months and set up a periodic task to truncate a four months old partition every once in a while. We killed two birds with one stone with this trick. We got almost constant query performance, as queries always operated on 1-3 partitions with the majority of them reading data from the latest one, and we solved the data amount problem.


Reports.
Short note about reports.
After we got rid of NoSQL we lost a possibility to run Map-reduce tasks on our data easily. However, we were still able to aggregate data for reports of course. There is a standard approach to aggregate data in relational databases areal: OLAP. There are several solutions available out there to start analysing your relational data in many dimensions, free and paid. Yet after we analysed the structure of information we had, we came to conclusion that our database had almost a classical star ROLAP scheme. By adding facts tables that referenced dictionaries, that were dimensions by coincidence, we easily solved that not so easy task. We forked activewarehouse gem and adjusted it for our needs. We started with baked in ETL dsl to extract and transform events to facts and dimensions initially. We ran these scripts on periodic basis, yet it proved to be very error prone. Again, considering the fact that we already had ROLAP scheme at place, it suddenly dawned on me, that we could append facts at the very same moment we inserted events in our database. We introduced another subscriber responsible for just that task, and that was it. As a side benefit we got immediately actual reports. Easy-peasy.


Comet.
I won’t elaborate on why did we change async web server and async application framework in favor of Rails and Mongrel. I’ll just mention, that Rails is a really convenient and powerful tool for rapid application development, provided the application is as simple as ours.


Comet is a browser to server communication pattern incorporating a permanent connection used to push data changes from server to browser to achieve near real-time behaviour in Web.
I once led a project where we had to communicate with millions of connected devices and propagate changes to web layer in near real-time fashion. We had found a powerful and scalable tool to solve this problem. The system was built with Erlang XMPP server software MongooseIM. This marvelous piece of software allowed us to handle extensive XMPP traffic along with BOSH traffic. It also holds a transient state for web layer. Prodigy.
Should I mention, that when I hear words ‘comet’, ‘device’, ‘real-time’ together in one sentence I first think about Mongoose? This case was no exception. However, considering the fact that no hypermarket possessed a thousandth part of devices this server was capable to serve (either alone or in cluster), I decided to hold this secret weapon and opt for equally cool WebSocket and node.js technology.
WebSocket is a binary protocol, part of html5. Its main advantage is that it is a standard, rather than a proprietary technology built as a browser plugin, it is supported by most of modern web browsers. node.js is a very fast webserver with evented processing model built on top of libev. Unfortunately, we immediately found out that a Websocket is supported by IE starting with version 10, whereas our clients had IE9 installed in the majority of cases. All in all, Websocket was abandoned pretty fast, even before the first prototype was written.
With devices’ communication protocol at hand I managed to calculate the load that a huge hypermarket (huge hypermarkets have up to 50 cashdesks here in Moscow) would have generated. I got an exaggerated number of 30 events per second. It means that our comet server would have to distribute 30*Nusers events per second.
If you remember the facts I mentioned in the very beginning, it was said that there was only one case when monitoring has been involved - security supervision. Even the biggest markets hardly have more than one security desk. That means Nusers for monitoring equals 1. Moreover, if you consider the fact that you hardly ever see all the cashdesks working at the same time (the latests Ruble crisis is not a good example here), and the fact that one can hardly perceive more than 7 objects simultaneously, you’ll get a workload similar to something like this: 30*⅔*⅙ - this would be the real load on our comet server to get monitoring events.
Keeping in mind that the webserver was the least loaded part of the system, it would easily serve these requests, even with the primitive polling.



After a short prototyping iteration we ended up with solution incorporating EventMachine thread inside the web application. EventMachine hosted an amqp subscriber, consuming certain cashdesks’ events from transient queues, pushing them to memory buffers, that got popped out by a regular Rails controller serving Ajax requests issued by a client. There is one small thing to note here, we optimised the traffic a bit by doing long polling. This is how we got rid of WebSockets and node.js.


.Net.
This is controversial - throughout the story I was getting rid of system components as much as possible, and here I tell you that we brought a whole enterprise framework with us. What for?
Well, first of all, we didn’t carry it with the solution. Every modern Windows Server OS has .Net framework preinstalled as a part of a system. We used .Net to develop our Watchdog service. The fact that we deployed on-premise means, that we had absolutely no chance to remotely administer our solutions. The system had to be rock solid, which was easier to achieve with native tools. This is why I wrote it with C#. All in all, Watchdog was not required to be portable across various platforms as other services were. It was implementation that was target OS dependent, and if we used a custom written code to automatically maintain our processes on Windows it didn’t mean we were supposed to take the same approach with other platforms. Something like crontab would solve the problem on *nix systems.





ActiveX.
We were “lucky” to have ActiveX on board as all our clients used Windows workstations, and all video systems we had had integrated so far, provided ActiveX API to access video programmatically. So, we didn’t write a universal facade to video systems providing streaming and event publishing features. Instead, we sacrificed cross-platform features for the speed of development. This decision, together with many other small decisions we made, allowed us to get things done on time and on budget. After all, there were no guarantees that we would succeed, so we wanted to move fast with short sprints.


Hardware
I bet you won’t guess the hardware we needed to host the solution.
It were one or two average size servers, depending on the availability requirements our clients had. Yet there is not much you can do about availability, provided that you can’t ignore regular power outages on sites.


Conclusions
We managed to deploy our first release in production after 6 month of development with 2 developers and one designer working part time. We supplied 90% of features required to compete on market.


The reason we managed to get things done is that we focused on business features rather on technology avoiding valueless experiments.





The end.

воскресенье, 9 ноября 2014 г.

Highload 2014

If you ever happened to hear about the biggest Russian IT conference Highload++, than you might be interested how it was. This is how it was for me this year: Highload 2014

I have deliberately chosen an external publishing platform for their beautiful storytelling format. Give it a try - Tilda.

Cheers.

вторник, 28 октября 2014 г.

Why REST is not JSON over HTTP

Because REST is distributed Hypermedia. THE END. 
Modern Web Architectures.

Enjoy.

четверг, 24 июля 2014 г.

Teach your code to speak genteelly, nil is country speach

Imagine you ask a person something, but he just stares at you in response. Or even worse, you tell one to do what you want, and he seems to get your command, but turns his back to you and you can't tell if he did what you asked him to or not. Did you imagine it?
Congratulations! You have just got nil/null, whatever represents nothingness in your programming language, in response to a function call.

The same way you think a person who doesn't react appropriately to your questions or requests is an ass-head, the same way function returning nil is a moron-function.

I do know that there are cases when you have nothing to return, but this is either a Special Case or an exceptional situation. Let's consider two examples for query and command. Yes, to be genteel your code have to comply with CQS principle. 

First - query. To be honest whether a special case or an exception is better depends very much on a context. Let's consider two query categories: query for data (information, data structure) and query for behaviour. First case is easy, you just fill your data structure with whatever it turns it to an empty state if it has a meaning or you raise an exception if you can't fulfill a query. Easy peasy. The second case is a little bit more complicated. Lets say, you write a polymorphic query function which is supposed to return objects with state and behaviour that may be used to do something. As long as it is polymorphic it is to change a behaviour when you work with descendant classes i.e. return different objects with different behaviour, but not every descendant alters a default behaviour. Oh, and there is no default behaviour. Natural choice? - Nil. But if I put it another way: to do nothing IS a default behaviour, it becomes obvious that your best bet is a Special Case with no op strategy, rather then nil, completely nothing. 

Now a command. I'd separate two kinds of commands: those that you do not expect to return a result and those you do. The first type is naturally implemented by procedures. Some programming languages have a void return type built in. This is probably the only valid case to return nothing. However, if your language of choice has no built-in tools to mark the function as a procedure then treat all commands as those that return result. In this case you'd probably return nothing in case you've failed to accomplish request, which is the worst way to do. Rather use an exception to inform a caller that his request puts you into an exceptional case when you can't process a command request.

All in all it is very easy to be polite to those who use your code and to those who read it. 

To be little bit more convincing I'll again reference one of the best programming books ever: Clean Code, and a very good discussion.

And as if that was not enough already I'd quote Sir Charles Antony Richard Hoare:
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. [...] More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

P.S. Should I mention that passing nil/null as a function argument is even worse for very obvious reasons? 
P.P.S. I'd like to try a programming language without nils. Without nothing. You got it.

Cheers.

вторник, 20 мая 2014 г.

TDD and Rails are sister pills

Давеча, небезызвестный в одном коммюнити человек DHH разразился замечательным высером про TDD, который по его мнению ничто иное как артефакт, подобранный на помойке истории, а применение оного напоминает ему онанизм с элементами BDSM.

Многое было сказано в защиту TDD, больше даже не в защиту, а так, поржать

Но ведь, если предположить, что Zed Shaw хотя бы наполовину прав, то сейчас вся толпа, которая разинув рот внимает божеству, бросится бросать и начнет начинать все сначала. 

Всем давно понятно, что OOP это лекарство чуть ли не хуже болезни, процесс принятия которого нужно сопровождать пилюлями типа TDD, BDD, DDD, чтобы пореже обкакиваться. Но у этих пилюль свои побочные эффекты и для них нужны другие пилюли типа autofixture (это вам .Net камарады подгон), DCI (а это скорее нам Ruby парням, обломитесь диназавры =Р).

Впрочем, если лекарство разбавлять, то все не так уж плохо. К примеру, у Udi Dahan есть много оригинальных идей, однако они относятся к разработке систем, а не приложений (разница я надеюсь понятна). А так ли нужен OOP для разработки приложений, а тем более TDD?
Может быть basecamp все же не все приложения/системы на свете, и надо быть более openminded, а DHH? Как-то даже странно это спрашивать у человека, который насрал на OOP в пользу developer productivity, ведь чтобы срать грамотно, надо быть openminded. 

Cознательно возведя productivity в краеугольный принцип Rails, DHH поставил его в одну категорию с TDD. Они суть две пилюли с разным действием, и вместе они не работают, т.к. мешают друг другу. 

Ну вот собственно и объяснение. Удачи.

четверг, 27 марта 2014 г.

Rails vs OOP or is it Rails feat. OOP?

Please mind that this note is as opinionated as can be. I believe this is the only way one can speak about Rails as RoR itself leave no room for doubts.

I believe Rails was meant for small to middle web sites development, and never meant to be a foundation for something bigger. Just read 37signalsGetting Real book and you will understand that Rails is backed by a company with a no-grow philosophy. And so is Rails - it will never be as big as Java or .Net are. 

You can ignore this until you start develop a little bit more elaborate software. And this is where you have to get serious about OOP. But Rails provides no aid here. Instead, it leads you into trouble, because it puts too much pressure on you with conventions and especially with poor naming. (I am not trying to charge DHH with that, after all naming is the weakest skill in IT community of all times.) Thousands of people fall into problems of not understanding where they should rethink Rails conventions and deviate from them, which leads to heated discussions whether Rails breaks OOP and how to keep producing quality maintainable and testable code with Rails (fat models/skinny controllers attempt, writing true unit-tests with Rails, is ActiveRecord an anti-pattern, etc). Well the answer is - get to know some OOP theory. These are the problems OOP community have been solving for decades now and there is nothing really new about that. 

Oh wait, there is something new! Rails deserves special gratitude for whetting community appetite for least known DCI design pattern. 

Rails community experience lot's of agitation and controversy, and this couldn't be another way. The reason for this is that RoR emphasizes on developers' productivity at the first place rather then being dogmatic about design. Understanding this may turn us to be a little bit less dogmatic about Rails' usage.

More to read:
https://gettingreal.37signals.com/
http://railsoopbook.com/
http://objectsonrails.com/
http://www.leansoftwarearchitecture.com/
http://clean-ruby.com/
http://www.artima.com/articles/dci_vision.html
http://folk.uio.no/trygver/2008/commonsense.pdf

Good luck!

воскресенье, 4 августа 2013 г.

TDD adventures Part 1. Productive TDD from real life.

Following is a real-life demonstration of how effective one may be applying TDD to solve tasks under stress conditions. 
Imagine that you are limited on time, you are under stress and you have a task you've never solved before. Did I mentioned that you don't know the problem domain and your destiny relies upon your solution you'll be judged for? How would you tackle this problem? What would you say If I'd tell you to write a test first? Sounds crazy, eh? You have so many things to do and you don't want to waste your time on testing. Afterwards, maybe... First you have to write... What are you going to write first? Remember, you aren't familiar with problem. Write a test.

Alright, now the problem statement is as follows:
"Compare two CIDR ranges. 
The significant part of IPv4 Address is prefix
255.255.255.255/24 notation means that only the first 24 bits of the IPv4 address make up the network mask. The bits which are not included in the network mask are not important.
There are several ways to write a CIDR range
These representations are equivalent:
255.255.255.255/24
255.255.255.0/24
255.255.255.100/24
Comparing CIDR Ranges
Ranges can be in subset, superset, disjoint or equal to each other. Write a code comparing them." 

- "Fuck, I never knew what this CIDR notation ment. I should have read about it before. Alright, let's google it. Whoooaaa! I'll read through this for half an hour. What do I do? I've already applied TDD in day to day job, I'll give it a try. First things first, I'll write down what I know about the problem. This is going to be my test list."
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • Only the Prefix of the IPv4 Address is Significant;
  • A CIDR Range Has Multiple Representations;
- "That will do for the moment. Which one of them do I pick first? I should pick the simplest to implement it as fast as possible. The first one is an ideal candidate for Starter Test."

    [TestFixture]
    public class CidrRangeTests
    {
        [Test]
        [TestCase("23.45.67.89/16", 23, 16)]
        public void when_instantiated_parses_string_into_address_prefix_and_suffix(string range, byte expectedMostSignificantByte, byte expectedSuffix)
        {
            //arrange 
            //act
            var sut = new CidrRange(range);

            //assert
            Assert.AreEqual(expectedMostSignificantByte, sut.Prefix[0]);
            Assert.AreEqual(expectedSuffix, sut.Suffix);
        }
    }

- "Now that the test is written I am going to write the CidrRange class itself. Forgive me Kent, but I am about to jump over one step - I won't Fake It, I consider this code to be an Obvious Implementation and 'll write it straight away. I had lot's of choclate and I can keep things in mind and I have very short time".

    public class CidrRange
    {
        //auto-props for the sake of simplicity
        public byte Suffix { get; private set; }
        public byte[] Prefix { get; private set; }

        public CidrRange(string range)
        {
            //assert not null
            var parts = range.Split('/');
            //assert count 2
            var prefixParts = parts[0].Split('.');
            //assert count 4
            Prefix = prefixParts.Select(byte.Parse).ToArray();
            Suffix = byte.Parse(parts[1]);
        }
     }

- "Green bar, hooray. Time spent? 10 minutes. OK, whats next? The next simplest test to implement - the shortest way to the green bar, is the second one: Ranges with equal prefix and suffix are equal".


        [Test]
        [TestCase("23.45.67.89/16")]
        [TestCase("1.2.3.4/24")]
        [TestCase("172.84.26.128/16")]
        [TestCase("197.54.16.128/25")]
        public void when_prefix_and_suffix_are_equal_should_consider_other_to_be_equal(string range)
        {
            //arrange 
            var sut = new CidrRange(range);
            var other = new CidrRange(range);

            //act
            //assert
            Assert.AreEqual(sut, other);
        }

- "And again, I triangulate and replace fake with implementation in one step. Remeber, I had lot's of chocolate."


        protected bool Equals(CidrRange other)
        {
            return Suffix == other.Suffix && Prefix.SequenceEquals(other.Prefix);
        }

- "OK, running the tests gives me a green bar. Yeah baby. What do I have to complete?"
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • Only the Prefix of the IPv4 Address is Significant;
  • A CIDR Range Has Multiple Representations;
- "OK, now stop and think. The last one seems to broaden the equality equation for CIDR range. If I put it another way it may sound like the following".
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
- "Cool, I feel for the core of the solution. Now I'll think more about the former: Only the Prefix of the IPv4 Address is Significant, - I may rephrase it like the following:
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
- "OK, so now my test list looks like this":
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
- "Let's code!"

        [Test]
        [TestCase("197.54.16.128/25", "198.54.16.128/25")]
        public void when_significant_prefix_bits_differ_should_not_consider_other_to_be_equal(string leftRange, string rightRange)
        {
            //arrange 
            var sut = new CidrRange(leftRange);
            var other = new CidrRange(rightRange);

            //act
            //assert
            Assert.AreNotEqual(sut, other);
        }

- "Red bar. OK, obviously in order to implement desired behavior I can no longer compare bytes, I have to drill down to bits. What facilities does BCL provides for this? I remember BitConverter and also there is a BitArray class. A-ha! This is what I am going to use, 'cause I don't really want to ressurect my middle age address arithmetic skills. The algorythm is simple: Given the CIDR range 255.255.255.255/24 we know that only the first 24 bits of the IPv4 address make up the network mask, - it means I have to take a prefix, turn it into a bit array, take a suffix, turn it into another bit array starting from the most significant bit and perform a bitwise AND operation in order to get a network mask. After I got both masks, I have to perform an operation, that will tell me if there any difference between them. Well, luckily I had lot's of choclate, and it's obvious that I need to perform a bitwise XOR".


        protected bool Equals(CidrRange other)
        {
            return Suffix == other.Suffix && PrefixesEquals(other);
        }

        private bool PrefixesEquals(CidrRange other)
        {
            var leftMask = new BitArray(Prefix);
            var leftSuffix = new BitArray(32, false);
            for (int i = 31; i >= Suffix; i--)
            {
                leftSuffix[i] = true;
            }
            leftMask = leftSuffix.And(leftMask);

            var rightMask = new BitArray(other.Prefix);
            var rightSuffix = new BitArray(32, false);
            for (int i = 31; i >= other.Suffix; i--)
            {
                rightSuffix[i] = true;
            }
            rightMask = rightSuffix.And(rightMask);

            var result = rightMask.Xor(leftMask);
            return result.Cast().ToArray().All(x => !x);
        }

- "Hmmm, red bar. Something is wrong here."
After five minutes in debugger it appeared that I was creating bit array for mask based on prefix with bytes in reversed order.


    public class CidrRange
    {
        //auto-props for the sake of simplicity
        public byte Suffix { get; private set; }
        public byte[] Prefix { get; private set; }

        public CidrRange(string range)
        {
            //assert not null
            var parts = range.Split('/');
            //assert count 2
            var prefixParts = parts[0].Split('.').Reverse();
            //assert count 4
            Prefix = prefixParts.Select(byte.Parse).ToArray();
            Suffix = byte.Parse(parts[1]);
        }

- "Arghhh, red bar. I'm toast. Come on, keep it on, I've almost found the solution and I've got tests that will prove it. The broken test is simple - Starter test expects to see bytes in reversed order, I'll fix it in a twinkle".


        [Test]
        [TestCase("23.45.67.89/16", 23, 16)]
        public void when_instantiated_parses_string_into_address_prefix_and_suffix(string range, byte expectedMostSignificantByte, byte expectedSuffix)
        {
            //arrange 
            //act
            var sut = new CidrRange(range);

            //assert
            Assert.AreEqual(expectedMostSignificantByte, sut.Prefix[3]);
            Assert.AreEqual(expectedSuffix, sut.Suffix);
        }

- "Finally! Green bar. It took me some time to get here, if I wasn't low on time I'd better be going teeny weeny steps of Fake and Triangulate. OK, now the test list looks like the following".
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
- "Let's code."


        [Test]
        [TestCase("23.45.67.89/16", "23.45.68.00/16")]
        public void when_insignificant_prefix_bits_differ_should_consider_other_to_be_equal(string leftRange, string rightRange)
        {
            //arrange 
            var sut = new CidrRange(leftRange);
            var other = new CidrRange(rightRange);

            //act
            //assert
            Assert.AreEqual(sut, other);
        }

- "red bar, eh? I was supposed to get a green bar!"

After another five minutes in debugger it appeared that I've filled the suffix mask with wrong number of bits and took into account the least significant bits comparing network masks. Fix follows.


        private bool PrefixesEquals(CidrRange other)
        {
            var leftMask = new BitArray(Prefix);
            var leftSuffix = new BitArray(32, false);
            for (int i = 31; i >= 32 - Suffix; i--)
            {
                leftSuffix[i] = true;
            }
            leftMask = leftSuffix.And(leftMask);

            var rightMask = new BitArray(other.Prefix);
            var rightSuffix = new BitArray(32, false);
            for (int i = 31; i >= 32 - other.Suffix; i--)
            {
                rightSuffix[i] = true;
            }
            rightMask = rightSuffix.And(rightMask);

            var result = rightMask.Xor(leftMask);
            return result.Cast().ToArray().Skip(32 - Math.Min(Suffix, other.Suffix)).All(x => !x);
        }

- "A wonderful, beautiful green bar! OK, what do I have?"
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
- "Well, this is not the end, but this was the most interesting part. Following is what is left".
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
  • CIDR range is a subset of other range if prefixes are equal and the suffix is greater;
  • CIDR range is a superset of other range if prefixes are equal and the suffix is less;
  • CIDR range is a disjoint of other range if prefixes differ;
- "I'll take them one by one."

        [Test]
        [TestCase("1.2.3.4/24", "1.2.3.4/16")]
        public void when_prefixes_are_equal_and_left_suffix_is_greater_should_consider_left_to_be_a_subset(string leftRange, string rightRange)
        {
            //arrange 
            var sut = new CidrRange(leftRange);
            var right = new CidrRange(rightRange);

            //act
            var actual = sut.CompareTo(right);
            var expected = sut.Prefix.SequenceEqual(right.Prefix) && sut.Suffix > right.Suffix
                ? RangeIntersectionResult.Subset
                : RangeIntersectionResult.Disjoint;

            //assert
            Assert.AreEqual(expected, actual);
        }

- "Obviously there are more then two results of ranges comparison. There are four of them as follows".


    public enum RangeIntersectionResult
    {
        Equals,
        Subset,
        Superset,
        Disjoint
    }

- "And to compare my range objects I will need another method."


        public RangeIntersectionResult CompareTo(CidrRange other)
        {
            if (Equals(other))
                return RangeIntersectionResult.Equals;

            if (PrefixesEquals(other) && Suffix > other.Suffix)
                return RangeIntersectionResult.Subset;

            return RangeIntersectionResult.Disjoint;
        }

- "Green bar! Hooray."
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
  • CIDR range is a subset of other range if prefixes are equal and the suffix is greater;
  • CIDR range is a superset of other range if prefixes are equal and the suffix is less;
  • CIDR range is a disjoint of other range if prefixes differ;
        [Test]
        [TestCase("172.84.26.128/16", "172.84.26.255/24")]
        public void when_prefixes_are_equal_and_left_suffix_is_less_should_consider_left_to_be_a_superset(string leftRange, string rightRange)
        {
            //arrange 
            var sut = new CidrRange(leftRange);
            var right = new CidrRange(rightRange);

            //act
            var actual = sut.CompareTo(right);
            var expected = sut.Prefix.Skip(1).SequenceEqual(right.Prefix.Skip(1)) && sut.Suffix < right.Suffix
                ? RangeIntersectionResult.Superset
                : RangeIntersectionResult.Disjoint;

            //assert
            Assert.AreEqual(expected, actual);
        }
        public RangeIntersectionResult CompareTo(CidrRange other)
        {
            if (Equals(other))
                return RangeIntersectionResult.Equals;

            if (PrefixesEquals(other) && Suffix > other.Suffix)
                return RangeIntersectionResult.Subset;
            if (PrefixesEquals(other) && Suffix < other.Suffix)
                return RangeIntersectionResult.Superset;

            return RangeIntersectionResult.Disjoint;
        }
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
  • CIDR range is a subset of other range if prefixes are equal and the suffix is greater;
  • CIDR range is a superset of other range if prefixes are equal and the suffix is less;
  • CIDR range is a disjoint of other range if prefixes differ;
        [Test]
        [TestCase("197.54.16.128/25", "197.54.16.127/25")]
        [TestCase("205.00.00.1/32", "205.00.00.00/32")]
        public void when_prefixes_are_different_should_consider_left_to_be_a_disjoint(string leftRange, string rightRange)
        {
            //arrange 
            var sut = new CidrRange(leftRange);
            var right = new CidrRange(rightRange);

            //act
            var actual = sut.CompareTo(right);
            var expected = !sut.Prefix.SequenceEqual(right.Prefix)//this is not exactly correct specification, but it's OK for the sake of simplicity
                ? RangeIntersectionResult.Disjoint
                : RangeIntersectionResult.Subset;

            //assert
            Assert.AreEqual(expected, actual);
        }
  • CIDR range has prefix and suffix parts;
  • Ranges with equal prefix and suffix are equal;
  • when CIDR ranges with exact suffixes differ in insignificant bits, they are considered equal;
  • when CIDR ranges with exact suffixes differ in significant bits, they aren't considered equal;
  • CIDR range is a subset of other range if prefixes are equal and the suffix is greater;
  • CIDR range is a superset of other range if prefixes are equal and the suffix is less;
  • CIDR range is a disjoint of other range if prefixes differ;
- "Time? Wow, I still got some time left. I've manged to complete in under two hours"

This is a real life story disproving the myth of TDD nonproductiveness. Stay tuned as I'll continue my adventures in TDD.

Next time I will refactor this code in a quiet and peacful manner.

P.S. You can find code here.