Data Transformations

2021-02-15

A lot of programming can be seen as data transformations.

To illustrate the point, consider this interview question:

Interview question tweet

Typical solutions look like this one by David Beazley.

David Beazley's answer

The reason why this question is slightly tricky is that you have to manage two things simultaneously as you loop through the data. You have to keep track of the current character and the number of continuous characters you have seen as you go though. There are some decisions to make as to how to best store the data.

If we break the problem down, we can actually perform the task in a different way that sidesteps the trickiness. Let’s treat this problem as, yes, a series of data-transformations. Where obviously each data transformation will be much simpler in itself than the one larger solution ably provided by David Beazley.

To prove how much we gain by doing this, I will re-pose the original question as a series of questions each challenging us to perform one component part of the problem. After each question, I will write my answer in my favourite programming language, co-incidentally the one I created, FFS Script.

Fake interview tweet 1

Interview question 1 fake tweet

This “interview question” is so simple as to be hilarious, 8 minutes to turn a string into a list?

1	const strToList = split('');

Fake interview tweet 2

Interview question 2 fake tweet

This is more difficult, the meat of the problem.

const lastChar = do([last, or(''), first]);
const gatherOnce = (l, c) =>
  c == lastChar(l) ? append(last(l) + c, slice(0, -1, l))
  /* otherwise */  : append(c, l);
const gatherSame = reduce(gatherOnce, []);

Even still, we break the problem down into smaller, easier problems. Here we can treat the problem as a single operation (gatherOnce) and allow reduce to worry about the iteration.

Fake interview tweet 3

Interview question 3 fake tweet

Again, the question is laughably easy.

1	const countLengths = map(juxt([first, length]));

Finally, to solve the interview question we just need to sequence our three data transformations in order. That’s also very easy to do.

const rle = do([
    strToList,
    gatherSame,
    countLengths,
]);

Not only have we solved a tricky problem with ease, something else quite odd has happened.

Imagine that our interviewer then asks us what we would do to prove that this solution works and will continue to work in a larger piece of software. The answer is to write unit tests.

Unit testing part 1

Let’s return to my solution to the first re-posed fake question:

1	const strToList = split('');

The thing is that split is provided to us by the language. We trust that it works, no sensible person unit tests library code that comes with the language. All we are doing is passing it some data, an empty string. We can fire up an interactive prompt and check that it works but that’s all we ever need to do. Nothing can change in the rest of the code that will break this – it’s not possible to re-define split and it doesn’t rely on any other code in any way at all. This does not need a unit test and not only that it would be silly to write one. All my career I have believed that code needs to be tested and yet this does not.

Unit testing part 2

The solution to the second re-posed fake question is more complex:

const lastChar = do([last, or(''), first]);
const gatherOnce = (l, c) =>
  c == lastChar(l) ? append(last(l) + c, slice(0, -1, l))
  /* otherwise */  : append(c, l);
const gatherSame = reduce(gatherOnce, []);

There are three things here that we could write tests for. The first of them is lastChar is using four things but each of them are provided by the language: do, last, or and first. This is the same situation as in the previous solution and no tests are appropriate. The second of them, gatherOnce contains the meat of our solution and bears unit testing since it contains some actual code.

1
2
3

assertEqual('Zero data state', ["A"], gatherOnce([], strToList("AABB"));
assertEqual('Same character', ["AA"], gatherOnce(["A"], strToList("ABB")));
assertEqual('New character', ["AA", "B"], gatherOnce(["AA"], strToList("BB")));

The third part, gatherSame, is using code we just unit tested and reduce which is, again, provided to us by the language. Again, we can call up an interactive prompt to make sure gatherOnce works with reduce but that’s all we need to do. No unit tests are appropriate.

Unit testing part 3

Returning to the solution for the final re-posed interview question:

1	const countLengths = map(juxt([first, length]));

Again, we use four things, map, juxt, first and length all of which are provided to us by the language. Once we have used an interactive prompt to run data through it, there’s nothing more to do. It doesn’t make sense to write persistent unit tests for this.

Unit testing part 4

The code to join these transformations together is this:

const rle = do([
    strToList,
    gatherSame,
    countLengths,
]);

We trust do as it is provided to us by the language. We know the three data transformations work and we know do works, therefore we can reason that rle works. Again, it would be silly to write a unit test. We may have a typo, an omission or an ordering problem but that can be established by calling it in an interactive session.

Possible objections

You could object that countLengths might change in the future and therefore rle might stop working. On this basis you should test rle to make sure that it is not broken by something it uses changing. A possible scenario might be that some other code finds countLengths useful and uses it, subsequently it might become apparent that countLengths doesn’t quite meet the demands of the new code and is changed so it does, breaking rle.

I don’t think this reasonable in practice because countLengths does essentially just one thing, and if that one thing is not what you want then you should write something that does do what you want and use that instead.

Data transformations and where you end up

To recap, by treating a problem as a series of data transformations, I managed to break down the problem into smaller and easier parts. Not just a bit easier, so much easier that it is almost laughable to consider that each part is any kind of problem at all.

Breaking problems down is widely accepted as an essential principle of programming, therefore viewing problems as data transformations is helping you to do the right thing.

Each part is so small to be almost atomic, impossible to break or affect by code elsewhere. Some component parts are so small to be comprised of just things provided by the language. Of course, you should try things out to check they work but these two reasons show why it doesn’t make sense to write unit tests for much of it.

All things being equal, you should surely prefer to follow widely accepted principles of breaking down your code into simpler parts. The typical answer to the original interview question often does not because people think (incorrectly in my opinion) that the solution is not complex enough to require it. Similarly, you should surely prefer to write code that 1) does not require testing, 2) re-uses existing code and 3) is immune to changes elsewhere breaking it.

Maybe the amazing thing is that my solution is not the typical one.

#Programming #FP