Sound advice. re: Write Your Tests I've never been successful with this. Sure, w...

vickychijwani · on May 30, 2017

> Semver is marketing, not engineering.

I agree with many of your points, but that casual dig at semver is unwarranted and reveals a misunderstanding of the motivation behind it [1]. Semver defines a contract between library authors and their clients, and is not meant for deployed applications of the kind being discussed here. Indeed, the semver spec [2] begins by stating:

> 1. Software using Semantic Versioning MUST declare a public API.

It has become fashionable to criticize semver at every turn. We as a community should be more mindful about off-the-cuff criticism in general, as this is exactly what perpetuates misconceptions over time.

[1]: https://news.ycombinator.com/item?id=13378637

[2]: http://semver.org/

specialist · on May 30, 2017

Build numbers, internal accounting process, engineering.

Semver, outsiders view, marketing.

Two different things, conflating them causes heartache. Keep them separate.

erikpukinskis · on May 30, 2017

> re: Write Your Tests, I've never been successful with this ... I wouldn't bother to write unit tests etc for code that is likely to be culled, replaced.

I think you misread the author. He says "Before you make any changes at all write as many end-to-end and integration tests as you can." (emphasis mine)

> My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output.

That's an interesting strategy! Similar to the event logs OP proposes?

rzzzt · on May 30, 2017

Sounds like approval testing: http://approvaltests.com/

You capture the initial output from the original code, then treat this canonical version as the expected result until something changes.

deathanatos · on May 31, 2017

The thing about end-to-end and integration tests is that at some point, your test has to assert something about the code, which requires knowing what the correct output even is. E.g., let's say I've inherited a "micro"service; it has some endpoints. The documentation essentially states that "they take JSON" and "they return JSON" (well, okay, that's at least one test) — that's it!

The next three months are spent learning what anything in the giant input blob even means, and the same for the output blob, and realizing that a certain output in the output comes directly from the sql of `SELECT … NULL as column_name …` and now you're silently wondering if some downstream consumer is even using that.

specialist · on June 1, 2017

Belated reply, sorry. Been chewing.

Methinks I've prioritized writing of tests, of any kind, based on perceived (or acknowledged) risks.

Hmmm, not really like event logs. More of a data processing view of the world. Input, processing, output. When/if possible, decouple the data (protocol and payloads) from the transport.

First example, my team inherited some PostScript processing software. So to start we greedily found all the test reference files we could, captured the output, called those the test suite. Capturing input and output requires lots of manual inspection upfront.

Second sorta example, whenever I inherit an HTTP based something (WSDL, SOAP, REST), I capture validated requests and generated responses.

debuggest · on May 30, 2017

Pinning tests can be helpful for scary legacy code! http://rick.engineer/Pinning-tests/

rileytg · on May 30, 2017

much like https://github.com/github/scientist

marco_salvatori · on May 30, 2017

For testing comparison testing should probably be the preferred means of testing (solves the oracle problem). A combinatoric tester of the quickcheck variety can be invaluable here,and can be used from the unit test level all the way to external service level tests. Copy the preferably small sections of code that are the fix or functionality target, compare the old and copied paths with the combinatoric tester, modify the copied path, understand any differences, remove the old code path (keep the combinatoric test asserting any invariant or properties).

Some other important points:

- Inst. and Logging: And also add an assert() function that throws or terminates in development and testing, but logs in production. Sprinkle it around when your working on the code base. If the assert asserts assumptions were wrong and now you know a bit more about what the code does. Also the asserts are your documentation and nothing says correct documentation like a silent assert

Fix bugs - Yes, and fix bugs causing errors first. Make it a priority every morning to review the logs, and fix the cause of error messages until the application runs quiet. Once its established that the app does not generate errors unless something is wrong, it will be very obvious when code starts being edited and mistakes start being made.

One thing at a time - And minimal fixes only. Before staring a fix ask what is the minimal change that will accomplish the objective. Once in midst of a code tragedy many other things will call out to be fixed. Ignore the other things. Accomplish the minimal goal. Minimal changes are easy to validate for correctness. Rabbit holes run deep and deepness is hard to validate.

Release - Also almost the first thing to do on a poorly done project is validate build and release scripts (if they exist). Validate generated build artifacts against a copy of the build artifact on the production machine. Use the Unix diff utility to match for files and content or you will miss something small but important. For deployment, make sure you have a rollback scheme in place or % staged rollout scheme because, at some point, mistakes will be made. Release often because the smaller the deploy the less change and the less that can go wrong.

taude · on May 30, 2017

To help others with this strategy of blackbox/comparison testing, it's also often called "characterization" testing [1]. (In case you want to read more about this strategy.)

[1] https://en.wikipedia.org/wiki/Characterization_test

mannykannot · on May 30, 2017

>My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output.

Same here - you have an oracle, it would be a waste not to use it. You can probably also think of some test cases that are not likely to show up often in the live data, but I would contend that until you know the implementation thoroughly, you are more likely to find input that tests significant corner cases in the live data, rather than by analysis.

douche · on May 30, 2017

> My go-to strategy has been blackbox (comparison) testing. Capture as much input & output as I can. Then use automation to diff output. I wouldn't bother to write unit tests etc for code that is likely to be culled, replaced.

I think that is precisely what the article advocates - although the definition of what end-to-end and integration tests are varies wildly from place to place.

> First step to any project is to add build numbers. Semver is marketing, not engineering. Just enumerate every build attempt, successful or not. Then automate the builds, testing, deploys, etc.

A thousand times this. And get to a point where the build process is reproducible, with all dependencies checked in (or if you trust your package manager to keep things around...). You should be able to pull down any commit and build it.

jacquesm · on May 30, 2017

That's absolutely true, I totally wrote that under the assumption that you at least have some kind of build process and that it actually works. I will add another section to the post.

raphar · on May 30, 2017

> write your tests.

From my point of view, this is always key. The moment you can have testable components, it's the moment you can begin to decompose the old system in parts. Once you begin with decomposition, Its easier first to pick on low hanging fruits to show that you are advancing and then transitioning to the dificult parts.

pd: I've been all my carreer maintaining & refactoring others code. I've never had any problem to take orphan systems or refactor old ones, and I kind of enjoy it.

If you have such of that old & horrible legacy systems, send it my way :D.

thijsvandien · on May 31, 2017

Interesting read for those who don't understand our fancy for legacy code: http://typicalprogrammer.com/the-joys-of-maintenance-program...

raphar · on May 31, 2017

The article is a description of my career! Thanks for sharing.

realcoopernurse · on May 30, 2017

+1 for the split testing + diff approach. We've successfully used this several times to replace old components with new implementations.