Formatters eliminating long lines is a pet peeve of mine.
About once every other project, some portion of the source benefits from source code being arranged in a tabular format. Long lines which are juxtaposed help make dissimilar values stand out. The following table is not unlike code I have written:
I've often wished that formatters had some threshold for similarity between adjacent lines. If some X% of the characters on the line match the character right above, then it might be tabular and it could do something to maintain the tabular layout.
Bonus points for it's able to do something like diff the adjacent lines to detect table-like layouts and figure out if something nudged a field or two out of alignment and then insert spaces to fix the table layout.
I believe some formatters have an option where you can specify a "do not reformat" block (or override formatting settings) via specific comments. As an exception, I'm okay with that. Most code (but I'm thinking business applications, not kernel drivers) benefits from default code formatting rules though.
And sometimes, if the code doesn't look good after automatic formatting, the code itself needs to be fixed. I'm specifically thinking about e.g. long or nested ternary statements; as soon as the auto formatter spreads it over multiple lines, you should probably refactor it.
I'm used to things like `// clang-format off` and on pairs to bracket such blocks, and adding empty trailing `//` comments to prevent re-flowing, and I use them when I must.
This was more about lamenting the need for such things. Clang-format can already somewhat tabularize code by aligning equals signs in consecutive cases. I was just wishing it had an option to detect and align other kinds of code to make or keep it more table like. (Destroying table-like structuring being the main places I tend to disagree with its formatting.)
I get what you're saying, and used to think that way, but changed my mind because:
1) Horizontal scrolling sucks
2) Changing values easily requires manually realigning all the other rows, which is not productive developer time
3) When you make a change to one small value, git shows the whole line changing
And I ultimately concluded code files are not the place for aligned tabular data. If the data is small enough it belongs in a code file rather than a CSV you import then great, but bothering with alignment just isn't worth it. Just stick to the short-line equivalent. It's the easiest to edit and maintain, which is ultimately what matters most.
This comes up in testing a lot. I want testing data included in test source files to look tabular. I want it to be indented such that I can spot order of magnitude differences.
Those kind of tables improve readability right until someone hits a length constraint and had to either touch every line in order to fix the alignment, causing weird conflicts in VCS, or ignore the alignment and it's slow decay into a mess begins.
It's not an either/or though. Tables are readable and this looks very much like tabular data. Length constraints should not be fixed if you have code like this, and it won't be "a slow decay into a mess" if escaping the line length rules is limited to data tables like these.
so you're basically saying "look this is neat and I like it, but since we cannot prevent some future chap come along and make a mess of it, let's stop this nonsense, now, and throw our hands up in the air—thoughts and prayers is what I say!"?
At best I'd say it's ok to use it sparingly, in places where it really does make an improvement in readability. I've seen people use it just to align the right hand side of a list of assignments, even when there is no tabular nature to what they are assigning.
I like short lines in general, as having a bunch of short lines (which tend to be the norm in code) and suddenly a very long line is terrible for readability. But all has exemptions. It's also very dependent on the programming language.
People have already outlined all the reasons why the long line might be less than optimal, but I will note that really you are using formatting to do styling.
In a post-modern editor (by which I mean any modern editor that takes this kind of thing into consideration which I don't think any do yet) it should be possible for the editor to determine similarity between lines and achieve a tabular layout, perhaps also with styling for dissimilar values in cases where the table has a higher degree of similarity than the one above.
Perhaps also with collapsing of tables with some indicator that what is collapsed is not just a sub-tree but a table.
It is an obvious example where automatic formatter fails.
But are there more examples? May be it's not high price to pay. I'm using either second or third approach for my code and I never had much issues. Yes, first example is pretty, but it's not a huge deal for me.
Another issue with fixed line lengths is that it requires tab stops to have a defined width instead of everyone being able to choose their desired indentation level in their editor config.
This is good, and objectively better than letting the random unbounded length of the function name define and inflate and randomize the indentation. It also makes it easier to use long descriptive function names without fucking up the indentation.
To me, that reads fine, but it has lost the property elevation wanted, which was that it's easy to compare the values assigned to any particular parameter across multiple calls. In your version you can only read one call at a time.
So fix your setup? Why should others with wider screens leave space on their screen empty for your sake?
Especially 80 characters is a ridiculously low limit that encourages people to name their variables and functions some abbreviated shit like mbstowcs instead of something more descriptive.
My main machine is an ultrawide, but I usually have multiple files open, and text reads best top-down so I stack files side-by-side. If someone has like, a 240 character long line, that is annoying. My editor will soft wrap and indicate this in the fringe of course but it's still a little obnoxious.
80 is probably too low these days but it's nice for git commit header length at least.
What do you do about the "oh, I'm the only one who cares about [???]? should I just fucking kill myself then?" Many such cases.
>How about doing it because your colleagues, who you presumably like collaborating with to reach a goal, asks you to?
If a someone wants me to do a certain thing in a certain way, they simply have to state it in terms of:
- some benefit they want to achieve
- some drawback they want to avoid
- as little as an acknowledged unexamined preference like "hey I personally feel more comfortable with approach X, how bout we try that instead"
I'm happy to learn from their perspective, and gladly go out of my way to accomodate them. Sometimes even against my better judgment, but hell, I still prefer to err on the side of being considerate. Just like you say, I like to work with people in terms of a shared goal, and just like you do, in every scenario I prefer to assume that's what's going on.
If, however, someone insists on certain approaches while never going deeper in their explanations than arbitrary non-falsifiable qualifiers such as "best practice", "modern", "clean", etc., then I know they haven't actually examined those choices that they now insist others should comply with. They're just parroting whatever version they imagine of industry-wide consensus describes their accidental comfort zone. And then boy do they hate my "make your setup assume less! it's the only way to be sure!". But no, I ain't reifying their meme instead of what I've seen work with my own two.
> If, however, someone insists on certain approaches while never going deeper in their explanations than arbitrary non-falsifiable qualifiers such as "best practice", "modern", "clean"
You're moving the goalposts of this discussion. The guy I was responding to said "fix your setup" to another person saying "Your table wrapped for me. The short line equivalent looks best on my screen." That's a stated preference based on a benefit he'd like to achieve.
We are not discussing "best practice" type arguments here.
"Best practice" type arguments are the universal excuse for remaining inconsiderate of the fact that different people interact with code differently, but fair enough I guess
Yes, I don't think we should discourage people from using Python or German just because you don't want to learn those particular languages either.
Working together with others should not mean having to limit everyone to the lowest common denominator, especially when there are better options for helping those with limitations that don't impact everyone else.
So haul your wide monitor around with your laptop, you mean? No.
Just use descriptive variable names, and break your lines up logically and consistently. They are not mutually exclusive, and your code will be much easier for you and other people to read and edit and maintain, and git diffs will be much more succinct and precise.
I softwrap so I don't care about line length myself but I read code on a phone a lot so people who hardwrap at larger columns are a little more annoying
I am at the opposite end. Having any line length constraints whatsoever seems like a massive waste of time every time I've seen it. Let the lines be as long as I need them, and accept that your colleagues will not be idiots. A guideline for newer colleagues is great, but auto-formatters messing with line lengths is a source of significant annoyance.
> auto-formatters messing with line lengths is a source of significant annoyance.
Unless they have been a thing since the start of a project; existing code should never be affected by formatters, that's unnecessary churn. If a formatter is introduced later on in a project (or a formatting rule changed), it should be applied to all code in one go and no new code accepted if it hasn't passed through the formatter.
I think nobody should have to think about code formatting, and no diff should contain "just" formatting changes unless there's also an updated formatting rule in there. But also, you should be able to escape the automatic formatting if there is a specific use case for it, like the data table mentioned earlier.
Define high? I think 120 is pretty reasonable. Maybe even as high as 140.
Log statements however I think have an effectively unbounded length. Nothing I hate more than a stupid linter turning a sprinkling of logs into 7 line monsters. cargo fmt is especially bad about this. It’s so bad.
Ugh. 80 is the worst. For C++ it’s entirely unreasonable. I definitely can not reconcile “linters make code easier to read” and “80 width is good”. Those are mutually exclusive imho.
What I actually want from a linter is “120, unless the trailing bits aren’t interesting in which case 140+ is fine”. The ideal rule isn’t hard and fast! It’s not pure science. There’s an art to it.
"Since forever" as in, "since the start of electronic computing"; we started printing the programs out on paper almost immediately. The 132 columns comes from the IBM's ancient line printers (circa 1957); most of other manufacturers followed the suit, and even the glass ttys routinely had 132-column mode (for VT100 you had to buy a RAM extension, for later models it was just there, I believe). My point is, most of the people did understand, back even in the sixties, that 80-columns wide screen is tiny, especially for reading the source code.
Printers aside the VT220 terminal from DEC had a 132 column mode. Probably it was aping a standard printer column count. Most of the time we used the 80 column mode as it was far more readable on what was quite a small screen.
Not only a small screen by modern standards, but the hardware lacked the needed resolution. The marketing brochure claims a 10x10 dot matrix. That will be for the 80 column mode. That works out to respectable 800 pixel horizontally, barely sufficient 6x10 pixel in 132 column mode. There was even a double-high, double-width mode for easier reading ;-)
Interesting here perhaps is that even back then it was recognized, that for different situations, different display modes were of advantage.
> There was even a double-high, double-width mode for easier reading
I'd forgotten that; now that waa a fugly font. I don't think anyone ever used it (aside from the "Setup" banner on the settings screen)
I think the low pixel count was rather mitigated by the persistence of phospher though - there's reproductions of the fonts that had to take this into account; see the stuff about font stretching here: https://vt100.net/dec/vt220/glyphs
Caveat, my personal experience is mainly limited to JS/TS, Java, and associated languages. 120 is fine for most use cases; I've only seen 80 work in Go, but that one also has unwritten rules that prefer reducing indentation as much as possible; "line-of-sight programming", no object-oriented programming (which gives almost everything a layer of indentation already), but also it has no ternary statements, no try/catch blocks, etc. It's a very left-aligned language, which is great for not unnecessarily using up that 80 column "budget".
It’s tricky to find an objective optimum. Personally I’ve been happy with up to 100 chars per line (aim for 80 but some lines are just more readable without wrapping).
But someone will always have to either scroll horizontally or wrap the text. I’m speaking as someone who often views code on my phone, with a ~40 characters wide screen.
In typography, it’s well accepted that an average of ~66 chars per line increases readability of bulk text, with the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line. There is however a difference between newspapers and books, since shorter ~40-char columns allows rapid skimming by moving your eyes down a column instead of zigzagging through the text.
But I don’t think these numbers translate directly to code, which is usually written with most lines indented (on the left) and most lines shorter than the maximum (few statements are so long). Depending on language, I could easily imagine a line length of 100 leading to an average of ~66 chars per line.
> the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line.
In my experience, with programming you rarely have lines of 140 printable characters. A lot of it is indentation. So it’s probably rarely a problem to find your way back on the next line.
I don’t think code is comparable. Reading code is far more stochastic than reading a novel.
For C/C++ headers I absolutely despise verbose doxygen bullshit commented a spreading relatively straightforward functions across 10 lines of comments and args.
I want to be able to quickly skim function names and then read arguments only if deemed relevant. I don’t want to read every single word.
I like splitting long text as in log statements into appropriate source lines, just like you would a Markdown paragraph. As in:
logger.info(
"I like splitting long text as in log statements " +
"into ” + suitablelAdjective + " source lines, " +
"just like you would a Markdown paragraph. " +
"As in: " + quine);
I agree that many formatters are bad about this, like introducing an indent for all but the first content line, or putting the concatenation operator in the front instead of the back, thereby also causing non-uniform alinkemt of the text content.
This makes it really annoying to grep for log messages. I can't control what you do in your codebase but I will always argue against this the ones I work on.
I haven’t found this to be a problem in practice. You generally can’t grep for the complete message anyway due to inserted arguments. Picking a distinctive formulation from the log message virtually always does the trick. I do take care to not place line breaks in the middle of a semantic unit if possible.
Yes, I find the part of the message that doesn't have interpolated arguments in it. The problem is that the literal part of the string might be broken up across lines.
IMO, implicit string concatenation is a bug, not a feature.
I once made a stupid mistake of having a list of directories to delete:
directories_to_delete = (
"/some/dir"
"/some/other/dir"
)
for dir in directories_to_delete:
shutil.rmtree(dir)
Can you spot the error? I somehow forgot the comma in the list. That meant that rather than creating a tuple of directories, I created a single string. So when the `for` loop ran, it iterated on individual characters of the string. What was the first character? "/" of course.
I essentially did an `rm -rf /` because of the implicit concatenation.
I prefer storing plain text and displaying locally preferred syntax, to a degree.
With some expressions, like lookup tables or bit strings, hand wrapping and careful white space use is the difference between “understandable and intuitive” and “completely meaningless”. In JS world, `// prettier-ignore` above such an expression preserves it but ideally there’s a more universal way to express this.
And that's fine, as long as whatever ends up in version control is standardized. Locally you can tweak your settings to have / have not word wrapping, 2-8 space indentation, etc.
But that's the core of this article, too; since then it's normalized to store the plain text source code in git and share it, but it mentions a code and formatting agnostic storage format, where it's down to people's editors (and diff tools, etc) to render the code. It's not actually unusual, since things like images are also unreadable if you look at their source code, but tools like Github will render them in a human digestable format.
You still have to minimize the wrapping that happens, because wrapped lines of code tend to be continuous instead of being properly spaced so as to make its parts individually readable.
I’d agree with you except for the trend over the last 10 years or so to set limits back to the Stone Age. For a while there we seemed to be settling on somewhere around 150 characters and yet these days we’re back to the 80-100 range.