Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the simplest explanation is that developers used it and did not like it.

The pro-XML narrative always sounded like what you wrote, as far back as I can remember: The XML people would tell you it was beautiful and perfect and better than everything as long as everyone would just do everything perfectly right at every step. Then you got into the real world and it was frustrating to deal with on every level. The realities of real-world development meant that the picture-perfect XML universe we were promised wasn't practical.

I don't understand your comparison to containerization. That feels like apples and oragnes.



HTML was conceived as a language for marking up a document that was primarily text; XML took the tags and attributes from that and tried to turn it into a data serialization and exchange format. But it was never really well suited to that, and it's obvious from looking XML-RPC or SOAP payloads that there were fundamental gaps in the ability of XML to encode type and structure information inline:

    <?xml version="1.0"?>
    <methodCall>
        <methodName>math.add</methodName>
        <params>
            <param>
                <value><int>5</int></value>
            </param>
            <param>
                <value><int>7</int></value>
            </param>
        </params>
    </methodCall>
Compared to this, JSON had string and number types built in:

    {
        "jsonrpc": "2.0",
        "method": "math.add",
        "params": [5, 7],
        "id": 1
    }
I don't think this is the only factor, but I think XML had a lot of this kind of cognitive overhead built in, and that gave it a lot of friction when stacked up against JSON and later yaml... and when it came to communicating with a SPA, it was hard to compete with JS being able to natively eval the payload responses.


To be fair I cannot trust your shape in your jsonrpc, I am not sure if id is truly an integer or if you sent me an integer by mistake, same as params or even the payload of the params' param, this is why we ended adopting openapi for describing http interactions and iirc jsonrpc specifically can also be described with it. At least in the schema part no one would say it is ambiguous, also one does not need do heavier parses, the obj is a tree, no more checking on scaping strings, no more issues with handcoded multiline strings, it is dropped the need to separate attributes with commas as we know the end tag delimits a space and so on.


> To be fair I cannot trust your shape in your jsonrpc, I am not sure if id is truly an integer or if you sent me an integer by mistake, same as params or even the payload of the params' param

In practice, it doesn't matter.

If the JSON payload is in the wrong format the server rejects it with an error.

If the server sends an integer "by mistake" then the purists would argue that the client should come to a halt and throw up an error to the user. Meanwhile the JSON users would see an integer coming back for the id field and use it, delivering something that works with the server as it exists today. Like it or not, this is why JSON wins.

Schema defined protocols are very useful in some circumstances, but in my experience the added overhead of keeping them in sync everywhere and across developers is a lot of overhead for most simple tasks.

Putting the data into a simple JSON payload and sending it off gets the job done in most cases.


Yeah this is the issue. I spent tons of time writing code that would consume xml and turn it into something useful.

It’s a mediocre data storage language.


> developers used it and did not like it.

This makes sense.

However, there are two ways to address it:

1) Work towards a more advanced system that addresses the issues (for example, RDF/Turtle – expands XML namespaces to define classes and properties, represents graphs instead of being limited to trees unlike XML and JSON)

2) Throw it away and start from scratch. First, JSON. Then, JSON schema. Jq introduces a kind of "JSONPath". JSONL says hi to XML stream readers. JSONC because comments in config files are useful. And many more primitives that existed around XML were eventually reimplemented.

Note how the discussion around removing XSLT 1 support similarly has two ways forward: yank it out or support XSLT 3.

I lean towards Turtle replacing XML over JSON, and for XSLT 3 to replace XSLT 1 support in the browsers.


> And many more primitives that existed around XML were eventually reimplemented.

Don't miss that they were reimplemented properly.

Even XML schemas, the one thing you'd think they were great at, ended up seeing several different implementation beyond the original DTD-based schema definitions and beyond XSD.

Some XML things were absolute tire fires that should have been reimplemented even earlier, like XML-DSIG, SAML, SOAP, WS-everything.

It's not surprising devs ended up not liking it, there are actual issues trying to apply XML outside of its strengths. As with networking and the eventual conceit of "smart endpoints, dumb pipes" over ESBs, not all data formats are better off being "smart". Oftentimes the complexity of the business logic is better off in the application layer where you can use a real programming language.


> Even XML schemas, the one thing you'd think they were great at

Of course not! W3C SHACL shapes, on the other hand...

schema.org is also a move in the right direction


The simplest explanation is that attributes were a mistake. They add another layer to the structure and create confusion as to where data is best stored within it.

XML without attributes probably would have seen wide and ready adoption.


I see it as the opposite. Attributes weren’t used enough. The result was unnecessarily nested code.

“Keep things flat” is current good advice in terms of usability. That means favor attributes over children.


I agree. A sibling thread showed an example of XML above containing params/param/value/int/ nodes which with attributes could just be <param type=int>.

I do agree that attributes/data was always a huge contention point on where things should go and caused confusion and bikeshedding.

I also saw a bit of this in the React/JSX community with decisions like render props, HoC, etc where it took a bit to stabilize on best practices.


While i think a lot of xml was a bad idea, some of the issues are not instrinsically the fault of XML but some really poor design decisions by people making xml based languages.

They tended to be design by comittee messes that included every possible use case as an option.

Anyone who has ever had the misfortune of having to deal with SAML knows what i'm talking about. Its a billion line long specification, everyone only implements 10% of it, and its full of hidden gotchas that will screw up your security if you get them wrong. (Even worse, the underlying xml-signature spec is literally the worst way to do digital signatures possible. Its so bad you'd think someone was intentionally sabotaging it)

In theory this isn't xml's fault, but somehow XML seems to attract really bad spec designers.


Part of the problem was it came in an era before we really understood programming, as a collective. We didn't even really know how to encapsulate objects properly, and you saw it in poor database schema designs, bizarre object inheritance patterns, poorly organised APIs, even the inconsistent method param orders in PHP. It was everywhere. Developers weren't good at laying out even POCOs.

And those bizarre designs went straight into XML, properties often in attributes, nodes that should have been attributes, over nesting, etc.

And we blamed XML for the mess where often it was just inexperience in software design as an industry that was the real cause. But XML had too much flexibility compared to the simplicity of the later JSON, meaning it helped cause the problem. JSON 'solved' the problem by being simpler.

But then the flip side was that it was too strict and starting one in code was a tedious pita where you had to specify a schema even though it didn't exist or even matter most of the time.


Nah, we still have all those issues and more.

The hard truth is that XML lost to the javascript-native format (JSON). Any JavaScript-native format would have won, because "the web" effectively became the world of JavaScript. XML was not js-friendly enough: the parsing infrastructure was largely based on C/C++/Java, and then you'd get back objects with verbose interfaces (again, a c++/java thing) rather than the simple, nested dictionaries that less-skilled "JS-first" developers felt at ease with.


The thing is, JSON is even superior in C++.

It's a dumber format but that makes it a better lingua franca between all sorts of programming languages, not just Javascript, especially if you haven't locked in on a schema.

Once you have locked in on a schema and IDL-style tooling to autogenerate adapter classes/objects, then non-JSON interchange formats become viable (if not superior). But even in that world, I'd rather have something like gRPC over XML.


A dumber format works great for dumber protocols like RPC. When you're trying to represent something complex like a document, JSON is crap. Imagine the JSON equivalent of HTML. Then imagine editing it by hand.

"Data" lives on a spectrum of simple to complex; most is simple; JSON is great for that. But not for everything.


> When you're trying to represent something complex like a document, JSON is crap.

I agree, but let's be honest, how often does this actually come up for data interchange. Because the situations where you want a human-editable document that is also computer-legible are fairly few in number. Prose in office documents, human-written documentation for code, config files.

For things where computer programs are driving the show (which is most the time, today), you want interchange designed for programming languages, whether that's JSON, CSV, gRPC, or what have you.

This even applies to documents that are completely computer-generated: get the inputs into an appropriate data structure using easy interchange formats, and then render to the output document (HTML, XML, LaTeX, etc.).

> Data" lives on a spectrum of simple to complex; most is simple; JSON is great for that. But not for everything.

Fully agreed. But even where you do have documents, XML is not always the right thing either, which is why there are other markup formats in wide usage like TeX and Doxygen/Javadoc-style comments. XML seems best aligned to things like office document formats or other rich text where it makes sense to want to wrap some human text inside some computer-visible markers.


> how often does this actually come up for data interchange

Pretty much anywhere with text-heavy payloads? It's quite a lot, I think, but still dwarfed by the number of cases where system A needs to perform some sort of RPC-like communication with system B.

I just think that there is plenty of room in the world for both approaches.


that's the thing, XML should have become javascript native so that we could write inline HTML more easily like JSX from react allows us to do.


It did somewhat. It was called E4X.


more like it almost did. bummer. thanks for the pointer. sad to find out that the attempt was made but rejected. i wonder why.


Essentially, people didn't use it.

I think the reason is that it solved the "problem" of integrating XML into JavaScript, as if being good at XML was an end in itself. Once XML started being replaced by JSON in earnest, what was the pressing need for E4X? Doing stuff in SOAP or WS-* presumably, but that wasn't a common use case in browser JavaScript, though it might have come in handy if Node.js took off a bit earlier.

JSX took off because it helped frontend developers solve problems outside of JSX. Look at examples of its use and it's heavily intertwined with the rest of what you'd see on a webpage (HTML, JS and CSS), and you never get the sense that JSX is a "self-licking ice cream cone".


This is the abstract idealism I was talking about: Every pro-XML person I've talked to wants to discuss XML in the context of a hypothetical perfect world of programming that does not exist, not the world we inhabit.

The few staunch XML supporters I worked with always wanted to divert blame to something else, refusing to acknowledge that maybe XML was the wrong tool for the job or even contributing to the problems.


Regarding containerization, XML wouldn’t just be a noun, but a verb (like in XSLT). You would define your remote procedures in XML.

Imagine if instead of the current Dockerfile format, we used XML, which was dynamically generated from lists of packages, and filtered and updates according to RSS feeds describing CSVs and package updates.

I’m not saying this is anything other than strange fantasy. And not a particularly nice fantasy either.

XML failed because it forced devs to spend tons of unproductive time on it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: