You could, conceivably, have a "display_name" column, an "informal_name" column, a "formal_name" column, etc. depending on your needs. But for absolutely most people and purposes, a single "name" column will suffice, and will steer programmers away from making unfounded assumptions about people’s names.
But then you don't have my informal name. When I receive an informal update email, it would be clunky to have it begin Dear Max Barraclough.
You can't automatically deduce that the way to address me informally is Max, without making assumptions that will break outside the English speaking world. See Points 18, 24, and 30.
The W3C suggest that one way to handle names is to have a Full name field and a What should we call you? field.
You'll regret this decision as soon as you have to integrate with another system that uses separate first and last names. Or as soon as you want to build a sorted list of five thousand names or output a bibliography in a specific format or anything.
Technically the right approach to the problem of storing international names would be to store markup: a string with an associated structure and a discriminant:
You can then have precise automated ways to transform this into whatever derived form you need. Using a single name field is just avoiding the problem. Using multiple fields is a maintenance burden, because in most cases fields are not fully independent and you have to update them all to keep the data coherent.
Using markup does not really change anything, that is just a way of serializing a data structure. You can represent the same information in normalized form.
Of course, I don't mean storing literal XML, but I think it's important to support generic markup in databases. It's not that hard, after all; something like these five tables:
Markup (markupId, string)
Ns (nsId, uri)
QName (qNameId, nsId, localName)
Elt (markupId, eltId, qNameId,
start, length /* string indices */,
level, index /* or other way to represent hierarchy */ )
Attr (eltId, qNameId, value)
would be enough to store and fully index any XML out there.
I think markup is an important separate data type. We have records, we have scalars, now we have a hybrid type, markup: a string (or maybe a byte/bit stream) + records linked to indexes in this string/stream and maybe forming a hierarchy.
A database should store the actual data, not some serialized format. You don't store the comma tokens from a CSV file in the databases either - you store data which can be serialized to CSV. The same information could be serialized as CSV, XML or JSON depending on the context.
Storing a serialized format as an opaque string or blob might make sense in particular cases.
A) the proposal is crazy. You probably don’t need to know if someone has a patronymn or a kunya or whatever. Base what you store on what you need.
B) Storing this as normal data would be an intense pain in the ass to use, for no gain in performance or correctness. Use JSONB, this is what it’s for.
If you actually need to know such details then it should be stored in normalized form.
Say you have a visa database which must be compatible with multiple naming systems from different countries and where no information can be discarded for security or legal reasons. Then you store the data you need in normalized form. Why would you use weird hacks for essential information?
And if you don't need all that detail - then you just save the name as a string and discard the metadata.
I can't see it would ever make sense to store a name in JSON format inside a relational database. Either the information is important or it isn't.
A) In many cases you don't, but if you store data about authors in a bibliographic database then you probably do. And it's just a part of the package: for example, Donald Knuth went as far as to print people names in their native language in addition to English in his books' indices :)
Matters for what? A database is a tool to solve a problem, not a timeless repository of deep cultural meaning.
If I make a webapp that is only localized in English, display, first, last will let me solve my common problems: I need to say "My account (John Smith)" in the corner (display name), and I want to be able to write emails that say "John, we know you're wondering how company that sold you a pair of pants three years ago is feeling about COVID-19…" (first name) and list articles by author sorted by last name. Those are all common use cases for apps used by English speakers.
Now, if I were making an app to do voting in Myanmar, I would need to deal with there not being last names for many Burmese. If I was trying to track Arab speaking terrorists around the world, I'd want a long list of their aliases and kunya in multiple romanizations (I think the CIA used to prefer "UBL" for Osama bin Laden because they called him "Usama"). If I was making a library app, I might want to have the English romanized name plus the Unicode native language name. Tons of possible problem spaces with different solutions.
I just think tagging the different name parts for "Gabriel José de la Concordia García Márquez" is going to be overkill for most English language apps because when are you going to look someone up by maternal family name versus of just doing a full text search for "Garcia Marquez".
I'm not arguing about storing serialized format; instead I suggest to parse it and end up with something like that:
NAME ( 1, "Fyodor Mikhailovich Dostoyevsky" )
TAG ( FIRSTNAME, 1, 0, 5 )
TAG ( LASTNAME, 1, 20, 30 )
TAG ( PATRONYMIC, 1, 7, 18 )
I'm not saying it's efficient (UPD: speedy), but it's sufficient to get the required details and can be mechanically transformed into a more efficient form. UPD: And it's very flexible and easy to extend. I believe it can handle just about any kind of naming from "Falsehoods about names".
This one lacks ordering. It's not a problem for Russian names as the rules are fixed, but may be a problem for other ones.
And how would it scale? E.g. John Ronald Reuel Tolkien and George Herbert Walker Bush have two middle names, while full name of García Márquez is Gabriel José de la Concordia García Márquez. Markup handles this uniformly creating only as few metadata records as necessary, but a field-based model will need fields for all of this that will stay empty most of the time and likely some meta fields to set additional quirky flags.
Regardless of how complex naming schemes you need to support, you just make it more complex and harder to query by storing tag names and character indexes and whatnot.
Isn’t that what RDF is all about? Each field being a named property of a resource, the properties themselves being properties of other resources describing the structure of the first properties. And so on, going on as meta as necessary
There is no need for an arbitrary "markup" that you then restrict based on your use-case (in your case it's some sort international-aware storage of different data).
Databases are designed to store data, and storing markup like your example above is basically a misuse on some level. Your example above has a clear structure, and that structure can be encoded as raw data in database tables. So instead of having a generic set of tables (as per your other comment) to "store markup" data, you would rather come up with tables that represent your actual requirement and what that markup is meant to indicate.
Some database also have inheritance models which allow you to store the data above quite trivially. You basically have a "base" and then extend it with specific fields for each type (en, ru, es, etc).
So please, don't put markup in your database like that. It's a big no-no, and I'm sure your DBAs would tell you that very quickly. Else, just throw that crap into a document-store and make it someone else's problem as is unfortunately all too-common lately.
What you want is not markup, but tagged unions. It's a real shame SQL and many other languages don't support them.
Also, your example about Spanish names doesn't take into account that nowadays parents can choose the order of family names, and that gay marriage has been legal for 15 years. Which reflects a deeper truth: don't make too strong assumptions about reality, for it may change.
Tagged unions, if I understand what they are, can be done in SQL:
create table Foo(
fooId primary key autoincrement,
type integer /* tag/discriminant */ );
create table FooA(fooId primary key,
/* data specific to FooA */ );
create table FooB(fooId primary key,
/* data specific to FooB */ );
They're not quite markup though. Markup has a string/stream with associated records. The number of these records, their kind, and their placement varies, so it can handle ordering issues or multiple similar names or missing name (e.g. missing middle name), or ignore parts of name quite naturally.
That could be an implementation of tagged unions. Without language level support (pattern matching, exhaustiveness checking, etc.) you gain very little from this.
It's not a simulation, it's a data model for reducing the pain for what's really a hard problem - names we call people are not unique, they are colloquial identifiers (HHOS).
Hum... I'm not from the US and do not live there if that's what you are asking.
People usually have enough trouble separating the first name from whatever you call the rest of it and deciding on what part goes where, and you expect them to be able to tell what are the cultural roots of the name?
Keep in mind that not all names on databases are filled by their owners. Besides, I really doubt I would be able to enter my name into whatever you are designing without some amount of lying.
I spent 13 years maintaining a large-ish system that separated first- and last names for customers.
Separating makes searching/indexing on last name easier.
The alternative would be to put last names first by convention, which renders them unusable for anything else; or go insane trying to find a universal way of splitting and recombining for display.
Keeping them separate inevitably results in a convention to put company names in the last name field for corporations, and failed searches when the convention is broken.
It's one of those questions that I suspect will keep bugging me until I die.
Yeah, but if you only have one last name field, you can only index on people's first last name, not their second last name? How are you going to solve that?
> Separating makes searching/indexing on last name easier.
The point is that such a search or index is a mistake. It's simply not meaningful based on the real properties of names, it just happens to work 99% of the time if you're only dealing with a mostly homogenous culture.
> Keeping them separate inevitably results in a convention to put company names in the last name field for corporations
In most (IMO sane) designs, a company name would usually be a separate field (or better yet: companies would be represented in an entirely different table).
But in many cases the customer is either an individual or a corporation. Keeping them in separate tables turns simple tasks such as generating a list of customers into joins. Which brings us to normalization, too much of that kills performance.
I feel like if a join is killing performance, either it's a really weird join (e.g. joining on an OR) or you ain't indexing.
Regardless, you could always just assume everyone's a "corporation" (and for individual customers, that "corporation" would just be a reflection of the individual customer's data). That, or only deal with contacts within a "corporation" (and an individual customer would just be a contact with a null "corporation"). I've seen both approaches in practice, and while they have their tradeoffs, they work reasonably well.
"Corporation" is a pretty wonky term for this, anyway (not all non-individual customers are corporations); "organization" would be better. But I digress.
Also for international users browsing in English labeling the fields "given name" and "family name" are way more clearer than using first and last names.
Exactly what I was thinking - plus a field to state the order they should be displayed in seeing as Asian cultures usually state the Family Name first.
What do you mean "the legal surname"? I'm Spanish and my legal name is my full name with both family names. You can't just ignore one. Also, Spanish people have two family names, but they can be composite, so they may look as three or four family names.
However, most people are usually known by one family name, but which one is used depends on which one is more easily recognizable.
> This list is by no means exhaustive. If you need examples of real names which disprove any of the above commonly held misconceptions, I will happily introduce you to several.
Prince, famously, didn't have what we'd recognise as a name for years. All the "Unknown Soldier"s don't have names. "John/Jane Doe" doesn't technically have a name.
These examples seem somewhat contrived, but of course it comes down to the use case of the database. I'm sure Prince remained registered under his legal name on contracts, checks etc, even if he used the symbol in marketing. But a database over bodies in a morgue obviously need to be able to support unidentified bodies.
Interesting, I think you're right. According to Wikipedia, he only changed his stage name, not his legal name. The BBC didn't seem to make the distinction.
If I had a dime for address forms that have "state" as a mandatory field, while having a country dropdown that contains countries that do not have states, I'd be rich, RICH I TELL YOU!
No.
“[…] refer people to this post the next time they suggest a genius idea like a database table with a first_name and last_name column.”
— https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...
You could, conceivably, have a "display_name" column, an "informal_name" column, a "formal_name" column, etc. depending on your needs. But for absolutely most people and purposes, a single "name" column will suffice, and will steer programmers away from making unfounded assumptions about people’s names.