> *- Names should be stored as display_name, first_name, and last_name.* No. “[…...

MaxBarraclough · on April 8, 2020

But then you don't have my informal name. When I receive an informal update email, it would be clunky to have it begin Dear Max Barraclough.

You can't automatically deduce that the way to address me informally is Max, without making assumptions that will break outside the English speaking world. See Points 18, 24, and 30.

The W3C suggest that one way to handle names is to have a Full name field and a What should we call you? field.

https://www.w3.org/International/questions/qa-personal-names

marcosdumay · on April 8, 2020

If the database owner is not billing you, he probably doesn't need your formal name, so he can have just one column, and call it "name".

Otherwise, yes, he'll need the 2 columns.

Mikhail_Edoshin · on April 8, 2020

You'll regret this decision as soon as you have to integrate with another system that uses separate first and last names. Or as soon as you want to build a sorted list of five thousand names or output a bibliography in a specific format or anything.

Technically the right approach to the problem of storing international names would be to store markup: a string with an associated structure and a discriminant:

    <name lang="en">
      <first>Kenneth</first>
      <middle>Elton</middle>
      <last>Kesey</last>
    </name>
    <name lang="ru">
      <given>Fyodor</given>
      <patronymic>Mikhailovich</patronymic>
      <surname>Dostoyevsky</surname>
    </name>
    <name lang="es">
      <given>Gabriel</given>
      <family-paternal>García</family>
      <family-maternal>Márquez</family>
    </name>

You can then have precise automated ways to transform this into whatever derived form you need. Using a single name field is just avoiding the problem. Using multiple fields is a maintenance burden, because in most cases fields are not fully independent and you have to update them all to keep the data coherent.

goto11 · on April 8, 2020

Using markup does not really change anything, that is just a way of serializing a data structure. You can represent the same information in normalized form.

Mikhail_Edoshin · on April 8, 2020

Of course, I don't mean storing literal XML, but I think it's important to support generic markup in databases. It's not that hard, after all; something like these five tables:

    Markup (markupId, string)
    Ns (nsId, uri)
    QName (qNameId, nsId, localName)
    Elt (markupId, eltId, qNameId, 
      start, length /* string indices */, 
      level, index /* or other way to represent hierarchy */ )
    Attr (eltId, qNameId, value)

would be enough to store and fully index any XML out there.

I think markup is an important separate data type. We have records, we have scalars, now we have a hybrid type, markup: a string (or maybe a byte/bit stream) + records linked to indexes in this string/stream and maybe forming a hierarchy.

goto11 · on April 8, 2020

A database should store the actual data, not some serialized format. You don't store the comma tokens from a CSV file in the databases either - you store data which can be serialized to CSV. The same information could be serialized as CSV, XML or JSON depending on the context.

Storing a serialized format as an opaque string or blob might make sense in particular cases.

earthboundkid · on April 8, 2020

A) the proposal is crazy. You probably don’t need to know if someone has a patronymn or a kunya or whatever. Base what you store on what you need.

B) Storing this as normal data would be an intense pain in the ass to use, for no gain in performance or correctness. Use JSONB, this is what it’s for.

goto11 · on April 8, 2020

If you actually need to know such details then it should be stored in normalized form.

Say you have a visa database which must be compatible with multiple naming systems from different countries and where no information can be discarded for security or legal reasons. Then you store the data you need in normalized form. Why would you use weird hacks for essential information?

And if you don't need all that detail - then you just save the name as a string and discard the metadata.

I can't see it would ever make sense to store a name in JSON format inside a relational database. Either the information is important or it isn't.

Mikhail_Edoshin · on April 8, 2020

A) In many cases you don't, but if you store data about authors in a bibliographic database then you probably do. And it's just a part of the package: for example, Donald Knuth went as far as to print people names in their native language in addition to English in his books' indices :)

lonelappde · on April 8, 2020

This comment is culturally insensitive. You are implicitly assuming that your culture's way of expressing names is all that matters

earthboundkid · on April 8, 2020

Matters for what? A database is a tool to solve a problem, not a timeless repository of deep cultural meaning.

If I make a webapp that is only localized in English, display, first, last will let me solve my common problems: I need to say "My account (John Smith)" in the corner (display name), and I want to be able to write emails that say "John, we know you're wondering how company that sold you a pair of pants three years ago is feeling about COVID-19…" (first name) and list articles by author sorted by last name. Those are all common use cases for apps used by English speakers.

Now, if I were making an app to do voting in Myanmar, I would need to deal with there not being last names for many Burmese. If I was trying to track Arab speaking terrorists around the world, I'd want a long list of their aliases and kunya in multiple romanizations (I think the CIA used to prefer "UBL" for Osama bin Laden because they called him "Usama"). If I was making a library app, I might want to have the English romanized name plus the Unicode native language name. Tons of possible problem spaces with different solutions.

I just think tagging the different name parts for "Gabriel José de la Concordia García Márquez" is going to be overkill for most English language apps because when are you going to look someone up by maternal family name versus of just doing a full text search for "Garcia Marquez".

goto11 · on April 8, 2020

Missing the point. If you just have a single "name" field then you can accommodate any culture perfectly.

earthboundkid · on April 9, 2020

This fits the 75% use case, but misses things that do come up, like alphabetizing by last name and emails with casual forms of address.

Mikhail_Edoshin · on April 8, 2020

I'm not arguing about storing serialized format; instead I suggest to parse it and end up with something like that:

    NAME ( 1, "Fyodor Mikhailovich Dostoyevsky" )
    TAG ( FIRSTNAME, 1, 0, 5 ) 
    TAG ( LASTNAME, 1, 20, 30 ) 
    TAG ( PATRONYMIC, 1, 7, 18 )

I'm not saying it's efficient (UPD: speedy), but it's sufficient to get the required details and can be mechanically transformed into a more efficient form. UPD: And it's very flexible and easy to extend. I believe it can handle just about any kind of naming from "Falsehoods about names".

goto11 · on April 8, 2020

But why include all the cruft fist place? You could just save this information (in normalized form):

  FIRSTNAME: Fyodor
  LASTNAME: Mikhailovich
  PATRONYMIC: Dostoyevsky

Mikhail_Edoshin · on April 8, 2020

This one lacks ordering. It's not a problem for Russian names as the rules are fixed, but may be a problem for other ones.

And how would it scale? E.g. John Ronald Reuel Tolkien and George Herbert Walker Bush have two middle names, while full name of García Márquez is Gabriel José de la Concordia García Márquez. Markup handles this uniformly creating only as few metadata records as necessary, but a field-based model will need fields for all of this that will stay empty most of the time and likely some meta fields to set additional quirky flags.

goto11 · on April 8, 2020

Regardless of how complex naming schemes you need to support, you just make it more complex and harder to query by storing tag names and character indexes and whatnot.

zo1 · on April 8, 2020

Please don't do that. I hope you haven't implemented such a set of tables in any database.

Mikhail_Edoshin · on April 8, 2020

Not yet :) But I may just for the sake of it.

antonvs · on April 8, 2020

That's a good idea. You'll learn firsthand why it's a bad idea.

eecc · on April 8, 2020

Isn’t that what RDF is all about? Each field being a named property of a resource, the properties themselves being properties of other resources describing the structure of the first properties. And so on, going on as meta as necessary

zo1 · on April 8, 2020

There is no need for an arbitrary "markup" that you then restrict based on your use-case (in your case it's some sort international-aware storage of different data).

Databases are designed to store data, and storing markup like your example above is basically a misuse on some level. Your example above has a clear structure, and that structure can be encoded as raw data in database tables. So instead of having a generic set of tables (as per your other comment) to "store markup" data, you would rather come up with tables that represent your actual requirement and what that markup is meant to indicate.

Some database also have inheritance models which allow you to store the data above quite trivially. You basically have a "base" and then extend it with specific fields for each type (en, ru, es, etc).

So please, don't put markup in your database like that. It's a big no-no, and I'm sure your DBAs would tell you that very quickly. Else, just throw that crap into a document-store and make it someone else's problem as is unfortunately all too-common lately.

pezezin · on April 8, 2020

What you want is not markup, but tagged unions. It's a real shame SQL and many other languages don't support them.

Also, your example about Spanish names doesn't take into account that nowadays parents can choose the order of family names, and that gay marriage has been legal for 15 years. Which reflects a deeper truth: don't make too strong assumptions about reality, for it may change.

Mikhail_Edoshin · on April 8, 2020

Tagged unions, if I understand what they are, can be done in SQL:

    create table Foo(
      fooId primary key autoincrement, 
      type integer /* tag/discriminant */ );
    create table FooA(fooId primary key,
      /* data specific to FooA */ );
    create table FooB(fooId primary key,
      /* data specific to FooB */ );

They're not quite markup though. Markup has a string/stream with associated records. The number of these records, their kind, and their placement varies, so it can handle ordering issues or multiple similar names or missing name (e.g. missing middle name), or ignore parts of name quite naturally.

Quekid5 · on April 8, 2020

That could be an implementation of tagged unions. Without language level support (pattern matching, exhaustiveness checking, etc.) you gain very little from this.

naasking · on April 8, 2020

Yes, you can simulate them, but it's painful.

no-s · on April 8, 2020

It's not a simulation, it's a data model for reducing the pain for what's really a hard problem - names we call people are not unique, they are colloquial identifiers (HHOS).

naasking · on April 8, 2020

Joined tables on the same primary key are a simulation of tagged unions. Proper tagged unions would solve more concisely and with more clarity.

_mt3y · on April 9, 2020

I'd really like to see an RDBMS with built-in support for both inheritance and tagged unions. That would solve 90% of my schema woes.

marcosdumay · on April 8, 2020

Hum... Who exactly is entering that data onto the database? And are they ok with entering things with that level of detail?

lonelappde · on April 8, 2020

Your American is showing? It's just a standard name input form but internationalized.

marcosdumay · on April 8, 2020

Hum... I'm not from the US and do not live there if that's what you are asking.

People usually have enough trouble separating the first name from whatever you call the rest of it and deciding on what part goes where, and you expect them to be able to tell what are the cultural roots of the name?

Keep in mind that not all names on databases are filled by their owners. Besides, I really doubt I would be able to enter my name into whatever you are designing without some amount of lying.

codr7 · on April 8, 2020

I spent 13 years maintaining a large-ish system that separated first- and last names for customers.

Separating makes searching/indexing on last name easier.

The alternative would be to put last names first by convention, which renders them unusable for anything else; or go insane trying to find a universal way of splitting and recombining for display.

Keeping them separate inevitably results in a convention to put company names in the last name field for corporations, and failed searches when the convention is broken.

It's one of those questions that I suspect will keep bugging me until I die.

henrikschroder · on April 8, 2020

> Separating makes searching/indexing on last name easier.

What do you mean "last name"?

If you have two last names, then what?

If you have no last name, then what?

> Keeping them separate inevitably results in a convention to put company names in the last name field for corporations

If you only have a single field for name, this problem disappears.

codr7 · on April 8, 2020

I mean everything but the first name.

Yes, but then you have to deal with the indexing/search problem instead.

henrikschroder · on April 8, 2020

Yeah, but if you only have one last name field, you can only index on people's first last name, not their second last name? How are you going to solve that?

naasking · on April 8, 2020

> Separating makes searching/indexing on last name easier.

The point is that such a search or index is a mistake. It's simply not meaningful based on the real properties of names, it just happens to work 99% of the time if you're only dealing with a mostly homogenous culture.

codr7 · on April 12, 2020

Which is good enough in plenty of situations, compared to the "perfect" alternative that doesn't work at all.

yellowapple · on April 8, 2020

> Keeping them separate inevitably results in a convention to put company names in the last name field for corporations

In most (IMO sane) designs, a company name would usually be a separate field (or better yet: companies would be represented in an entirely different table).

codr7 · on April 8, 2020

But in many cases the customer is either an individual or a corporation. Keeping them in separate tables turns simple tasks such as generating a list of customers into joins. Which brings us to normalization, too much of that kills performance.

yellowapple · on April 12, 2020

I feel like if a join is killing performance, either it's a really weird join (e.g. joining on an OR) or you ain't indexing.

Regardless, you could always just assume everyone's a "corporation" (and for individual customers, that "corporation" would just be a reflection of the individual customer's data). That, or only deal with contacts within a "corporation" (and an individual customer would just be a contact with a null "corporation"). I've seen both approaches in practice, and while they have their tradeoffs, they work reasonably well.

"Corporation" is a pretty wonky term for this, anyway (not all non-individual customers are corporations); "organization" would be better. But I digress.

tasogare · on April 8, 2020

Also for international users browsing in English labeling the fields "given name" and "family name" are way more clearer than using first and last names.

Wildgoose · on April 8, 2020

Exactly what I was thinking - plus a field to state the order they should be displayed in seeing as Asian cultures usually state the Family Name first.

pezezin · on April 8, 2020

Shouldn't that be a localization/translation issue?

My experience living in Japan is that they will write family name first if using Japanese, but given name first if using an European language.

rumanator · on April 8, 2020

Also, in some cultures like in Spain the legal surname is typically the first family name, which might be the first of two or three family names.

pezezin · on April 8, 2020

What do you mean "the legal surname"? I'm Spanish and my legal name is my full name with both family names. You can't just ignore one. Also, Spanish people have two family names, but they can be composite, so they may look as three or four family names.

However, most people are usually known by one family name, but which one is used depends on which one is more easily recognizable.

rumanator · on April 8, 2020

> What do you mean "the legal surname"? I'm Spanish and my legal name is my full name with both family names.

I'm referring to the primer apellido. Even government forms require that people break down their surnames in first surname and all trailing surnames.

NOGDP · on April 8, 2020

From the article:

> This list is by no means exhaustive. If you need examples of real names which disprove any of the above commonly held misconceptions, I will happily introduce you to several.

...

> 40. People have names.

humanrebar · on April 8, 2020

Newborns have no known names until the birth certificate is filled out. There is still data about their medical care that needs stored in a database.

zimpenfish · on April 8, 2020

Prince, famously, didn't have what we'd recognise as a name for years. All the "Unknown Soldier"s don't have names. "John/Jane Doe" doesn't technically have a name.

goto11 · on April 8, 2020

These examples seem somewhat contrived, but of course it comes down to the use case of the database. I'm sure Prince remained registered under his legal name on contracts, checks etc, even if he used the symbol in marketing. But a database over bodies in a morgue obviously need to be able to support unidentified bodies.

MaxBarraclough · on April 8, 2020

Interesting, I think you're right. According to Wikipedia, he only changed his stage name, not his legal name. The BBC didn't seem to make the distinction.

https://en.wikipedia.org/wiki/Prince_(musician) , https://www.bbc.co.uk/news/magazine-36107590

jayd16 · on April 8, 2020

Some oauth systems will give you a user id or an email but not permission to see a name, for example.

ddrt · on April 8, 2020

How do you store addresses? There’s a use case for breaking things up.

henrikschroder · on April 8, 2020

If I had a dime for address forms that have "state" as a mandatory field, while having a country dropdown that contains countries that do not have states, I'd be rich, RICH I TELL YOU!