You can split it yourself. If you can't, replace Shorts with Ints in the implementation and it would just work, but I would be very happy to know your usecase.
Just bumping the pointer size to cover relatively rare usecases is wasteful. It can be partially mitigated with more tags and tricks, but it still would be wasteful. A tiny chunking layer is easy to implement and I don't see any downsides in that.
Presumably 4 bytes dedicated to the keys would be dwarfed by any strings thrown into the dataset.
Regardless, other than complexity, would there be any reason to not support a dynamic key size? You could dedicate the first 2 bits on the key to the length of the key. 1 byte would work if there's only 64 keys, 2 bytes would give you 16k keys and 3 4M. And if you wanted to you could use a frequency table to order the pointers such that more frequently used keys are smaller values in the dictionary.
Most of the data the library originally was written for consists of small objects and arrays with high levels of duplication (think state of the world in a videogame with tons of slightly varying objects). Pointer sizes really matter.
You can split it yourself. If you can't, replace Shorts with Ints in the implementation and it would just work, but I would be very happy to know your usecase.
Just bumping the pointer size to cover relatively rare usecases is wasteful. It can be partially mitigated with more tags and tricks, but it still would be wasteful. A tiny chunking layer is easy to implement and I don't see any downsides in that.