
On 30/09/16 7:17 PM, Joachim Durchholz wrote:
There is a single standard representation. [for strings in Java] I'm not even aware of a second one, and I've been programming Java for quite a while now Unless you mean StringBuilder/StringBuffer (that would be three String types then).
StringBuffer is just a synchronized version of StringBuilder. However, these classes are by no means "preferred" in
practice: the vast majority of APIs demands and returns String objects.
The Java *compiler* prefers StringBuilder: when you write a string concatenation expression in Java the compiler creates a StringBuilder behind the scenes. I'm counting a class as "preferred" if the compiler *has* to know about it and generates code involving it without the programmer explicitly mentioning it.
Even then, Java has its preferred string representation nailed down pretty strongly: a hidden array of 16-bit Unicode code points, referenced by a descriptor object (the actual String), immutable.
As already noted, that representation changed internally. And that change is actually relevant to this thread. The representation that _used_ to be used was (char[] array, offset, length, hash) Amongst other things, this meant that taking a substring cost O(1) time and O(1) space, because you just had to allocate and initialise a new "descriptor object" sharing the underlying array. Since Java 1.7 the representation is (char[] array, hash) Amongst other things, this means that taking a substring n characters long now costs O(n) time and O(n) space. If you are working in a loop like while (there is more input) { read a chunk of input split it into substrings process some of the substrings } the pre-Java-1.7 representation is perfect. If you *retain* some of the substrings, however, you retain the whole chunk. That was easy to fix by doing retain(new String(someSubstring)) instead of retain(someSubstring) but you had to *know* to do it. (Another solution would be to have a smarter garbage collector that knew about string sharing and could compact strings. I wrote such a collector for XPL many years ago. It's quite easy to do a stop-and- copy garbage collector that does that. But that's not the state of the art in Java garbage collection, and I'm not sure how well string compaction would fit into a more advanced collector.) The Java 1.7-and-later representation is *safer*. Depending on your usage, it may either save a lot of memory or bloat your memory use. The point is that there is no one-size-fits-all string representation; being given only one forces you to either write your own additional representation(s) or to use a representation which is not really suited to your particular purpose.