
G'day all.
Quoting Marcin 'Qrczak' Kowalczyk
2. Use a persistent data structure with logarithmic cost of most operations: a balanced tree of text fragments, called a rope (Hans Boehm has made one for C). "Undo" can be made by simply keeping old versions.
Hard to implement the core data structure. If done right, the rest is easy, in particular "undo" handling is very robust. There are some overheads for all operations, but the cost of operations scales to extreme cases.
The second way is more interesting, but I don't know how to implement a rope in details.
It's quite easy, actually. Leaves of the tree are traditionally arrays of chars (with one big array representing the "original" file), though you could also have something that represents a lazily read file on disk. Internal nodes come in two varieties: - "Restrictions" have one child. A restriction represents a subsequence of the child node. - "Concatenations", which have two children. A concatenation represents two children pasted together. Using only these, you can implement any editing operation. A deletion, for example, is done by restricting the node twice and then pasting the two restrictions together. To keep the tree managable, you need to a) eagerly push restrictions as far down the tree as possible, and b) rebalance. Rather than using a traditional balanced binary tree, it's easier to keep track of the depth of the tree and restructure when it gets too deep. Note that ropes are optimised for _large_ edits. So if you're actually writing a text editor, treating the line currently being edited differently seems worth it. Cheers, Andrew Bromage