Previously, we summarized the interaction strategies of the browser selection model, implemented basic selection operations, and researched the implementation of custom selection. Consequently, we also need to design the representation of selections in the editor, also known as model selections. The scope of operations when applying changes in the editor is achieved based on the model selections. Here we will base the design of the model selections on the editor's state to express the structure.
Articles on implementing a rich text editor project from scratch:
The design of the selection model in the editor involves a wide range of topics since it serves as the fundamental scope for application changes or executing commands, involving extensive module interactions. Although fundamentally requiring other modules to adapt to the selection model representation, if the selection model is not clear enough, the adaptation work of other modules will become complex. Interaction naturally requires cooperation to achieve.
The selection model most directly depends on the editor's state model, and the design of the state model heavily relies on the design of data structures. Therefore, here we will start from the perspective of data structures and state models to investigate the current mainstream editor's selection model designs.
Quill
is a modern rich text editor, with good compatibility and powerful extensibility, it also provides some out-of-the-box features. The emergence of Quill
brought many new things to rich text editors, and it is currently a widely popular editor among open-source editors. Its ecosystem is very rich, and it has now released version 2.0
.
The data structure representation of Quill
is based on Delta
implementation, meaning data changes can also be based on Delta
. Representing basic rich text data structure using Delta
as shown below, we can observe that there are no nested data structures; all content and format representations are linear.
Since the data structure is represented flatly, the representation of selections does not require complex structures to implement. Intuitively, with a flat structure, there will be fewer special cases to handle, the state structure will be easier to maintain, and the editor architecture will be more lightweight. Usually, editors also construct Range
objects, so the selection structure in Quill
is as follows:
As the scope of application changes for editor modifications, it inevitably needs to address applying changes. When applying changes, traversing the state structure is naturally required to extract the nodes that need modification to actually apply changes. Therefore, naturally, the order of the flat structure itself is the rendering order, and there is no need for nested structures for recursive searches, which naturally leads to higher search efficiency.
In addition, Quill
has redesigned a model parchment
for the view layer, maintaining the view and state models. Although some class structures have been rewritten through inheritance in the repository, such as the ScrollBlot
class, it is difficult to determine many design intentions of the view modules while reading the source code, so there is currently no clear understanding of the interaction between DOM events and state models.
Slate
is a highly flexible, fully customizable rich text editor framework that provides a core model and API for developers to deeply customize the editor's behavior. It is more like an engine for rich text editors rather than out-of-the-box components. Slate
has undergone a major version refactor, and although it is currently in a beta state, many online services are already using it.
The data structure representation of Slate
is not a flat content representation. Based on its TypeScript type definitions, the Node
type is a tuple of three types: Editor | Element | Text
. Leaving out the discussion on Editor
itself, we can easily see the definitions of Element
and Text
types from the descriptions below.
The children
attribute can be considered as nodes of type Element
, while the text
attribute is nodes of type Text
. Nodes of type Element
can infinitely nest Element
type nodes as well as Text
type nodes. In this case, a flat selection expression cannot accommodate such a structure, so Slate
's selection expression is implemented using endpoints.
This selection expression should look familiar, Slate
's selection expression aligns entirely with browser selections, as its original design principle aligns with the DOM
. On a side note, Slate
's representation of data structures also aligns entirely with selections. For example, when representing an image, an empty Text
must be inserted as a zero-width character.
In addition to Void
nodes like images, for inline Void
nodes, the representation better demonstrates that the internally maintained data structure is entirely aligned with the DOM
. For instance, in the following Mention
structure, two zero-width characters are introduced on both sides in the DOM
to accommodate the cursor position, resulting in the internal data structure maintaining two Text
nodes.
Returning to the topic of the selection model, when Slate
applies data changes, it also relies on two endpoints for traversing and searching, especially during operations such as setNode
. Consequently, searching heavily depends on the rendering order. Since the document structure is two-dimensional, finding the start/end nodes by comparing path + offset
and recursively traversing from the first node to the last node suffices.
Nested data structures complicate batch changes overall, as maintaining the index relationships between Op
depends on the implementation of transform
. However, for individual Op
changes, clarity remains high, resembling the targeted handling of individual operation changes in a Delta
, while the single change under a JSON
structure is very clear.
The transform
part is also a core implementation aspect. Previously, low-code scenarios for state management based on OT-JSON
and Immer
were discussed. Typically, a single Op
change does not affect the Path
, whereas batch Op
changes need to consider the mutual relationships between them. For instance, in Slate
's remove-nodes
implementation, the influence relationships are maintained through ref
.
When it comes to the implementation approach of Slate
that requires multiple synchronous apply
, it is not suitable for immediately synchronizing editor changes, as the rendering cost is very high. Therefore, it is more appropriate to wait for all synchronization instructions to be completed and then apply changes to the view layer in batch, which is commonly known as synchronizing data changes and asynchronously rendering views.
Lexical
is a modern open-source rich text editor framework developed by Meta
, focusing on providing ultimate performance, accessibility, and reliability. As the successor to Draft.js
, it offers better scalability and flexibility. Especially, Lexical
also provides native support for iOS
, featuring an extensible text editor built on Swift/TextKit
with shared design principles and API with Lexical
.
The data structure of Lexical
is similar to Slate
, based on a tree-like nested structure to represent content. However, its overall design includes many preset contents and lacks the flexibility of Slate
as a versatile editor engine. This can be observed from the data structure representation below, which is already simplified and lacks attributes such as detail
.
Since it is a nested data structure, the selection representation in Lexical
is similar to the implementation in Slate
. However, the selection in Lexical
is more similar to ProseMirror
implementation, dividing the selection into types such as RangeSelection
, NodeSelection
, TableSelection
, and null
selection type.
The Table
and null
selection types are unique and will not be discussed for now. Regarding Range
and Node
, they can actually be represented in the same way. Node
is used for Void
types like images or dividers, while Quill
and Slate
integrate them directly into Range
representation using zero-width text characters as selection points.
Special representations naturally have special meanings. While Quill
and Slate
use zero-width characters to homogenize text selection representation, Lexical
does not have placeholder zero-width characters, making direct homogenization of text selection representation impossible. For Lexical
, the basic text RangeSelection
selection is as follows:
Here, although the data structure is similar to Slate
, the path
part has been replaced with key
. It's important to note that in the data structure shown earlier, there was no identification like key
. This key
value is a temporary state value and not maintained within the data structure. Although it may look like a number, it is actually expressed as a unique id
in the form of a string.
In fact, this kind of id
generation method exists in many places, including Slate
and our BlockKit
, and it just happens that we both use this method to generate key
values. Since our view layer is rendered through React
, it inevitably needs to maintain unique key
values, and in Lexical
, the key
value also maintains the role of state mapping.
So essentially, the key
value here maintains the relationship between id
and path
, and we should be able to directly obtain the rendering node state through the key
value. By using this method, it is actually equivalent to leveraging temporary path
to achieve comparison of any key
strings, and being able to prune towards the tail node along the corresponding nextSibling
, which reminds me of maintaining a partial order relationship.
Here, what is important is the state preservation when the key
value changes, because the content of the editor actually needs to be edited. However, achieving immutability, it is clear that mapping the key
based on the reference of the state object would cause the entire editor DOM
to be rebuilt. For example, adjusting the heading level could lead to a complete reconstruction of the line due to the change in the line's key
.
So, reusing key
values as much as possible becomes a matter to be studied. The key
at the line level of our editor is specially maintained, achieving immutability and reusing key
values. The leaf node's key
depends on the index
value, so by studying the implementation of Lexical
, we can also apply it to the maintenance of our key
values.
Through debugging in the playground
, it can be observed that even if we cannot know if it is an implementation of immutability, it is evident that the key
in Lexical
is maintained in a somewhat left-biased manner. Therefore, in our editor implementation, we can also leverage the same method to directly reuse based on the left value, and when splitting, if starting from 0
, reuse directly, otherwise create a new key
.
[123456(key1)][789(bold-key2)]
, removing the bold from 789
while keeping the key
value of the entire text as key1
.[123456789(key1)]]
, bolding the text 789
, keeping the key
value of the left side 123456
text as key1
, and assigning a new key
to 789
.[123456789(key1)]]
, bolding the text 123
, keeping the key
value of the left side 123
text as key1
, and assigning a new key
to 456789
.[123456789(key1)]]
, bolding the text 456
, keeping the key
value of the left side 123
text as key1
, and assigning new keys to 456
and 789
.Speaking of which, Lexical
is not entirely based on React
as the view layer; it merely supports the mounting of React
component nodes. The main DOM
nodes, such as bolding and heading formatting, are still implemented in the view layer, and it's easy to determine whether they are rendered using React
by checking if properties like __reactFiber$
exist in the console.
Feishu Document is a rich text editor developed based on the concept of Blocks
. As one of the core applications of Feishu, it provides powerful document editing features. Feishu Document deeply integrates components such as documents, tables, and mind maps, supports real-time collaboration among multiple users in the cloud, integrates deeply with the Feishu suite, and is highly adapted for mobile devices, making it an excellent commercial product.
Feishu Document is a commercial product and not an open-source editor, so the data structure can only be explored through the interface. By directly examining the SSR
data structure of Feishu, you can find the basic content as shown below, simply outputting DATA
on the console. Of course, since the text data is based on the EasySync
collaborative algorithm of EtherPad
, this part has also been discussed in articles on the Delta
data structure.
Of course, this data structure seems quite complex. We can directly get an overview of the entire data structure from the response example in the server-side API
of Feishu Open Platform. This part of the data structure seems to have undergone a data transformation, not a direct response. In fact, this structure can be considered similar to Slate
's tree structure, but each line is an independent instance.
Although Feishu documents appear to have a flat structure, the actual state values constructed are still tree-like structures. The id
is used to achieve a chain-like structure, enabling the construction of the entire tree structure. Therefore, for the selection structure of Feishu documents, it is naturally necessary to adapt to the data structure state model. For text selections in Feishu documents, the structure is as follows:
Since Feishu documents are designed with Blocks
block structure, maintaining selections for pure text is naturally not enough. Therefore, in block-level selection expression, Feishu's selection expression is as follows. Speaking of which, if Feishu documents perform block-level selection across levels, when the mouse is lifted, deeper-level block selections will be merged into selections at the same level.
For deep-level Blocks
-level text selections, similar to Editor.js
, it does not support selecting text across nodes; Feishu documents support selecting text across lines, but in the case of selecting text across levels, it will be promoted to block-level selections at the same level; Slate/Lexical
and other editors support selecting text across levels. Of course, they are not editors designed based on the Blocks
concept, mainly supporting nested nodes.
Editors designed based on the Blocks
concept are actually more like low-code design, equivalent to implementing a restricted low-code engine, more precisely, a no-code engine. The restriction here mainly refers to not being as flexible as a graphic drawing board with drag-and-drop operations, but rather an editor form based on certain rules, such as block components needing to occupy a single line, and structures not being able to nest arbitrarily.
In this nested data structure, there are naturally many feasible ways to express selections. Feishu documents' method of promoting selections at the same level should be more suitable because in this case, handling related data will be simpler, and it does not entirely disallow selecting text across nodes, nor does it require recursive searching to retrieve related nodes, striking a balance between data representation and performance.
After researching the selection model designs of current mainstream editors, we need to design our editor's selection model based on similar principles. First, design selection objects for the data structure, which are typically Range
objects present in editors, and then express the state model, considering this as the basis for designing the editor's state selection structure.
In the browser, the Range
object is the basic representation of selections. However, for editors, they usually implement their own Range
objects. In such cases, when implementing IOC DI
, we can independently allocate a separate name for the browser's Range
object.
So, as mentioned earlier, our editor's data structure design is based on the implementation transformation of Delta
. Therefore, the design of the Range
object can also be consistent with the selection design of the Quill
editor since the selection design directly depends on the data structure design. To distinguish our upcoming design solution, we name it RawRange
.
Since the RawRange
object actually represents a linear range, it only needs two number
s to express itself. However, in order to better express its semantics and related calling methods, such as isBefore
and isAfter
, we also have its internal representation of the Point
object.
Based on this, methods like intersects
and includes
can be implemented, which are quite important for various operations on selections. For example, the intersects
method can be used to determine the selection state of block-level nodes, because void
nodes themselves are non-textual content, and browsers do not have a selection state for them.
Since we have the design of RawRange
selection object, a corresponding design for the Range
object is naturally there. Here, the design of our Range
object is directly based on the implementation of the editor state. Therefore, one can consider that our RawRange
object is based on the data structure, while the Range
object is based on the editor state model.
Let's first look at the declaration of the Range
object, the implementation here is relatively more refined expression. In the Point
object, we maintain the line index and the inline offset, and in the Range
object, we maintain the starting point and the ending point of the selection. At this point, the interval in the Range
object always points from start
to end
, and the isBackward
is used to indicate whether the selection is in a backward state.
It is very important that the selection interval always points from start
to end
in the Range
object. This will be very useful in our subsequent synchronization of browser selection and editor selection states. Because the anchor
and focus
obtained by the browser's Selection
object are not always pointing from start
to end
, we need to obtain relevant nodes from the Selection
in order to maintain our Range
object.
From the existing data structure, the Delta
data structure does not contain any line-related data information, so we need to obtain line index and inline offset from the status maintained by the editor. Maintaining independent status changes itself is a complex matter, which we will need to look into later on. For now, let's take a look at the data maintained by various rendering node states.
Therefore, the Selection
object implemented in the browser is based on DOM
, so synchronizing the editor's selection model through DOM
nodes also needs to be dealt with. However, here, we first build the Range
object based on obtaining the selection from the status model. Since in the above implementation, we are based on the point Point
, naturally we can separate points to process intervals.
In this way, we have separated the states of LineState
and LeafState
, and we can directly retrieve the line index and inline offset of the Point
object from the relevant states. Please note that here we are obtaining the inline offset rather than the leaf node's offset. Similar to how Slate
calculates the offset based on Text
nodes, this is also a design choice for the selection model.
In other words, our current selection implementation is the L-O
implementation, which stands for Line and Offset index level implementation. This means that the Offset
here may span across multiple actual LeafState
nodes. Therefore, this Offset
will require additional iterations when implementing selection searches, providing a flexible implementation.
I have also considered implementing the L-I-O
selection model, which is similar to Slate
's data structure but simplified into 3 levels instead of allowing unlimited nesting like Slate
. The advantage of this approach is that the selection model is clearer since there is no need to loop through lines to find the selection. However, the downside is the increased complexity of the selection and the L-O
model represents a flexible and complex compromise implementation.
Lastly, we also need to implement methods for converting between RawRange
and Range
objects. Even within the editor, these conversions are frequently needed, for operations like line breaks, line deletions, etc., using Range
objects for convenience, and RawRange
objects when transforming selections based on Delta
.
Converting from RawRange
to Range
objects is akin to transforming a linear range into a two-dimensional range, thus necessitating a search of the line states for selections. Given that our LineState
object indices increase linearly, the most common approach for ordered sequence search is binary search.
The conversion from a Range
object to a RawRange
object is relatively simple, as we only need to convert the row index and the row offset to a linear offset. Since the offset information for rows is recorded on the row object at this point, we can directly retrieve the relevant values and add them together.
Previously, we implemented basic selection operations based on Range
objects and Selection
objects, illustrating specific application scenarios and examples. In contrast, here we summarize the investigation of modern rich text editor selection model designs, and based on the data model, we designed two selection models, RawRange
and Range
objects.
Next, we need to synchronize the editor selection model with the browser selection based on the representation of the editor's selection model. By using the selection model as the target range for editor operations, we can implement basic editor operations such as insertion, deletion, formatting, and many selection-related boundary operations.