Previously, we implemented basic browser selection operations based on the Range
and Selection
objects, and designed two selection models RawRange
and Range
based on the editor's data model. Here we need to associate browser selection with editor selection to determine the operation range when applying changes, essentially needing controlled selection synchronization based on the DOM
.
Articles on building a rich text editor project from scratch:
The main goal currently is to synchronize the browser selection with the editor selection model, aiming to achieve controlled DOM
selection synchronization. In reality, there are many considerations to be made here. DOM
nodes are quite complex, especially in supporting plugin-based rendering modes. How to normalize them and address issues like controlled rendering with ContentEditable
are key aspects to ponder.
Let's first tackle the simplest selection synchronization issue, focusing on selecting text nodes. To illustrate text selection operations in browsers, in the example below, we can retrieve the position of the text fragment 23
. Here, firstChild
represents a Text
node, a node type with a value of Node.TEXT_NODE
, enabling calculation of the text content fragment.
In the editor selection model, we have defined Range
and RawRange
objects to represent the editor's selection state. The design of RawRange
objects aligns with the selection design of the Quill
editor. Since selection design typically relies on data structure design, RawPoint
objects directly maintain the initial offset value.
The Range
object selection design is directly based on the editor's state implementation. It uses the Point
object to maintain line index and inline offset, while the Range
object preserves the starting and ending points of the selection. The interval in the Range
object always points from start
to end
, with isBackward
marking whether the selection is reversed.
The primary objective of selection synchronization here is to utilize ContentEditable
for content input and leverage the browser's selection model for text selection effects, without the need for additional maintenance of input
for input handling or custom selection for text selection effects. Therefore, relying on more browser capabilities necessitates a substantial amount of logic to achieve controlled model synchronization.
Throughout this process, we need to accomplish bidirectional conversion. When the browser selection changes, we must obtain the latest DOM
selection and convert it to the Model
selection. Conversely, in scenarios like editor content changes or setting selections actively, we need to convert the editor selection to the browser selection and apply it to the DOM
nodes.
Our editor is essentially aimed at achieving a structure similar to slate
. We want the core logic to be separate from the view, so the implementation of selection and rendering needs to be done within the react
package. The related state management is handled within the core logic. Here we can refer to the selection implementation in quill
and slate
, and summarize the following implementations:
slate
and quill
focus more on handling points like Point
. In quill
, the final step is to subtract points to obtain the length. But before this final step, the concept of Point
is being processed, as browser selections are also based on Anchor
and Focus
points. Hence, the implementation needs to inherit this concept.slate
and quill
normalize the browser selection to align it with text nodes and calculate offsets. Since rich text predominantly revolves around text nodes, this normalization is crucial for correct offset calculation.quill
, which has a custom view layer, nodes are maintained within Blot
. Hence, mapping the browser selection to quill
selection is relatively straightforward. On the other hand, slate
uses React
for the view layer, making the mapping process more complex. Therefore, in slate
, you will see many nodes resembling data-slate-leaf
, used by slate
for calculations, not just for selections.Range
is inefficient. Hence, a mapping is needed during rendering, where real DOM nodes are mapped to objects containing key, offset, length, etc. This is where WeakMap
comes in handy, allowing direct retrieval of node information using the DOM node as a key.First, we implement the logic of synchronizing the browser selection with the editor selection, referred to as DOM
selection and Model
selection. Since we call it DOM
selection, we must base it on DOM nodes to retrieve selection information. Handling selection for text nodes, commonly found in a ContentEditable
state, is straightforward. It involves getting a StaticRange
object from Selection
and converting it to a Model
selection based on the editor's built-in state.
For non-text node selections, especially in scenarios involving mixed media content like images or videos, handling selection positions can be more complex. Similarly, dealing with collapsed or reversed selections requires proper marking and consideration when converting them from DOM
to Model
selections.
Moreover, ensuring compatibility with various browser events like double-clicking to select words or triple-clicking to adjust selections on different nodes is crucial. Handling scenarios where modifier keys like alt
are pressed along with movement keys or deleting content also requires attention.
Starting from the OnSelectionChange
event callback, we need to extract the Selection
object and the static range object. Note that browsers like Firefox support multiple selection segments, which needs proper handling, especially focusing on the initial segment.
Next, we need to determine if the current selection is inside the editor container node because if the selection is not within the editor, we should ignore it. Then, we need to check if the current selection needs to be in a backward state, and this check is straightforward since the nodes and offsets provided by the Selection
object and Range
objects are consistent, so we just need to check their equivalence.
Now, we move on to the crucial part — the Range
object is implemented based on Node
nodes, in other words, the Range
object is similar to the mathematical definition of intervals, based on the starting node. Therefore, we need to handle the conversion of the selection based on nodes, normalize the selection nodes, and transform them into model nodes. We will look into how to handle a collapsed selection as an example.
The normalizeDOMPoint
method is used to standardize and handle nodes because DOM
nodes can be of various types and complexities. We need to handle these cases, especially for non-text node types. For pure text selection types, usually, we only need to map the model selection nodes through rendering the corresponding node state.
When dealing with non-text selection nodes, we need a relatively more complex handling approach. First, we must clarify the design of our editor's selection nodes. For nodes such as images, we position a zero-width character text node to place the cursor. This enables us to standardize the processing. Similarly, for line break nodes, we also use zero-width characters for processing, like how Typora uses the <br>
node for handling.
Clearly, if a non-text node is selected, we need to locate the internally marked zero-width character nodes. In such cases, we can only handle this iteratively until we reach the target node. In theory, for non-text nodes, the browser's selection falls on the outermost contenteditable=false
node. Thus, considering hierarchical searching should suffice.
The getEditableChildAndIndex
method is used to iterate through all child nodes to find a nearby editable node and index in the parent
. Additionally, this method prioritizes the search direction; when both forward and backward searches are unsuccessful, it can only return the last searched node and index.
As for the toModelPoint
method, it is responsible for transforming the standardized DOMPoint
node into a ModelPoint
. In this method, we need to retrieve the data-leaf
and data-node
nodes that mark the rendering model based on the text node. These nodes are essentially used for state mapping. Once we have the state representation, we can then calculate the model selection.
In the case of handling special Cases
mentioned above, let's first address the end-of-line \n
node type. When the current node is data-zero-enter
, it needs to be adjusted to the end of the previous node. The reason behind this adjustment is to rectify the offset discrepancies resulting from calculating the selection. While there should be only one position for a cursor point, the \n
node technically offers two positions, leading to an additional offset. Hence, the extra handling is necessary.
In fact, there is another interesting issue here. Our goal is to standardize the handling of cursor points within the editor as plain text. Nodes like those in Typora, which use <br>
for line breaks instead of zero-width characters, only have a single insert value of 0
. This discrepancy arises from the inherent design differences of editors, necessitating varied DOM
format handling.
Thus far, we have implemented the logic to convert browser nodes in plain text to the editor's selection model. Of course, many details and special Case
handling are omitted here, especially in the toModelPoint
method. As for non-text nodes, such as images or video nodes, we will delve into their treatment when implementing Void
nodes subsequently.
When the browser selection changes are synchronized with the editor's selection, the synchronization of the selection is not yet complete. Even though we seem to have calculated the model selection position based on the browser selection, it might not be the precise position needed. Given that we are dealing with input content in the editor, it is essential to ensure that the selection/cursor is at the controlled position.
While perceiving the position solely in the model selection might suffice for read-only mode, it falls short when inputting content in editing mode. In this case, we need to synchronize the model selection position with the desired DOM
nodes to uphold the controlled principle. Additionally, features like input cursor tracking and inline toolbar all rely on the capability to actively set the selection.
Thus, the flow transitions to browser selection change -> editor selection change -> browser selection setting. Here, we easily encounter a problem where the selection setting becomes a loop; browser change triggers editor selection setting, which then alters the browser selection, resulting in continued selection synchronization. To address this, we need to introduce a condition to prevent setting the selection when there is no change.
Of course, fundamentally, it is due to dragging the mouse to move the selection, causing the selection to be continuously reset and conflicting with the selection synchronized with us. Thus, we can avoid actively setting the selection in the state after the mouse is pressed. Additionally, since releasing the mouse button may not necessarily result in a change in selection, we need to set the selection again when the mouse button is released.
After discussing the synchronization logic between the browser selection and the editor selection, let's implement the toDOMRange
method to convert ModelRange
into DOMRange
. In practice, the implementation here may not be as complex as toModelRange
because our model range is in a simple format, unlike the complex DOM structure, and the actual corresponding DOM is controlled by the state module.
The toDOMPoint
method is a rather complex implementation. We need to fetch the current line state and leaf state from the editor's status module and then obtain the corresponding DOM
based on the state mapping. The mapping of DOM
nodes here is established in the react
package, which essentially deals with DOM
related implementations, forming part of the design rules we must adhere to.
By mapping states to nodes, we can access their corresponding nodes. However, the retrieved nodes may not always be reliable, so some fallback measures are required. The subsequent logic involves finding all leaf node containers DOM
based on the LineNode
, then calculating offsets according to the text length of each leaf node to determine the corresponding node and offset position.
When setting the editor selection, we need to separate the logic for setting the model selection and setting the browser selection. The main reason for this design is that we can handle the DOM selection changes after the browser in batches. Also, when inputting content, we will uniformly handle selection changes when applying, and then update the DOM selection after asynchronous rendering in the view layer.
So far, we have converted the editor's selection model to specific DOM
nodes and offsets in the browser, so we can now set it on the browser using browser API
. The logic handling that follows needs to be based on the selection scenarios, execution constraints mentioned at the beginning, and a lot of Case
handling, which we will describe later.
Previously, we implemented basic selection operations based on browser's selection API
, and designed an editor model selection expression based on editor state to define the operation range when applying changes in the editor. Here, we have implemented bidirectional synchronization between editor selection and model selection to achieve controlled selection operations, which is a crucial foundational capability in the editor.
Next, on top of the editor selection module, we need to implement content input in the editor based on browser's BeforeInput
event and Compositing
related events. Editor input is a complex issue that requires handling of the complex default behavior of ContentEditable
DOM structure, as well as compatibility with various input scenarios of IME input methods.