Previously, we implemented basic browser selection operations based on the Range and Selection objects, and designed two selection models RawRange and Range based on the editor's data model. Here we need to associate browser selection with editor selection to determine the operation range when applying changes, essentially needing controlled selection synchronization based on the DOM.
Articles about implementing a rich text editor project from scratch:
The main goal currently is to synchronize the browser selection with the editor selection model, aiming to achieve controlled DOM selection synchronization. In reality, there are many considerations to be made here. DOM nodes are quite complex, especially in supporting plugin-based rendering modes. How to normalize them and address issues like controlled rendering with ContentEditable are key aspects to ponder.
Let's first tackle the simplest selection synchronization issue, focusing on selecting text nodes. To illustrate text selection operations in browsers, in the example below, we can retrieve the position of the text fragment 23. Here, firstChild represents a Text node, a node type with a value of Node.TEXT_NODE, enabling calculation of the text content fragment.
In the editor selection model, we have defined Range and RawRange objects to represent the editor's selection state. The design of RawRange objects aligns with the selection design of the Quill editor. Since selection design typically relies on data structure design, RawPoint objects directly maintain the initial offset value.
The Range object selection design is directly based on the editor's state implementation. It uses the Point object to maintain line index and inline offset, while the Range object preserves the starting and ending points of the selection. The interval in the Range object always points from start to end, with isBackward marking whether the selection is reversed.
The primary objective of selection synchronization here is to utilize ContentEditable for content input and leverage the browser's selection model for text selection effects, without the need for additional maintenance of input for input handling or custom selection for text selection effects. Therefore, relying on more browser capabilities necessitates a substantial amount of logic to achieve controlled model synchronization.
Throughout this process, we need to accomplish bidirectional conversion. When the browser selection changes, we must obtain the latest DOM selection and convert it to the Model selection. Conversely, in scenarios like editor content changes or setting selections actively, we need to convert the editor selection to the browser selection and apply it to the DOM nodes.
Our editor is essentially aimed at achieving a structure similar to slate. We want the core logic to be separate from the view, so the implementation of selection and rendering needs to be done within the react package. The related state management is handled within the core logic. Here we can refer to the selection implementation in quill and slate, and summarize the following implementations:
slate and quill focus more on handling points like Point. In quill, the final step is to subtract points to obtain the length. But before this final step, the concept of Point is being processed, as browser selections are also based on Anchor and Focus points. Hence, the implementation needs to inherit this concept.slate and quill normalize the browser selection to align it with text nodes and calculate offsets. Since rich text predominantly revolves around text nodes, this normalization is crucial for correct offset calculation.quill, which has a custom view layer, nodes are maintained within Blot. Hence, mapping the browser selection to quill selection is relatively straightforward. On the other hand, slate uses React for the view layer, making the mapping process more complex. Therefore, in slate, you will see many nodes resembling data-slate-leaf, used by slate for calculations, not just for selections.Range is inefficient. Hence, a mapping is needed during rendering, where real DOM nodes are mapped to objects containing key, offset, length, etc. This is where WeakMap comes in handy, allowing direct retrieval of node information using the DOM node as a key.First, we implement the logic of synchronizing the browser selection with the editor selection, referred to as DOM selection and Model selection. Since we call it DOM selection, we must base it on DOM nodes to retrieve selection information. Handling selection for text nodes, commonly found in a ContentEditable state, is straightforward. It involves getting a StaticRange object from Selection and converting it to a Model selection based on the editor's built-in state.
For non-text node selections, especially in scenarios involving mixed media content like images or videos, handling selection positions can be more complex. Similarly, dealing with collapsed or reversed selections requires proper marking and consideration when converting them from DOM to Model selections.
Moreover, ensuring compatibility with various browser events like double-clicking to select words or triple-clicking to adjust selections on different nodes is crucial. Handling scenarios where modifier keys like alt are pressed along with movement keys or deleting content also requires attention.
Starting from the OnSelectionChange event callback, we need to extract the Selection object and the static range object. Note that browsers like Firefox support multiple selection segments, which needs proper handling, especially focusing on the initial segment.
Next, we need to determine if the current selection is inside the editor container node because if the selection is not within the editor, we should ignore it. Then, we need to check if the current selection needs to be in a backward state, and this check is straightforward since the nodes and offsets provided by the Selection object and Range objects are consistent, so we just need to check their equivalence.
Now, we move on to the crucial part — the Range object is implemented based on Node nodes, in other words, the Range object is similar to the mathematical definition of intervals, based on the starting node. Therefore, we need to handle the conversion of the selection based on nodes, normalize the selection nodes, and transform them into model nodes. We will look into how to handle a collapsed selection as an example.
The normalizeDOMPoint method is used to standardize and handle nodes because DOM nodes can be of various types and complexities. We need to handle these cases, especially for non-text node types. For pure text selection types, usually, we only need to map the model selection nodes through rendering the corresponding node state.
When dealing with non-text selection nodes, we need a relatively more complex handling approach. First, we must clarify the design of our editor's selection nodes. For nodes such as images, we position a zero-width character text node to place the cursor. This enables us to standardize the processing. Similarly, for line break nodes, we also use zero-width characters for processing, like how Typora uses the <br> node for handling.
Clearly, if a non-text node is selected, we need to locate the internally marked zero-width character nodes. In such cases, we can only handle this iteratively until we reach the target node. In theory, for non-text nodes, the browser's selection falls on the outermost contenteditable=false node. Thus, considering hierarchical searching should suffice.
The getEditableChildAndIndex method is used to iterate through all child nodes to find a nearby editable node and index in the parent. Additionally, this method prioritizes the search direction; when both forward and backward searches are unsuccessful, it can only return the last searched node and index.
As for the toModelPoint method, it is responsible for transforming the standardized DOMPoint node into a ModelPoint. In this method, we need to retrieve the data-leaf and data-node nodes that mark the rendering model based on the text node. These nodes are essentially used for state mapping. Once we have the state representation, we can then calculate the model selection.
In the case of handling special Cases mentioned above, let's first address the end-of-line \n node type. When the current node is data-zero-enter, it needs to be adjusted to the end of the previous node. The reason behind this adjustment is to rectify the offset discrepancies resulting from calculating the selection. While there should be only one position for a cursor point, the \n node technically offers two positions, leading to an additional offset. Hence, the extra handling is necessary.
In fact, there is another interesting issue here. Our goal is to standardize the handling of cursor points within the editor as plain text. Nodes like those in Typora, which use <br> for line breaks instead of zero-width characters, only have a single insert value of 0. This discrepancy arises from the inherent design differences of editors, necessitating varied DOM format handling.
Thus far, we have implemented the logic to convert browser nodes in plain text to the editor's selection model. Of course, many details and special Case handling are omitted here, especially in the toModelPoint method. As for non-text nodes, such as images or video nodes, we will delve into their treatment when implementing Void nodes subsequently.
When the browser selection changes are synchronized with the editor's selection, the synchronization of the selection is not yet complete. Even though we seem to have calculated the model selection position based on the browser selection, it might not be the precise position needed. Given that we are dealing with input content in the editor, it is essential to ensure that the selection/cursor is at the controlled position.
While perceiving the position solely in the model selection might suffice for read-only mode, it falls short when inputting content in editing mode. In this case, we need to synchronize the model selection position with the desired DOM nodes to uphold the controlled principle. Additionally, features like input cursor tracking and inline toolbar all rely on the capability to actively set the selection.
Thus, the flow transitions to browser selection change -> editor selection change -> browser selection setting. Here, we easily encounter a problem where the selection setting becomes a loop; browser change triggers editor selection setting, which then alters the browser selection, resulting in continued selection synchronization. To address this, we need to introduce a condition to prevent setting the selection when there is no change.
Of course, fundamentally, it is due to dragging the mouse to move the selection, causing the selection to be continuously reset and conflicting with the selection synchronized with us. Thus, we can avoid actively setting the selection in the state after the mouse is pressed. Additionally, since releasing the mouse button may not necessarily result in a change in selection, we need to set the selection again when the mouse button is released.
After discussing the synchronization logic between the browser selection and the editor selection, let's implement the toDOMRange method to convert ModelRange into DOMRange. In practice, the implementation here may not be as complex as toModelRange because our model range is in a simple format, unlike the complex DOM structure, and the actual corresponding DOM is controlled by the state module.
The toDOMPoint method is a rather complex implementation. We need to fetch the current line state and leaf state from the editor's status module and then obtain the corresponding DOM based on the state mapping. The mapping of DOM nodes here is established in the react package, which essentially deals with DOM related implementations, forming part of the design rules we must adhere to.
By mapping states to nodes, we can access their corresponding nodes. However, the retrieved nodes may not always be reliable, so some fallback measures are required. The subsequent logic involves finding all leaf node containers DOM based on the LineNode, then calculating offsets according to the text length of each leaf node to determine the corresponding node and offset position.
When setting the editor selection, we need to separate the logic for setting the model selection and setting the browser selection. The main reason for this design is that we can handle the DOM selection changes after the browser in batches. Also, when inputting content, we will uniformly handle selection changes when applying, and then update the DOM selection after asynchronous rendering in the view layer.
So far, we have converted the editor's selection model to specific DOM nodes and offsets in the browser, so we can now set it on the browser using browser API. The logic handling that follows needs to be based on the selection scenarios, execution constraints mentioned at the beginning, and a lot of Case handling, which we will describe later.
Previously, we implemented basic selection operations based on browser's selection API, and designed an editor model selection expression based on editor state to define the operation range when applying changes in the editor. Here, we have implemented bidirectional synchronization between editor selection and model selection to achieve controlled selection operations, which is a crucial foundational capability in the editor.
Next, on top of the editor selection module, we need to implement content input in the editor based on browser's BeforeInput event and Compositing related events. Editor input is a complex issue that requires handling of the complex default behavior of ContentEditable DOM structure, as well as compatibility with various input scenarios of IME input methods.