As mentioned earlier, the design of the data model is the foundation module of the editor, directly influencing the representation of the selection module. The design of the selection module is also a fundamental part of the editor. When the application of the editor changes, the expression of the operation scope requires implementation based on the selection model. In other words, the meaning represented by the selection is for the editor to perceive in what range the change commands should be executed.
Articles about implementing a rich text editor project from scratch:
The design of the data model directly influences the expression of the editor's selection model. For example, the Quill and Slate editor's model selection implementations in the following examples are closely related to the data structures they maintain. However, regardless of the editor's design data model, it needs to be implemented based on the browser's selection, so in this article, we will first implement the basic operations of the browser selection model.
In reality, the concept of selection is quite abstract, but we should be dealing with it frequently. For example, when dragging part of the text content with the mouse, that part will carry a light blue background, which represents the range of the selection. Similarly, we may refer to it as selected, dragged blue, selected range, and so on; this is the selection capability provided by the browser.
In addition to the blue background of the selected text, the blinking cursor is also a form of selection representation. The selection range of the cursor is a point, or can be referred to as a collapsed selection. The cursor representation usually only appears in editable elements, such as input boxes, text areas, ContentEditable elements, etc. If in a non-editable element, the cursor is in an invisible state, but it still exists in reality.
The operations of the browser selection mainly rely on the Range and Selection objects. The Range object represents a fragment of the document containing nodes and partial text nodes, while the Selection object represents the text range selected by the user or the current position of the cursor symbol.
The Range object is similar to the mathematical concept of an interval, meaning it represents a continuous content range. Mathematically, an interval can be represented by two points, similarly, the expression of the Range object is from startContainer to endContainer, so the selection must be continuous to be properly represented. The properties of a Range instance object are as follows:
startContainer: Represents the starting node of the selection.startOffset: Represents the starting offset of the selection.endContainer: Represents the ending node of the selection.endOffset: Represents the ending offset of the selection.collapsed: Indicates whether the start and end positions of the Range are the same, i.e., the collapsed state.commonAncestorContainer: Represents the common ancestor node of the selection, the node that fully contains startContainer and endContainer at the deepest level.The Range object also has many methods, with the most commonly used in our editor being setting the start position of the selection setStart, setting the end position setEnd, and getting the rectangle position of the selection getBoundingClientRect. In the example below, we can obtain the position of the text fragment 23:
Getting the rectangle position of a text fragment is a very important application, this way we can achieve getting the position of incomplete DOM elements, without necessarily having to obtain rectangle positions through the HTML DOM, which is very useful for implementing effects like word highlighting. Additionally, it's important to note that the firstChild here is a Text node, with the value being of type Node.TEXT_NODE, to calculate the text fragment.
Since textual nodes can be set, naturally there are also states for non-textual nodes. When calling to set a selection, if the node types are Text, Comment, or CDATASection, then the offset refers to the character offset from the end node. For other node types, the offset refers to the offset of the child nodes from the end node.
In the example below, what we are doing is setting the selection range to a node that is not a text content. In this case, the outermost $1 node acts as the parent node, with 2 child nodes. Therefore, the offset can be set within the range of 0-2. If it is set to 3 at this point, an exception will be thrown directly. Here, it is similar to setting the offset for a text selection, but the difference is that for a text selection, it must be a text node, while for a non-text selection, it is the parent node.
The main purpose of constructing a Range object is to obtain the position of related DOM elements for calculation. Another common requirement is to implement content highlighting. Usually, this requires us to actively calculate positions to create a virtual layer. In newer browsers, the ::highlight pseudo-element has been implemented, which allows us to achieve highlighting effects using the browser's native implementation. In the example below, the text fragment 23 will appear with a background color.
The Selection object represents the text range selected by the user or the current position of the cursor, which represents a text selection on the page that may span multiple elements. In reality, the Selection in the browser is composed of Range objects. The main properties of the Selection object are as follows:
anchorNode: represents the starting node of the selection.anchorOffset: represents the starting offset of the selection.focusNode: represents the ending node of the selection.focusOffset: represents the ending offset of the selection.isCollapsed: indicates whether the starting and ending positions of the selection are the same, i.e., collapsed state.rangeCount: indicates the number of Range objects included in the selection.Users may select text from left to right or from right to left. The anchor points to where the user begins the selection, while the focus points to where the user ends the selection. The concepts of anchor and focus should not be confused with the starting position startContainer and ending position endContainer of the selection. The Range object always points from the start node to the end node.
We can listen for changes in the selection by using the selectionchange event. In the event callback for selection changes, we can retrieve the current selection object state using window.getSelection(). The selection instance obtained with getSelection is a singleton object, meaning the reference is to the same instance, but its internal values change.
Although the W3C standard does not strictly require a singleton, major browsers such as Chrome, Firefox, and Safari implement it as a single instance. However, for compatibility and to handle null states, when we actually need to use the selection object, we typically get the selection state in real-time through getSelection. Additionally, the properties of the Selection object are not enumerable, and the spread operator is not valid.
In an editor, it's essential to handle selection operations. This can be achieved using the addRange method of the Selection object, typically preceded by using the removeAllRanges method to clear any existing selection. It's important to note that this method cannot handle selections in a backward direction.
Therefore, when setting selections, it's common practice to use setBaseAndExtent to achieve this. For forward selections, simply set the start and end directions of the nodes as base and extent. For backward selections, set the start and end directions of the nodes as extent and base, thus achieving the effect of a backward selection. The parameters for DOM nodes are essentially the same as those for Range.
The setting of selections can be achieved through the above-mentioned APIs. While it is possible to obtain selections using focus and anchor, the backward state is not explicitly marked on the Selection object. Therefore, it is necessary to use the getRangeAt method to retrieve the built-in Range object inside the selection to compare it with the original object's state.
The rangeCount property on the Selection object indicates the number of Range objects contained within the selection. Typically, only the first selection needs to be retrieved. Special attention needs to be paid to the conditional check here because if there is no current selection, i.e., when rangeCount is 0, attempting to retrieve the built-in selection object directly will result in an exception.
Moreover, you may wonder why the rangeCount property exists when we usually only select a single continuous range. It is worth noting that in Firefox, multiple selections can be made by holding down the Ctrl key. For better browser compatibility, however, handling the first selection is usually sufficient.
Selection handling also offers an interesting control effect. Normally, a ContentEditable element should be selected as a whole, with its internal text nodes appearing as selected. If there is a need to prevent the selection of text content, the user-select property can be used for this purpose. As selections rely on edges of nodes, a special disjointed selection effect can be achieved.
Additionally, in the following example, childNodes is used instead of children to access the status of child nodes. While childNodes retrieves a collection of all child nodes, including text nodes, comments, etc., children only retrieves a collection of element nodes. Therefore, extra care is needed when setting selections.
Lastly, we introduced the ::highlight pseudo-element in relation to Range objects. Here, we also introduce the ::selection pseudo-element. ::selection is used to apply styles to the portion of text selected by the user. By setting styles for ::selection, such as background color and font color, control over the selection's appearance can be achieved.
Here's an intriguing scenario – when developing browser extensions or scripts, if there is a need to restore the original background color for the ::selection pseudo-element, setting it to transparent is not effective. This would result in the disappearance of the selection's background color. In fact, the default light blue background is retained by using the keyword highlight, though using a color like #BEDAFF is also viable.
Although selections are not directly related to editable elements, our goal is to implement a rich text editor, so we need to handle the selection state within ContentEditable elements. But before that, let's first focus on basic selection operations for input elements.
Next, let's look at the implementation in a ContentEditable element. Essentially, the selection operation doesn't vary whether it's an editable element or not. In editable elements, the cursor is explicitly visible, also known as the insertion point. In non-editable elements, although the cursor is invisible, selection events and movements are still real.
Initially, I wanted to create a pure Blocks editor. Currently, I haven't found a good editor implementation to reference, mainly because similar editors are complex, making it hard to understand without relevant articles. Back then, I was more inclined towards the quill-delta data structure because of its excellent support for collaboration and well-structured representation of diff and ops.
Therefore, the initial idea was to implement nested Blocks using multiple Quill Editor instances. However, this approach comes with several challenges, requiring the disabling of many default editor behaviors and reimplementation. For instance, managing History, Enter for line breaks, selection transformations, etc., which indicates numerous points to focus on. Compared to building an editor from scratch, dealing with various browser compatibility events and handling input events, this management approach seemed acceptable.
One important aspect to consider here is that a Blocks editor inherently requires a nested data structure to describe content. It's possible to design the editor with initially flat Block structures, storing string[] block node information for each Block for references. If the editor design doesn't favor nested structures but content refers to block structures, then those blocks can be incorporated into the Editor instance, although this approach is predominantly tied to rich text frameworks, impacting extensibility.
A Blocks editor is entirely managed by the outermost Block structure for reference relationships where references are within the children. On the other hand, a block-referencing editor needs to handle reference relationships within the editor itself, with references within the ops. Therefore, the design and implementation of the data structure heavily depend on the overall architecture of the editor. Viewing block-referencing editors as single-entry Blocks editors, where Editor instances handle all line expressions, helps identify commonalities despite differing designs.
During the actual editor implementation, I noticed clear selection strategies in browsers. In the example below, under State 1 of a ContentEditable status, it's impossible to select from Selection Line 1 to Selection Line 2. This default selection behavior by browsers renders it impractical to implement Blocks based on this model.
In Stage 2, the model state allows normal selection operations with no model-related issues. However, at this point, we encounter selection problems with Quill due to its initialization causing a mutation from <br/> to div/p states, resulting in incorrect cursor positions in the browser.
Without the ability to intervene in Quill to correct selections and lacking any DOM markers to assist in selection corrections, continuing this approach becomes challenging. Even handling state changes would require intrusion into parchment's view implementation, necessitating more complex processing.
Therefore, in this state, we may only be able to adopt the form of the Stage 3 strategy, not implementing full Blocks, but using Quill as an embedded structure editor instance. In this model state, the editor will not encounter selection offset issues, and our nested structure can also leverage Quill's Embed Blot for plugin extension with nested Block structures.
In this scenario, to achieve the target of the Blocks editor using Quill, it can only rely on the Embed Blot pattern for implementation. However, this is entirely dependent on the view layer maintained by Quill. Handling boundary cases would require further handling of the view layer with parchment, making it very cumbersome, which is also a part of the reason for implementing the editor from scratch.
Similar issues exist in editor.js as well. In its online DEMO, we find that we cannot select from pure text line 1 to line 2. Specifically, after selecting part of the text in line 1 and dragging to select part of the text in line 2, lines 1 and 2 will both be entirely selected. The Yoopta-Editor implemented based on slate has a similar problem.
This issue does not exist in the Feishu document editor. In the Feishu document, the key point is that the selection area must be within the same parent Block. The specific implementation is that when the mouse is pressed and dragged, it is in an entirely uncontrolled state. If it collides with another block, the internal selection style will be overridden by ::selection, and then the overall style of the block will apply the selected style with a class.
When releasing the mouse, the selection state will be immediately corrected to truly be within the same parent block. Being in the same block simplifies operations, whether applying changes or retrieving data from the selected segments. Iterating within the same parent block eliminates the need to recursively search based on the rendering block order, making data processing simpler.
In the end, it was discovered that TextBus neither uses the common implementation approach of ContentEditable, nor does it draw selections like CodeMirror or Monaco. From the DOM nodes of the Playground, it was found that an invisible iframe was maintained to achieve this, containing a textarea inside to handle IME input.
This implementation is quite unique because when inputting content, the text selection disappears, meaning their focus competes with each other. Let's first look at a simple example, using the focus competition between the iframe and text selection as an illustration. It can be observed that in a scenario where the iframe constantly contends for focus, we are unable to drag the text selection. It's worth mentioning that we cannot directly call focus in the onblur event, as this action will be blocked by the browser, and must be triggered asynchronously in a macro task.
In fact, this issue is a pitfall I have encountered before. Please note that our focus call directly uses $1.focus. If at this point we were to call win.focus, then we would observe that the text selection can be dragged. From this behavior, we can see that the document selections in the master and slave frames are completely independent; if the focus is within the same frame, they will compete for focus, whereas if they are in different frames, they will work normally. This illustrates the distinction between $1 and win.
One can notice that the text selection is gray at this point, and this can be styled using the ::selection pseudo-element. Furthermore, all kinds of events can be triggered normally, such as the SelectionChange event and manual selection settings. Of course, placing a textarea directly within the iframe would yield the same behavior, allowing for normal content input without interrupting IME input methods, a magical performance that works well in various browsers.
Apart from the unique implementation of TextBus, CodeMirror, Monaco/VSCode, DingTalk Documents, and Youdao Cloud Note editors all feature custom-drawn selections. Drawing custom selections requires consideration of two aspects: calculating the current cursor position and rendering the virtual selection layer. We have previously discussed this in our diff and virtual layer implementation; the common approach involves a relatively simple three-line drawing implementation. The solitary row drawing in wrapped scenarios has been observed only in the search and replace feature in Feishu Document.
Hence, the complexity lies in calculating the position of the cursor. Our editor's selection can still adhere to the browser's model for implementation; primarily obtaining the positions of anchor and focus is sufficient. In browsers, there exist APIs to realize cursor position selections using Range, which currently only VSCode utilizes, while CodeMirror and DingTalk Documents have independently implemented cursor position calculation. CodeMirror uses binary search to continually compare cursor and character positions, with line breaks in search adding a significant level of complexity.
Speaking of which, the package management in VSCode is quite interesting. VSC is an open-source application that extracts the core monaco-editor-core package. This package is then included as a dev dependency in monaco-editor, which is bundled into monaco-editor during packaging. monaco-editor is a repackage of the core to enable the editor to run within a browser web container, thus realizing the web version of VSCode.
Here we can utilize the browser-related API to calculate the position of the cursor selection. The compatibility of related API when used together is relatively good, but if used in shadow DOM, the compatibility is comparatively poor. This is based on the browser's DOM implementation. If we were to draw the selection area using Canvas, we would need to calculate based entirely on the text being drawn. This part of the implementation is not overly complex. The complexity of DOM rendering arises from the difficulty in obtaining text positions and sizes, while in Canvas, this information is typically recorded.
Here we have summarized the relevant APIs for browsers. We have implemented basic selection operations based on the Range object and Selection object, and provided specific application scenarios and examples. Additionally, we discussed the issues encountered with selection in the Blocks editor, and finally, we researched various editor implementations with custom selection drawing, providing a simple custom selection drawing example.
Next, we will start from the data model and design the representation of the editor's selection model. Then, building upon the browser's selection-related APIs, we will synchronize the editor's selection model with the browser's selection. By using the selection model as the target scope for editor operations, we will achieve fundamental editor functions such as insertion, deletion, formatting, and various boundary operations related to selections.