As mentioned earlier, the design of the data model is the foundation module of the editor, directly influencing the representation of the selection module. The design of the selection module is also a fundamental part of the editor. When the application of the editor changes, the expression of the operation scope requires implementation based on the selection model. In other words, the meaning represented by the selection is for the editor to perceive in what range the change commands should be executed.
Articles about implementing a rich text editor project from scratch:
The design of the data model directly influences the expression of the editor's selection model. For example, the Quill
and Slate
editor's model selection implementations in the following examples are closely related to the data structures they maintain. However, regardless of the editor's design data model, it needs to be implemented based on the browser's selection, so in this article, we will first implement the basic operations of the browser selection model.
In reality, the concept of selection is quite abstract, but we should be dealing with it frequently. For example, when dragging part of the text content with the mouse, that part will carry a light blue background, which represents the range of the selection. Similarly, we may refer to it as selected, dragged blue, selected range, and so on; this is the selection capability provided by the browser.
In addition to the blue background of the selected text, the blinking cursor is also a form of selection representation. The selection range of the cursor is a point, or can be referred to as a collapsed selection. The cursor representation usually only appears in editable elements, such as input boxes, text areas, ContentEditable
elements, etc. If in a non-editable element, the cursor is in an invisible state, but it still exists in reality.
The operations of the browser selection mainly rely on the Range
and Selection
objects. The Range
object represents a fragment of the document containing nodes and partial text nodes, while the Selection
object represents the text range selected by the user or the current position of the cursor symbol.
The Range
object is similar to the mathematical concept of an interval, meaning it represents a continuous content range. Mathematically, an interval can be represented by two points, similarly, the expression of the Range
object is from startContainer
to endContainer
, so the selection must be continuous to be properly represented. The properties of a Range
instance object are as follows:
startContainer
: Represents the starting node of the selection.startOffset
: Represents the starting offset of the selection.endContainer
: Represents the ending node of the selection.endOffset
: Represents the ending offset of the selection.collapsed
: Indicates whether the start and end positions of the Range
are the same, i.e., the collapsed state.commonAncestorContainer
: Represents the common ancestor node of the selection, the node that fully contains startContainer
and endContainer
at the deepest level.The Range
object also has many methods, with the most commonly used in our editor being setting the start position of the selection setStart
, setting the end position setEnd
, and getting the rectangle position of the selection getBoundingClientRect
. In the example below, we can obtain the position of the text fragment 23
:
Getting the rectangle position of a text fragment is a very important application, this way we can achieve getting the position of incomplete DOM
elements, without necessarily having to obtain rectangle positions through the HTML DOM
, which is very useful for implementing effects like word highlighting. Additionally, it's important to note that the firstChild
here is a Text
node, with the value being of type Node.TEXT_NODE
, to calculate the text fragment.
Since textual nodes can be set, naturally there are also states for non-textual nodes. When calling to set a selection, if the node types are Text
, Comment
, or CDATASection
, then the offset
refers to the character offset from the end node. For other node types, the offset
refers to the offset of the child nodes from the end node.
In the example below, what we are doing is setting the selection range to a node that is not a text content. In this case, the outermost $1
node acts as the parent node, with 2
child nodes. Therefore, the offset
can be set within the range of 0-2
. If it is set to 3
at this point, an exception will be thrown directly. Here, it is similar to setting the offset
for a text selection, but the difference is that for a text selection, it must be a text node, while for a non-text selection, it is the parent node.
The main purpose of constructing a Range
object is to obtain the position of related DOM
elements for calculation. Another common requirement is to implement content highlighting. Usually, this requires us to actively calculate positions to create a virtual layer. In newer browsers, the ::highlight
pseudo-element has been implemented, which allows us to achieve highlighting effects using the browser's native implementation. In the example below, the text fragment 23
will appear with a background color.
The Selection
object represents the text range selected by the user or the current position of the cursor, which represents a text selection on the page that may span multiple elements. In reality, the Selection
in the browser is composed of Range
objects. The main properties of the Selection
object are as follows:
anchorNode
: represents the starting node of the selection.anchorOffset
: represents the starting offset of the selection.focusNode
: represents the ending node of the selection.focusOffset
: represents the ending offset of the selection.isCollapsed
: indicates whether the starting and ending positions of the selection are the same, i.e., collapsed state.rangeCount
: indicates the number of Range
objects included in the selection.Users may select text from left to right or from right to left. The anchor
points to where the user begins the selection, while the focus
points to where the user ends the selection. The concepts of anchor
and focus
should not be confused with the starting position startContainer
and ending position endContainer
of the selection. The Range
object always points from the start
node to the end
node.
We can listen for changes in the selection by using the selectionchange
event. In the event callback for selection changes, we can retrieve the current selection object state using window.getSelection()
. The selection instance obtained with getSelection
is a singleton object, meaning the reference is to the same instance, but its internal values change.
Although the W3C
standard does not strictly require a singleton, major browsers such as Chrome
, Firefox
, and Safari
implement it as a single instance. However, for compatibility and to handle null
states, when we actually need to use the selection object, we typically get the selection state in real-time through getSelection
. Additionally, the properties of the Selection
object are not enumerable, and the spread operator is not valid.
In an editor, it's essential to handle selection operations. This can be achieved using the addRange
method of the Selection
object, typically preceded by using the removeAllRanges
method to clear any existing selection. It's important to note that this method cannot handle selections in a backward direction.
Therefore, when setting selections, it's common practice to use setBaseAndExtent
to achieve this. For forward selections, simply set the start
and end
directions of the nodes as base
and extent
. For backward selections, set the start
and end
directions of the nodes as extent
and base
, thus achieving the effect of a backward selection. The parameters for DOM
nodes are essentially the same as those for Range
.
The setting of selections can be achieved through the above-mentioned APIs. While it is possible to obtain selections using focus
and anchor
, the backward
state is not explicitly marked on the Selection
object. Therefore, it is necessary to use the getRangeAt
method to retrieve the built-in Range
object inside the selection to compare it with the original object's state.
The rangeCount
property on the Selection
object indicates the number of Range
objects contained within the selection. Typically, only the first selection needs to be retrieved. Special attention needs to be paid to the conditional check here because if there is no current selection, i.e., when rangeCount
is 0
, attempting to retrieve the built-in selection object directly will result in an exception.
Moreover, you may wonder why the rangeCount
property exists when we usually only select a single continuous range. It is worth noting that in Firefox
, multiple selections can be made by holding down the Ctrl
key. For better browser compatibility, however, handling the first selection is usually sufficient.
Selection handling also offers an interesting control effect. Normally, a ContentEditable
element should be selected as a whole, with its internal text nodes appearing as selected. If there is a need to prevent the selection of text content, the user-select
property can be used for this purpose. As selections rely on edges of nodes, a special disjointed selection effect can be achieved.
Additionally, in the following example, childNodes
is used instead of children
to access the status of child nodes. While childNodes
retrieves a collection of all child nodes, including text nodes, comments, etc., children
only retrieves a collection of element nodes. Therefore, extra care is needed when setting selections.
Lastly, we introduced the ::highlight
pseudo-element in relation to Range
objects. Here, we also introduce the ::selection
pseudo-element. ::selection
is used to apply styles to the portion of text selected by the user. By setting styles for ::selection
, such as background color and font color, control over the selection's appearance can be achieved.
Here's an intriguing scenario – when developing browser extensions or scripts, if there is a need to restore the original background color for the ::selection
pseudo-element, setting it to transparent
is not effective. This would result in the disappearance of the selection's background color. In fact, the default light blue background is retained by using the keyword highlight
, though using a color like #BEDAFF
is also viable.
Although selections are not directly related to editable elements, our goal is to implement a rich text editor, so we need to handle the selection state within ContentEditable
elements. But before that, let's first focus on basic selection operations for input
elements.
Next, let's look at the implementation in a ContentEditable
element. Essentially, the selection operation doesn't vary whether it's an editable element or not. In editable elements, the cursor is explicitly visible, also known as the insertion point. In non-editable elements, although the cursor is invisible, selection events and movements are still real.
Initially, I wanted to create a pure Blocks
editor. Currently, I haven't found a good editor implementation to reference, mainly because similar editors are complex, making it hard to understand without relevant articles. Back then, I was more inclined towards the quill-delta
data structure because of its excellent support for collaboration and well-structured representation of diff
and ops
.
Therefore, the initial idea was to implement nested Blocks
using multiple Quill Editor
instances. However, this approach comes with several challenges, requiring the disabling of many default editor behaviors and reimplementation. For instance, managing History
, Enter
for line breaks, selection transformations, etc., which indicates numerous points to focus on. Compared to building an editor from scratch, dealing with various browser compatibility events and handling input events, this management approach seemed acceptable.
One important aspect to consider here is that a Blocks
editor inherently requires a nested data structure to describe content. It's possible to design the editor with initially flat Block
structures, storing string[]
block node information for each Block
for references. If the editor design doesn't favor nested structures but content refers to block structures, then those blocks can be incorporated into the Editor
instance, although this approach is predominantly tied to rich text frameworks, impacting extensibility.
A Blocks
editor is entirely managed by the outermost Block
structure for reference relationships where references are within the children
. On the other hand, a block-referencing editor needs to handle reference relationships within the editor itself, with references within the ops
. Therefore, the design and implementation of the data structure heavily depend on the overall architecture of the editor. Viewing block-referencing editors as single-entry Blocks
editors, where Editor
instances handle all line expressions, helps identify commonalities despite differing designs.
During the actual editor implementation, I noticed clear selection strategies in browsers. In the example below, under State 1
of a ContentEditable
status, it's impossible to select from Selection Line 1
to Selection Line 2
. This default selection behavior by browsers renders it impractical to implement Blocks
based on this model.
In Stage 2
, the model state allows normal selection operations with no model-related issues. However, at this point, we encounter selection problems with Quill
due to its initialization causing a mutation from <br/>
to div/p
states, resulting in incorrect cursor positions in the browser.
Without the ability to intervene in Quill
to correct selections and lacking any DOM markers to assist in selection corrections, continuing this approach becomes challenging. Even handling state changes would require intrusion into parchment
's view implementation, necessitating more complex processing.
Therefore, in this state, we may only be able to adopt the form of the Stage 3
strategy, not implementing full Blocks
, but using Quill
as an embedded structure editor instance. In this model state, the editor will not encounter selection offset issues, and our nested structure can also leverage Quill
's Embed Blot
for plugin extension with nested Block
structures.
In this scenario, to achieve the target of the Blocks
editor using Quill
, it can only rely on the Embed Blot
pattern for implementation. However, this is entirely dependent on the view layer maintained by Quill
. Handling boundary cases would require further handling of the view layer with parchment
, making it very cumbersome, which is also a part of the reason for implementing the editor from scratch.
Similar issues exist in editor.js
as well. In its online DEMO
, we find that we cannot select from pure text line 1
to line 2
. Specifically, after selecting part of the text in line 1
and dragging to select part of the text in line 2
, lines 1
and 2
will both be entirely selected. The Yoopta-Editor
implemented based on slate
has a similar problem.
This issue does not exist in the Feishu document editor. In the Feishu document, the key point is that the selection area must be within the same parent Block
. The specific implementation is that when the mouse is pressed and dragged, it is in an entirely uncontrolled state. If it collides with another block, the internal selection style will be overridden by ::selection
, and then the overall style of the block will apply the selected style with a class
.
When releasing the mouse, the selection state will be immediately corrected to truly be within the same parent block. Being in the same block simplifies operations, whether applying changes or retrieving data from the selected segments. Iterating within the same parent block eliminates the need to recursively search based on the rendering block order, making data processing simpler.
In the end, it was discovered that TextBus
neither uses the common implementation approach of ContentEditable
, nor does it draw selections like CodeMirror
or Monaco
. From the DOM
nodes of the Playground
, it was found that an invisible iframe
was maintained to achieve this, containing a textarea
inside to handle IME
input.
This implementation is quite unique because when inputting content, the text selection disappears, meaning their focus competes with each other. Let's first look at a simple example, using the focus competition between the iframe
and text selection as an illustration. It can be observed that in a scenario where the iframe
constantly contends for focus, we are unable to drag the text selection. It's worth mentioning that we cannot directly call focus
in the onblur
event, as this action will be blocked by the browser, and must be triggered asynchronously in a macro task.
In fact, this issue is a pitfall I have encountered before. Please note that our focus call directly uses $1.focus
. If at this point we were to call win.focus
, then we would observe that the text selection can be dragged. From this behavior, we can see that the document selections in the master and slave frames are completely independent; if the focus is within the same frame, they will compete for focus, whereas if they are in different frames, they will work normally. This illustrates the distinction between $1
and win
.
One can notice that the text selection is gray at this point, and this can be styled using the ::selection
pseudo-element. Furthermore, all kinds of events can be triggered normally, such as the SelectionChange
event and manual selection settings. Of course, placing a textarea
directly within the iframe
would yield the same behavior, allowing for normal content input without interrupting IME
input methods, a magical performance that works well in various browsers.
Apart from the unique implementation of TextBus
, CodeMirror
, Monaco/VSCode
, DingTalk Documents, and Youdao Cloud Note editors all feature custom-drawn selections. Drawing custom selections requires consideration of two aspects: calculating the current cursor position and rendering the virtual selection layer. We have previously discussed this in our diff
and virtual layer implementation; the common approach involves a relatively simple three-line drawing implementation. The solitary row drawing in wrapped scenarios has been observed only in the search and replace feature in Feishu Document.
Hence, the complexity lies in calculating the position of the cursor. Our editor's selection can still adhere to the browser's model for implementation; primarily obtaining the positions of anchor
and focus
is sufficient. In browsers, there exist APIs
to realize cursor position selections using Range
, which currently only VSCode
utilizes, while CodeMirror
and DingTalk Documents have independently implemented cursor position calculation. CodeMirror
uses binary search to continually compare cursor and character positions, with line breaks in search adding a significant level of complexity.
Speaking of which, the package management in VSCode
is quite interesting. VSC
is an open-source application that extracts the core monaco-editor-core
package. This package is then included as a dev
dependency in monaco-editor
, which is bundled into monaco-editor
during packaging. monaco-editor
is a repackage of the core
to enable the editor to run within a browser web
container, thus realizing the web version of VSCode
.
Here we can utilize the browser-related API
to calculate the position of the cursor selection. The compatibility of related API
when used together is relatively good, but if used in shadow DOM
, the compatibility is comparatively poor. This is based on the browser's DOM
implementation. If we were to draw the selection area using Canvas
, we would need to calculate based entirely on the text being drawn. This part of the implementation is not overly complex. The complexity of DOM
rendering arises from the difficulty in obtaining text positions and sizes, while in Canvas
, this information is typically recorded.
Here we have summarized the relevant APIs for browsers. We have implemented basic selection operations based on the Range
object and Selection
object, and provided specific application scenarios and examples. Additionally, we discussed the issues encountered with selection in the Blocks
editor, and finally, we researched various editor implementations with custom selection drawing, providing a simple custom selection drawing example.
Next, we will start from the data model and design the representation of the editor's selection model. Then, building upon the browser's selection-related APIs, we will synchronize the editor's selection model with the browser's selection. By using the selection model as the target scope for editor operations, we will achieve fundamental editor functions such as insertion, deletion, formatting, and various boundary operations related to selections.