Previously, we optimized performance during content updates by minimizing the Op operation DOM changes, maintaining key values, and implementing incremental rendering in React. Next, we need to discuss component presets for editable nodes such as zero-width characters, Embed nodes, Void nodes, etc., which provide default behaviors for editor plugin extensions.
Rich text editors are called "rich" not only because they support text formatting but also because they can accommodate the insertion of images, videos, Mentions, and other content. From the outset, we've pointed out that creating a controlled editor calls for a meticulously designed DOM structure to enable accurate node searching and manipulation. Consequently, supporting these nodes still requires establishing a preset DOM structure.
However, nodes like images must be implemented by developers, making the DOM structure completely uncontrollable at the editor framework level. Therefore, we need to agree on nesting different HOC components for various types of components, which will manage the default DOM structure and behaviors. For example, the component below fixes the outer structure, allowing developers to utilize it directly.
Additionally, there is the issue of non-text node selections. Typically, selections occur on text nodes; that is, the nodes retrieved via the Selection object are of text type. However, in certain situations, there may be selections on non-text nodes. For example, when triple-clicking a text line, the entire line may be selected, and at this point, the anchor node will be the first text node, while the focus node will be the line node of the next row.
Cases like this require correction, adjusting the selection back to a text type node. It's especially important to note that the focus node here is the node of the next line, so the correction target is the text node at the start of the next line, offset: 0. Thus, this Model selection can correspond to the 0 offset position in the selection across two lines, which also needs to sync with the DOM selection.
The search logic implemented in the editor largely references the approach used by slate, aiming to find editable child nodes. It starts by calling getEditableChildAndIndex to search for first-level child nodes, serving as a preliminary check to determine the subsequent iteration direction. It continues to search in a DFS manner, endeavoring to find editable nodes at each level.
It's worth mentioning that after handling incremental changes in the previous article, the next challenge is ensuring that the preset DOM structure remains intact when inputting content, which also applies to the components defined in this article. This involves dirty DOM checks, selection updates, rendering Hooks, and more. However, these topics have already been thoroughly discussed in sections #8 and #9 concerning input methods, so we won't revisit them here.
As the name suggests, zero-width characters have no width, making it easy to deduce that these characters do not visually appear. Therefore, these characters can serve as invisible placeholders to achieve special effects. For example, they can be used for information hiding, creating watermarks, or sharing encrypted information. Certain novel sites use this method and font replacements to trace piracy.
Visually, zero-width characters are not displayed because they have zero width, but in an editor, they are quite significant. Simply put, we need these characters to position the cursor and create additional display effects. It is important to note that we are referring to editors implemented with ContentEditable; if it is a self-drawn selection editor, this part of the design may not be necessary.
This is closely related to the design of the editor. If we introduce block-level structures, meaning that the selection effect for these image nodes and similar elements is implemented independently rather than relying on the browser's text selection model, theoretically, zero-width characters are not needed. For instance, in Feishu documents, the image block structure does not have embedded zero-width characters, which can be contrasted against our implementation that incorporates zero-width character selection effects.
However, the reality is not so ideal. Taking Feishu documents as an example, even if block-level selections are implemented independently, the situation of mixed selections cannot be avoided. This means users can select both images and text simultaneously in the browser, which prevents the blocking of mixed selections. Hence, we still rely on zero-width characters as selection points. In Feishu documents, zero-width characters are placed both above and below block-level nodes.
Returning to the design of our editor, we only need to provide a common component for use by Void and Embed nodes. Of course, while implementing this, we need to mark the related types to determine the corresponding styles; for example, a zero-width character of Void type should not display the cursor, but a zero-width character of Embed type may need to maintain the cursor.
When implementing Void/Embed nodes, we typically need to incorporate a zero-width character in the Void node to handle selection mapping issues. Usually, we need to hide its display position to conceal the cursor. However, under specific conditions, there may be issues with IME input swallowing. In the example below, multiple characters cannot be entered properly after the ! when preceded by a Void node.
Handling this issue is relatively simple; we just need to place the zero-width character identifier before the EmbedNode, ensuring it does not affect the selection search, as noted in https://github.com/ianstormtaylor/slate/pull/5685. Moreover, Feishu Document’s implementation follows this pattern: the ZeroNode is always before the FakeNode.
As previously stated, the primary purpose of using zero-width characters is to position the cursor, and our current view rendering is fully equivalent to the data structure. This means that there is necessarily a zero-width character at the end of our lines, corresponding to the Leaf node at the end of the data structure, which serves several purposes:
LineState data. Each LeafState must render a DOM node, making the data model friendly, and ensuring that a text node always remains during empty lines without requiring special handling.Void node, it can prevent the cursor from being positioned at the end. To address this issue, we can conditionally render a zero-width character node, as slate does for the ending Mention node.Blocks. Additionally, it was learned that earlier versions of Etherpad also implemented zero-width characters for handling DOM and selection-related problems.Here, we have implemented numerous compatibility solutions to address this issue, such as the selection correction part mentioned in previous articles and the subsequent handling of Void selection transformations. If we actually did not render this node, we would not need to address these two related issues; however, this would require managing other cases to ensure the equivalence of DOM and Model.
Now we need to solve the selection problem of an empty line. If we directly use an empty node like <span></span> as a child node, it leads to the absence of actual text content. Consequently, the height would not be sustained, and selection could not focus on that area. Therefore, we still need the content of this empty node to be a zero-width character so that selection can indeed focus correctly.
In fact, if the last node is a <br /> node, a zero-width character would not be necessary to solve this issue. The selection node can be placed on this node without any 0/1 offset that needs handling. Quill manages empty lines in this manner, which is common for rich text editors.
However, for us, since the br node is not text and only has a 0 offset, it results in selection depending on default behavior, which fails to delete the current node, thus requiring special handling for non-text nodes. Additionally, we need to manage the callback for directional keys to control the cursor, and there is an issue noted that the br node may interrupt the IME, so we currently maintain the zero-width character implementation.
Another quite important application of the zero-width character at the end of the line is when our selection operation spans from the end of one line to part of the next line. The Model of the selection obtained via Selection will cross two lines. If we perform an operation like TAB indenting, it will affect operations across multiple lines. However, our light blue selection only appears to cover one line, making it seem like a BUG, mainly due to the inconsistency between the visual representation and actual operation.
In editors like Tencent Documents and Google Docs, which implement similar Canvas, this issue is resolved by an additional light blue selection rendering. However, if we were to implement it via DOM, it would not allow for direct content rendering. Hence, we utilize zero-width characters, meaning we add a zero-width character node at the end of the line. The key to this implementation lies here. When our selection is after the zero-width character, we proactively correct it to be before the zero-width character. This implementation works well in Chrome, but has no effect in FireFox.
The Void node refers to what we commonly call an empty node, which should be completely user-defined. Even if there is textual content inside, the editor treats it as a whole block. Therefore, in addition to the definitions in the schema, a HOC component also needs to be implemented for developers to use. The overall component is quite clear but requires handling of events such as selection and clicks.
As mentioned above, the internal structure of this component is entirely user-defined, and the editor engine itself does not know what content may exist here. Therefore, we need to implement a lookup logic when the selection changes. This implementation can be quite challenging because it requires an iterative approach from top to bottom. The slate library implements this in a hierarchical iterative manner, and we follow suit.
The getEditableChildAndIndex function below is the implementation for finding child nodes. It attempts to look forward and backward at the same level, prioritizing the direction provided in the direction argument to find editable nodes. This is just a lookup at the same level; if nothing is found, it then needs to continue looking through the child nodes of the parent, which is also implemented hierarchically rather than recursively.
It is also important to note that if a text node can be found directly, there is no need to continue deeper searches. In HTML, text nodes do not belong to DOMElement. Furthermore, since it has been established that the current node is a non-text node, our current offset value will only be 0 or the length offset of the child nodes.
Essentially, a rich text editor is an editor that supports mixed text and images. When implementing images through Void, it becomes clear that clicking on the image node does not trigger changes in the DOM selection. Since the DOM selection itself does not change, our Model selection will not reflect any alterations, leading to issues such as focus and selection errors.
In Slate, the implementation is such that when the OnClick event is triggered, it actively calls the ReactEditor.toSlateNode method to locate the DOM node corresponding to data-slate-node. It then uses ELEMENT_TO_NODE to find the corresponding Slate Node, and finally retrieves its associated Path node via ReactEditor.findPath. If both anchor points are Void, a range will be created, and ultimately, the latest DOM will be set.
Since our design utilizes Void nodes as higher-order components, we can directly leverage the onMouseDown event to set the selection. However, an issue arises with the selection at this point; the node state is \n, which effectively divides it into three positions. Our actual Void should only occupy the second position, which should also be considered the start of the line, as it needs to be utilized when navigating with the left and right arrow keys.
In the context of the editor, the selection state of nodes is a very common feature. For example, when clicking on an image node, it is typically necessary to add a selection state to that image. I have considered two implementation methods: using React Context and built-in event management. The Context approach involves maintaining the selection useState state at the top level, while the built-in event management listens to selection events within the editor to handle callbacks.
Slate is implemented using Context, where each ElementComponent node is wrapped with SelectedContext to manage the selection state. When the selection state changes, the render function is executed again. This approach is convenient as it only requires setting up Hooks to directly access the selection state in the rendered components; however, it necessitates passing the selection state from the outermost layer to the child components.
Here, we're managing editor events to handle the selection, since our plugin calls methods post-instantiation to complete the view render scheduling. Thus, we implement a class that extends EditorPlugin and a selection higher-order component (HOC), listening for changes in the editor's selection to trigger state updates in the HOC, with the selection state determined directly based on the position of the leaf relative to the current selection.
Next, we need to address the movement of the selection in Void nodes. When the selection is over a Void node, it shifts to the end of the zero-width character generated by pressing Enter, but due to our selection adjustment, it gets corrected back to the zero-width character of the Void node, preventing the cursor from moving. Hence, we need to actively control the selection's movement by binding keyboard events on the Void node for the up and down arrow keys.
Now, we address input method issues. If the cursor is currently in a Void node and any input key is pressed, the content of the node turns into an inline-block format. The issue here is that a BlockVoid node should occupy an entire line, but after input, the actual state looks like the following:
Thus, the simplest solution is to prevent the default behavior when pressing input keys while the cursor is in the Void node. If content is input, it won't trigger the actual text insert, which behaves consistently with Slate.
However, here we also need to address the situation with Chinese input, because the beforeinput event cannot actually prevent the behavior of the IME, and while our content cannot be input at this moment, the selection changes. This can also lead to issues with our toDOMRange method, where the selection might be reset to null, so we need to recalibrate it when transitioning the selection from DOM to Modal.
Additionally, a very interesting issue arises regarding the interaction between ContentEditable and IME. It was discovered in a slate issue that if the outermost node is editable, and then a child node is not editable, immediately followed by a text node inside a span, and the cursor is positioned in between these two, specifically at the end of the Void node.
When the IME is activated to input part of the content and the left arrow key is pressed to move the IME edit to the left to the end, it causes the entire editor to lose focus, resulting in the IME and any entered text disappearing. If we then reactivate the IME, the previously entered text will reappear. This phenomenon only exists in Chromium, while Firefox/Safari behave normally.
This issue was addressed in https://github.com/ianstormtaylor/slate/pull/5736, with the key point being that the outer span tag has a display:inline-block style, while the child div tag has the contenteditable=false attribute.
The Embed node component is more complex than the Void node component because the Embed node is an inline node, which means it is required to implement all the behaviors of text selection. However, it itself is not a character, as it must also be treated as a complete inline block and implement text selection behavior. Nonetheless, we still need to provide a HOC component here.
In reality, our Embed node should be an InlineVoid node, but because the component name is too long, we alias it to Embed. However, in the process of implementation, I encountered numerous challenges and can only lament that what is learned from paper does feel shallow, true understanding comes from practical experience.
Previously, we had been using a rich text engine to fulfill application layer functionalities, and while I had basically read through the Slate code and made some prs to address certain issues, upon trying to implement it directly, I found the challenges to be overwhelming. The current issue is that we are relying on the browser's own ContentEditable to render the cursor position, rather than using self-drawn selections. This means we have to depend on how the browser itself implements selections.
If we were to implement an InlineVoid node where only that node is inline, it would prevent the cursor from being placed in surrounding positions, which is inconsistent with how text content behaves. In the example below, the middle line cannot place the cursor at the end of the node on a click, although it is possible via double-clicking or using the arrow keys. However, this node is not located on a text node, which conflicts with our selection design.
For the solution to this problem, both Quill and Slate add zero-width characters  on either side of the Embed nodes to facilitate cursor placement. Of course, this is only necessary when there are no text nodes adjacent to the Embed nodes; if there are text nodes on either side, no special handling is required. If we follow Slate's design in this scenario, we end up with three potential positions to place the cursor within the selection: the Embed itself, and the left and right cursor CARET, which results in three locations for the zero-width characters.
Moreover, the data structure in Slate directly corresponds to the normalized data format. This means in the previous example, the structural content in Slate will look something like this. This also clarifies why Void nodes, like images, must have a children structure; as their zero-width characters must fully align with the data structure. Given that they are truly calculated as zero-width, the cursor points will also land on the zero-width character nodes, ensuring that the selection can only occur at the 0 offset of these zero-width nodes.
However, a critical issue arises with zero-width characters, which contain two cursor positions, namely 0|1, leading us to a total of 4 cursor positions |||. At this point, our Model has only two nodes, which are \n. Thus, even if we do not allow the cursor to be placed after \n, we can barely match up with three cursor positions. The most significant point here is that each zero-width character accounts for two offsets, which creates two scenarios for handling the same cursor position when using toDOMPoint.
Initially, when we designed toDOMPoint, we prioritized placing the cursor at the end of the previous node. Therefore, when we click at the end of the line, the cursor will end up right after the zero-enter node, giving it an offset of 2. However, due to our selection correction, this offset gets adjusted to 1.
At this point, our selection is corrected to the first data-zero-enter node with an offset value of 1. Ideally, we wanted the cursor positioned at the last data-zero-enter node, but instead, it is adjusted to the first node, which is not the intended selection location.
Now, assuming we did not adjust the offset from 2 to 1, our selection would instead be set at data-zero-embed node's offset -> 1, still not the desired cursor location. Thus, this situation is also an incorrect selection location, and we would still need further adjustments.
Transforming the structure here may present several issues, primarily related to the mutation of two nodes. If we adjust our approach to reduce the zero-width character nodes by eliminating the first zero-width character node, we would lose the ability to maintain three selection states. If we wish for the zero-embed's 0 offset to signify the left cursor and offset 1 to indicate the selection effect of the embedded content, it becomes difficult to achieve. The cursor itself cannot be in a state that appears on the left and disappears on the right, requiring further styling adjustments.
Based on the issues outlined above, the data-zero-embed node would be the left cursor we need to manage, while the right cursor remains as the data-zero-enter node. Handling the left cursor would necessitate applying a margin style to the content of the right Embed. Thus, under default conditions, when we correct the selection offset after the \n node to 1, the default selection position will appear as follows.
So, there are still issues here. When we click at the end of a line, the browser will move the selection to the left side of the node's cursor position, which is clearly not the desired effect. Because of this, we can’t focus at the very end of the line. Therefore, we need to add extra handling logic for toDOMPoint: instead of the default priority on offset, we switch to a priority on node. Specifically, when the node is a data-zero-embed and the offset is 1, we prioritize focusing on the following data-zero-enter node with an offset of 0.
Actually, this still leaves a problem: if the next node is a text node, the cursor cannot be placed properly. So we actually only need to check if nextLeaf exists to decide whether to shift the selection. At the same time, there is still an issue because the same node has two possible offset positions. Therefore, we first need to correct the position in toModelPoint: if the cursor is positioned after a data-zero-embed node, it should be corrected to the position before the node.
But then another issue arises: if the cursor is positioned on the left side of the node, pressing the right arrow key will not move the cursor properly, because the selection is corrected back to the original spot by the above logic. To fix this, we still need to control this behavior in the onKeyDown event — when the selection is on a data-zero-embed node and the right arrow key is pressed, we actively adjust the selection.
Although it seems the problems are resolved, the frequent normalize calls on the DOM bring a subtle issue: when a line consists only of an Embed node, mouse dragging to select it fails. In fact, such dragging even triggers our preset selection limit errors, and the console shows a very frequent log of selection changes.
This problem likely stems from a mix of controlled selections and uncontrolled dragging by the browser. Since we can’t fully control browser dragging, the best approach is to minimize controlled behavior during dragging. So here, we avoid actively setting the selection while the user is dragging, which prevents interfering with the drag selection behavior. We also must remember that during input, even if the mouse is down, the DOM selection needs to update, and it should be reconciled again when the mouse is released.
There are quite a few points to consider regarding the selection behavior of Embed nodes. Although the previous implementation basically worked, some issues emerged during practical use. For example, when selecting from left to right, if the mouse drag stops slightly to the left of an Embed node, the browser's selection does not cover the node. However, when the mouse is released, the calculation ends up selecting the entire node. This is a typical selection synchronization problem.
The cause is that our selection model is designed with zero-width nodes placed on the left side. Placing them on the right could lead to IME-related issues, as mentioned in issues slate#5685 and slate#5736. Hence, in our editor, these nodes are directly placed to the left of the embedded node.
In the previous implementation, while the overall selection behavior tended to the left side of the node, the selection on the right side of the Embed node was biased to the right. This is because the design causes the browser cursor to always appear on the left. We have retained this implementation for now. In contrast, slate adds an extra zero-width node for selection, effectively preventing this issue.
The major modification here adds support for Case 4. The main principle is that when the cursor lands on the zero-width character, regardless of whether the offset is 0 or 1, the cursor should be positioned before the node. Since an offset of 0 already corresponds to a left-side selection, no special handling is required in that case. The data-void marker indicates that the selection should be placed after the node, thereby aligning with browser behavior.
Additionally, Case 5 has been implemented. Because the Embed node does not implement user-select: none, during dragging, the offset can go out of bounds, causing selection offset issues. Furthermore, if the selection is not collapsed, we need to check whether we're at the End node to determine the boundary and select the entire Embed node accordingly. This requires passing the relevant environment state.
Moreover, when multiple Embed nodes are consecutive, triple-clicking in the browser results in only the first node being selected, while subsequent nodes remain unselected. The selection transformation yields the Embed div node. Therefore, we need to specifically handle triple-click selection behavior by proactively selecting the entire line ourselves, rather than relying on the browser's default selection behavior.
In Slate, the inline nodes place zero-width characters within empty nodes to signify the selection state of the node itself. When a line is occupied exclusively, zero-width characters are generated before and after to position the cursor. The nodes themselves do not contain zero-width characters to mark their own selection; naturally, there are no zero-width characters inside, while the external zero-width characters serve the purpose of cursor placement, aligning with their respective semantic designs.
While dragging a selection area, if the Embed node has not been crossed, the browser's selection will then be placed on nodes with contenteditable="false". From this, it becomes clear that when searching for nodes in Slate, the search should ideally be conducted inward, whereas in reality, we should be looking for sibling nodes. Thus, the implementation here shows some differences. Of course, if the selection point lands on a data-leaf node, searching inward is naturally not a problem.
When implementing the Embed node, the built-in zero-width characters are utilized as cursor placement positions using 0/1, where the offset 1 will be adjusted to the offset 0 of the subsequent node. Theoretically, if we only employ this offset scheme without correcting the ModelPoint, it is feasible.
However, in practice, we observe that if the two selection nodes are not contiguous, clicking the left mouse button causes the selection to move from node2 offset 0 to node1 offset 1. Conversely, if they are contiguous, the selection moves correctly to node1 offset len-1.
In this example, after pressing the Embed button and then clicking the left mouse button, the resulting selection's offset becomes 3, while using the Span button yields an offset of 2. Although placing the zero-width character immediately after the Embed node could solve this issue, it would prevent the cursor from being positioned before the Embed node.
At this point, another zero-width character would need to be added at the very front, complicating the interaction handling even further. Additionally, I have previously mentioned the issue of zero-width characters interrupting the input of Chinese IME in Slate. This also presents an interesting problem regarding selection mapping: when the cursor is located after a data-zero-embed node, it needs to be corrected to the position before the node.
Thus, pressing the right mouse button to select will remap the logic within this toModelPoint back to the original position, i.e., L => L remains unchanged, which does not trigger the Model Sel Change, while the DOM selection is forcibly corrected from offset 1 to 0.
So if we actively adjust the selection by right-clicking, it will first trigger Model Sel Change, followed by UpdateDOM, and then DOM Sel Change will correct the selection. At this moment, since the selection is no longer on the Embed zero-width character, it won't hit the correction logic, allowing the selection movement to proceed normally.
In Embed nodes, input methods can also introduce additional effects. Beyond the issues related to the input method mentioned earlier, there’s another situation that needs to be addressed. When text is only on a single line and contains only Embed nodes, entering content at the very beginning of the node can cause duplication. This occurs because the IME keeps the input content stuck on the zero-width character.
In such cases, after the IME input is completed, it is necessary to re-correct the text content on the zero-width character. Since we already carry the text correction of the line content, we only need to re-correct the text content on the zero-width character after the IME input is completed.
Here, we discussed the preset components for editing nodes, including zero-width characters, Embed nodes, Void nodes, etc. These are mainly preset components for plugin extensions in the editor, within which there are some default behaviors and certain preset DOM structures, handling numerous selection interactions and issues caused by IME input.
Next, we will discuss rendering content for non-editing nodes, such as placeholder nodes, read-only mode, plugin mode, external node mounting, etc. These node types are common external nodes in the design of the editor, such as placeholders and pop-ups, and do not require handling selection interactions or issues caused by IME input like editing nodes do.