Previously, we primarily focused on the default behavior of complex browser DOM structures and various input scenarios compatible with IME input methods, requiring targeted handling of input method and browser compatibility. Here, we focus on managing text structural changes, mainly dealing with line-level operations, text drag-and-drop, and other actions related to text structure and change operation extensions.
Currently, our main focus in editor input modes lies on semi-controlled text input and detection of dirty DOM elements. Input state synchronization is quite complex and prone to errors. Here, we extend the discussion towards input synchronization behaviors such as Enter, Delete, and text drag-and-drop operations, effectively enhancing the overall editor input mode handling.
Specifically, pressing Enter or deleting a line break changes the DOM structure, while text deletion and drag-and-drop are triggered by a combination of events like BeforeInput. In the editor, these operations are treated as integral parts of input. Moreover, these actions are typically controllable, making dirty DOM issues less likely. However, there remain many points needing careful attention:
From the start, we discussed the uncontrolled behavior of ContentEditable, especially noting the inconsistent behavior of the Enter key across browsers. Earlier examples highlighted these disparities:
contenteditable editor, pressing Enter inserts <div><br></div> in Chrome, <br> in Firefox (<60), and <p><br></p> in IE.123|123 results in 123<div>123</div> in Chrome, while Firefox formats it as <div>123</div><div>123</div>.123|123 -> 123123), Chrome reverts back to 123123, while Firefox results in <div>123123</div>.These examples have been discussed repeatedly—each time touching on uncontrolled browser behavior and their implications on state synchronization. In reality, the Enter key behavior can be controlled by preventing the default behavior and then managing line splitting and format inheritance based on the current selection state.
Generally speaking, there are two ways to prevent the default behavior: listening to the BeforeInput event and preventing its default behavior, or listening to the KeyDown event and preventing its default behavior. The advantage of the former is that it provides the input type directly (e.g., soft vs. hard Enter), while the latter allows earlier prevention of the default behavior.
Naturally, we still rely on the BeforeInput event to handle the Enter key operation, which makes things a bit more convenient. The implementation here is quite straightforward—in theory, we just need to insert an op with \n into the data structure. Also, since our editor doesn't support soft line breaks by default, both types of line breaks need to be uniformly treated as hard breaks.
This process doesn’t seem too complicated because DOM structural updates are handled by our LineState and Mutate modules; the Mutate module manages the key values and maintains immutability. Then, in the React adapter, we can simply render the line structure based on LineState, and the rendering naturally becomes React's responsibility.
The algorithm inside the Mutate module is quite complex, so we won’t delve into it here. Simply put, it identifies the line structure corresponding to the current selection, splits it into two lines, and inherits the formatting attributes of the original line. Importantly, this isn’t handled case-by-case—instead, we implement a general mutation pattern based on the changes described by the Delta operations.
It looks like inserting an Enter key is just that simple—but in reality, it’s not. The complexity lies in inheriting line styles. In our Mutate design, line styling fully follows the delta data structure: the line style is carried only by the final EOL node.
This creates a somewhat counterintuitive issue: if we insert \n in the middle of a line, the original line style will actually belong to the next line, since the EOL node is always at the end. Thus, when inserting \n, the original EOL node naturally moves to the next line.
This problem stems from the delayed positioning of the \n. If instead we added a SOL (Start Of Line) node at the beginning of the line to carry the styles, while the \n node would only serve to separate lines, then during Mutate Insert, it would be easy to keep the line style on the preceding line rather than moving it to the next.
However, this approach clearly disrupts the original data structure and leads to new problems in state management. It requires many additional cases to handle this non-rendering node correctly. Another alternative is adding a used flag to the Mutate Iterator object. When inserting a \n, it checks if the current cached LineState has already been reused.
If it hasn't, it simply reuses the key and attrs of that State. Then, when the next \n node is read, since reuse has already happened, it cannot reuse the state again—thus creating a completely new state.
The problem here is that it’s difficult to guarantee the actual value for the second \n. This breaks our original model structure—it’s no longer symmetric and we can’t reliably pass a determined new value to the second \n. Moreover, doing this during Mutate Compose makes it impossible to avoid this behavior when truly needing to implement it.
In fact, Quill faces a similar issue: I noticed that when it directly inserts \n, the styles also follow into the next line. This implies that Quill handles line style inheritance during the Enter event itself, which strikes me as a reasonable approach. In that case, we can manage it in a fully controlled manner.
Returning to the topic of pressing Enter in the editor, regarding the inheritance of line formats, if we continue with the above operation, the issue of line format inheritance becomes quite apparent. In the example below, quote represents the blockquote format. If in Markdown a blockquote is denoted by a > at the start of a line, when inserting a carriage return, the original line should retain the blockquote format. However, in the example below, the blockquote format only appears on the newly created line.
Here, it is necessary to distinguish multiple scenarios. If the Enter is pressed at the start of a line, all current attributes should be carried over to the next line — this is the default behavior. If Enter is pressed at the end of a line, all attributes on the new line should be cleared, but any incoming attributes must still be merged in. If Enter is pressed in the middle of a line, the current line’s attributes should be copied and applied to the newly inserted line, and if there are any incoming attributes, these should be merged as well.
Deletion is also an important action in text structure changes. Similarly, when deleting, attention must be paid to merging line structures and handling line format issues. Let’s start with a relatively straightforward case: deleting a segment of text. Since our selection carries Range information, deleting a text segment isn’t complicated — simply delete content according to the selection.
In addition, deletion has two modes: backward deletion and forward deletion. Therefore, we need to handle deleteContentBackward and deleteContentForward input types separately. In fact, their implementations are similar; the main difference is how the deletion position is calculated. However, it is still necessary to carefully handle cases when deleting at the start or end of a line.
When handling backward deletion, the main focus is on deleting at the start of the line—that is, when the cursor is at the beginning of the current line and a line state node exists. In this case, three scenarios are addressed separately: the previous node is a block node, the current line has line attributes, and the current line has no line attributes. The primary goal here is to remove the current line’s structure during deletion to better align with intuitive operation and move the cursor to an appropriate position.
When handling forward deletion, the focus is mainly on deleting at the end of the line, which is somewhat simpler. Complex cases are generally not handled here since these operations occur less frequently. If the cursor is on a block node, deleting simply triggers the block node's own deletion operation. If the cursor is at the end of the current line and the next line is a block node, deletion moves the cursor onto that block node.
The most critical aspect when deleting content actually lies within the view layer. When integrating with React for view updates, uncontrolled behaviors also arise — here, "uncontrolled" refers to issues with React’s data and rendering layers. Essentially, this traces back to DOM changes related to IME input.
Specifically, when the selection spans multiple nodes—whether inline or across lines—once the IME enters composing input, the content in those nodes is deleted and replaced by the input content. However, once the composition confirms the input, the editor crashes. This problem stems from merging deletion and insertion operations, triggering the following error:
From the error message, it seems that React removes child nodes from their parent nodes, which is a perfectly reasonable behavior. For example, when implementing a list, if the data source deletes some nodes, React will automatically remove the corresponding DOM nodes. This means we don't have to manipulate the DOM directly; instead, changes can be made declaratively.
The issue here arises because these DOM nodes have actually been removed, so when React tries to remove these nodes again, it triggers an error. This exception causes the entire editor to crash, so we need to prevent this situation from happening. The first step is to avoid exceptions from removeChild. It’s difficult to directly avoid React’s behavior, so the only option is to intercept on the DOM node itself.
However, intercepting at the DOM level is not straightforward either. The removeChild method exists on the Node object, and if we override Node.prototype.removeChild directly, it would affect all DOM nodes in the whole page. Therefore, we can only try to apply this interception within the editor’s ref.
Since the editor contains a large number of DOM nodes, rewriting on every node is impractical. Therefore, we also need to limit the scope of DOM mutations. In React, controlling re-rendering can be handled through key props. Hence, we refresh the key of the relevant nodes at the start of IME input to prevent React from reusing these nodes. The refresh scope is therefore restricted to the line nodes.
In the part where React controls the nodes, we need to apply the overridden logic to the block and line nodes’ DOM to avoid exceptions. Additionally, we must prevent repeated executions of the ref function since React calls the old ref and then the new one if refs differ. To handle this, useMemoFn comes in handy.
Fundamentally, we cannot control DOM changes or stop the browser’s default behavior during IME input. However, we can perform relevant handling at the input start. This is similar to separating the deletion and insertion behavior that happens at the end of composition — that is, first executing the deleteFragment method to remove all related DOM elements to synchronize behavior.
But this introduces a new problem: the original delete method completely removes the content within the selection. This causes the DOM node where the selection resides to be deleted when IME is activated. As a result, the browser defaults the cursor position to the beginning of the current line. Although this doesn’t affect the final input content, it is visually noticeable during typing, which somewhat impacts user experience.
Another possible implementation to consider is deleting the selected content during composition input while preserving the DOM node that contains the cursor. However, this implementation is quite complicated. Ideally, if we could delete the selection and reset the cursor position before triggering the IME, this issue could be avoided — but currently, no API exists to achieve such behavior.
Later, when studying the implementation of slate, I found that it simply deletes related nodes at the start of IME composition input, something our editor cannot do. After investigation, it turns out we’re blocking browser selection events after updates, which causes this behavior. Interestingly, blocking selection updates has the side effect that all nodes after the current line’s node fail to render.
Therefore, here we allow the selection update event to pass through; that is, during the Update Effect phase, we no longer block selection updates by relying on the Composing state. This approach avoids the issue mentioned above. However, the behavior here is quite peculiar — React indeed holds onto the DOM state, and the change happens precisely during the selection update. It’s somewhat baffling that the selection itself can prevent proper rendering of nodes.
Unicode can be viewed as a map that relates numeric code points to specific glyphs, allowing symbols to be referenced without directly embedding the characters themselves. The possible code point range spans from U+0000 to U+10FFFF, encompassing over 1.1 million possible symbols. For better organization, Unicode divides this range into 17 planes.
The first plane, U+0000 to U+FFFF, is called the Basic Multilingual Plane (BMP) and contains the most commonly used characters. Beyond the BMP, there are about one million code points ranging from U+010000 to U+10FFFF, which are part of supplementary planes, sometimes called the astral planes.
JavaScript characters are represented by unsigned 16-bit integers, meaning they cannot directly represent code points above U+FFFF. Instead, such code points must be split into surrogate pairs. This is essentially JavaScript’s UCS-2 encoding format, where all characters take up 2 bytes. Characters requiring 4 bytes are treated as two double-byte units — the surrogate pair.
In fact, this means the variable-length 1-to-4 byte UTF-8 encoding cannot be directly represented; surrogate pairs naturally solve this limitation. UTF-16 encoding length is either 2 or 4 bytes. ECMAScript 6 introduced new ways to express this, but for backward compatibility, ECMAScript 5 still uses surrogate pairs for astral plane characters.
Moreover, ES6 introduced functions that solve issues with string iteration, and regular expressions now support the u flag to handle 4-byte characters properly.
Additionally, within the BMP, the surrogate range from U+D800 to U+DFFF is a vacant interval—these code points aren’t assigned to any characters to avoid conflicts within the BMP. This gap facilitates the mapping of supplementary plane characters. The high surrogates ([�-�]) and low surrogates ([�-�]) form exactly 2^10 × 2^10 code units, perfectly covering the roughly one million code points.
Although 4-byte characters can be expressed using Unicode surrogate pairs, symbols like emoji can be combined. This means a grapheme that looks like a single character is composed of multiple characters joined with \u200d (the zero-width joiner, ZWJ). Consequently, the length of such sequences is longer, and ES6 string functions will split them accordingly.
Hence, before deleting text here, we need to check the length of the text about to be deleted. There are various ways to do this—for example, by deleting at the word level and converting the process to an uncontrolled state. In this context, we achieve it by calculating the length of the Unicode characters at the end to determine how much to delete.
Previously, we applied special handling when deleting Emoji because they consist of multiple characters. Deleting with a fixed length of 1 would leave behind invisible residual characters. Besides Emoji, when using the Alt + Del shortcut key, the default behavior deletes text at the word level, which can also involve multiple characters.
If you’re working with only ContentEditable, browsers handle word-level deletion automatically—including proper deletion of Emoji. Therefore, for uncontrolled input editors like Quill or Feishu Docs, manually handling this behavior isn’t usually necessary; the main focus rather lies in passively syncing state after DOM mutations.
However, in our editor’s implementation, input management is entirely based on the beforeInput event, making it fully controlled. Because of this, we need to explicitly handle deletion behavior ourselves. Although the inputType event provides values like deleteWordBackward and deleteWordForward, it does not specify how many characters should be deleted by default.
Initially, I considered either switching back to uncontrolled input or actively segmenting words using the Intl.Segmenter API. However, after seeing the MDN demo, I realized the segmenter requires a language parameter, which we cannot reliably determine in an editor context.
So, I looked into implementations from open-source editors. Slate fully customizes this behavior by using getWordDistance to calculate word boundaries on its own. While that works fine for English, it falls short for Chinese word groups since it primarily splits at punctuation marks, effectively deleting by sentences rather than words.
On the other hand, Lexical's handling of word deletion aligns more closely with expectations. I initially thought it was uncontrolled input too, but after reviewing the source code, I found that it also relies on the beforeInput event. Its behavior closely matches native browser behavior. I suspected it might be using Segmenter under the hood and wanted to check how it tackled language issues. It turns out you can omit the language parameter:
However, a closer reading of Lexical's source revealed that it doesn’t directly use Segmenter for segmentation. Instead, it leverages the selection.modify API to pre-process selection changes. This API lets you synchronously update the selection's DOM reference, allowing you to immediately retrieve the future selection state and thereby compute the deletion range effectively.
In Lexical, the beforeInput event and its corresponding getTargetRanges() method are also explained. This clarifies that my previous assumption—that browsers don't provide a default delete length—was incorrect. Instead, deletion is represented through Range. However, the comments also mention that this approach is unreliable and may fail to accurately reflect the selection state after operations in complex scenarios.
Using tools like Intl.Segmenter that segment text at the word level is prone to errors and requires tokenizing the entire operation (Op), which involves a lot of unnecessary computation. Different languages have vastly different tokenization rules—for example, English uses spaces as delimiters whereas Chinese does not—making automatic word boundary detection extremely challenging. This difficulty is especially pronounced when dealing with automatic line breaks and non-Roman scripts.
In summary, the selection.modify method leverages the browser engine's own built-in, highly optimized logic for selection calculation. Since the browser inherently understands its own tokenization process best, it makes sense to use it. Additionally, the beforeInput event exposes several other methods, and word-level deletions can also be implemented using getTargetRanges.
Since getTargetRanges was mentioned, it's natural to discuss how text drag-and-drop is implemented based on this. Within the beforeInput event, the inputType can be either deleteByDrag or insertFromDrop. Drag-and-drop involves two combined operations: deletion and insertion of text.
The getTargetRanges method returns an array of StaticRange objects, so we need to use the previously implemented toModelRange method to convert them into the editor's selection model. This means moving text only needs to focus on two Range objects representing the deletion and insertion locations.
Therefore, implementing drag-and-drop driven by the beforeInput event is fairly straightforward: it just combines the delete and insert logic. A temporary variable is needed to store the starting position because these are two separate events; the delete location must be saved and then used when the text is inserted.
The part that moves the content fragment is relatively simple. The key point is using the transformPosition function to handle position offsets. After deleting the content fragment, the insertion point shifts. Both changes use the same draft document as their base and are sequential operations, so it’s necessary to apply the effect of operation A when handling operation B.
Besides handling drag-and-drop within the BeforeInput event, we can also manage the drag logic using DragEvent. In slate, this is done by leveraging Drag-related events. Essentially, this approach involves preventing the default Drag behavior and manually managing the drag operation. If you need to take control of drag behaviors for elements like images, this method becomes necessary.
The solution based on DragEvent focuses on three main aspects: first, saving the current selection position into dataTransfer during the DragStart event; second, preventing the default behavior in the DragOver event to allow the drag; and third, retrieving the dragged content and performing the move operation in the Drop event. Here is how it's implemented in Slate:
The findEventRange method here deserves special attention because DragEvent doesn’t provide a method like getTargetRanges. Therefore, we can't directly get the target drop position from the event. As previously mentioned, under the hood this relies on custom DOM-based selection logic, and here we can still rely on related APIs to obtain the selection at a specified position.
Previously, we specifically addressed input method and browser compatibility issues. Since input methods directly manipulate the DOM, synchronizing the editor model’s input state requires handling many complexities. Here, we focused on managing structural text changes, mainly implementing handling for text modifications such as inserting line breaks, deletions, and drag-and-drop operations. With this, the input synchronization part of the editor is complete.
Next, we need to implement the editor’s view layer synchronization, meaning adapters for view layers like React and Vue to interface with the core editor model. We have already developed the model layer (delta) and the controller (core); adding the view layer adapter (react) allows us to fully realize the editor’s MVC architecture.