Previously, based on the selection module, we implemented a semi-controlled input mode through a combination of browser events, which is one of the essential implementations for state synchronization. Here, we need to focus on dealing with the default behaviors of complex DOM
structures in browsers, as well as various input scenarios compatible with IME
input methods. Essentially, we need to handle input method and browser compatibility behaviors on a Case By Case
basis.
From the very beginning of the editor series, we mentioned the controllability of ContentEditable
and browser compatibility issues, particularly when combined with React
as the view layer, the state management and DOM
behavior become more uncontrollable. Let's review some common browser compatibility issues:
contenteditable
editor, pressing Enter directly will result in <div><br></div>
in Chrome
, <br>
in FireFox (<60)
, and <p><br></p>
in IE
.123|123
will format the content as 123<div>123</div>
in Chrome
, and as <div>123</div><div>123</div>
in FireFox
.123|123->123123
, the content will revert to 123123
in Chrome
, and become <div>123123</div>
in FireFox
.("formatBlock", false, "P")
command, Chrome
will wrap both lines in a single <p>
tag, while FireFox
will wrap each line in its own <p>
tag.Since our editor's input relies on browser-provided combination events, we naturally cannot avoid related issues. The design of the editor's view structure needs to be strictly controlled so that we can synchronize the view with the selection pattern based on certain rules. Following the overall MVC
architecture design, the current editor's view structure is designed as follows:
If the above structure is disrupted during ContentEditable
input, issues will arise with the editor's synchronization mode we designed. To solve similar problems, we need to implement a dirty DOM
detection; if destructive node structures appear, we need to attempt to repair the DOM
structure and even resort to triggering a re-render using React
to maintain a strict view structure.
However, performing DOM
checks and repairs every time there is input or selection changes may negatively impact the overall performance or input fluency of the editor. The scope of DOM
checking and repair also needs to be limited; otherwise, it may also affect performance. Thus, we need to categorize the browser's input modes here and implement different DOM
check and repair modes for different types of behavior.
Synchronizing DOM
structure with the Model
structure in uncontrolled React
components becomes complex, which is actually one of the reasons why some editors choose to draw their own selection to avoid uncontrolled problems. The main issues caused by uncontrolled behaviors can be easily demonstrated. Let's assume there are two nodes present, one of inline
type and the other of text
type:
At this point, our cursor is after inline
. Assuming the inline
rule defined in the schema
does not inherit the formatting of the previous node, if we input a content such as 1
, the text will become inline|1text
. This operation seems intuitive. However, when we invoke the IME to input Chinese content at this position, the text turns into incorrect content.
The difference here is easily noticeable. When inputting English or numbers, which do not require invoking IME controlled input mode, the character 1
will be added before the text
text node. However, when invoking the IME for uncontrolled input mode, the input content will not only appear before text
, but also after the inline
node, which clearly indicates a problem.
The root cause of this issue lies in the uncontrolled IME problem. When inputting English, our input is prevented from the default behavior in the beforeinput
event, thus not triggering the browser's default DOM changes. However, in the case of invoking the IME, the DOM changes cannot be prevented, making it uncontrolled input and leading to the problem.
Due to the browser's default behavior, the content of the inline
node will be inserted with the "Chinese" text by the input method, which is the browser's default handling of the input method. Once we complete inputting, the content will be placed before text
in the data structure of the Model
layer, controlled by the editor. This behavior aligns with inputting non-Chinese content and meets the expected performance.
Due to our immutable design, along with performance optimization strategies such as memo
and the execution of useMemo
, even with dirty DOM detection added to the final text node rendering, it is not sufficient because there will be no rerender
performed at this point. Consequently, React will reuse the current DOM node in place, causing inconsistency between the DOM changes of IME input and the Model layer.
If we remove React.memo
and useMemo
from the leaf
directly, the issue will naturally disappear, but it will cause a decline in the editor's performance. Therefore, we need to consider detecting dirty DOM situations as much as possible. In fact, when handling pure uncontrolled input cases in input
events or MutationObserver
, we also need to address dirty DOM issues.
When there is a change in line status, we can directly check all leaf
nodes of the current line, compare the text content, and correct any inconsistencies. Directly using querySelector
may not be elegant enough. We can use a WeakMap
to map leaf states to the DOM structure, enabling quick positioning of the required nodes.
After the state change in the line nodes, when handling side effects, checking dirty DOM nodes, and since our line state is also immutable, there is no need to worry about performance issues. This execution of checks is an O(N)
algorithm, and the scope of the check will be limited to the lines where a rerender
occurs. The specific method of checking nodes will naturally be consistent with the above onRef
.
Here is something to keep in mind, the check for the dirty nodes' status needs to be executed during the useLayoutEffect
timing, because we must ensure that the sequence of operations is to correct the DOM
first before updating the selection. If done the other way around, a problem might occur where the updated selection remains on the dirty node, and then correcting it later due to DOM
node changes would result in the selection being lost, manifesting as the selection being at the beginning of an inline
.
Furthermore, in this implementation, there is no need to check during the initial rendering since there won't be any dirty node situations at that time. Therefore, during the initial rendering, we can skip the check directly. By handling the dirty DOM
issue in this manner, we can also avoid other potential issues. For now, we won't handle the content of zero-width character texts temporarily. If a similar situation arises again, additional checks will be necessary.
Actually, thinking from a different perspective, the issue here might also be due to our selection strategy favoring the leftmost search as much as possible. Correcting it to the right node in such scenarios might solve the problem. However, in the case of empty lines, our trailing \n
node won't render, so this strategy currently cannot completely solve the issue. Additionally, this approach would make the editor's selection strategy more intricate.
It is also essential to pay attention to the timing of React
's Hooks
calls. In the example below, obtaining onRef
from the actual DOM
executes first, making it reasonable to perform the initial DOM
check at that point. Subsequent Child LayoutEffect
is similar to line-wise DOM
check, followed by updating the selection in the Parent LayoutEffect
after corrections, aligning with the scheduling approach.
Regarding the node wrapping issue, let's first discuss the design pattern. The current implementation of the rich text editor lacks block structures, making it complex to handle nested structures. While we wouldn't deal with nesting structures like tables, certain wrapper-level structures such as blockquote
need attention.
Similar structures exist like lists, which we can completely design ourselves. However, for structures like blockquote
, specific combinations are required. When we split text into bold
, italic
, or other inline
nodes, actual DOM
nodes get divided. If we nest an <a>
node at this point, hovering effects like underlines get split. Thus, if we can wrap them within the same <a>
tag, this issue can be avoided.
But a new challenge arises. Handling wrapping for a single key
during rendering isn't complex, but dealing with multiple keys needing wrapping becomes puzzling. For instance, in the example below, individually combining 34
with b
and then wrapping it in a
seems sensible, but wrapping 34
in a
before combining b
of 5
also makes sense. Moreover, it's unclear whether 67
can be combined as well due to their shared b
tag.
After some contemplation, I came up with a simple solution. For elements needing wrapping, if the merged list
keys and values are all the same, they can be considered as a single entity to combine. This simplifies the process, treating them as a combined value rather than separate values, which is sufficient in most scenarios.
Speaking of which, this kind of wrapper
structure is only needed in some particular scenarios. For instance, certain actions such as indentation pose challenges in distinguishing whether the intention is to indent a block quote or the text within it. This issue is prevalent in many open-source editors, especially those with flat data structure designs like the Quill
editor.
In the absence of block structures, controlling similar behaviors becomes tricky. Whole indentation, when combined with lists, is a reasonable behavior in large documents. This implementation will have to wait for our block structure editor to be developed. Of course, if the data structure itself supports nested patterns, as in the case of Slate
, it can be achieved.
Subsequently, in implementing the wrap node
for an a
tag, we encountered a similar issue to that of inline-code
dirty DOM
problem. Looking at the following DOM
structure, everything seems fine. However, when the cursor is placed after the word "超链接" and Chinese characters are input using an IME
, the input text "测试输入" seems to be placed directly under the immediate div
, parallel to the a
tag.
Speaking of which, I noticed that the implementation in the Feishu documents renders the leaf
as the a
tag, while the wrapping is done using a span
which includes additional styles for a hover
effect. Directly wrapping with a span
mitigates the aforementioned issue, and although the internal a
tag could still cause problems, it triggers the dirty DOM
check under the leaf
.
Therefore, building upon the previous dirty DOM
check, we were able to resolve this issue. Fundamentally, such behaviors are a result of how browsers naturally handle these situations, and the browser's interpretation might vary. At present, browsers perceive the structure of a
tags as being inline, similar to our inline-code
implementation. In theory, this shouldn't pose a problem, but we need to manage these uncontrolled situations ourselves.
In practice, even the Quill
editor faces similar issues related to dirty DOM
handling. On the other hand, Slate
doesn't encounter this problem. The workaround here involves placing additional  
nodes at both ends of the a
tag in order to avoid the issue. While this solution addresses the problem, it does introduce new nodes, and currently, shifting the cursor requires careful handling.
Moreover, in subsequent browser testing, the previously mentioned a
tag issue resurfaced. This time, it wasn't caused by wrapping nodes, making the problem more complex, mainly due to various browsers' compatibility issues. Similar to inline code blocks, fundamentally, these are still DOM
change issues arising from uncontrolled browser IME
. However, there are significant differences in browser behaviors. Below is a minimal DEMO
structure:
Chrome
, inputting content at the end of the a
tag will insert a text-type node at the same level as the a
tag, resulting in something like <a></a>"text"
.Firefox
, inputting content at the end of the a
tag will insert a span
type node inside the a
tag, similar to <a></a><span data-string>text</span>
.Safari
, the a
tag and span
tag positions will be swapped, and text content will be added at the same level as the a
tag, resembling <span><a></a>"text"</span>
.Therefore, our dirty DOM
check needs to be handled with more granularity. Simply comparing text content is clearly insufficient. We also need to check if the structure of text nodes is accurate. Initially, we only dealt with the situation under Chrome
, the simplest way is to allow only a single node under the leaf
node. If there are multiple nodes, it indicates a dirty DOM
.
However, we later found that Embed
nodes were also being removed during editing. This happened because we mistakenly considered a combined div
node as dirty DOM
, so a more granular approach is required here. By checking the node type, if it's a text node type, then removing it would avoid mistakenly deleting Embed
nodes.
While this seemed to solve the problem, further issues were discovered under Firefox
and Safari
. Let's first look at the situation in Firefox
, the node is not a text node type, hence it couldn't be removed in the dirty DOM
check. This still couldn't address the dirty DOM
issue in Firefox
, so we need to further handle nodes of different types.
In the case of Safari
, it is more complicated because it swaps the positions of a
tags and span
tags, which disrupts the DOM
structural integrity. In this scenario, we must refresh the DOM
structure, requiring a more complex handling where we introduce forceUpdate
and check for TextNode
nodes.
In fact, a similar approach is also adopted in Feishu documents; when an a
tag in Feishu documents triggers a dirty DOM
check upon calling the IME
input, Feishu documents will directly ReMount
all leaf
nodes in the current line based on behavior to avoid complex dirty DOM
checks. Here we implement a more refined leaf
handling to avoid unnecessary remounting.
It is important to note that we also need to handle zero-width character types. When an Embed
node is present at the beginning of a line without any preceding nodes, typing Chinese characters will cause the IME
input content to be stuck on the zero-width character of the Embed
node. This is similar to the aforementioned inline node and needs to be addressed as well.
At this point, our dirty DOM
check should be able to handle most cases. The overall pattern is that React
processes the DOM
structure after the layout calculation is complete and before the browser rendering. For text nodes and <a>
tags, we need to examine the relationship between the text and the state, and when strict DOM
structure violations occur, the component needs to be remounted directly.
Regarding additional text nodes, which are the compatibility issues emphasized in this chapter, we need to strictly control the DOM
structure under the leaf
node. If there is only a single text node, it conforms to the design structure. However, if there are multiple nodes, except for Void/Embed
nodes, it means the DOM
structure has been compromised, and we need to remove the excess nodes.
Since our editor uses immutable
to enhance rendering performance, when there is a need for consecutive format processing in text node changes, such as the styling implementation of inline-code
, there may be issues with components not re-rendering. Specifically, having multiple consecutive code
nodes where the last node has a length of 1
, deleting this last node would cause the previous node to not refresh its style.
The reason for this issue is that our className
is dynamically calculated when rendering the leaf
node, as shown in the logic below. If the previous node does not exist or is not an inline-code
, the inline-code-start
class attribute is added. Similarly, the inline-code-end
class attribute needs to be added to the last node.
This situation is similar to the problem of Dirty DOM
. Since the deleted node has a length of 1
, the LeafState
of the previous node remains unchanged, thus not triggering a re-render by React
. In this case, we need to correct it during line node rendering, which doesn't require synchronous execution like the aforementioned checks, as async execution through an effect would be sufficient.
Even though it seems like the issue has been resolved, there are still some issues in React
due to the fact that the current DOM
handling is uncontrolled. For instance, in the following example, when React
handles the style
attribute, it only updates the properties that have changed. Even if the overall object is new, if the specific value remains the same as in the previous render, React
won't reset that style attribute.
Therefore, in the didPaintLineState
method mentioned above, we mainly add class attributes using classList
. Even if LeafState
changes, React
won't reset the class attributes. Hence, we also need to remove unnecessary class attributes when there are changes in didPaintLineState
.
Previously, we implemented a semi-controlled input mode, which is currently the mainstream implementation in most rich text editors. Here, we focus on the issues caused by the default behavior of browser ContentEditable
input mode, and we correct these issues through dirty DOM
checking to maintain the strict DOM
structure of the editor.
Currently, our main focus is on the text input issue in the editor, i.e., how to write the keyboard input content into the editor data model. The next step is to focus on the controlled handling of structural changes in the input mode, such as enter, delete, drag and drop operations, which are also based on input-related events and often involve structural changes in the text, constituting supplements to the input mode.