Half-Controlled Input Mode Based on Composite Events

Previously, we achieved bidirectional synchronization between the editor selection and model selection to implement controlled selection operations, which is a fundamental capability in the editor. Next, building upon the editor selection module, we need to implement a half-controlled input mode using composite events in the browser, handling complex default behaviors of the DOM structure and ensuring compatibility with various input scenarios of IMEs.

Articles on implementing a rich text editor project from scratch:

Editor Input Mode

The Input module is responsible for handling input in the editor, which is one of the core operations. We need to manage input methods such as input method editors (IME), keyboard, mouse, etc. The interaction with IMEs requires extensive compatibility handling, including candidate words, predictive text, shortcuts, accents, etc. Mobile IME compatibility poses even more challenges, and compatibility issues specific to mobile IMEs are separately outlined in the draft.

Similar to the selection module, the editor input module needs to manipulate its default behavior based on the browser's DOM, especially when activating IMEs, which requires additional modules to work in unison, thereby necessitating complex compatibility adaptations. Input modes can be categorized into three types: uncontrolled input, half-controlled input, and controlled input, each tailored for specific use cases and implementation methods.

Uncontrolled Input

The uncontrolled method relies entirely on the browser's default behavior to handle input operations without intervention or modification. Changes in the DOM structure need to be monitored and applied to the editor after collection. While this method maximizes the use of native browser capabilities like selections and cursors, its main drawback is the lack of control over input, inability to prevent default behaviors, and instability.

For example, a common issue currently is that ContentEditable cannot completely block IME input, resulting in the inability to fully control Chinese input behavior. In the following example, inputting English and numbers won't trigger responses, but Chinese characters can be input normally. This limitation is one reason why many editors opt for custom selection rendering and controlled input, as seen in applications like VSCode and DingTalk.

<div contenteditable id="$1"></div>
<script>
  const stop = (e) => {
    e.preventDefault();
    e.stopPropagation();
  };
  $1.addEventListener("beforeinput", stop);
  $1.addEventListener("input", stop);
  $1.addEventListener("keydown", stop);
  $1.addEventListener("keypress", stop);
  $1.addEventListener("keyup", stop);
  $1.addEventListener("compositionstart", stop);
  $1.addEventListener("compositionupdate", stop);
  $1.addEventListener("compositionend", stop);
</script>

When using the uncontrolled method for input, we utilize MutationObserver to identify the current input characters, then parse the DOM structure to extract the latest Text Model. Subsequently, a diff operation is performed against the original Text Model to determine the changes and generate ops, which can then be applied to the current Model for further processing.

Even within uncontrolled input, there are various implementation approaches. For instance, one can perform text diff based on behavior after triggering Input events to derive ops and combine attributes based on a schema. Alternatively, one can rely entirely on MutationObserver to capture fragment-level changes in nodes, followed by a diff operation. The famous Quill editor employs this approach.

Handling inputs in Quill is not overly complex, despite involving significant event communication and special case handling. The core logic remains relatively clear. However, a potential challenge is that the view layer encapsulated by parchment is not in the core package. Although some methods are inherited and overridden, debugging can still be cumbersome, especially as elements like Text are directly exported from Quill.

Overall, the uncontrolled input of quill is divided into two processing methods. For regular ASCII input, it directly compares the MutationRecord's oldValue with the latest newText text to obtain the changed ops. However, for input methods like IME, such as Chinese input, it may lead to multiple Mutations, resulting in a full delta comparison to identify the changes.

// https://github.com/slab/quill/blob/07b68c9/packages/quill/src/core/editor.ts#L273
const oldDelta = this.delta;
if (
  mutations.length === 1 &&
  mutations[0].type === 'characterData' &&
  mutations[0].target.data.match(ASCII) 
) {
  const textBlot = this.scroll.find(mutations[0].target) as Blot;
  const index = textBlot.offset(this.scroll);
  const oldValue = mutations[0].oldValue.replace(CursorBlot.CONTENTS, '');
  const oldText = new Delta().insert(oldValue);
  const newText = new Delta().insert(textBlot.value());
  const diffDelta = new Delta()
    .retain(index)
    .concat(oldText.diff(newText, relativeSelectionInfo));
} else {
  this.delta = this.getDelta();
  if (!change || !isEqual(oldDelta.compose(change), this.delta)) {
    change = oldDelta.diff(this.delta, selectionInfo);
  }
}

The key issue to note here is why textBlot can obtain the latest value, whether from MutationRecord or getDelta, it is acquired using textBlot.value(). The getDelta section iterates through all Bolts to obtain the latest value, with this part being cached by line to prevent potential performance issues.

// https://github.com/slab/quill/blob/07b68c9/packages/quill/src/core/editor.ts#L162
this.scroll.lines().reduce((delta, line) => {
  return delta.concat(line.delta());
}, new Delta());

// https://github.com/slab/quill/blob/07b68c9/packages/quill/src/blots/block.ts#L183
function blockDelta(blot: BlockBlot, filter = true) {
  return blot
    .descendants(LeafBlot)
    .reduce((delta, leaf) => {
      if (leaf.length() === 0) {
        return delta;
      }
      return delta.insert(leaf.value(), bubbleFormats(leaf, {}, filter));
    }, new Delta())
    .insert('\n', bubbleFormats(blot));
}

TextBlot is implemented in parchment, hence debugging can be challenging. It is important to focus on updating to the latest text; we only need to focus on updates related to plain text content. Blot has an update method that is triggered when the DOM changes. It is crucial to note that updates are obtained from static methods, not from the instance's .value.

// https://github.com/slab/parchment/blob/3d0b71c/src/blot/text.ts#L80
public update(mutations: MutationRecord[], _context: { [key: string]: any }): void {
  if (
    mutations.some((mutation) => {
      return (mutation.type === 'characterData' && mutation.target === this.domNode);
    })
  ) {
    this.text = this.statics.value(this.domNode);
  }
}
public static value(domNode: Text): string {
  return domNode.data;
}

Additionally, it is important to consider the timing of updates, which means that the call timing must update the content of the Blot first to obtain the latest text content, and finally schedule the scroll update to update the editor model. We primarily focus on input changes, but there are also DOM structure changes caused by format, which belong to the optimize method to handle the MutationRecord part.

// https://github.com/slab/parchment/blob/3d0b71c/src/blot/scroll.ts#L205
// handleCompositionEnd - batchEnd - scrollUpdate - blotUpdate - editorUpdate
mutations
.map((mutation: MutationRecord) => {
  const blot = this.find(mutation.target, true);
  // ...
})
.forEach((blot: Blot | null) => {
  if (blot != null && blot !== this && mutationsMap.has(blot.domNode)) {
    blot.update(mutationsMap.get(blot.domNode) || [], context);
  }
});

There is an interesting implementation when executing the diff method with the cursor parameter, considering a scenario where text changes from xxx to xx, there are many possibilities. It could be deleting a character at any position, deleting a character forward at the cursor position, or even deleting two x and then inserting an x.

Therefore, to precisely identify the changes in ops, the cursor position needs to be passed into the diff method. This allows the string to be divided into three segments, where the prefixes and suffixes are the same, and the middle part serves as the differing content. Handling this part is a high-frequency operation, bypassing the complexity of the actual diff process, for better performance in handling text changes.

// https://github.com/jhchen/fast-diff/blob/da83236/diff.js#L1039
var newBefore = newText.slice(0, newCursor);
var newAfter = newText.slice(newCursor);
var prefixLength = Math.min(oldCursor, newCursor);
var oldPrefix = oldBefore.slice(0, prefixLength);
var newPrefix = newBefore.slice(0, prefixLength);
var oldMiddle = oldBefore.slice(prefixLength);
var newMiddle = newBefore.slice(prefixLength);
return remove_empty_tuples([
  [DIFF_EQUAL, before],
  [DIFF_DELETE, oldMiddle],
  [DIFF_INSERT, newMiddle],
  [DIFF_EQUAL, after],
]);

Semi-Controlled Input

Semi-controlled method refers to handling English input, content deletion, IME input, and additional KeyDown, Input events through BeforeInputEvent and CompositionEvent, to assist in completing this work. This approach allows interception of user input to construct changes applied to the current content model.

Of course, for events like CompositionEvent, extra handling is needed because, as mentioned earlier, IME input cannot be fully controlled, hence semi-controlled is the mainstream implementation method. Due to browser compatibility, compatibility processing for BeforeInputEvent is usually needed, such as leveraging React's synthetic events or onKeyDown to achieve the necessary compatibility.

The input mode of the slate editor is implemented in a semi-controlled manner, primarily based on the beforeinput event and composition related events to handle input and deletion operations. Initially, when slate was first implemented, the beforeinput event was not widely supported, but now it can be used in most modern browsers, while the composition event has long been widely supported.

Looking at the controlled part, our control specifically refers to preventing default input behavior, where we can actively update the editor model based on relevant events. In this input scenario, we mainly focus on the inputType related to insert; yet, there are numerous patterns to handle on input, and slate also has extensive compatibility logic to address various browser implementation issues.

// https://github.com/ianstormtaylor/slate/blob/ef76eb4/packages/slate-react/src/components/editable.tsx#L550
By looking at the example above, you can see that the `inputType` itself has a lot of different operation types that need to be handled. Apart from input and deletion, there are also operation types like formatting and history. However, for now, we are mainly focused on input and deletion-related operations. Here are some common `inputType` types that might need to be handled:

- `insertText`: Insert text, usually done through keyboard input.
- `insertReplacementText`: Replace the text of the current selection or word, for example, through spell checking or autocomplete.
- `insertLineBreak`: Insert a line break, usually done by pressing the Enter key.
- `insertParagraph`: Insert a paragraph separator, often found in a `ContentEditable` element by pressing Enter.
- `insertFromDrop`: Insert content by dragging and dropping.
- `insertFromPaste`: Insert content by pasting.
- `insertTranspose`: Swap the positions of two characters, commonly seen in macOS with `Ctrl+T`.
- `insertCompositionText`: Insert composition text from an input method editor (IME).
- `deleteWordBackward`: Delete a word backward, for example, with `Option+Backspace`.
- `deleteWordForward`: Delete a word forward, for example, with `Option+Delete`.
- `deleteSoftLineBackward`: Delete a line backward when using auto line break.
- `deleteSoftLineForward`: Delete a line forward when using auto line break.
- `deleteEntireSoftLine`: Delete the entire soft line where the cursor is.
- `deleteHardLineBackward`: Delete a line backward when using hard line breaks.
- `deleteHardLineForward`: Delete a line forward when using hard line breaks.
- `deleteByDrag`: Delete content by dragging.
- `deleteByCut`: Delete content by cutting.
- `deleteContent`: Delete content forward, meaning the `Delete` key.
- `deleteContentBackward`: Delete content backward, meaning the `Backspace` key.

In reality, it's challenging to address all these events, especially since soft line break-related content is not commonly used in browser-based editors, so we can treat it as a hard line break. Both Quill and Slate handle it as a hard line break, whereas TinyMCE and TipTap implement soft line breaks, where `Shift+Enter` inserts `<br>` instead of creating a new paragraph.

It's important to pay attention to the related information transmitted in the events. For instance, `deleteWord` involves deleting content at the word level, and this data range is obtained through `getTargetRanges` as an array of `StaticRange` to pass. Additionally, `insertCompositionText` and `insertFromPaste` can be handled in the `Composition` event and `Paste` event, respectively.

```js
// [StaticRange]
[{
  collapsed: false,
  endContainer: text,
  endOffset: 4,
  startContainer: text,
  startOffset: 2
}]

Next, we can focus on the uncontrolled part of Slate, which is due to the inability to fully control IME input, leading to compatibility issues that must be addressed. Handling this compatibility in Slate can be somewhat complex, with inconsistent behavior across different browsers; for example, in Safari, there is an insertFromComposition type that needs correction in similar scenarios to maintain the editor model.

Apart from the inability to prevent default behaviors, the uncontrolled aspect is also evident in modifications to the DOM structure. This part can be considered the most challenging to handle because once the IME is activated, it inevitably modifies the DOM. This means that this portion of the DOM is in an unknown state, and any unforeseen changes in the DOM content could disrupt the synchronization of the editor model, requiring additional compatibility efforts.

```js // https://github.com/ianstormtaylor/slate/blob/ef76eb4/packages/slate-react/src/components/editable.tsx#L1299 // COMPAT: In Chrome, `beforeinput` events for compositions // aren't correct and never fire the "insertFromComposition" // type that we need. So instead, insert whenever a composition // ends since it will already have been committed to the DOM. if ( !IS_WEBKIT && !IS_FIREFOX_LEGACY && !IS_IOS && !IS_WECHATBROWSER && !IS_UC_MOBILE && event.data ) { Editor.insertText(editor, event.data) }

Controlled Input

A fully controlled approach refers to the method where characters are recorded when any content is entered, the original content is deleted when the input ends, and a new Model is constructed. Full control usually requires a hidden input box or even an iframe to be used. Since only one focus must be maintained on the browser page, this method also requires the implementation of custom selection.

There are many details to be handled, such as drawing content during CompositionEvent without triggering collaboration. Additionally, to achieve a consistent input experience with browsers, such as the pinyin status prompt when the input method is activated in a browser, this prompt is not only for display, pressing the left and right keys allows for candidate word selection changes, which also need to be simulated in a fully controlled mode.

In an editor implemented with a controlled mode, we can classify the dependence on browser APIs into three categories based on the level of reliance, from high to low reliance, representing a varying level of implementation difficulty. The three types are those dependent on iframe focus magic and Editable types, those not dependent on Editable but rely on DOM for custom selection implementation, and those fully based on Canvas drawing.

We can find typical editor implementations for these three types, such as the TextBus relying on iframe magic, documents like DingTalk and Zoom relying on custom selection, and documents like Tencent Docs and Google Docs fully based on Canvas drawing. In reality, there are relatively few open-source editor implementations of controlled input mode because it is complex to implement and requires a lot of compatibility handling.

Next, let's take a look at these three types separately. Firstly, the implementation method of iframe magic needs to be discussed, which inevitably involves browser focus issues. In browsers, the selected text's focus is placed on the selected text, and clicking on another input field at this time will cause the focus to shift, which can be viewed through document.activeElement.

<div tabindex="-1">After selecting text, click on the input to observe focus shift</div>
<input />
<script>
  document.onselectionchange = () => {
    console.log("Focused Element", document.activeElement);
  };
</script>

There are certain specifications regarding what elements can gain focus, such as editable elements, the tabindex attribute, a tags, and so on, which we will not delve too deeply into. The problem here is that if we place a separate input to receive input instead of directly relying on Editable, there will be an issue with the browser's selection shifting, making it impossible to select text.

Therefore, typically, when choosing to use an additional input to handle input, we must independently draw the selection effect, known as the drag-blue effect. However, in the presence of iframes, browsers do not strictly maintain a singular selection effect, which is what we refer to as magic, such as the very unique implementation of TextBus mentioned earlier.

TextBus does not use common implementation solutions like ContentEditable or custom selection as seen in CodeMirror or Monaco. From examining the DOM nodes of the Playground, it maintains a hidden iframe for implementation. Within this iframe, there is a textarea used to process IME input.

Now looking at a simple example with iframes and content selection focus competition, it can be noticed that under continuous iframe competition, we cannot drag text selections. It is worth mentioning that we cannot directly focus in the onblur event, as this operation will be blocked by the browser and must be triggered asynchronously in a macro task.

```html <span>123123</span> <iframe id="$1"></iframe> <script> When focusing, we call `$1.focus` directly. If we were to call `win.focus` at this point, we would notice that text selection is draggable. This behavior reveals that text selections inside and outside of iframes are completely independent. If the focus is within the same frame, they compete for focus; if not, the selections behave normally, highlighting the difference between `$1` and `win`. Additionally, notice that the text selection is gray at this point. This can be styled using the `::selection` pseudo-element, and all events can be triggered normally, such as the `SelectionChange` event and manually setting selections. Placing a `textarea` directly inside the `iframe` allows for normal text input without disrupting IME input. ```html <span>123123</span> <iframe id="$1"></iframe> <script> const win = $1.contentWindow; const textarea = document.createElement("textarea"); $1.contentDocument.body.appendChild(textarea); textarea.focus(); textarea.addEventListener("blur", () => { setTimeout(() => textarea.focus(), 0); }); win.addEventListener("blur", () => console.log("blur")); win.addEventListener("focus", () => console.log("focus")); win.focus(); </script>

The key point is that this "Magic" behavior can be triggered in various browsers. Specifically on desktop browsers; behavior might differ on mobile browsers due to inconsistent event standards for key inputs, as highlighted in the draft.js documentation under "Not Fully Supported" for mobile devices.

As for fully implementing custom text selections in editors, I have not come across any open-source implementations yet. The complexity lies in simulating the entire browser's interaction behaviors. Browsers handle many intricate details of text selection interactions, such as extending selections when dragging even off text areas, and the threshold for selecting characters in the middle of dragged text.

Rich-text editors have not been closely monitored, but code editors like CodeMirror and VSCode (Monaco) have implemented custom text selections. Commercial online document products like DingTalk Docs, Zoom Docs, and Youdao Cloud Notes also have custom text selections. Since the selection DOM typically does not respond to mouse events, direct DOM manipulation can be used for debugging.

document.querySelectorAll(`[style*="pointer-events: none;"]`);
[...document.querySelectorAll("*")].filter(node => node.style.pointerEvents === "none");

For implementations like DingTalk Docs which use it as a web-component, a bit more effort may be required for exploration. Additionally, a previous mentioned method of custom text selection involves caretRangeFromPoint and caretPositionFromPoint APIs to calculate selection positions; refer to the article on the core interaction strategies of browser selection models.

Lastly, editors entirely drawn using Canvas present a different challenge, requiring manual drawing for both text and selections. Since browsers provide only basic APIs for Canvas, it acts as a blank canvas, necessitating self-implemented features and event handling, making it quite laborious.

The typical implementations are demonstrated in Google Docs and Tencent Docs, both commercial document editors fully based on Canvas drawing. Google Docs, as the pioneer of Canvas implementation, has articles comparing its old and new versions, particularly noting updates in editing interfaces and layout engines; a link can be found in the reference section at the end.

Interestingly, compared to editors relying on controlled DOM inputs, open-source implementations of rich-text editors based on Canvas drawing do exist, such as canvas-editor, a well-developed open-source rich-text editor utilizing Canvas drawing. However, unless there is a clear need for features like word processing, pagination, layout, and printing, relying solely on Canvas for implementation remains a costly endeavor.

The high implementation cost here mainly comes from two aspects. Firstly, the layout engine for editor's WYSIWYG is a self-implemented requirement. For instance, in `Word`, when we write text and it perfectly fills a line, adding a period would simply compress the existing text without line break. However, if we add one more character, it will result in a line break. ```html <!-- word --> Text text text text text text text text text text text text text text text text text text text text. <!-- Browser --> Text text text text text text text text text text text text text text text text text text text text text .

As demonstrated above, the behavior in a browser is different. Thus, to surpass the browser's formatting limitations, one has to develop their own layout capabilities. Rendering the position of each character, deciding line breaks, and other layout strategies all need to be implemented manually, leading to a myriad of boundary conditions to consider. In my previous work with an editor based on Canvas, a significant amount of time was dedicated to the rich text drawing and layout part.

Secondly, there is the implementation of selection rendering. As we discussed earlier, the caretRangeFromPoint and caretPositionFromPoint APIs are used to calculate the selection position, provided by the browser's selection calculation capabilities. When working with text drawn on Canvas, devoid of the DOM, all calculations for such functionalities need to be done manually. However, details like character width are stored, making the task less complex.

While Canvas can break free from the browser's formatting constraints, eliminating the performance issues stemming from DOM complexities, the inherent complexity of using Canvas itself poses a challenge. Moreover, abandoning DOM essentially means forsaking the current relevant ecosystem, including aspects like SEO and accessibility, which cannot be directly utilized. Hence, unless there is an absolute necessity, one should approach this transition with caution.

Coming back to the input part, given that direct interaction with IME is not possible in the browser, one is limited to handling events triggered by the browser, essentially relying on the input function for user input interaction, which corresponds with the self-implemented selection rendering mode using DOM. Implementing rich text using Canvas is akin to building a browser's layout engine, demanding considerable effort.

Additionally, concerning collaborative input, we mentioned earlier that word selection changes when pressing left or right keys, which should not affect other clients in a collaborative setting. Nevertheless, as the content length changes, collaboration cannot simply filter out the local attribute. Fully mimicking browser behavior for formatting and interacting with the DOM requires accounting for these details, without entirely decoupling from the existing rendering framework.

Hence, this collaborative input aspect needs extra handling. The straightforward approach involves temporarily suspending collaborative processing, merging the intended final states, and then collaboratively sharing the consolidated state. Another method is extending the Z on the AXY scheduling model to implement a local queue. As the queue content is locally applied, methods for moving op within the queue before and after are necessary. We can dive deeper into this temporary collaboration aspect later.

Semi-controlled Input Implementation

Based on the aforementioned input mode overview, our focus now shifts to the implementation of semi-controlled input mode. This mode stands as the predominant approach for most rich text editors today. Generally, in the semi-controlled mode, while ensuring a streamlined user input experience, it offers a relatively good degree of control and flexibility. Drawing from our prior discussions on input design and abstraction, we can relatively straightforwardly design the entire process:

  • Mapping selections to our internally maintained Range Model, involving transforming selections mapped from DOMRange to the Model. This step requires substantial lookup and iteration, supplemented by using the WeakMap object discussed earlier to find the Model for position calculations.
  • Inputting through the keyboard, utilizing the browser's BeforeInputEvent and CompositionEvent to respectively handle input/deletion and IME input. Constructing Delta Change based on input to apply to the state structure and trigger ContentChange, thereby prompting a view layer update.
  • Subsequent to the view layer update, refreshing the model selections based on the browser's DOM and our maintained Model becomes imperative. This selection change interactivity necessitates simulating browser behavior, transferring the Model mapped selections to DOMRange selections and then applying them to the browser's Selection object, involving multiple boundary conditions.

At one point, I considered completing the task through self-drawn selections and cursors. I found controlling input via Editable to be quite challenging, especially when it comes to the IME, which can easily disrupt the current DOM structure. Consequently, dirty data checks, forcibly refreshing the DOM structure might be needed. However, a brief insight indicated that self-drawing has its own challenges, thus opting for the widely-used Editable.

However, even Editable poses numerous challenges, including a myriad of details that are hard to cover comprehensively. For instance, perceiving how to detect damage to the DOM, requiring forced refreshes. When addressing all edge cases, the complexity of the code increases, potentially paving the way for performance issues, particularly when dealing with extensive documents.

Regarding performance, apart from the aforementioned WeakMap optimization strategy, various areas merit further enhancements. Due to the nature of the Delta data structure, maintaining a reciprocal transformation between Range-RawRange selections is essential. Since we have recorded the start and size of LineState, allowing bidirectional sorting based on start, envisioning a binary approach for searches becomes viable. Furthermore, as we can predict precisely what content has been updated with each state, reusing the original state objects through computation instead of refreshing all objects with each update ensures an immutable approach, easing the maintenance intricacies and difficulties.

One more thing, a flattened data structure would be more suitable for large documents. Flattening implies simplicity, like the current Delta, which is a flattened data structure. However, random access efficiency might be slightly slower. Perhaps when performance issues arise, it might be necessary to consider incorporating some data storage solutions such as PieceTable, although that seems a bit far off for now.

Controlled Input Mode

In this context, controlled input mode refers to the parts that do not require invoking the IME input method editor, typically indicating English input, numerical input, and so on. Building upon the above, our implementation here can become more straightforward. You simply need to prevent all default behaviors and then handle the original behaviors in a controlled manner. Taking text insertion insertText and deletion deleteContentBackward as examples, we can implement input and deletion of content.

// packages/core/src/input/index.ts
onBeforeInput(event: InputEvent) {
  event.preventDefault();
  const { inputType, data = "" } = event;
  const sel = this.editor.selection.get();
  switch (inputType) {
    case "deleteContent":
    case "deleteContentBackward": {
      this.editor.perform.deleteBackward(sel);
      break;
    }
    case "insertText": {
      data && this.editor.perform.insertText(sel, data);
      break;
    }
  }
}

The specific changes are encapsulated in the perform class. When inserting text content, you first need to get the current selection's status node. If the current node is a void node, input of content should be avoided. Then, retrieve the ready-to-use attributes of the folding selection or the tail attribute values in the case of a non-collapsed selection, and finally construct the changes in delta to apply to the editor.

// packages/core/src/perform/index.ts
const raw = RawRange.fromRange(this.editor, sel);
const point = sel.start;
const leaf = this.editor.lookup.getLeafAtPoint(point);
// Cannot insert text when the current node is void
if (leaf && leaf.void) return void 0;
let attributes: AttributeMap | undefined = this.editor.lookup.marks;
if (!sel.isCollapsed) {
  // For non-collapsed selections, determine leaf marks based on the start position
  const isLeafTail = isLeafOffsetTail(leaf, point);
  attributes = this.editor.lookup.getLeafMarks(leaf, isLeafTail);
}
const delta = new Delta().retain(raw.start).delete(raw.len).insert(text, attributes);
this.editor.state.apply(delta, { range: raw });

Dealing with content deletion becomes more complex, as we need to consider the state of line breaks during deletion. The main issue is that our line attributes are placed on the line breaks, which can be counterintuitive. The way EtherPad manages line formatting by placing it at the beginning of lines leads to many rendering-related behaviors, as various line formats like lists, quotes, etc., are rendered at the start of lines.

This aspect of interaction strategies can become considerably intricate. For instance, if the previous line is formatted as a heading, and the current line is a quote format with the cursor positioned at the beginning of the line, directly deleting content might result in the heading format being deleted and the quote format merging with the previous line. This behavior aligns with the quill editor and is primarily influenced by the data structure. To achieve a more intuitive result, either the data structure or content changes need to be addressed.

Modifying the data structure directly would entail complex compatibility implementations, such as basic normalization requiring the absence of continuous text before \n tokens, consideration for block structures, as well as the need to evaluate the presence of attributes at the start of lines. Therefore, here we strive to maintain the document's data structure as much as possible through handling changes in content.

// packages/core/src/perform/index.ts
// When the previous line is a block node and is at the beginning of the current line, moving the cursor to this node upon deletion
if (prevLine && isBlockLine(prevLine)) {
  // Special handling when the current line is empty, delete the line first
  if (isEmptyLine(line)) {
    const delta = new Delta().retain(line.start).delete(1);
    this.editor.state.apply(delta, { autoCaret: false });
  }
  const firstLeaf = prevLine.getFirstLeaf();
  const range = firstLeaf && firstLeaf.toRange();
  range && this.editor.selection.set(range, true);
  return void 0;
}
const attrsLength = Object.keys(line.attributes).length;
// If at the beginning of the current line and no other line attributes exist, move the current line attributes to the next line
if (prevLine && !attrsLength) {
  const prevAttrs = { ...prevLine.attributes };
  const delta = new Delta()
    .retain(line.start - 1)
    .delete(1)
    .retain(line.length - 1)
    .retain(1, prevAttrs);
  this.editor.state.apply(delta);
  return void 0;
}

The basic content deletion handling is relatively straightforward here, as it only requires deleting content with a length of 1. Of course, due to the presence of content like Emoji, the length is often greater than 1, and using alt+delete deletes content from a word perspective, which we will address later.

// packages/core/src/perform/index.ts
const raw = RawRange.fromRange(this.editor, sel);
const start = raw.start - len;
const delta = new Delta().retain(start).delete(len);
this.editor.state.apply(delta, { range: raw });

The input part may not seem very complex at first glance, but it's not as straightforward as it appears. For example, after mapping the selection to our self-maintained Range Model, when performing an input operation, let's say we have two spans at the beginning, with the current DOM structure being <span>DOM1</span><span>DO|M2</span>, where | denotes the cursor position.

If we insert content x between DO and M2 characters in the second span, whether through code apply or user input, it will cause the DOM2 span to undergo a ContentChange due to apply resulting in DOM node refresh, meaning the second span is no longer the original span but a new object.

This change in DOM results in the browser's cursor no longer locating the original DOM2 span structure, so the cursor now becomes <span>DOM1|</span><span>DOxM2</span>. While we might expect the selection to adjust accordingly during input, practical evidence shows that this method is not effective because the DOM nodes are not consistent.

Therefore, what's missing here is updating the DOM Range based on our Range Model and updating the DOM Range as soon as the DOM structure is finalized. This operation needs to be carried out in useLayoutEffect rather than useEffect, similar to DidUpdate in class components, to proactively update the DOM Range.

// packages/react/src/model/block.tsx
/**
 * When updating the view, the selection needs to be reset with no dependencies
 */
useLayoutEffect(() => {
  const selection = editor.selection.get();
  if (
    !editor.state.get(EDITOR_STATE.COMPOSING) &&
    editor.state.get(EDITOR_STATE.FOCUS) &&
    selection
  ) {
    // Update the browser selection
    editor.logger.debug("UpdateDOMSelection");
    editor.selection.updateDOMSelection(true);
  }
});

Uncontrolled Input Mode

Here the uncontrolled input mode refers to the part that needs to wake up the IME input method, usually referring to Chinese input, Japanese input, and so on. Since it is an uncontrolled mode, it is easy to cause some issues because the input method in the browser will directly modify the DOM, and we cannot prevent this behavior. Therefore, we can only make corrections after the DOM changes, which is what we commonly refer to as dirty DOM checking.

For example, initially, the current DOM structure is <s>DOM1</s><b>DOM2</b>. At this point, when we input Chinese characters at the end of the two DOM elements, that is, triggering the IME input method. When we type the words "try out," without applying any additional style, similar to inline-code, the DOM structure will change to <s>DOM1</s><b>DOM2 try out</b><s>try out</s>.

It is obvious that the text inside the <b> tags is abnormal. At this point, our data structure in Delta is correct, as our defined schema does not add any styles. However, this discrepancy causes concern; although the state and delta of the <b> tag have not changed for us, the DOM has changed due to the input method.

When our Model, maintained by us, is mapped to React's Fiber, because the Model has not changed, React, based on the VDOM diff result, determines that there is no change and proceeds to reuse this DOM structure. However, in reality, this DOM structure has been disrupted by our IME input, leading to issues since we cannot control the IME input.

As we prevent the default behavior during English input, the original DOM structure remains unchanged. Therefore, here, we need to conduct dirty data checks and correct any inconsistencies to ensure the final data is accurate. Presently, one approach being taken is to handle the most basic Text components. In the ref callback, we check if the current content matches op.insert; if not, we clear all nodes except the first one and revert the content of the first node to the original text content.

When it comes to Chinese input, two aspects need attention. First, when waking up the IME input method, we need to avoid triggering editor-related events such as selection changes and input events. Second, we need to pay attention to the event that marks the end of input method usage, which is the compositionend event. After input method input ends, we can insert content here and perform the aforementioned dirty DOM check.

Concerning the composition event sequence, it consists of three events: compositionstart, compositionupdate, and compositionend, corresponding to the awakening, updating, and ending of the input method. Even if not implementing an editor, this is relevant in inputting content; for instance, when pressing Enter, if you do not check whether the input method is activated, unintended actions may occur.

<input id="$1" />
<script>
  $1.addEventListener("compositionstart", (e) => console.log("Composition started", e));
  $1.addEventListener("compositionupdate", (e) => console.log("Composition updated", e));
  $1.addEventListener("compositionend", (e) => console.log("Composition ended", e));
  $1.onkeydown = (e) => {
    if (e.key === "Enter") {
      if (e.isComposing) {
        console.log("Enter key pressed during composition");
        return;
      }
      console.log("Enter key pressed");
    }
  };
</script>

Summary

Previously, we implemented the selection module for the editor, achieving a controlled selection synchronization mode, which is one of the core state synchronization modes mentioned in the MVC layered architecture. Here, building upon the selection module, we utilize browser composition events to implement a semi-controlled input mode. This is also an important implementation of state synchronization, and widely used in most rich text editors as the mainstream input method.

Next, we will focus on handling the default behaviors of complex DOM structures in browsers, as well as various input scenarios for IME input method compatibility. Essentially, we will address input method and browser compatibility behaviors on a case-by-case basis. For example, we need to handle issues like the length of Emoji emoticons, DOM structure input method operations, more complex dirty DOM checks, and more.

Daily Challenge

References