Exploring the Search and Replace Algorithm for Rich Text

In our online document system, besides the basic rich text editing capability, the search and replace algorithm is equally important in its implementation. A pure text search and replace algorithm is relatively easy to implement. However, in the context of rich text where there are mixed images and text, as well as embedded modules like Mention, implementing this function becomes considerably more complex. This article will focus on implementing search and replace using index-based search and layer rendering with Quill, a rich text editor implementation.

Description

Let's first consider a simple text search and replace algorithm. For plain text, we can easily search for the index of the corresponding text using some methods. Whether replacing directly or by cutting and combining, this functionality can be easily achieved. Rich text can essentially be seen as text with formatting, so in this case, we can continue with the same approach as with plain text. However, due to the text being segmented into multiple fragments, we need to address this data layer issue.

As previously emphasized, the current implementation is based on Quill as the core solution. Since the search and replace function is closely related to data structure implementation, quill-delta provides a flat data structure which makes it easier for us to handle this issue. With nested data structures like the one used in Slate, the implementation can still follow the same approach. However, using it as a DEMO may not be as straightforward, and dealing with replacement in nested data structures can be more complex.

Firstly, we need to construct an instance of the Quill editor. In our implementation, we just need to reference the quill.js resource and specify the DOM node for instantiation. Here, we have configured style modules registration. The related implementation details can be found at https://github.com/WindRunnerMax/QuillBlocks/blob/master/examples/find-replace.html.

<div class="editor-container">
    <div id="$editor"></div>
</div>
<script src="https://cdn.quilljs.com/1.3.6/quill.js"></script>
<script>
const editor = new Quill($editor, {
  modules: {
    toolbar: [[{ header: [1, 2, false] }], ["bold", "italic", "underline"], ["code-block"]],
  },
  placeholder: "Enter Text...",
  theme: "snow",
});
</script>

Quill uses quill-delta to describe rich text content. quill-delta is a flat data structure that not only describes rich text content but also its changes. In this case, we mainly focus on describing the document content. Therefore, we predominantly use the insert form for description. As mentioned earlier, rich text is essentially plain text with attributes. Hence, we can easily use the following style descriptions for normal and bold text.

const delta = new Delta([
  { insert: "Hello " },
  { insert: "World", attributes: { bold: true } },
]);

Fundamentally, our search and replace solution is based on finding the target content on the basis of data description. Looking at the data structure above, it's evident that there may be target content to be found in the attributes as well. For instance, in implementing the Mention component, we can store user information in the attributes to display @user information, while the content inserted into insert serves as a placeholder. Here, we implement the Mention plugin for Quill.

const Embed = Quill.import("blots/embed");
class Mention extends Embed {
  static blotName = "mention";
  static className = "ql-mention";
  static tagName = "SPAN";
  static create(value) {
    const node = super.create(value);
    node.innerHTML = `@${value}`;
    node.setAttribute("data-value", value);
    return node;
  }
  static value(domNode) {
    return domNode.getAttribute("data-value");
  }
}
Quill.register(Mention);

At this point, our preset data structure content appears as follows. We may notice that the insert for Mention in the data structure is an object, and the actual content is not in attributes but in insert. This is because in Quill, the description of parchment in the blots/embed uses the blotName object stored by insert to complete the value. Therefore, we still target the insert object for searching in the example, which doesn't actually require additional search implementations.

editor.setContents([
  { "insert": "Find and Replace" },
  { "attributes": { "header": 1 }, "insert": "\n" },
  { "insert": "The basic ability of rich text find and replace includes basic " },
  { "attributes": { "bold": true }, "insert": "text search" },
  { "insert": ", as well as embedded structures like " },
  { "attributes": { "italic": true }, "insert": "Mention" },
  { "insert": " for the " },
  { "insert": { mention: "search capability" } },
  { "insert": ".\n" },
]);

Search Algorithm

Firstly, let's design the search algorithm for the content. Since we have already mentioned that rich text is essentially text with attributes, let's first consider implementing a search for plain text. Text search algorithm implementation fundamentally involves matching string text, and one of the common algorithms that comes to mind is the KMP algorithm. In practice, however, the search functionality implemented in various text editors mostly uses the Boyer-Moore algorithm as it is more efficient.

For now, let's not focus on implementing the most efficient algorithm. Our goal is to find the indices of target text within long text. We can directly use the indexOf method of the String object for this purpose. To find all occurrences of the target text, we need to utilize the second parameter position of this method to specify the starting position for the search. Using regular expressions for indexing can sometimes cause issues where the input string is treated as part of the regex syntax itself, for example, when searching for the string (text), it may be mistaken for searching just text without proper handling.

In our test examples below, we can check the performance by running them at https://perf.link/. Place them in the Test Cases section and see how they perform. Surprisingly, the performance of indexOf is even better, making it a more favorable choice. Therefore, we will directly use indexOf for our search algorithm. In browser testing, the indexOf search in a ten-thousand-word text stabilizes below 1ms, which is quite acceptable. Of course, performance may vary depending on the computer's processing power.

indexOf // 141,400 ops/s
RegExp // 126,030 ops/s

Now, let's proceed with establishing the indexing for Delta. Since rich text representation includes style information reflected in the data structure as text segments, our search needs to concatenate all text into a single long string and then use indexOf for finding. Additionally, we have observed the existence of insert in the data structure, where we should consider it as non-searchable text content to prevent it from being included as part of the target text during searches.

For structures like the Mention module involving Void/Embed, special handling is required. Unlike plain text content searching, where we directly utilize the insert property, we cannot directly include these descriptions in the insert field for such cases. Consequently, full restoration of these parts back to the original data structure becomes crucial for displaying the search results in the appropriate format. Therefore, we need to iterate over delta.ops again and handle the properties that require special treatment case by case.

```js const rects = []; let iterated = 0; delta.ops.forEach(op => { if (typeof op.insert === "string") { iterated = iterated + op.insert.length; return void 0; } iterated = iterated + 1; if (op.insert.mention) { const value = op.insert.mention; const mentionIndex = value.indexOf(text, 0); if (mentionIndex === -1) return void 0; // Drawing node positions ... } });

Once all the data is collected, we need to build a virtual layer to display the search results. Firstly, we need to process the search results of the initially processed insert plain text. Since we usually use input to enter the search text targets, we don't need to consider the situation of \n. Handling the layer here is relatively convenient. The specific implementation of the virtual layer has been described clearly in the previous diff algorithm article, here we only implement the relevant calls.

const rangeDOM = document.createElement("div");
rangeDOM.className = "ql-range virtual-layer";
$editor.appendChild(rangeDOM);
const COLOR = "rgba(0, 180, 42, 0.3)";
const renderVirtualLayer = (startRect, endRect) => {
  // ...
};
const buildRangeLayer = ranges => {
  rangeDOM.innerText = "";
  ranges.forEach(range => {
    const startRect = editor.getBounds(range.index, 0);
    const endRect = editor.getBounds(range.index + range.length, 0);
    rangeDOM.appendChild(renderVirtualLayer(startRect, endRect));
  });
};

Here, the focus is on handling Mention, as we cannot use editor.getBounds to obtain its related BoundingClientRect. Therefore, we need to calculate its position manually. We need to retrieve the node where mentionIndex is located, then build a Range object using createRange, and obtain the ClientRect based on this Range. This way, we will get the necessary startRect and endRect, but it is important to note that we need to convert them to relative positions of the editor instance.

const [leaf, offset] = editor.getLeaf(iterated);
if (
  leaf &&
  leaf.domNode &&
  leaf.domNode.childNodes[1] &&
  leaf.domNode.childNodes[1].firstChild
) {
  const textNode = leaf.domNode.childNodes[1].firstChild;
  const startRange = document.createRange();
  startRange.setStart(textNode, mentionIndex + 1);
  startRange.setEnd(textNode, mentionIndex + 1);
  const startRect = startRange.getBoundingClientRect();
  const endRange = document.createRange();
  endRange.setStart(textNode, mentionIndex + 1 + text.length);
  endRange.setEnd(textNode, mentionIndex + 1 + text.length);
  const endRect = endRange.getBoundingClientRect();
  rects.push({ startRect, endRect });
}
rects.forEach(it => {
  const editorRect = editor.container.getBoundingClientRect();
  const startRect = {
    bottom: it.startRect.bottom - editorRect.top,
    height: it.startRect.height,
    left: it.startRect.left - editorRect.left,
    right: it.startRect.right - editorRect.left,
    top: it.startRect.top - editorRect.top,
    width: it.startRect.width,
  };
  const endRect = {
    bottom: it.endRect.bottom - editorRect.top,
    height: it.endRect.height,
    left: it.endRect.left - editorRect.left,
    right: it.endRect.right - editorRect.left,
    top: it.endRect.top - editorRect.top,
    width: it.endRect.width,
  };
  const block = renderVirtualLayer(startRect, endRect);
  rangeDOM.appendChild(block);
});

Batch Replacements

Implementing batch replacements on the data structure of Delta will be relatively straightforward. As we mentioned earlier, Delta can not only describe documents using the insert method but also with retain, delete, and insert methods to depict document changes. Therefore, upon the constructed ranges, we need to build the change description for the target. Since the previously constructed Mention does not allow substantive replacement operations, we only need to focus on the original insert text content.

The key here is to implement the construction of batch changes. First, we need to define an instance of Delta. The variable preIndex is used to keep track of the index position after the last execution. Within the loop through our ranges, retain is used to move the pointer to the current original text content being processed, followed by delete to remove it, and then insert for the target text replacement. Note that at this point, the pointer is after the target text, so we must update preIndex to the current index position being processed. Finally, apply these changes to the editor.

const batchReplace = () => {
  const text = $input1.value;
  const replace = $input2.value;
  if (!text) return void 0;
  const changes = new Delta();
  let preIndex = 0;
  ranges.forEach(range => {
    changes.retain(range.index - preIndex);
    changes.delete(range.length);
    changes.insert(replace);
    preIndex = range.index + range.length;
  });
  editor.updateContents(changes);
  onFind();
};

Daily Challenge

https://github.com/WindrunnerMax/EveryDay

References

https://quilljs.com/docs/delta https://www.npmjs.com/package/parchment https://www.npmjs.com/package/quill-delta/v/4.2.2 https://developer.mozilla.org/zh-CN/docs/Web/API/Document/createRange https://www.ruanyifeng.com/blog/2013/05/boyer-moore_string_search_algorithm.html https://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html