In our online document system, besides the basic rich text editing capability, the search and replace algorithm is equally important in its implementation. A pure text search and replace algorithm is relatively easy to implement. However, in the context of rich text where there are mixed images and text, as well as embedded modules like Mention
, implementing this function becomes considerably more complex. This article will focus on implementing search and replace using index-based search and layer rendering with Quill
, a rich text editor implementation.
Let's first consider a simple text search and replace algorithm. For plain text, we can easily search for the index of the corresponding text using some methods. Whether replacing directly or by cutting and combining, this functionality can be easily achieved. Rich text can essentially be seen as text with formatting, so in this case, we can continue with the same approach as with plain text. However, due to the text being segmented into multiple fragments, we need to address this data layer issue.
As previously emphasized, the current implementation is based on Quill
as the core solution. Since the search and replace function is closely related to data structure implementation, quill-delta
provides a flat data structure which makes it easier for us to handle this issue. With nested data structures like the one used in Slate
, the implementation can still follow the same approach. However, using it as a DEMO
may not be as straightforward, and dealing with replacement in nested data structures can be more complex.
Firstly, we need to construct an instance of the Quill
editor. In our implementation, we just need to reference the quill.js
resource and specify the DOM node for instantiation. Here, we have configured style modules registration. The related implementation details can be found at https://github.com/WindRunnerMax/QuillBlocks/blob/master/examples/find-replace.html
.
Quill
uses quill-delta
to describe rich text content. quill-delta
is a flat data structure that not only describes rich text content but also its changes. In this case, we mainly focus on describing the document content. Therefore, we predominantly use the insert
form for description. As mentioned earlier, rich text is essentially plain text with attributes. Hence, we can easily use the following style descriptions for normal and bold text.
Fundamentally, our search and replace solution is based on finding the target content on the basis of data description. Looking at the data structure above, it's evident that there may be target content to be found in the attributes
as well. For instance, in implementing the Mention
component, we can store user
information in the attributes
to display @user
information, while the content inserted into insert
serves as a placeholder. Here, we implement the Mention
plugin for Quill
.
At this point, our preset data structure content appears as follows. We may notice that the insert
for Mention
in the data structure is an object, and the actual content is not in attributes
but in insert
. This is because in Quill
, the description of parchment
in the blots/embed
uses the blotName
object stored by insert
to complete the value
. Therefore, we still target the insert
object for searching in the example, which doesn't actually require additional search implementations.
Firstly, let's design the search algorithm for the content. Since we have already mentioned that rich text is essentially text with attributes, let's first consider implementing a search for plain text. Text search algorithm implementation fundamentally involves matching string text, and one of the common algorithms that comes to mind is the KMP
algorithm. In practice, however, the search functionality implemented in various text editors mostly uses the Boyer-Moore
algorithm as it is more efficient.
For now, let's not focus on implementing the most efficient algorithm. Our goal is to find the indices of target text within long text. We can directly use the indexOf
method of the String
object for this purpose. To find all occurrences of the target text, we need to utilize the second parameter position
of this method to specify the starting position for the search. Using regular expressions for indexing can sometimes cause issues where the input string is treated as part of the regex syntax itself, for example, when searching for the string (text)
, it may be mistaken for searching just text
without proper handling.
In our test examples below, we can check the performance by running them at https://perf.link/
. Place them in the Test Cases
section and see how they perform. Surprisingly, the performance of indexOf
is even better, making it a more favorable choice. Therefore, we will directly use indexOf
for our search algorithm. In browser testing, the indexOf
search in a ten-thousand-word text stabilizes below 1ms
, which is quite acceptable. Of course, performance may vary depending on the computer's processing power.
Now, let's proceed with establishing the indexing for Delta
. Since rich text representation includes style information reflected in the data structure as text segments, our search needs to concatenate all text into a single long string and then use indexOf
for finding. Additionally, we have observed the existence of insert
in the data structure, where we should consider it as non-searchable text content to prevent it from being included as part of the target text during searches.
For structures like the Mention
module involving Void/Embed
, special handling is required. Unlike plain text content searching, where we directly utilize the insert
property, we cannot directly include these descriptions in the insert
field for such cases. Consequently, full restoration of these parts back to the original data structure becomes crucial for displaying the search results in the appropriate format. Therefore, we need to iterate over delta.ops
again and handle the properties that require special treatment case by case.
Once all the data is collected, we need to build a virtual layer to display the search results. Firstly, we need to process the search results of the initially processed insert
plain text. Since we usually use input
to enter the search text targets, we don't need to consider the situation of \n
. Handling the layer here is relatively convenient. The specific implementation of the virtual layer has been described clearly in the previous diff
algorithm article, here we only implement the relevant calls.
Here, the focus is on handling Mention
, as we cannot use editor.getBounds
to obtain its related BoundingClientRect
. Therefore, we need to calculate its position manually. We need to retrieve the node where mentionIndex
is located, then build a Range
object using createRange
, and obtain the ClientRect
based on this Range
. This way, we will get the necessary startRect
and endRect
, but it is important to note that we need to convert them to relative positions of the editor instance.
Implementing batch replacements on the data structure of Delta
will be relatively straightforward. As we mentioned earlier, Delta
can not only describe documents using the insert
method but also with retain
, delete
, and insert
methods to depict document changes. Therefore, upon the constructed ranges
, we need to build the change description for the target. Since the previously constructed Mention
does not allow substantive replacement operations, we only need to focus on the original insert
text content.
The key here is to implement the construction of batch changes
. First, we need to define an instance of Delta
. The variable preIndex
is used to keep track of the index position after the last execution. Within the loop through our ranges
, retain
is used to move the pointer to the current original text content being processed, followed by delete
to remove it, and then insert
for the target text replacement. Note that at this point, the pointer is after the target text, so we must update preIndex
to the current index position being processed. Finally, apply these changes to the editor.