When implementing an online document system, it's essential to consider version control and auditing capabilities, as these are crucial aspects of the entire document management process. In this process, document differencing or diff
capability is typically required, allowing us to track document changes like the variances between a document draft and the live document, or the disparities between private versions A
and B
. This article is based on the Quill
rich text editor engine and delves into the implementation of document diff
algorithms.
Quill
is a modern rich text editor known for its excellent compatibility, robust extensibility, and some out-of-the-box features. Open-sourced in 2012
, Quill
has brought many innovations to the rich text editor scene, making it one of the most widely used editors in the open-source community. The ecosystem around Quill
is rich and diverse, with numerous examples available on platforms like GitHub
, including comprehensive collaboration implementations.
Firstly, let's consider a scenario - when describing a piece of plain text, simple input suffices. For instance, the underlying data structure of this article is plain text, and the content format is compiled by the compiler through lexical and syntax analysis, akin to serialization and deserialization. However, for a rich text editor like Quill
, frequent serialization and deserialization during editing are performance-intensive. Hence, the data structure needs to be easily readable and writable, such as a JSON
object. Describing rich text in JSON
can take various forms, but fundamentally, it involves attaching additional attributes to specific text segments. For instance, if A
is bold and B
is italic, bold
property is attached to A
, and italic
property to B
. This data structure can effectively represent rich text content.
For Quill
, the data structure is described by quill-delta
. This data structure's design is excellent, and quill-delta
also serves as an implementation of rich text Operational Transformation (OT
) collaboration algorithm. However, for the diff
capability we are focusing on here, it pertains more to the data structure level than collaboration. In quill-delta
, the data structure for a piece of text is illustrated as follows. Of course, quill-delta
expressions can be quite elaborate, enabling complete document content description through operations like retain
, insert
, and delete
. We will delve into these Op
when implementing the diff view feature later on.
Upon seeing this data structure, one might think it's just a standard JSON
. Can't we directly use JSON
diff since there are numerous algorithms available? This method theoretically works for purely insert
text content, but the granularity may lack precision, failing to pinpoint specific word modifications. Thus, constructing a delta
based on the diff
results as per the quill-delta
structure from A
to B
is not straightforward. For instance, in the following JSON
, the result of our diff
is deleting insert: 1
and adding "insert": "12", "attributes": { "bold": true }
. In reality, what we've done is added bold
style to 1
and appended 2
with bold
. To apply this diff
result to generate B
from A
, two actions are required: precise content modifications and transforming the diff
result into delta
. Hence, a better diff
algorithm design is needed to simplify the entire process.
Our goal here is to achieve a finer granularity in diff
, construct delta
directly, and apply it, meaning A.apply(diff(a, b)) = B
. Quill-delta
already has a pre-implemented diff
algorithm, but we have streamlined it for better comprehension, especially concerning non-insert
operations. It's crucial to note that the diff
we discuss here is for non-collaborative modes. For document editors that have implemented OT
, relevant version Op
can be extracted from the history for compose + invert
, and performing a full document diff
algorithm isn't always necessary.
Complete DEMO
can be directly opened in the console at https://codesandbox.io/p/devbox/z9l5sl
to view it. As mentioned earlier, after using JSON
for diff
, we still need to process the data in two additional steps, especially for dealing with granularity seems more challenging. So, why not change our perspective on this granularity issue? We are working on processing rich text, and rich text is text with attributes. Therefore, can we adopt a text diff
algorithm, then perform additional processing on the attribute values? This way, we can handle granularity very finely. In theory, this approach seems feasible, so we can continue exploring along this line of thinking.
First, let's consider the pure text diff
algorithm. So, let's briefly understand the diff
algorithm used by diff-match-patch
. This algorithm is generally considered the best general-purpose diff
algorithm, designed by Eugene W. Myers
here. Due to the capability of match
and patch
in diff-match-patch
, but the algorithm we actually need only requires the diff
capability, so we can just use fast-diff
. It removes matching, patching, and all extra difference options, retaining only the basic diff
capability, with the diff
result being a two-dimensional array [FLAG, CONTENT][]
.
Next, we need to construct the string. As mentioned above, the data format of quill-delta
has been mentioned, so it is simple to construct. We need to first create a Delta
object to carry the diff
result of our delta
.
The Iterator
here is the iterator we will use to iterate over block structures. We can imagine that since the diff
result consists of N
characters, and the insert
block in our Delta
also has N
characters, after diff
, we need to process substrings of these two strings. Therefore, we need to iterate over N
characters of the entire Delta
for processing, and this part of data processing methods is encapsulated in the Iterator
object. Let's take an overall look at the iterator's processing method.
In this case, the handling of the next
method is a bit more complex. In the next
method, our main goal is to extract the content of the insert
part. It's important to note that each time we call insert
, it won't span across op
s. In other words, each next
call can only extract up to the length of the insert
stored in the current op
. If the content exceeds the length of a single op
, the corresponding attributes will not be consistent, making direct merging impossible. In such cases, we need to consider scenarios where the diff
result is longer than the insert
, requiring compatible handling of the attributes, essentially breaking down the diff
result into blocks for processing.
Now, with the Iterator
, we can extract the 'insert' part of 'op' in a more granular manner. Next, let's return to handling the diff
. First, let’s look at the diff
of the attributes
. Fundamentally, assuming the current data structure is Record<string, string>
, we can directly compare two attributes
. The essence of diff
is transforming a
to become b
through certain calculations, where a + diff = b
. Therefore, iterating through all keys, if the values of the two attrs
differ, we can assign the value of b
to the target attrs
based on value comparison.
Because we have already broken it down quite finely upfront, the final step won't be too complicated. Our goal is to construct the a + diff = b
equation, focusing on the diff
part. Therefore, when constructing diff
, the target to consider is a
. We need to keep this purpose in mind throughout the process, or else it may be hard to understand the operations on delta
. In the overall process of diff
, there are three main parts to handle: iter1
, iter2
, and text diff
. Depending on the type of diff
produced, we need to process accordingly. The guiding principle is to take the shorter length as the block to process. In the diff.INSERT
section, we insert delta
from iter2
inserts. In the diff.DELETE
section, we apply the length of delete
from iter1
to delta
. In the diff.EQUAL
section, we extract op
from both iter1
and iter2
to handle the diff
of attributes
or op
replacement.
Here we can illustrate the effect of 'diff' with an example. To see the specific effect, you can open the console in https://codesandbox.io/p/devbox/z9l5sl
in the src/index.ts
file. It mainly demonstrates the three types of 'diff' - DELETE
, EQUAL
, INSERT
, and the resulting delta
, where in this case ops1 + result = ops2
.
Now that our document diff
algorithm is ready, it's time to get down to business and think about how to apply it to specific documents. Let's start with a simple approach. Imagine we have performed a diff
between documents A
and B
, resulting in a patch
. We can modify the diff
directly, structure it as desired, and apply it to A
to obtain a comparison view. Of course, we can also apply deleted content from A
and added content from B
to generate the view, which we'll discuss later. For now, let's focus on getting the comparison view directly from A
.
Essentially, a comparison view highlights deletions in red and additions in green. Since rich text comes with its own formatting, we can leverage this to implement highlighting directly using rich text capabilities.
The core algorithm based on this approach is quite simple. Here, we handle formatting modifications by changing DELETE
content to RETAIN
with red attributes
, adding green attributes
to INSERT
types, and then assembling this modified patch
onto A
's delta
. Applying the complete delta
to the new comparison view is all that's needed. For a complete DEMO
, refer to https://codepen.io/percipient24/pen/eEBOjG
.
In essence, the core functionality in the code is just a few lines. It's great to solve complex requirements with a simple approach. In less complicated scenarios, you can achieve a comparison within the same document area or use separate views to apply deletion and addition deltas
. However, as scenarios become more complex, such as needing to allow real-time editing and displaying diff
results in the view representing additions on the right side, directly applying diff-delta
to the document may pose some issues. Apart from potential performance concerns from continuously applying delta
to rich text, in collaborative environments, managing local Ops
and History
is necessary, or in non-collaborative settings, filtering relevant keys to prevent storing diff
results.
If the scenarios mentioned above just touch the advanced capabilities in basic functionalities, then in search scenarios, directly applying highlights to rich text content doesn't seem feasible. Imagine applying search results directly from the data layer to rich text for highlighting. This would involve all the issues mentioned earlier, and the performance cost due to frequent content changes is unacceptable. In slate
, the concept of decorate
exists, allowing the construction of Range
to consume attributes
without changing the document content, which aligns with our needs. Hence, we need a way to highlight content without modifying the rich text content, but it's not easy to achieve this capability at the editor-level rendering like slate
. Alternatively, we can take a different approach by simply adding a semi-transparent highlight overlay to relevant positions, making it much simpler. Here, we refer to it as a virtual layer.
On a theoretical level, implementing a virtual layer is rather simple, just adding an extra layer of DOM
, but there are many details to consider in this process. Firstly, let's address an issue: if we place a masking layer directly above a rich text editor, with a higher z-index
than the text editor's, clicking on the masking layer would cause the rich text editor to lose focus. While we can use event.preventDefault
to prevent the default behavior of focus shifting, other actions like click events would still pose similar problems. For instance, if a button's click action in the rich text editor is user-defined, covering the button with the masking layer would redirect the click event to the layer instead of triggering the button's action as expected. This scenario does not align with our intentions.
To tackle this, let's shift our perspective. Lowering the z-index
below that of the rich text editor can resolve the event issues. However, it triggers another problem; if the rich text content itself has a background color, adding a masking layer would conceal the layer's color beneath the original background color. Since our rich text capabilities are typically plugin-based, we can't control the user's background color plugins. These plugins must contain some level of transparency, and our masking layer needs to offer a universal capability. As a result, this solution also has its limitations.
The solution to this issue is actually quite simple. In CSS
, there's a property named pointer-events
, which, when set to none
, ensures an element never becomes the target of mouse events. This can address the problems caused by the earlier approach. By using this, we can achieve our basic virtual layer styles and event handling. Additionally, using this property may lead to an interesting observation: right-clicking on the masking layer won't allow you to directly inspect the node in the console; you must use the Elements
panel to select the DOM
node rather than the reverse.
After determining the method for drawing the masking layer, the next step is to confirm the position information for drawing shapes. Since the DOM
nodes created by our rich text editor do not have separate nodes for each character, but rather nodes with the same attributes
grouped together, a new issue arises. When dealing with a diff
result, it's highly likely that we won't have a complete node for a particular DOM
, but just a few characters. In such cases, we can't directly use Element.getBoundingClientRect
to get the position information. Instead, we must rely on document.createRange
to construct a range. It's worth noting that we are handling Text
nodes, and only Text
nodes and similar ones can set offsets. Both the start
and end
nodes in a Text
node range can be directly constructed without needing to match exactly. For instance, in Quill, editor.getBounds
offers location information retrieval. This method essentially uses editor.scroll
to obtain the actual DOM
and wraps it with document.createRange
, handling various edge cases.
Furthermore, we need to discuss another issue. With diff
, we cannot guarantee the length of the current result. As we've established previously, we are implementing diff
for plain text. Therefore, the diff
result may be lengthy, which can potentially lead to problems. While using editor.getBounds(index, length)
directly provides a rect
(rectangle) covering range, issues may arise when dealing with longer diff
results. Two key considerations arise in such cases: first, single-line content extending beyond the editor's visible area, causing line breaks; and second, content that spans multiple lines with \n
characters included in the diff
results.
Assuming the content above is the diff
result, we are not focusing on whether it is of type INSERT/DELETE/RETAIN
for now. Our current goal is to implement highlighting. In these two scenarios, if we directly highlight the rectangular area fetched through getBounds
, it is obvious that a large amount of non-text or blank spaces will be highlighted. In this case, our approach would be to take the maximum range for highlighting coverage. Actually, we can accept it if only blank spaces are covered, but consider a scenario where only some content is changed. For example, content is inserted in the entire line N, and at the beginning of line N+1, a letter is also inserted. Due to the width of line N affecting line N+1, the highlight ends up covering the entire line. This inaccurate highlighting in our diff
result is not acceptable, whether it involves line breaks or spanning across lines.
Next, we need to address these two issues. For the calculation of positions spanning multiple lines, we can adopt a relatively simple approach. We just need to know exactly where the line breaks occur and split the diff
result accordingly to change our granularity from document-level to line-level processing. However, Quill
doesn't directly provide operations at the line Range
level. Therefore, we need to maintain our own line-level index-length
. We can easily split the index-length
comprehensively by using delta insert
and calculate using editor.scroll.lines
. When there are changes in the document content, we can also maintain index values based on delta-changes
. Moreover, if we manage Blocks
through multiple Quill
instances, managing at the Line
level becomes more straightforward, and maintaining index becomes simpler. However, during diff
, we will need a Block
tree-level implementation. It's relatively easier to implement diff
when dealing with Blocks
with the same id
, but when there is a need for diff
across Blocks
, the implementation might become more complex.
Once we have split the line index-length
index, the next step is to break down the original complete diff-index-length
into line-level content. Here, it is crucial to handle the line-identifying node, namely \n
, attributes
, with special consideration. Because all modifications to this node are directly applied to the entire line. For instance, when a line changes from a second-level heading to a first-level heading, the entire line needs to be highlighted as a style change. Of course, there may also be content additions or deletions within the heading, and these highlights can be displayed in different colors. This is one of the reasons we need to maintain line-level Range
.
Next, we need to address the issue of line wrapping due to long single-line content. Since we have already segmented the diff
results into lines, we can focus on how to render the highlights. As mentioned earlier, we cannot directly draw the rect
obtained from calling getBounds
onto the text. Therefore, we can consider splitting a paragraph into three parts: the first line head
, content body
, and last line tail
. Only the beginning and end of each line will have partial highlighted areas, which require separate calculation of rect
. The body
part will always have a complete rect
which can be rendered directly. Therefore, using three rect
s to represent the highlight of single-line content will suffice. The data returned from getBounds
is sufficient to support the processing of single-line content into three sections. We need to obtain the rect
of the first head
and last tail
, and calculate the rect
of the body
part based on these two rect
s. Depending on the number of actual line breaks, different scenarios need to be considered: for single-line content, only the head
is sufficient; for two lines, both head
and tail
are needed for rendering with body
acting as a placeholder for positioning; for multiple lines, head
, body
, and tail
need to render their respective contents to ensure the integrity of the layers.
After solving the above two issues, we can apply delta
to the diff
algorithm to obtain results, and divide them line by line to construct a new Range
. Here we want the left view to reflect DELETE
content and the right view to reflect INSERT + RETAIN
content. By storing the constructed Range
in different arrays based on the different types of diff
, and finally obtaining position information with editor.getBounds
based on the Range
, constructing new layer DOM
at relevant positions to achieve highlighting is all we need to do.
To summarize the overall process, in order to implement a diff
based on virtual layers, we need a diff
algorithm, construct Range
, calculate Rect
, and render DOM
. In fact, to achieve this skill well is quite complex, especially with many edge cases to handle. For example, some text may have different fonts or styles applied, resulting in different rendering heights compared to regular text. If the diff
falls on these edges, it may cause issues in calculating our rect
, leading to problems in rendering the styles of layer nodes. Here we have not addressed such issues, only established the entire process without special focus on edge cases. For a complete demo, you can directly visit https://codesandbox.io/p/sandbox/quill-diff-view-369jt6
.