Exploring the Ability of Selective Commenting in Rich Text

When developing an online document platform, the function of selective commenting is crucial, especially in online document products involved in heavy document management processes. Document feedback plays a significant role in enhancing the quality of documents maintained by users. Even when merely used as a discussion area, selective commenting proves to be highly useful, particularly in situations where documents may not be perfect. It provides additional input beyond the content of the document when integrating with product systems. Therefore, this article introduces collaborative algorithms to resolve conflicts and enable the ability of selective commenting in online documents.

Description

In practice, implementing selective commenting is not overly challenging from an interaction perspective. Just imagine selecting text in a document, then triggering a comment button on the onMouseUp event. Users can input comments upon clicking the button, and subsequently relay the comment's position and data to persistent storage. This brings to mind a famous riddle - "How many steps does it take to put an elephant into a refrigerator? The answer is three steps: open the refrigerator door, put the elephant in, and close the door. However, putting a giraffe into the refrigerator takes four steps: open the refrigerator door, take out the elephant, put the giraffe in, and close the door."

Implementing selective commenting is akin to putting an elephant in the refrigerator. The key challenge here lies in identifying the comment's position. If not using a rich text editor, a potential solution could involve storing the specific hierarchy of the DOM, essentially saving a path array. During rendering and resizing, the script needs to retrieve and calculate this path accordingly. Ideally, each text node should have an id for easy referencing, and upon persistence, store the id and the offset. However, creating individual ids for each text node may not be straightforward and could significantly impact the data structure, necessitating changes from data entity to rendering.

For static content, various methods exist to address the persistence of comment positions. Yet, in dynamic online documents, alterations in document content can potentially affect the positions of comments. For instance, if content is added at position 0, the initial comment position of [2, 6] could become inaccurate. To tackle this challenge, collaborative algorithms need to be introduced to track document changes and recalculate comment positions. It's vital to note that we focus here on selective commenting in non-collaborative scenarios. Introducing online collaborative editing capabilities eliminates the need for manual position calculations, as the backend can function as a collaborative client utilizing algorithms to address position changes.

Operational Transformation (OT)

Firstly, let's discuss syncing comment positions in editing scenarios. Selective commenting typically comprises two main components - displaying comment positions in the document and a comments panel on the right side. Here, we primarily focus on displaying comment positions in the document, ensuring they correctly follow edits. Relevant code snippets can be found at https://github.com/WindrunnerMax/QuillBlocks/blob/master/examples/comment-ot.html.

Consider this - comment sections in a document are essentially a style for the editor, similar to emphasizing text. By adding attributes like { comment: id }, we can represent comment sections, which the editor engine can help adjust seamlessly. Implementing this approach is simple within the editor, requiring only applying format to the selected content.

const onFormatComment = (id: string) => {
  editor.format("comment", id);
}

While this method is convenient, it's crucial to note that this is selective commenting without real-time collaborative editing. To prevent content overlap, editors typically use editing locks. A key drawback of this commenting method is the need to embed data in the content, allowing all authorized individuals to add data throughout the process. Thus, storing the entire dataset in the backend is impractical. Instead, backend data should also undergo formatting to maintain data sync across multiple platforms. Otherwise, in scenarios such as simultaneous commenting during an approval process, there's a risk of content overlay.

By ensuring a single document instance exists, we can address both draft editing locks and review capabilities. However, complexities arise when the platform handles both online and draft states allowing commenting on each. While draft states involve internal modifications and review annotations, online states are primarily for user feedback and discussions. Implementing the editing solution mentioned earlier suffices for draft states but doesn't simplify online state commenting. Picture a scenario where a user comments on version A, but the backend edits the document to create version B. How do we synchronize user comments made on version A for a smooth transition to version C? This dilemma is what we're about to delve into.

Before tackling this issue, let's ponder its essence - transforming comments based on document changes. Rather than solving it directly, we can abstract this process and define a selection list to store comment positions.

const COMMENT_LIST = [
  { index: 12, length: 2 },
  { index: 17, length: 4 },
];

Since previously we were using format to achieve this, which means we were essentially writing comments into the delta. Here, to demonstrate the actual effect, the comments are implemented using a virtual layer. The implementation of the virtual layer has been abstracted into a reusable capability in the prior document on diff algorithm implementation and comparison views, so we won't delve into it further here. Implementing it through a virtual layer means our data representation is completely separate from the delta, implying that we can store it independently, enabling complete state separation.

Next, during view initialization, the virtual layer needs to be rendered, and event bubbling and default behavior prevention need to be added to our previously defined comment button. Especially, we don't want to lose focus on the editor when clicking the comment button, so its default behavior needs to be prevented.

const applyComment = document.querySelector(".apply-comment");
applyComment.onmousedown = e => {
  e.stopPropagation();
  e.preventDefault();
};

Next, we need to focus on the functionality required when clicking the comment button, which is actually quite straightforward. The main tasks involve storing the position of the selection, rendering it on the virtual layer, and finally moving the selection position to the position of the comment, essentially collapsing the selection to that position.

applyComment.onclick = e => {
  const selection = editor.getSelection();
  if (selection) {
    const sel = { ...selection };
    console.log("Add comment:", sel);
    COMMENT_LIST.push(sel);
    editor.renderLayer(COMMENT_LIST);
    editor.setSelection(sel.index + sel.length);
  }
};

Now, the key point is when we edit, it triggers a content change event, in this case, an atomic op/delta. We can leverage this op to transform the position of the comment, meaning that the comment's position will be recalculated based on the changes in op. Finally, all comments on the virtual layer are rendered. This ensures that as we edit, the comments will naturally follow the position changes. Syncing comment positions across different versions of the document follows a similar principle, whether it's a single op or multiple ops. Since op can be composed into ops, it essentially comes down to the same issue, and can be resolved using a similar approach.

editor.on("text-change", delta => {
  for(const item of COMMENT_LIST){
    const { index, length } = item;
    const start = delta.transformPosition(index);
    const end = delta.transformPosition(index + length, true);
    item.index = start;
    item.length = end - start;
  }
  editor.renderLayer(COMMENT_LIST);
});

Here, let's take a closer look at the transformPosition method. This method transforms the index based on the delta, which is particularly useful for operations related to cursor and selection. The second parameter might be confusing initially, but we can understand its significance by using the transform method. When it's true, it signifies that the behavior of this is considered first, meaning that it is the first delta to be executed. Since our delta all start at position 0, conflicts may arise when operating on the same position, so this flag determines who goes first or, in other words, which delta is executed first.

const a = new Delta().insert('a');
const b = new Delta().insert('b').retain(5).insert('c');

// Ob' = OT(Oa, Ob)
a.transform(b, true);  // new Delta().retain(1).insert('b').retain(5).insert('c');
a.transform(b, false); // new Delta().insert('b').retain(6).insert('c');

Returning to the scenario where we're dealing with consumer-side comments on our online documents, let's illustrate the issue with a specific example. At a certain point, the online document is in state A with content yyy. While in draft mode, we make edits to the document, transitioning it to state B with content xxxyyy, where we've added 3 x characters before yyy. Now, imagine a user has made a comment on content yyy in the online A version, with the tag index [0, 3]. Later on, we publish version B online. If the comment still retains the position [0, 3] at this point, it will be out of place since the content it refers to is now xxx, which does not align with the user’s original selection. Therefore, we need to recalculate the position of the comment based on the A -> B transformation, shifting [0, 3] to [3, 6].

One crucial point to note is that we do not synchronize consumer-side comments with the draft mode. Synchronizing comments while users are actively engaging and authors are editing would be a complex issue, akin to simplified collaborative editing. The complexity rises, making it challenging to manage. In such cases, it might be more feasible to consider integrating a mature collaborative system directly. Building upon our previous method of atomically transforming comment positions, all we need to do is identify the changes ops between versions A and B, then apply a transform based on the delta for comment positions. Now, how do we determine the changes A to B? Here, we have two options:

  • One solution involves recording the ops between versions. Since the draft mode document is essentially derived from the previous online document version, we can ideally keep a comprehensive record of the ops during document changes. Whenever necessary, we can retrieve the relevant ops for transformation. This method essentially implements common OT collaboration practices. By recording ops, it becomes more convenient to handle finer operations like rollback, word count tracking, or tracing authorship at the character level.

  • Another approach is to perform a diff on content when publishing versions. If our online document system did not initially have ops recording or collaborative capabilities pre-built, introducing such features suddenly would incur high costs. Introducing full-fledged collaborative capabilities solely for commenting might not be entirely necessary. Therefore, a simpler and cost-effective approach in this scenario would be to directly perform a diff between the two versions. For insights into the performance overhead of diff, refer to the relevant content in the previous document diff algorithm implementation and comparison view.

Our earlier solution, which entails recording ops, can be viewed as one strategy. Next, let's demonstrate an algorithm based on diff using the example mentioned above. Firstly, represent the online state online and draft as seen in the previous example, then mark the comment positions. It's important to note that comment positions are stored persistently in our database, converted to delta representation during the actual transformation process. Next, combine the online content and comments to compose what will actually be displayed to users with annotated comments. Subsequently, perform a diff between the online and draft states, with the sequence being online -> draft. The next step involves transforming the comment content, where it's crucial to apply the comment transformation based on the diff, as the draft already incorporates the changes. Thus, following the logic of Ob' = OT(Oa, Ob), we essentially want to obtain Ob'. Apply the new comment expression to the draft to arrive at the final commented content. By observing, we ensure the comments are correctly following their original positions. Additionally, as we ultimately need to store the new comment positions in the database, we must convert the delta to index and length formats for storage. Alternatively, during the transformation, we can directly use transformPosition to construct new positions, then create a delta representation based on the updated positions for application and storage.

const Delta = Quill.import("delta");
// Online status
const online = new Delta().insert("yyy");
// Draft status
const draft = new Delta().insert("xxxyyy");
// Position of comment
const comment = { index: 0, length: 3 };
// Delta representation of comment position
const commentDelta = new Delta().retain(comment.index).retain(comment.length, { id: "xxx" });
// Actual content displayed in the online version
const onlineContent = online.compose(commentDelta);
// [{ "insert": "yyy", "attributes": { "id": "xxx" } }]
// `diff` result
const diff = online.diff(draft); // [{ "insert": "xxx" }]
// Updated comment delta representation
const nextCommentDelta = diff.transform(commentDelta); 
// [{ "retain": 3 }, { "retain": 3, "attributes": { "id": "xxx" } }]
// Actual content of the online version after the update
const nextOnlineContent = draft.compose(nextCommentDelta);
// [{ "insert": "xxx" }, { "insert": "yyy", "attributes": { "id": "xxx" } }]

In addition, when using diff to synchronize comments, there is another point to pay attention to, which is that using the diff approach may lead to inconsistency in intentions. By uniformly calculating diff instead of fully recording ops, there may be data accuracy loss. For example, if we have multiple consecutive blocks of xxx, and one of these blocks with consumer comments is deleted during editing, based on our actual intention, the comment associated with the deleted block should disappear or be collapsed in the next version update. However, in practice, this may not happen as expected. This discrepancy arises because diff calculates based on content, so determining which xxx block was deleted is just an algorithmic solution rather than an intentional one. Therefore, in such cases where complete intentions need to be ensured, additional tagging information needs to be introduced or the first approach of recording ops needs to be adopted, or even a complete collaborative algorithm needs to be introduced to ensure the completeness of intentions.

Having discussed extensively about recording and transforming operations concerning comment positions, let's not forget about the comment panel on the right side. This part actually does not involve very complex operations; usually, it only requires communication with the document editor to obtain the actual top distance of comments from the document's top to perform position calculations. This can be directly achieved using CSS transform: translateY(Npx);. However, there are many details to consider, such as when to update comment positions, avoiding overlap of multiple comment cards, possibly needing to move comment cards when selecting comments, etc., which require a more intricate implementation in terms of interaction. Of course, there are many ways to implement interactive comment display, such as displaying specific comment content using Hover or clicking on comments within the document, which can be tailored according to specific needs.

Another point worth noting is how to obtain the position of comments from the document's top. Editors typically provide relevant APIs for this purpose; for instance, in Quill, the specific range's rect can be obtained using editor.getBounds(index, 0). Why is it important to pay attention to this? Because the implementation here is quite interesting; since our selection is not necessarily a complete DOM element and may involve selecting only a few characters within a text, we cannot directly get the position of this DOM node, as it might be within a long paragraph where line breaks have occurred, causing vertical offset. In such cases, we need to construct a Range and use Range.getClientRects method to get selection information. Typically, taking the position of the first character within the selection directly using Range.getBoundingClientRect suffices. After obtaining this position, additional calculations based on the editor's position information are needed, which won't be elaborated here.

const node = $0; // <strong>123123</strong>
const text = node.firstChild; // "123123"
const range = new Range();
range.setStart(text, 0);
range.setEnd(text, 1);
const rangeRect = range.getBoundingClientRect();
const editorRect = editor.container.getBoundingClientRect();
const selectionRect = editor.getBounds(editor.getSelection().index, 0);
rangeRect.top - editorRect.top === selectionRect.top; // true
rangeRect.left - editorRect.left === selectionRect.left; // true

CRDT

In the above implementation, we used the OT approach to resolve the synchronization issue of comment positions. Essentially, we are solving the synchronization problem through collaboration. Similarly, we can also use the CRDT collaborative solution to address this problem. Here, we utilize yjs to implement comment position synchronization similar to the aforementioned OT. The relevant code for this part can be found in https://codesandbox.io/p/devbox/comment-crdt-psm548.

Firstly, we need to define the data structure of yjs, namely Y.Doc. To facilitate testing, we directly use indexeddb as storage instead of using websocket to communicate with the backend yjs. This allows us to test locally. In yjs, there is an inbuilt representation of rich text data structure using getText, which, in practice, is equivalent to the data structure of quill-delta. We bind the data structure with the editor using y-quill provided by yjs.

const ydoc = new Y.Doc();
new IndexeddbPersistence("y-indexeddb", ydoc);
// ...
const ytext = ydoc.getText("quill");
new QuillBinding(ytext, editor);

Next, we also initialize a comment list, which is the content we persist. The difference from before is that this time we store the relative position of CRDT, which means we store RelativePosition of yjs. This position is relative to the document's position and not an absolute index, as determined by the characteristics of the collaboration algorithm. I won't delve into details here, but if you're interested, you can refer to the previous articles on OT and CRDT collaboration algorithms. We then need to initialize the implementation of a virtual layer. Here, we also use a virtual layer to display the comment positions. Before specific rendering, we need to convert the relative position to an absolute position. It's important to note that when creating a relative position, we use ytext, while when creating an absolute position from a relative position, we use ydoc.

const COMMENT_LIST: [string, string][] = [];
const layerDOM = initLayerDOM();
const renderAllCommentWithRelativePosition = () => {
  const ranges: Range[] = [];
  for (const item of COMMENT_LIST) {
    const start = JSON.parse(item[0]);
    const end = JSON.parse(item[1]);
    const startPosition = Y.createAbsolutePositionFromRelativePosition(
      start,
      ydoc,
    );
    const endPosition = Y.createAbsolutePositionFromRelativePosition(end, ydoc);
    if (startPosition && endPosition) {
      ranges.push({
        index: startPosition.index,
        length: endPosition.index - startPosition.index,
      });
    }
  }
  renderLayer(layerDOM, ranges);
};

Similarly, we still need to add event propagation and prevent default behavior for our previously defined comment button. Especially, we don't want to lose the focus of the editor when clicking the comment button, so we need to prevent its default behavior.

const applyComment = document.querySelector(".apply-comment") as HTMLDivElement;
applyComment.onmousedown = (e) => {
  e.stopPropagation();
  e.preventDefault();
};

Next, we need to focus on the functionality that needs to be implemented when clicking the comment button. At this point, we need to convert the selected content to a relative position. By using the createRelativePositionFromTypeIndex method, we can obtain client and clock to identify the total order relative position based on our data type and index value. After obtaining the relative position, we store it in COMMENT_LIST, render it on the virtual layer, and finally move the selection to the comment position.

applyComment.onclick = () => {
  const selection = editor.getSelection();
  if (selection) {
    const sel = { ...selection };
    console.log("Adding comment:", sel);
    const start = Y.createRelativePositionFromTypeIndex(ytext, sel.index);
    const end = Y.createRelativePositionFromTypeIndex(
      ytext,
      sel.index + sel.length,
    );
    COMMENT_LIST.push([JSON.stringify(start), JSON.stringify(end)]);
    renderAllCommentWithRelativePosition();
    editor.setSelection(sel.index + sel.length);
  }
};

So, when the text content changes, we just need to re-render it. Since we marked the relative position, we don't need to transform the selection here. We just need to re-render the virtual layer. After adding a comment and editing the content, we found that our comment position can follow the initial selection correctly. This also indicates that CRDT can synchronize comment positions based on relative positions. We don't need to transform or diff them, just keep the data structure complete for storage and updates. The subsequent comment panel implementation is basically the same. We use the Range object operation to obtain the comment position, then calculate the height based on the editor's position information.

editor.on("text-change", () => {
  renderAllCommentWithRelativePosition();
});

Daily Challenge

https://github.com/WindrunnerMax/EveryDay

References

https://quilljs.com/docs/delta https://docs.yjs.dev/api/relative-positions https://www.npmjs.com/package/quill-delta/v/4.2.2