English

Exploring Rich Text: Document Virtual Scrolling

Virtual scrolling is a technique that optimizes the performance of long lists by rendering only the visible portion of the document data, rather than the entire document structure. It improves browser efficiency by calculating the necessary rendered list items based on the visible area height and the container's scroll position, while avoiding the rendering of unnecessary view content. The advantage of virtual scrolling is that it greatly reduces DOM operations, thereby reducing rendering time and memory consumption. It solves issues such as slow page loading and lag, thus improving user experience.

Description

Recently, we received feedback from users regarding performance issues with our document editor when working with large documents that contained a lot of tables. The long content caused noticeable lag during the editing process and took a long time to render for the end user, resulting in a poor user experience. To address this, I tested a large document and found that the Largest Contentful Paint (LCP) reached 6896ms for the initial screen. Even with various resources cached, the First Contentful Paint (FCP) still took 4777ms. The rendering time for the initial screen of the editor alone was 2505ms, and the entire application's Time to Interactive (TTI) reached 13343ms. Even under simulated rapid input, the frames per second (FPS) could only maintain at 5+, and the DOM count reached 24k+. As a result, this problem was quite serious, and I embarked on a lengthy journey of research and optimization.

Solution Research

During my research, I found that there were hardly any articles on performance optimization for online document editing. So, for me, it was essentially starting from scratch to research the entire solution. However, the community had various performance optimization solutions for virtual scrolling, which greatly helped in implementing the overall solution. Furthermore, I also considered whether it was appropriate to put all the content in one document. It seemed no different from putting all the code in one file. I felt that there might be better solutions in terms of organizational structure. However, that was a different issue and here, we focused on addressing the performance issues of large documents.

Progressive Pagination Loading: By using a data-driven approach, we can progressively fetch data in chunks. This can be done either by requesting pages one by one or by using Server-Sent Events (SSE). The data can then be rendered gradually, reducing the initial rendering time. During rendering, we can also dynamically render based on the currently displayed page, reducing unnecessary rendering and improving performance. Notion, for example, completely relies on data-driven pagination loading. However, it does not implement data lazy loading. It's important to note that lazy loading and lazy rendering are two different concepts. In practice, this solution heavily relies on the data design of the document. If the structure is similar to nested JSON blocks, implementing a similar solution would be relatively simple. However, if the document is described using a flat structure, especially with nested block concepts, implementing this approach becomes relatively difficult.
Canvas Pagination Rendering: Many online document editors now use Canvas for rendering, such as Google Docs and Tencent Docs. This reduces DOM operations. The advantage of using Canvas is the ability to customize rendering logic and achieve complex rendering and layout effects. However, the drawback is that everything needs to be manually laid out. This may limit flexibility for document editors with complex content. In practice, drawing documents with Canvas is similar to how Word works. During initialization, a blank space structure with a fixed height based on the number of pages is constructed. As the user scrolls, the Canvas for a specific page within the viewport range is mounted to render its content on demand.
Line-level Virtual Scrolling: Most DOM-based online document editors have the concept of lines or paragraphs, such as Feishu Docs, Shimo Docs, and Yuque. Organizing content into paragraphs is the most natural way due to the structure expressed by the DOM. Hence, line-level virtual scrolling can be implemented, rendering only the lines within the currently visible range. Typically, we only apply virtual scrolling to the direct child elements of the main document, the line elements. Content within nested structures, such as lines within code blocks, are not subject to virtual scrolling. This simplifies the complexity of virtual scrolling while ensuring rendering performance.
Block-level Virtual Scrolling: Notion has popularized the trend of document editors being block-based. This approach allows better organization of document content and facilitates block structure reuse and management. With this approach, line-based representation would also be based on blocks. For example, Feishu Docs also uses this structure to organize content. In this case, we can implement block-level virtual scrolling by rendering only the visible blocks within the current range. In practice, when dealing with larger independent blocks, there may still be room for performance optimization. Feishu Docs, for instance, handles code blocks differently and implements virtual scrolling even within nested structures. For non-block-based document editors, block-level virtual scrolling is still a good choice. This elevates the granularity of virtual scrolling to blocks, allowing the implementation of virtual scrolling for complex structures such as code blocks, tables, and flowcharts.

Virtual Scrolling

Before diving into the implementation, I pondered an interesting question: why does virtual scrolling optimize performance? When we perform DOM operations in the browser or manage windows on a PC, are these elements truly present? The answer is clear: these views, windows, and DOM elements are all simulated through graphics. Although we can easily achieve various operations using system or browser APIs, the content is ultimately drawn by the system. It's fundamentally based on external input signals generating simulated state and behavior, including collision detection, which is all represented by a large amount of computation performed by the system.

Then right after, recently I wanted to learn the basic operations of Canvas, so I implemented a very basic graphic editor engine. Because the browser's Canvas only provides the most basic graphic operations, and lacks the convenience of DOM operations, all interaction events need to be simulated through mouse and keyboard events. In this process, an important point is to determine whether two graphics intersect, in order to decide whether to redraw the graphics on demand to improve performance. So let's imagine the simplest way to determine this: iterate through all the graphics to check if they intersect with the upcoming graphic. This can involve complex calculations, but if we can determine in advance that certain graphics are impossible to intersect, we can save a lot of unnecessary calculations. Similarly, the situation is the same for layers outside the viewport. If we can determine that a graphic is outside the viewport, there is no need to check its intersection, and it doesn't need to be rendered. The same goes for virtual scrolling – if we can reduce the number of DOM elements, we can reduce a lot of calculations and improve the runtime performance of the entire page. And as for the first-screen performance, it goes without saying that reducing the number of DOM elements will definitely make the first screen rendering faster.

Of course, the above is just my thoughts on improving document editing or runtime performance. In fact, there have been many discussions in the community about optimizing performance with virtual scrolling. For example, reducing the number of DOM elements can reduce the number of DOM elements that the browser needs to render and maintain, thereby reducing memory usage and allowing the browser to respond faster to user interactions. Moreover, browser reflow and repaint operations usually require a large amount of computation, which becomes more frequent and complex as the number of DOM elements increases. By reducing the number of DOM elements that need to be managed through virtual scrolling, rendering performance can be significantly improved. In addition, virtual scrolling also provides faster first-screen rendering, especially for large documents, where rendering the entire document all at once can easily result in long first-screen rendering times. It can also reduce the JavaScript performance overhead of maintaining component states in React, especially when there is Context involved. If not given enough attention, performance degradation issues may arise.

So after studying the advantages of virtual scrolling, we can start studying the implementation of virtual scrolling. Before diving into block-level virtual scrolling in the rich text editor, let's first study how virtual scrolling is generally implemented. Here, we take the List component in ArcoDesign as an example to study a common implementation of virtual scrolling. In the example provided by Arco, we can see that it passes the height attribute. Without this attribute, the virtual list cannot work properly. In fact, Arco calculates the height of the entire container by multiplying the number of list items with the height of each item. It is worth noting that the scroll container should be an element outside the virtual list container. For the area within the viewport, it can be actually offset using transform: translateY(Npx). When scrolling, we need to calculate the nodes that need to be rendered in the current viewport based on the actual scroll distance of the scroll bar, the height of the scroll container, and the height of the elements we defined, while other nodes are not actually rendered, thus achieving virtual scrolling. Of course, there are many other configurations for the Arco virtual list, which will not be covered here.

<List
  {/* ... */}
  virtualListProps={{
    height: 560,
  }}
  {/* ... */}
/>

By analyzing the common virtual scrolling of Arco's list, we can see that implementing virtual scrolling doesn't seem to be that difficult. However, in our online document scenario, implementing virtual scrolling may not be so straightforward. Here, let's first discuss the implementation of image rendering in documents. Usually, when uploading images, we record the size (i.e. width and height) of the image. During actual rendering, we ensure the aspect ratio of the image by using the maximum width and height of the container and object-fit: contain;. Even if the image is not fully loaded, its placeholder height is already fixed. However, in the context of our document structure, the height of our blocks is not fixed, especially for text blocks where the height can vary depending on factors such as font and browser width. We cannot know the height of a text block before it is rendered. This means that we cannot calculate the placeholder height of the document block structure in advance, thus the virtual scrolling of the document block structure needs to address the issue of varying block heights. Therefore, we need to implement a dynamic height virtual scrolling scheduling strategy to handle this scenario. Actually, if it's just dynamic height virtual scrolling, it's not particularly difficult. There are already many implementation solutions in the community. However, our document editor includes many complex modules, such as text selection, comments, anchor jumps, etc., which need to be compatible with the virtual scrolling solution in addition to the virtual scrolling of the document itself.

Module Design

In fact, there are many ways to implement a rich text editor. We won't discuss the difference between drawing rich text based on DOM or Canvas here. Here, we focus on a DOM-based rich text editor, such as Quill, which implements its own view layer drawing, while Slate relies on React to build the view layer. These two approaches have significant differences in implementing the view layer. In this article, we lean towards the implementation approach of Slate, which uses React to construct block-level virtual scrolling. However, in practice, if we have full control over the view layer, there will be more room for performance optimization. For example, it is easier to schedule idle-time rendering and caching strategies, thereby optimizing the experience during fast scrolling. In fact, regardless of the approach, there is not a big difference in terms of the core content of this article. As long as we ensure the correct scheduling of modules controlled by the rich text engine itself, such as the selection module, height calculation module, lifecycle module, etc., and control the actual rendering behavior, any editing engine can apply a virtual scrolling solution.

Rendering Model

First, let's envision the rendering model of the entire document. Whether it's based on a block-based editor or a paragraph-based editor, it cannot escape the concept of lines, because when we describe content, it's usually composed of lines to form a document. Therefore, our document rendering is also described based on lines. Of course, here, the line is just a rather abstract concept. The nested structure within this line structure may be an expression of a block structure, such as code blocks, tables, and so on. Regardless of how the blocks are nested, the outermost layer will always need to encompass the line structure. Even for a pure Blocks document model, we can always find the outer block container DOM structure. Therefore, we need to clearly define the concept of lines here.

In fact, the lines we are concerned with here tend to lean more towards directly describing the main document. If a code block structure is nested within a line of the main document, the entire block structure of this code block is what we need to focus on, and we will not pay too much attention to the internal structure of this code block for now. This can be further optimized, particularly for scenarios involving very large code blocks, but we will not focus on this structural optimization here. In addition, documents drawn on the Canvas or documents expressed in a paginated manner are also not within our scope of concern. As long as an article can be expressed through pagination, we can directly render it as needed. Of course, if necessary, we can also perform paragraph-level on-demand rendering, which can also be considered as further optimization space.

Based on this, we can easily deduce the structure that our document ultimately needs to render. Firstly, there is the placeholder area, which contains content that is not within the viewport and therefore exists in a placeholder form. Next is the buffer, which contains the pre-rendered content. Although this area is not within the viewport, it preloads part of the view content to minimize the occurrence of brief white screens when the user scrolls, and usually, this part can be the size of half the viewport height. Following that is the viewport portion, containing the actual content to be rendered within the viewport. Similarly, within the viewport area, we also need buffer and placeholder to serve as pre-loaded and placeholder areas.

placeholder 
   |
 buffer
   | 
viewport 
   |
 buffer
   | 
placeholder

It is important to note that for the placeholder here, we typically choose to directly use DOM for placement. Some may think that using translate directly is a better choice, as it may be more efficient and trigger GPU acceleration. However, in an ordinary virtual list, using translate should not pose any issues. But in a document structure, the DOM structure can be more complex, and using translate may lead to some unexpected situations, especially in complex style structures. Therefore, using DOM for placement is a simpler approach. In addition, because of the existence of the selection module, when implementing the placeholder, we also need to consider cases where users drag a long selection. That is, if the user continuously scrolls a portion of the viewport, then directly drags it into the placeholder area, if this part of the DOM disappears and is replaced by a placeholder DOM node without special processing, there will be problems mapping the selection to the Model. Therefore, we need to retain these DOM nodes when the user selects, and using DOM for placement here can be more convenient. It can be more difficult to adapt using translate, so the rendering model at this point is as follows.

  placeholder 
      |
selection.anchor 
      |
  placeholder 
      |
    buffer
      | 
   viewpoint 
      |
   buffer
      | 
  placeholder 
      |
selection.focus 
      |
  placeholder

Scroll Scheduling

The essence of implementing virtual scrolling is to calculate the lines that need to be rendered in the current viewport based on the height of the viewport, the scrolling distance of the scroll container, and the height of the line when the user scrolls the view. The two most commonly used APIs for virtual scrolling in the browser are the Scroll Event and Intersection Observer API. The former calculates the position of the viewport by listening for scroll events, while the latter determines the position of an element by observing its visibility. Based on these two APIs, different virtual scrolling solutions can be implemented.

First, let's look at the Scroll Event. This is the most common way of monitoring scrolling - by listening for scroll events, we can obtain the scrolling distance of the scroll container. Then, by calculating the viewport height and scrolling distance, we can calculate the lines that need to be rendered in the current viewport, and then decide whether to render based on the calculated status in the view layer. In practice, a virtual scrolling solution based solely on monitoring the Scroll event is very straightforward. However, it is also more likely to encounter performance issues, and there may still be lag even if marked as a Passive Event. The core idea is to listen for scroll events on the scroll container. When a scroll event is triggered, we need to calculate the nodes within the current viewport based on the scroll position, and then calculate the actual nodes that need to be rendered based on the height of the nodes, thereby achieving virtual scrolling.

As mentioned earlier, it is relatively easy to implement virtual scrolling with a fixed height. However, our document blocks have dynamic heights, and we cannot know their actual heights until the blocks are rendered. So what is the difference between dynamically sized virtual scrolling and fixed-height virtual scrolling?

Firstly, the height of the scrolling container is unknown at the beginning. We can only determine the actual height through the continuous rendering process. Secondly, we cannot directly calculate the nodes that need to be rendered based on the scroll height. In the case of fixed heights, we calculate the starting index cursor for rendering based on the scroll container height and the total height of all nodes in the list. However, in dynamic height virtual scrolling, we cannot obtain the total height, and the length of the rendered nodes is also unknown. Thus, we cannot know how many nodes need to be rendered in this instance.

Furthermore, it is difficult to determine the distance between each node and the top of the scrolling container, as mentioned earlier with translateY. We need this height to support the scrolling area, allowing us to achieve the scrolling effect.

Some may argue that these values cannot be calculated, but that is not the case. Without any optimization, we can brute force iterate through and calculate these values.

Now let's figure out how to calculate the above content. Based on our previous discussion, documents are essentially based on block virtual scrolling. We can directly calculate the total height by summing the heights of all blocks. Here, it's important to note that even though we cannot obtain the height before rendering, we can estimate it based on the data structure and correct the height during actual rendering. Remember that we use placeholder blocks to support the scrolling area. Therefore, we need to calculate the specific placeholders based on the start and end cursors. We will calculate the specific cursor values later, but for now, let's calculate the height of the two placeholder nodes and render them in their respective positions.

const startPlaceHolderHeight = useMemo(() => {
  return heightTable.slice(0, start).reduce((a, b) => a + b, 0);
}, [heightTable, start]);

const endPlaceHolderHeight = useMemo(() => {
  return heightTable.slice(end, heightTable.length).reduce((a, b) => a + b, 0);
}, [end, heightTable]);

return (
  <div
    style={{ height: 500, border: "1px solid #aaa", overflow: "auto", overflowAnchor: "none" }}
    onScroll={onScroll.run}
    ref={onUpdateInformation}
  >
    <div data-index={`0-${start}`} style={{ height: startPlaceHolderHeight }}></div>
    {/* ... */}
  <div data-index={`${end}-${list.length}`} style={{ height: endPlaceHolderHeight }}></div>
  </div>
);

Now we have an approximate estimation of the total height. Next, we need to determine the positions of the start and end cursors, i.e., the actual indices of the blocks to be rendered. For the start cursor, we can directly calculate it based on the scroll height. We iterate through the heights of the nodes until we find a node that exceeds the scroll height. At that point, we consider the cursor to be the index of the first node to be rendered.

For the end cursor, we need to calculate it based on the start cursor and the height of the scrolling container. Similarly, we iterate through the heights of the nodes until we find a node that exceeds the height of the scrolling container. At that point, we consider the cursor to be the index of the last node to be rendered.

Of course, we should not forget the buffer data in this calculation. This is crucial to avoid blank areas during scrolling. Additionally, we are using a brute force method to calculate these cursor values. For modern machines and browsers, the performance cost of performing addition calculations is relatively low. For example, if we perform 10,000 addition operations, the actual time consumption might be less than 1ms.

const getStartIndex = (top: number) => {
  const topStart = top - buffer.current;
  let count = 0;
  let index = 0;
  while (count < topStart) {
    count = count + heightTable[index];
    index++;
  }
  return index;
};

const getEndIndex = (clientHeight: number, startIndex: number) => {
  const topEnd = clientHeight + buffer.current;
  let count = 0;
  let index = startIndex;
  while (count < topEnd) {
    count = count + heightTable[index];
    index++;
  }
  return index;
};

const onScroll = useThrottleFn(
  () => {
    if (!scroll) return void 0;
    const scrollTop = scroll.scrollTop;
    const clientHeight = scroll.clientHeight;
    const startIndex = getStartIndex(scrollTop);
    const endIndex = getEndIndex(clientHeight, startIndex);
    // ...
  },
);

Here we are discussing the most basic principle of virtual scrolling, so there are basically no optimizations in this example. It is obvious that our traversal processing for heights is relatively inefficient. Even though the overhead of performing thousands of addition calculations is not significant, it is still advisable to avoid such extensive computations in large-scale applications, especially in cases where the Scroll Event is triggered at a very high frequency. Therefore, an obvious optimization direction is to implement height caching. In simple terms, we can cache the heights that have already been calculated, so that the cached heights can be used directly in subsequent calculations without the need for re-traversal. When there are height changes that require updating, we can recalculate the cached height from the current node to the newest cached node. Additionally, this approach is equivalent to an incrementally sorted array, and we can solve the lookup problem through methods such as binary search, thereby minimizing the need for extensive traversal calculations.

height: 10 20 30 40  50  60  ...
cache:  10 30 60 100 150 210 ...

IntersectionObserver has now been marked as Baseline Widely Available. All browsers released after March 2019 have implemented this API, and it is now very mature. Next, let's take a look at the implementation of virtual scrolling using the Intersection Observer API, but before delving into the specific implementation, let's first examine the specific application scenarios of IntersectionObserver. Based on its name, we can infer that the main goal of this API is to asynchronously observe the intersection of target elements with their ancestor elements or the top-level document viewport, which is very useful for determining whether an element appears within the viewport.

At this point, we need to consider a potential issue. The application scenario of the IntersectionObserver object is to observe the intersection of target elements with the viewport. However, the core concept of our virtual scrolling is not to render elements outside the viewport. Therefore, there is actually a deviation here. In virtual scrolling, the target elements either do not exist or are not rendered, so their state cannot be observed. To align with the concept of IntersectionObserver, we need to render actual placeholder nodes. For example, for a list of 10,000 nodes, we need to first render 10,000 placeholder nodes. In reality, this is a reasonable approach, unless we initially notice performance issues with the document. In fact, most performance optimization is done later, especially in complex scenarios. For example, if we originally had 10,000 data items and even if only 3 nodes are rendered for each item, rendering placeholder nodes would still reduce the original 30,000 nodes on the page to approximately 10,000 nodes. This is highly meaningful for performance improvement, and if necessary, can continue to be further optimized.

Of course, if we don't use placeholder nodes, we can still implement virtual scrolling using Intersection Observer, but in this case, we would need to rely on Scroll Event to assist in implementing some forced refresh operations, making the overall implementation more complicated. Therefore, let's proceed with the implementation of a virtual scrolling solution based on IntersectionObserver and placeholder nodes. First, we need to create an IntersectionObserver. Similarly, because our scroll container may not necessarily be the window, we need to create the IntersectionObserver on the scroll container. In addition, as discussed earlier, we will add a buffer to the viewport area to preload elements outside the viewport, thereby avoiding blank areas during user scrolling. The size of this buffer is usually selected as half the current viewport height.

useLayoutEffect(() => {
  if (!scroll) return void 0;
  // Viewport threshold set to half the scroll container height
  const margin = scroll.clientHeight / 2;
  const current = new IntersectionObserver(onIntersect, {
    root: scroll,
    rootMargin: `${margin}px 0px`,
  });
  setObserver(current);
  return () => {
    current.disconnect();
  };
}, [onIntersect, scroll]);

Next, we need to manage the state of the placeholder nodes. Since we currently have actual placeholders, we no longer need to estimate the height of the entire container. We only need to render the nodes when they are actually scrolled to the relevant position. We set three states for the nodes: loading state represents the placeholder state, where the node only renders an empty placeholder or a 'loading' indicator. At this point, we do not know the actual height of the node. viewport state represents the actual rendering state of the node, indicating that the node is within the logical viewport. We can record the actual height of the node at this point. placeholder state represents the rendered placeholder state, indicating that the node has scrolled from within the viewport to outside the viewport. The height of the node has been recorded at this point, so we can set the node's height to its actual height.

loading -> viewport <-> placeholder

type NodeState = {
  mode: "loading" | "placeholder" | "viewport";
  height: number;
};

public changeStatus = (mode: NodeState["mode"], height: number): void => {
  this.setState({ mode, height: height || this.state.height });
};

render() {
  return (
    <div ref={this.ref} data-state={this.state.mode}>
      {this.state.mode === "loading" && (
        <div style={{ height: this.state.height }}>loading...</div>
      )}
      {this.state.mode === "placeholder" && <div style={{ height: this.state.height }}></div>}
      {this.state.mode === "viewport" && this.props.content}
    </div>
  );
}

Of course, the observation of our Observer also needs configuration. Here, it is important to note that the callback function of IntersectionObserver only carries information about the target node. We need to use the node information to manage the node state. Therefore, we use WeakMap to establish a relationship between elements and nodes, making it easier for us to handle.

export const ELEMENT_TO_NODE = new WeakMap<Element, Node>();

componentDidMount(): void {
  const el = this.ref.current;
  if (!el) return void 0;
  ELEMENT_TO_NODE.set(el, this);
  this.observer.observe(el);
}

componentWillUnmount(): void {
  const el = this.ref.current;
  if (!el) return void 0;
  this.observer.unobserve(el);
}

Finally, the actual scrolling schedule is needed. When a node appears in the viewport, we need to obtain the node information based on ELEMENT_TO_NODE, and then set the state based on the current viewport information. If the current node is in the viewport, we set the node state to viewport. If the node is leaving the viewport, we need to further determine the current state. If it is not the initial loading state, we can directly set the height to placeholder in the node state. At this point, the node's height is the actual height.

const onIntersect = useMemoizedFn((entries: IntersectionObserverEntry[]) => {
  entries.forEach(entry => {
    const node = ELEMENT_TO_NODE.get(entry.target);
    if (!node) {
      console.warn("Node Not Found", entry.target);
      return void 0;
    }
    const rect = entry.boundingClientRect;
    if (entry.isIntersecting || entry.intersectionRatio > 0) {
      // Entering viewport
      node.changeStatus("viewport", rect.height);
    } else {
      // Leaving viewport
      if (node.state.mode !== "loading") {
        node.changeStatus("placeholder", rect.height);
      }
    }
  });
});

The performance optimization methods continued to be discussed in this article are all based on the implementation of the Intersection Observer API. In the document, each block may contain hundreds of nodes, especially in complex expressions such as tables. Moreover, the number of immediate blocks or rows under the main document is usually not very high, so the optimization for the number of nodes is significant.

Previously, I saw a question on Zhihu about why Python's built-in sort is 100 times faster than self-written quicksort. This made me think of this question every time I see the Intersection Observer API. The main reason for this is that Python's standard library is implemented in C/C++, which makes its execution efficiency much higher than that of Python, an interpreted scripting language. The same issue applies to the Intersection Observer API, which is implemented at the browser's low level using C/C++, resulting in much higher execution efficiency than when we use JS for scroll scheduling. However, with the aid of JIT compilation, perhaps the performance gap is not that big.

State Management

In our document editor, virtual scrolling is not just about simple scroll rendering, but also involves managing various states. Typically, our editor already has a block manager, which manages the entire Block Tree state based on various changes. Essentially, this involves the manipulation of the tree structure, such as when a trigger operation is to "insert { parentId: xxx, id: yyy}", where we need to add a new node yyy under the node xxx. The management of this tree structure depends on the specific business implementation. For example, if the editor retains a block in the tree for the convenience of undo/redo instead of actually deleting it, the block manager's state management becomes about adding only, not deleting. Therefore, the implementation of the block manager depends on the specific editor engine.

Here, our focus is on extending the capabilities of this Block Engine to incorporate virtual scrolling state. If it were just a matter of adding new states, it would be a simple problem. However, we also need to consider the issue of nested block structures, in preparation for our subsequent scenarios. As mentioned earlier, we are currently focusing on the direct management of blocks within the main document. For nested structures, when a direct block is in a placeholder state, we need to set all nested blocks inside it to a placeholder state. This process is recursive and may involve a significant amount of calls, so we need to implement a caching layer to reduce redundant calculations.

Our approach here is to set a cache on each node, which stores references to all the child nodes in the subtree. This is a typical trade-off of space for time, but because it stores references, the space consumption is not significant. The advantage of this approach is that, for example, if a user keeps modifying the structure of a certain child node, caching will only re-calculate the content of that node, while other child nodes can directly access the cached content without the need for re-computation. It should be noted that when appending or removing child nodes from the current node, we need to clear the cache for that node and all parent nodes on the link, and recalculate them as needed for the next call. Actually, due to the granularity of our editor, which is based on changes for scheduling, achieving fine-grained structure management is not very difficult.

public getFlatNode() {
  if (this.flatNodes) return this.flatNodes;
  const nodes: Node[] = [];
  this.children.forEach(node => {
    nodes.push(node);
    nodes.push(...node.getFlatNode());
  });
  this.flatNodes = nodes;
  return nodes;
}

public clearFlatNode() {
  this.flatNodes = null;
}

public clearFlatNodeOnLink() {
  this.clearFlatNode();
  let node: Node | null = this.parent;
  while (node) {
    node.clearFlatNode();
    node = node.parent;
  }
}

Now that we have a complete block manager, the next step is to consider how to schedule and control the rendering behavior. If our editor engine had a self-developed view layer, the controllability would certainly be very high, and it would not be difficult to control rendering behavior or implement rendering caches. However, as mentioned earlier, we tend to use React as the view layer for scheduling. Therefore, we need a more common management solution. The advantage of using React as the view layer is the ability to achieve rich custom view rendering using the ecosystem. However, the problem lies in the difficulty of control, including not only rendering scheduling behavior, but also issues related to the mapping between Model and View, as well as problems related to the reuse of ContentEditable. Nonetheless, these are not the focus of this article. Let's first discuss more common rendering control methods.

First of all, let's think about how to control the rendering of DOM nodes in React. Obviously, we can manage the rendering state through State, or control the rendering through ReactDOM.render/unmountComponentAtNode. As for directly manipulating the DOM through Ref, this method may be difficult to control and may not be the best management approach.

Let's start by looking at ReactDOM.render/unmountComponentAtNode. This API was deprecated in React18, and although it may change in the future, the main issue is that using render will result in the inability to directly share Context, meaning it will detach from the original React Tree and must be re-integrated with Context in order to achieve this, which is clearly not suitable.

Therefore, ultimately, we still control the rendering state through State. At this point, we also need a global document manager to control the state of all block nodes. In React, obviously, we can accomplish this through Context, using global state changes to affect the state of each ReactNode. However, this essentially shifts control to the various child nodes to manage their own state. We may want to have a global controller to manage all the blocks. In order to achieve this, we implement the LayoutModule module to manage all the nodes, and for the nodes themselves, we need to wrap them with an HOC. To facilitate this, we choose class components to accomplish this, allowing us to manage the state of all block structure instances through the LayoutModule module.

class LayoutModule {
  private instances: Map<string, HOC> = new Map();
  // ...
}

class HOC extends React.PureComponent<Props> {
  public readonly id: string;
  public readonly layout: LayoutModule;
  // ...
  constructor(props: Props) {
    // ...
    this.layout.add(this);
  }
  componentWillUnmount(): void {
    this.layout.remove(this);
    // ...
  }
  // ...
}

Using class components, the entire component instantiation is an object, making it easier to write function calls and control states. Of course, these implementations can also be achieved through function components, but using class components might be more convenient. We can then control their state through class methods. Additionally, we need to use ref to obtain the current component's observed node. Using ReactDOM.findDOMNode(this) can acquire the reference to the DOM in a class component, but it has also been deprecated, so it is not recommended to use. Instead, we wrap a layer of DOM and observe this layer of DOM to achieve virtual scrolling. Furthermore, we need to note that the DOM rendering is actually controlled by React and is uncontrollable for our application, so we also need to record prevRef to observe the change in the DOM reference and update the IntersectionObserver observation object accordingly.

type NodeState = {
  mode: "loading" | "placeholder" | "viewport";
  height: number;
};

class HOC extends React.PureComponent<Props> {
  public prevRef: HTMLDivElement | null;
  public ref: React.RefObject<HTMLDivElement>;
  // ...
  componentDidUpdate(prevProps: Props, prevState: State): void {
    if (this.prevProps !== this.ref.current) {
      this.layout.updateObserveDOM(this.prevProps, this.ref.current);
      this.prevProps = this.ref.current;
    }
  }
  public changeStatus = (mode: NodeState["mode"], height: number): void => {
    this.setState({ mode, height: height || this.state.height });
  };
  // ...
  render() {
    return (
      <div ref={this.ref} data-state={this.state.mode}>
        {/* ... */}
      </div>
    );
  }
}

Selection Status

In the selection module, we need to ensure that the view's status can be correctly mapped to the Model. During virtual scrolling, the DOM may not actually be rendered on the page, and the browser's selection expression needs to be determined by the anchorNode and focusNode nodes, so we need to ensure that these two nodes are rendered normally in the DOM tree during the user's selection process. Implementing this capability is not actually complicated, as long as we understand the browser's selection model and ensure that the anchorNode and focusNode nodes are rendered normally. By ensuring the correct rendering of nodes, we do not need to redesign the selection model in the virtual scrolling scenarios. Based on this, let's deduce some scenarios.

Selection within the viewport: When the user selects relevant blocks within the viewport, we can consider this part of the selection to be processed normally with or without virtual scrolling, without the need for additional deduction. The original View Model mapping logic can be maintained.
Scrolling selection outside the viewport: When the user initially selects content within the viewport, the selection is normal. However, if the user later scrolls the viewport, causing part of the selection to scroll outside the viewport, we need to preserve the selection status. Failing to do so could result in the loss of the selection once the user scrolls back. In this case, we need to ensure that the anchorNode and focusNode nodes are correctly rendered, or if the granularity is coarse, ensure that the blocks in which they are located are rendered normally.
Drag-selecting a long range: When the user performs a MouseDown while the anchorNode is within the viewport, and then uses drag to scroll the page, consequently dragging the anchorNode outside the viewport. Similarly, in this scenario, we need to ensure that the block/node where the anchorNode is located is rendered normally, even if it is outside the viewport, to prevent the loss of the selection.
Triggering selection update: When the content within the selection is updated due to certain operations, such as through the editor's API operations on the document content, two scenarios may arise. If the updated content is not the anchorNode or focusNode node, it will not affect the overall selection. Otherwise, we need to recalibrate the selection nodes through the Model after rendering is completed.
Selecting all: The "select all" operation is considered a special selection behavior, and we need to ensure that the first and last nodes of the document are fully rendered. Therefore, in this process, the states of the first and last nodes need to be obtained through the Model, and then forcibly rendered to ensure that the anchorNode and focusNode nodes are correctly rendered, followed by the normal selection mapping logic.

In fact, do you remember that our Intersection Observer API usually requires placeholder nodes to achieve virtual scrolling? So, since the placeholder nodes are already present, if we do not pay special attention to the number of DOM nodes, we can render the block's selection-marking node when creating the placeholder. This can solve some problems, for example, the "select all" operation can be handled without special treatment. If we broaden the scope a bit, when creating the placeholder, we can render the text blocks as well as the Void/Embed structure \u200B nodes at the same time, and only schedule the rendering of complex blocks. In this case, we may not even need to worry about the selection, as the nodes needed for selection mapping are already rendered. We only need to focus on scheduling the virtual scrolling of complex blocks.

Viewport Locking

Viewport locking is a significant module. In the case of virtual scrolling, if we always start browsing from the beginning of the list, viewport locking is usually not necessary. However, for our document system, the situation is different. Let's imagine a scenario: when user A shares a link with an anchor to user B, and user B opens the link and directly navigates to a specific heading or even a specific block content area in the document. If user B then scrolls upward, a problem will arise. As mentioned before, we cannot obtain the actual height of the block before we actually render the content. Therefore, when the user scrolls up, because there is a disparity between the height of our placeholder node and the actual height of the block, visual jumping occurs. The purpose of viewport locking is to address this issue by locking the user's view at the current scroll position.

Before delving into the specifics of virtual scrolling, let's first understand the overflow-anchor property. In the actual implementation of the editor engine, a major challenge lies in achieving compatibility across various browsers. This is evident with the overflow-anchor property, as even browsers based on the Webkit engine, such as Chrome and Safari, differ in their support. Returning to the overflow-anchor property, it is designed to adjust the scroll position to minimize content movement, thereby addressing the visual jumping issue mentioned earlier. This property is enabled by default in supporting browsers. Since Safari does not support this property, and considering that we actually need the value of this jumping disparity, here we need to disable the default behavior of overflow-anchor and proactively control the ability to lock the viewport. However, as obtaining DOM Rect data is unavoidable when locking the viewport, manual intervention in viewport locking may trigger more reflow/repaint actions.

class LayoutModule{
  private scroll: HTMLElement | Window;
  // ...
  public initLayoutModule() {
    // ...
    const dom = this.scroll instanceof Window ? document.body : this.scroll;
    dom.style.overflowAnchor = "none";
  }
}

Apart from overflow-anchor, another property we need to pay attention to is History.scrollRestoration. You might notice that when you navigate to a specific position on a page, then click a hyperlink to go to another page, and then navigate back, the browser remembers the previous scroll position. However, in our case, with virtual scrolling in place, we don't want the browser to control this navigation behavior because it might not accurately remember the position. Now that scrolling behavior needs active management, we need to disable this behavior in the browser.

class LayoutModule{
  // ...
  public initLayoutModule() {
    // ...
    if (history.scrollRestoration) {
      history.scrollRestoration = "manual";
    }
  }
}

Now, let's consider what other scenarios might affect our viewport locking behavior. It's quite evident that during a Resize, container width changes, consequently affecting the height of text blocks, thus requiring adjustments to our viewport locking behavior here as well. Our adjustment strategy here is relatively simple. Consider that the only states in which we need to adjust viewport locking are from loading -> viewport, as in other state changes, the height remains stable due to our placeholder state fetching the real height. However, in the case of Resize, even with placeholder, we might need to reapply viewport locking because the height might not be the actual rendering height. Thus, our logic is to re-mark all nodes in the placeholder state during a Resize for viewport locking.

class HOC extends React.PureComponent<Props> {
  public isNeedLockViewport = true;
  // ...
}

class LayoutModule {
  // ...
  private onResize = (event: EditorResizeEvent) => {
    const { prev, next } = event;
    if (prev.width === next.width) return void 0;
    for (const instance of Object.values(this.instances)) {
      if (instance.state.mode === "placeholder") {
        instance.isNeedLockViewport = true;
      }
    }
  };
}

Next, let's delve into our actual viewport locking method. The approach here is still rather straightforward. When our component undergoes rendering changes, we need to fetch height information via the component's state. Then, based on this height data, we calculate the difference and adjust the scrollbar position accordingly. We also need to obtain information about the scrolling container. If the observed node's top value is above the scrolling container, we need to lock the viewport due to height changes. When adjusting the scrollbar position, we must not use a smooth animation but explicitly set its value to prevent viewport locking failure and avoid value retrieval issues with multiple calls. Additionally, it's important to note that since we calculate height based on actual measurements, using margin might lead to calculation issues, such as margin collapsing. Therefore, our principle here is to use padding wherever possible for spacing adjustments in block structures, minimizing the use of margin.

class LayoutModule {
  public offsetTop: number = 0;
  public bufferHeight: number = 0;
  private scroll: HTMLElement | Window;
  // ...
  public updateLayoutInfo() {
    // ...
    const rect = this.scroll instanceof Element && this.scroll.getBoundingClientRect();
    this.offsetTop = rect ? rect.top : 0;
    const viewportHeight = rect ? rect.height : window.innerHeight;
    this.bufferHeight = Math.max(viewportHeight / 2, 300);
  }
  // ...
  public scrollDeltaY(deltaY: number) {
    const scroll = this.scroll;
    if (scroll instanceof Window){
      scroll.scrollTo({ top: scroll.scrollY + deltaY });
    } else {
      const top = scroll.scrollTop + deltaY;
      scroll.scrollTop = top;
    }
  }
  // ...
}

Quick Scroll

When users scroll quickly, there may be a short white screen due to the existence of virtual scrolling. To avoid this problem as much as possible, we still need a certain scheduling strategy. The buffer we previously set on the view layer can partially solve this problem, but it is still not enough in fast scrolling scenarios. Of course, in reality, the white screen time is usually not too long, and in cases where there are placeholder nodes, the user experience is usually acceptable. Therefore, the optimization strategy here still needs to be based on specific user needs and feedback. After all, one of our goals for virtual scrolling is to reduce memory usage. When scrolling quickly, it is usually necessary to schedule the rendering of more blocks in the scrolling direction. This will inevitably increase memory usage. Therefore, we still need to find a balance between white screens during scrolling and memory usage.

Let's first think about our fast scrolling strategy. After users perform a large range of scrolling, they are likely to continue scrolling in the same direction. Therefore, we can customize the scrolling strategy. In the case of a sudden large number of block renderings or when the scrolling distance within a certain time slice is greater than N times the viewport height, we can determine the scrolling order based on the rendering order of the blocks, and then perform pre-rendering based on this order. The range of pre-rendering and the time interval for rendering scheduling need to be scheduled in the same way. For example, fast rendering should not exceed 100ms between two scheduling events, and the duration of fast rendering can be set to 500ms. The maximum rendering range can be defined as 2000px or N times the viewport length, depending on specific business requirements.

In addition, we can use idle rendering scheduling during idle time to avoid the white screen phenomenon in fast scrolling as much as possible. When users stop scrolling, we can use requestIdleCallback for idle rendering, and control it by manually setting the time interval. It can be similar to the scheduling strategy for fast scrolling, setting the rendering time interval and rendering distance, etc. If the view layer supports node caching, we can even prioritize caching the view layer without actually rendering it in the DOM structure. When the user scrolls to the relevant position, we can directly retrieve it from memory and place it in the node position. In addition, even if the view layer cache is not supported, we can try to calculate and cache the state of the nodes in advance to avoid the delay caused by calculating during rendering. However, this method will also increase memory usage, so we still need to find a balance between efficiency and space occupation.

placeholder
    |
  buffer
    |
  block 1
    |
  block 2
    |
  buffer
    |
pre-render ...
    |
placeholder

Incremental Rendering

In the previous discussion, we mainly talked about the rendering of blocks. Except for the selection module, which may involve the editing status, the other content is more inclined towards controlling the rendering status. However, during editing, new blocks will definitely be inserted, and this part of the content actually needs its management mechanism, otherwise it may cause some unexpected problems. Let's imagine a scenario where users insert a code block through the toolbar or shortcut input. If virtual scrolling is not enabled, the cursor should be placed directly inside the code block. However, due to the existence of virtual scrolling, the first frame will be a placeholder DOM, and then the block structure will be loaded normally. Therefore, since the ContentEditable block structure does not exist at this time, the cursor naturally cannot be placed correctly, and this will usually trigger the fallback strategy of the selection area, resulting in unexpected problems.

Therefore, when inserting nodes, we need to control them. The solution to this problem is very simple. Let's think about when to perform insertion operations. It must be after the entire editor has finished loading. At that time, where should the insertion take place? It is highly likely that the editing will be done in the viewport area. Therefore, our solution is to mark the Layout module as loaded after the editor is initially rendered. At this time, the initial state of the inserted HOC can be considered as viewport. In addition, often we may need to mark the order of HOC with an index tag. If we need to mark it where it is inserted, we usually need to rely on the DOM to determine its index.

Translate into English:

class LayoutModule {
  public isEditorLoaded: boolean = false;
  // ...
  public initLayoutModule() {
    // ...
    this.editor.once("paint", () => {
      this.isEditorLoaded = true;
    });
  }
}

class HOC extends React.PureComponent<Props> {
  public index: number = 0;
  // ...
  constructor(props: Props) {
    // ...
    this.state = {
      mode: "loading"
      // ...
    }
    if (this.layout.isEditorLoaded) {
      this.state.mode = "viewport";
    }
  }
  // ...
}

Actually, the modules we have here all need to provide the capabilities required by the editor engine. In many cases, we need to interact with the external main application, such as comments, anchors, find and replace, etc., which all require obtaining the status of the editor block. For example, our word commenting capability is a common scenario in document applications. The comments panel on the right side usually needs to obtain the height information of the text we select for displaying the position. However, because of the existence of virtual scrolling, this DOM node may not actually exist, so the actual module of the comment will also become virtualized, which means it is progressively loaded as the scrolling progresses. Therefore, we need the ability to interact with the external application. In fact, this part of the capability is relatively simple. We just need to implement an event mechanism to notify the main application when the status of the editor block changes. In addition to managing the block status, it is also very important to change the height value of the viewport lock, otherwise there will be jumping problems in the positioning of the comments panel.

class Event {
  public notifyAttachBlock = (changes: Nodes) => {
    if (!this.layout.isEditorLoaded) return void 0;
    const nodes = changes.filter(node => node.isActive());
    Promise.resolve().then(() => {
      this.emit("attach-block", nodes);
    });
  }

  public notifyDetachBlock = (changes: Nodes) => {
    if (!this.layout.isEditorLoaded) return void 0;
    const nodes = changes.filter(node => !node.isActive());
    Promise.resolve().then(() => {
      this.emit("detach-block", nodes);
    });
  }

  public notifyViewLock = (instance: HOC) => {
    this.emit("view-lock", instance);
  }
}

class HOC extends React.PureComponent<Props> {
  // ...
  componentDidUpdate(prevProps: Props, prevState: State): void {
    // ...
    if (prevState.mode !== "viewport" && this.state.mode === "viewport") {
      const changes = this.layout.blockManager.setBlockState(true);
      this.layout.event.notifyAttachBlock(changes);
    }
    if (prevState.mode !== "placeholder" && this.state.mode === "placeholder") {
      const changes = this.layout.blockManager.setBlockState(false);
      this.layout.event.notifyDetachBlock(changes);
    }
    if (this.isNeedLockViewport && this.state.mode === "viewport" && this.ref.current) {
      // ...
      this.layout.event.notifyViewLock(this);
    }
  }
  // ...
}

Scenario Inference

In our document editor, it is obvious that it is not enough to simply implement virtual scrolling. Various API compatibility must also be provided for it. In fact, the module design described above can also be part of the scenario inference, but the preceding content tends to be more focused on the design of functional modules within the editor, while our current scenario inference tends to be about the scenarios and interaction between the editor and the main application.

Anchor Jump

Anchor jump is a fundamental feature of our document system, especially when users share links, it is used more frequently. Some users even want to share arbitrary text positions. Similar to anchor jump, there may be problems when we have virtual scrolling. Imagine a situation where the user's hash value is in a block, which may not be rendered actually in the case of virtual scrolling. Therefore, both the default strategy of the browser and the capability provided by the editor will become ineffective. Therefore, we need to separately adapt the anchor jump scenario and create a separate module to control the positioning to certain locations.

We can clearly determine that after the virtual scrolling is integrated, the difference from the previous jump lies in the fact that the block structure may not have been rendered yet. In this case, we only need to schedule the block with the anchor to be rendered immediately after the page is loaded, and then schedule the original jump. Since there may be a situation where the jumping occurs during the loading, when the user jumps to a certain node, the block structure above it may be transitioning from loading to viewport state. In this case, we need the viewport locking capability described earlier to ensure that the user's viewport does not cause visual jump due to the difference in height caused by block state changes.

So here we define the locateTo method. In the parameters, we need to specify the Hash Entry that needs to be searched, which represents the structure of the anchor in the rich text data structure. Because we ultimately need to retrieve the DOM nodes through the data, if the blockId is not passed, we also need to find the Block to which the node belongs based on the Entry. In the options, we need to define buffer as the scroll position offset. Since the DOM node may already exist, we pass domKey to try to jump directly to the relevant position through the DOM. Finally, if we can determine the blockId, we will directly pre-render the relevant nodes; otherwise, we need to look up based on the key value from the data.

class Viewport {
  public async locateTo(
    key: string, 
    value: string, 
    options?: { buffer?: number; domKey?: string; blockId?: string }
  ) {
    const { buffer = 0, domKey = key, blockId } = options || {};
    const container = this.editor.getContainer();
    if (blockId) {
      await this.forceRenderBlock(blockId);
    }
    let dom: Element | null = null;
    if (domKey === "id"){
      dom = document.getElementById(value);
    } else {
      dom = container.querySelector(`[${domKey}="${value}"]`);
    }
    if (dom) {
      const rect = dom.getBoundingClientRect();
      const top = rect.top - buffer - this.layout.offsetTop;
      this.layout.scrollDeltaY(top);
      return void 0;
    }
    const entry = this.findEntry(key, value);
    if (entry) {
      await this.forceRenderBlock(entry.blockId);
      this.scrollToEntry(entry);
    }
  }
}

Actually, in most cases, we usually jump to the position of the title, and we don't even jump to the title of a nested block. So in this case, we can even independently schedule the blocks of type Heading, which means that it will be in the viewport state instead of the loading state when the HOC is loaded. In this way, the complexity of scheduling the anchor can be reduced to some extent. Of course, the independent position jump control capability is still necessary, as there are many other modules that may need it besides the anchor.

class HOC extends React.PureComponent<Props> {
  constructor(props: Props) {
    // ...
    if (this.props.block.type === "HEADING") {
      this.state.mode = "viewport";
    }
  }
}

Find and Replace

Find and replace is also a common feature in online documents. Usually, it is based on document data retrieval and marks the relevant positions in the document. It also has the capability to jump and replace. Due to the requirements of document retrieval and virtual layers in find and replace, our control scheduling becomes more dependent when using virtual scrolling. First of all, there is a jump issue in the find and replace scenario, which is similar to the anchor jump mentioned above. We need to render the relevant blocks when jumping and then proceed with the jump. In addition, find and replace also requires the rendering capability of the virtual layer VirtualLayer. When rendering the actual blocks, we also need to render the layer at the same time. In other words, our virtual layer module also needs to be rendered on-demand.

So next, we need to adapt its related API control capability. First of all, let's talk about the part of location jumping. Here, since our goal is to obtain the original data structure through retrieval, we don't need to retrieve the Entry again through key value. We can directly assemble the Entry data, and then find the corresponding Text node based on the mapping of Model and View. After that, use range to get its position information, and finally jump to the relevant position. Of course, the node information here may not necessarily be a Text node, it could also be a Line node and so on, so it's necessary to focus on the implementation of the editor engine. However, one thing to note here is that we need to ensure the rendering state of the Block in advance, which means we need to schedule forceRenderBlock to render the Block before the actual jump.

class Viewport {
  public scrollTo(top: number) {
    this.layout.scrollDeltaY(top - this.layout.offsetTop);
  }
  public getRawRect(entry: Entry) {
    const start = entry.index;
    const blockId = entry.blockId;
    const { node, offset } = this.editor.reflect.getTextNode(start, blockId);
    // ...
    const range = new Range();
    range.setStart(node, offset);
    range.setEnd(node, offset);
    const rect = range.getBoundingClientRect();
    return rect;
  }
  public async locateToEntry(entry: Entry, buffer = 0) {
    await this.forceRenderBlock(entry.blockId);
    const rect = this.getRawRect({ ...entry, len: 0 });
    rect && this.scrollTo(rect.top - buffer);
  }
}

Next, we need to focus on the location jumping in the search and replace process itself. Typically, in the search and replace feature, there are buttons for finding the previous and next occurrences. So in this case, we need to consider one problem. Because our Block may not always be rendered, it's not easy to obtain its height information, so the scheduling of previous and next occurrences might be inaccurate. For example, if we have a block structure nested with lines and code blocks at a position below the document, and we directly iterate through all the state blocks without recursively searching, then there may be a problem of jumping to the completion of the block content before jumping to the code block. Therefore, in the search process, we need to first predict the height. Remember, as we discussed earlier, we have placeholder nodes, so by using the placeholder nodes as the estimated height value, this problem can be solved. However, it still depends on the specific algorithm of the search and replace to determine whether such compatibility control is needed. In essence, we need to ensure that the order of marking the content before and after block rendering remains consistent.

class Viewport {
  public getObservableTop(entry: Entry) {
    const blockId = entry.blockId;
    let state: State | null = this.editor.getState(blockId);
    let node: HTMLElement | null = null
    while (state) {
      if (state.node && state.node.parentNode){
        node = state.node;
        break;
      }
      state = state.parent;
    }
    if (!node) return -999999;
    const rect = node.getBoundingClientRect();
    return rect.top;
  }
}

Then, we need to pay attention to the rendering of the virtual layer in the actual document body, which means the markup displayed in the document. As mentioned earlier, we have integrated the Event module into the Layout module, so next, we need to use the Event module to complete the rendering of the virtual layer. Actually, this part of the logic is quite simple. We just need to render the stored virtual layer nodes onto the block structure at the moment of attach-block, and remove them at the moment of detach-block.

class VirtualLayer {
  // ...
  private onAttachBlock = (nodes: Nodes) => {
    for (const node of nodes) {
      const blockId = node.id;
      this.computeBlockEntriesRect(blockId);
      this.renderBlock(blockId);
    }
  }
  private onDetachBlock = (nodes: Nodes) => {
    for (const node of nodes) {
      const blockId = node.id;
      this.removeBlock(blockId);
    }
  }
  // ...
}

Text Selection Comment

Text selection commentary is also a common feature in online document products. Because comments may have various jump functions, such as the previous and next positions, jumping to the first comment, and positioning when the document opens, we need to adapt to these functions. First, let's consider the position update of the comments. When we open the document, whether it is anchor jumping or positioning the first comments in the document, the document will directly scroll to the corresponding position. If the user scrolls up again at this time, a problem will occur. Due to the existence of the viewport locking function, the scrollbar is constantly adjusting, and the height of the block structure will also change. Therefore, we must adjust the position of the comments equally, otherwise, there will be an offset phenomenon between the comments and the text selection.

Similarly, our comments may encounter situations where the block DOM does not exist, which causes problems obtaining its height. Therefore, our comments content also needs to be rendered on demand, that is, the comments content will only be displayed when scrolling to the block structure. Therefore, we only need to register the callback function for the comment module in the virtual scroll module. We may notice that during the implementation of the virtual scrolling events, the mounting and unmounting of the blocks are asynchronous notifications, while the notification events for locking the viewport are synchronous. This is because the viewport locking must be executed immediately, otherwise, a visual jump will occur. Additionally, we cannot set animations for the comment cards, as it may also cause a visual jump, so we need additional scheduling strategies to resolve this issue.

class CommentModule {
  // ...
  private onLockView = (instance: HOC) => {
    this.computeBlockEntriesRect(instance.id);
    this.renderComments(instance.props.block.id);
  }
  private onAttachBlock = (nodes: Nodes) => {
    for (const node of nodes) {
      const blockId = node.id;
      this.computeBlockEntriesRect(blockId);
      this.renderComments(blockId);
    }
  }
  private onDetachBlock = (nodes: Nodes) => {
    for (const node of nodes) {
      const blockId = node.id;
      this.removeComments(blockId);
    }
  }
  // ...
}

In fact, the updates mentioned earlier may encounter a problem. When we update the content of a block, will it only affect the height of that block? Obviously, it is not. When one of our blocks changes, it is likely to affect all the blocks after it, because our layout engine is from top to bottom, and a change in the height of one block will likely affect other blocks. Therefore, if we update the position information in full, it may cause significant performance consumption. So here we can consider determining the update scope based on the influence range of the 'HOC'. Even due to the clear height change caused by locking the viewport, we can update each position height as needed. We need to consider updating the 'HOC' index range to determine the influence range. For the comments of the current block, we need to update them all, while for the blocks after the current block, we only need to update their heights. Our strategy here is to determine the influence range based on the 'HOC' index, so we need to maintain the 'HOC' index range after any changes.

class CommentModule {
  // ...
  private onLockView = (instance: HOC, delta: number) => {
    this.computeBlockEntriesRect(instance.id);
    this.renderComments(instance.props.block.id);
    const effects = this.layout.instances.filter(it => it.index > instance.index);
    for (const effect of effects) {
      const comments = this.getComments(effect.block.id);
      comments.forEach(comment => {
        comment.top = comment.top + delta;
        comment.update();
      });
    }
  }
  // ...
}

Actually, as we've mentioned multiple times before, we cannot handle scrolling through smooth scheduling because we need explicit height values and viewport locking scheduling. So we can also think about this issue. Since we basically take full control of the document's scrolling behavior, we just need to put the explicit height value in a variable. The main problem with viewport locking scheduling is that we cannot clearly know if we are currently scrolling. If we can clearly perceive when we are scrolling, we just need to schedule the viewport locking and block structure rendering after the scrolling ends. No relevant modules will be scheduled during the scrolling process.

As for this issue, I have an implementation idea, but it has not been specifically carried out. Since our scrolling is mainly to solve the above two problems, we can completely simulate this scrolling animation. In other words, for a fixed scrolling delta value, we can simulate the animation effect through calculation, similar to the transition ease animation effect, and manage all the scrolling progress through Promise.all, then implement subsequent scheduling effects through a queue. When we need to obtain the current state, we can let the scrolling module decide whether to take the scheduling value or scrollTop. After the scrolling is completed, the next task is scheduled. Actually, I think this approach can be considered as a future optimization direction. Even if we don't schedule animation effects, achieving the goal flashing effect by positioning to the relevant position is also a good idea.

    Set Top 100
         |
[ 50, 25, 13, 7, 5 ]
         |
    Promise.all
         |
     Next Task
         |
        ...

Performance Considerations

After we complete the compatibility with various functions, we must evaluate the performance of our virtual scrolling solution. In fact, we need to conduct preliminary performance testing during the early research phase to determine the ROI of implementing this functionality and the allocation of resources.

Performance Indicators

Therefore, to evaluate performance, we need to clearly define our performance indicators. Our common performance testing indicators usually include:

FP - First Paint: The time point of the first rendering, which can be considered as the white screen time in the performance statistics. Until the FP time point, the user sees a completely white screen with no content, and no useful work is perceived by the user.
FCP - First Contentful Paint: The time point of the first rendering with content, considered as the time period of no content in the performance statistics. Until the FCP time point, users see a screen with pixels rendered but no actual content, and no useful information is obtained by the user.
LCP - Largest Contentful Paint: A Core Web Vitals metric used to measure when the largest content element in the viewport becomes visible, which can be used to determine when the main content of the page is fully rendered on the screen.
FMP - First Meaningful Paint: The time of the first rendering of meaningful content, considered completed after the entire page layout and textual content are rendered.
TTI - Time to Interactive: An unofficial web performance progress metric defined as the time point when the previous LongTask is completed, followed by 5 seconds of inactivity in both network and main thread.

Since we want to test the editor engine, or in simpler terms, the performance indicators are not for the main application but rather for testing the performance of the SDK, our indicators may not be as generic. Moreover, since we prefer to conduct testing in an actual online scenario rather than solely based on the development version of the SDK, we have chosen LCP and TTI as our testing standards here. As we do not involve network status, static resources and caching can be enabled, and to prevent the impact of sudden spikes, we can conduct multiple tests and take the average value.

The LCP standard usually emits the completion of the initial rendering in our editor engine. This time point can be considered as the componentDidMount timing of the component. So in this case, our LCP takes this time point, which is also mentioned earlier as the isEditorLoaded in the Layout module. In addition, we can also start calculating from the time when the editor is instantiated, which can more accurately exclude the time consumption of the main application. This solution only needs to define the event trigger in the editor and subtract the timestamps in the HTML.
As for the TTI standard, since the TTI is a non-standard web performance progress metric, we do not need to strictly define this behavior according to the standard. In fact, we only need to find a proxy indicator. As mentioned earlier, we are conducting tests in real-world scenarios online, so all the functionalities exist in the system. Therefore, we can define this indicator based on the user's interaction behavior. The chosen solution in this test is to consider it as fully interactive when the user clicks the publish button and an actual modal for publishing is displayed. This solution can be achieved by using a Tampermonkey script to continuously check the status of the button and automatically simulate the user's publishing interaction behavior.

// HTML
var __MEASURE_START = Date.now(); // or `performance.now`

// Editor
window.__MEASURE_EDITOR_START = Date.now(); // or `performance.now`

// LCP
editor.once("paint", () => {
  const LCP = Date.now() - __MEASURE_START;
  console.log("LCP", LCP);
  const EDITOR_LCP = Date.now() - window.__MEASURE_EDITOR_START;
  console.log("EDITOR_LCP", EDITOR_LCP);
});

// TTI
// ==UserScript==
// @name         TTI
// @run-at      document-start
// ...
// ==/UserScript==
(function () {
  const task = () => {
    const el = document.querySelector(".xxx")?.parentElement;
    el?.click();
    const result = document.querySelector(".modal-xxx");
    if (result) {
      console.log("TTI", Date.now() - __MEASURE_START);
    } else {
      setTimeout(task, 100);
    }
  };
  setTimeout(task, 100);
})();

Performance Testing

In the preliminary performance testing conducted during the early stages of research, the introduction of virtual scrolling has brought significant improvements in performance. Especially for many API documents, a large number of table block structures can quickly degrade performance. The table contains nested block structures and also requires maintaining a large number of states. Therefore, implementing a virtual list is very valuable. So, remember the user feedback we mentioned earlier? We need to perform performance testing on this feedback document using the performance metrics mentioned above. Based on the performance data obtained earlier, we can make comparisons.

Editor Rendering: 2505ms -> 446ms, an improvement of 82.20%.
LCP Metric: 6896ms -> 3376ms, an improvement of 51.04%.
TTI Metric: 13343ms -> 3878ms, an improvement of 70.94%.

However, testing on the document provided by user feedback alone is not enough. We need to design other testing plans to test the document, especially fixed test documents or fixed test plans, which can provide more data references for future performance plans. Therefore, we can design a testing plan. Since our document is composed of block structures, it is obvious that we can generate performance testing benchmarks based on three types of blocks: plain text blocks, basic blocks, and code blocks.

First, let's start with a plain text block scenario. Here, we generate a plain text document consisting of 10,000 characters. In fact, our documents usually do not have a particularly large number of characters. For example, this document is about 37,000 characters, which is already considered a super large document. The majority of documents are less than 10,000 characters. When generating the text, I also noticed an interesting thing. Even randomly generated characters still have a classical Chinese feel to them when selecting Yueyang Tower as the base text. In this case, for plain text blocks, we adopt the strategy of full rendering without scheduling virtual scrolling because plain text is a simple block structure. However, the additional module causes an increase in the overall rendering time.

Editor Rendering: 219ms -> 254ms, an improvement of -13.78%.
FCP Metric: 2276ms -> 2546ms, an improvement of -10.60%.
TTI Metric: 3270ms -> 3250ms, an improvement of 0.61%.

Next up is the benchmark test for basic block structures, where basic block structures refer to simple blocks, such as highlighted blocks and code blocks. Due to the versatility of code blocks and the likelihood of their frequent occurrence in documents, we have chosen code blocks as the benchmark for testing. We will randomly generate 100 basic block structures here, with each block containing randomly generated text marked with random bold and italic styles.

Editor rendering: 488ms -> 163ms, optimized by 66.60%.
FCP metric: 3388ms -> 2307ms, optimized by 30.05%.
TTI metric: 4562ms -> 3560ms, optimized by 21.96%.

Finally, we have the benchmark test for table block structures. Due to the high maintenance state and the potential existence of a large number of individual table cell structures, especially in the case of large tables in many documents, table structures have the maximum performance overhead on the editor engine. The benchmark here involves generating 100 table structures, each containing 4 cells, with each cell randomly filled with text and marked with random bold and italic styles.

Editor rendering: 2739ms -> 355ms, optimized by 87.04%.
FCP metric: 5124ms -> 2555ms, optimized by 50.14%.
TTI metric: 20779ms -> 4354ms, optimized by 79.05%.

Daily Question

https://github.com/WindrunnerMax/EveryDay

References

https://developer.mozilla.org/en/docs/Web/CSS/overflow-anchor
https://developer.mozilla.org/en/docs/Web/API/IntersectionObserver
https://developer.mozilla.org/en/docs/Web/API/IntersectionObserverEntry
https://developer.mozilla.org/en/docs/Web/API/History/scrollRestoration
https://developer.mozilla.org/en/docs/Web/API/Element/getBoundingClientRect
https://arco.design/react/components/list#%E6%97%A0%E9%99%90%E9%95%BF%E5%88%97%E8%A1%A8

Exploring Rich Text: Document Virtual Scrolling#

Description#

Solution Research#

Virtual Scrolling#

Module Design#

Rendering Model#

Scroll Scheduling#

State Management#

Selection Status#

Viewport Locking#

Quick Scroll#

Incremental Rendering#

Scenario Inference#

Anchor Jump#

Find and Replace#

Text Selection Comment#

Performance Considerations#

Performance Indicators#

Performance Testing#

Daily Question#

References#