Virtual scrolling is a technique that optimizes the performance of long lists by rendering only the visible portion of the document data, rather than the entire document structure. It improves browser efficiency by calculating the necessary rendered list items based on the visible area height and the container's scroll position, while avoiding the rendering of unnecessary view content. The advantage of virtual scrolling is that it greatly reduces DOM operations, thereby reducing rendering time and memory consumption. It solves issues such as slow page loading and lag, thus improving user experience.
Recently, we received feedback from users regarding performance issues with our document editor when working with large documents that contained a lot of tables. The long content caused noticeable lag during the editing process and took a long time to render for the end user, resulting in a poor user experience. To address this, I tested a large document and found that the Largest Contentful Paint (LCP) reached 6896ms for the initial screen. Even with various resources cached, the First Contentful Paint (FCP) still took 4777ms. The rendering time for the initial screen of the editor alone was 2505ms, and the entire application's Time to Interactive (TTI) reached 13343ms. Even under simulated rapid input, the frames per second (FPS) could only maintain at 5+, and the DOM count reached 24k+. As a result, this problem was quite serious, and I embarked on a lengthy journey of research and optimization.
During my research, I found that there were hardly any articles on performance optimization for online document editing. So, for me, it was essentially starting from scratch to research the entire solution. However, the community had various performance optimization solutions for virtual scrolling, which greatly helped in implementing the overall solution. Furthermore, I also considered whether it was appropriate to put all the content in one document. It seemed no different from putting all the code in one file. I felt that there might be better solutions in terms of organizational structure. However, that was a different issue and here, we focused on addressing the performance issues of large documents.
Before diving into the implementation, I pondered an interesting question: why does virtual scrolling optimize performance? When we perform DOM operations in the browser or manage windows on a PC, are these elements truly present? The answer is clear: these views, windows, and DOM elements are all simulated through graphics. Although we can easily achieve various operations using system or browser APIs, the content is ultimately drawn by the system. It's fundamentally based on external input signals generating simulated state and behavior, including collision detection, which is all represented by a large amount of computation performed by the system.
Then right after, recently I wanted to learn the basic operations of Canvas
, so I implemented a very basic graphic editor engine. Because the browser's Canvas
only provides the most basic graphic operations, and lacks the convenience of DOM
operations, all interaction events need to be simulated through mouse and keyboard events. In this process, an important point is to determine whether two graphics intersect, in order to decide whether to redraw the graphics on demand to improve performance. So let's imagine the simplest way to determine this: iterate through all the graphics to check if they intersect with the upcoming graphic. This can involve complex calculations, but if we can determine in advance that certain graphics are impossible to intersect, we can save a lot of unnecessary calculations. Similarly, the situation is the same for layers outside the viewport. If we can determine that a graphic is outside the viewport, there is no need to check its intersection, and it doesn't need to be rendered. The same goes for virtual scrolling – if we can reduce the number of DOM elements, we can reduce a lot of calculations and improve the runtime performance of the entire page. And as for the first-screen performance, it goes without saying that reducing the number of DOM elements will definitely make the first screen rendering faster.
Of course, the above is just my thoughts on improving document editing or runtime performance. In fact, there have been many discussions in the community about optimizing performance with virtual scrolling. For example, reducing the number of DOM elements can reduce the number of DOM elements that the browser needs to render and maintain, thereby reducing memory usage and allowing the browser to respond faster to user interactions. Moreover, browser reflow
and repaint
operations usually require a large amount of computation, which becomes more frequent and complex as the number of DOM elements increases. By reducing the number of DOM elements that need to be managed through virtual scrolling, rendering performance can be significantly improved. In addition, virtual scrolling also provides faster first-screen rendering, especially for large documents, where rendering the entire document all at once can easily result in long first-screen rendering times. It can also reduce the JavaScript performance overhead of maintaining component states in React, especially when there is Context
involved. If not given enough attention, performance degradation issues may arise.
So after studying the advantages of virtual scrolling, we can start studying the implementation of virtual scrolling. Before diving into block-level virtual scrolling in the rich text editor, let's first study how virtual scrolling is generally implemented. Here, we take the List
component in ArcoDesign
as an example to study a common implementation of virtual scrolling. In the example provided by Arco
, we can see that it passes the height
attribute. Without this attribute, the virtual list cannot work properly. In fact, Arco
calculates the height of the entire container by multiplying the number of list items with the height of each item. It is worth noting that the scroll container should be an element outside the virtual list container. For the area within the viewport, it can be actually offset using transform: translateY(Npx)
. When scrolling, we need to calculate the nodes that need to be rendered in the current viewport based on the actual scroll distance of the scroll bar, the height of the scroll container, and the height of the elements we defined, while other nodes are not actually rendered, thus achieving virtual scrolling. Of course, there are many other configurations for the Arco
virtual list, which will not be covered here.
By analyzing the common virtual scrolling of Arco
's list, we can see that implementing virtual scrolling doesn't seem to be that difficult. However, in our online document scenario, implementing virtual scrolling may not be so straightforward. Here, let's first discuss the implementation of image rendering in documents. Usually, when uploading images, we record the size (i.e. width and height) of the image. During actual rendering, we ensure the aspect ratio of the image by using the maximum width and height of the container and object-fit: contain;
. Even if the image is not fully loaded, its placeholder height is already fixed. However, in the context of our document structure, the height of our blocks is not fixed, especially for text blocks where the height can vary depending on factors such as font and browser width. We cannot know the height of a text block before it is rendered. This means that we cannot calculate the placeholder height of the document block structure in advance, thus the virtual scrolling of the document block structure needs to address the issue of varying block heights. Therefore, we need to implement a dynamic height virtual scrolling scheduling strategy to handle this scenario. Actually, if it's just dynamic height virtual scrolling, it's not particularly difficult. There are already many implementation solutions in the community. However, our document editor includes many complex modules, such as text selection, comments, anchor jumps, etc., which need to be compatible with the virtual scrolling solution in addition to the virtual scrolling of the document itself.
In fact, there are many ways to implement a rich text editor. We won't discuss the difference between drawing rich text based on DOM
or Canvas
here. Here, we focus on a DOM
-based rich text editor, such as Quill
, which implements its own view layer drawing, while Slate
relies on React
to build the view layer. These two approaches have significant differences in implementing the view layer. In this article, we lean towards the implementation approach of Slate
, which uses React
to construct block-level virtual scrolling. However, in practice, if we have full control over the view layer, there will be more room for performance optimization. For example, it is easier to schedule idle-time rendering and caching strategies, thereby optimizing the experience during fast scrolling. In fact, regardless of the approach, there is not a big difference in terms of the core content of this article. As long as we ensure the correct scheduling of modules controlled by the rich text engine itself, such as the selection module, height calculation module, lifecycle module, etc., and control the actual rendering behavior, any editing engine can apply a virtual scrolling solution.
First, let's envision the rendering model of the entire document. Whether it's based on a block-based editor or a paragraph-based editor, it cannot escape the concept of lines, because when we describe content, it's usually composed of lines to form a document. Therefore, our document rendering is also described based on lines. Of course, here, the line is just a rather abstract concept. The nested structure within this line structure may be an expression of a block structure, such as code blocks, tables, and so on. Regardless of how the blocks are nested, the outermost layer will always need to encompass the line structure. Even for a pure Blocks
document model, we can always find the outer block container DOM
structure. Therefore, we need to clearly define the concept of lines here.
In fact, the lines we are concerned with here tend to lean more towards directly describing the main document. If a code block structure is nested within a line of the main document, the entire block structure of this code block is what we need to focus on, and we will not pay too much attention to the internal structure of this code block for now. This can be further optimized, particularly for scenarios involving very large code blocks, but we will not focus on this structural optimization here. In addition, documents drawn on the Canvas
or documents expressed in a paginated manner are also not within our scope of concern. As long as an article can be expressed through pagination, we can directly render it as needed. Of course, if necessary, we can also perform paragraph-level on-demand rendering, which can also be considered as further optimization space.
Based on this, we can easily deduce the structure that our document ultimately needs to render. Firstly, there is the placeholder
area, which contains content that is not within the viewport and therefore exists in a placeholder form. Next is the buffer
, which contains the pre-rendered content. Although this area is not within the viewport, it preloads part of the view content to minimize the occurrence of brief white screens when the user scrolls, and usually, this part can be the size of half the viewport height. Following that is the viewport
portion, containing the actual content to be rendered within the viewport. Similarly, within the viewport area, we also need buffer
and placeholder
to serve as pre-loaded and placeholder areas.
It is important to note that for the placeholder
here, we typically choose to directly use DOM
for placement. Some may think that using translate
directly is a better choice, as it may be more efficient and trigger GPU
acceleration. However, in an ordinary virtual list, using translate
should not pose any issues. But in a document structure, the DOM
structure can be more complex, and using translate
may lead to some unexpected situations, especially in complex style structures. Therefore, using DOM
for placement is a simpler approach. In addition, because of the existence of the selection module, when implementing the placeholder
, we also need to consider cases where users drag a long selection. That is, if the user continuously scrolls a portion of the viewport
, then directly drags it into the placeholder
area, if this part of the DOM
disappears and is replaced by a placeholder DOM
node without special processing, there will be problems mapping the selection to the Model
. Therefore, we need to retain these DOM
nodes when the user selects, and using DOM
for placement here can be more convenient. It can be more difficult to adapt using translate
, so the rendering model at this point is as follows.
The essence of implementing virtual scrolling is to calculate the lines that need to be rendered in the current viewport based on the height of the viewport, the scrolling distance of the scroll container, and the height of the line when the user scrolls the view. The two most commonly used APIs for virtual scrolling in the browser are the Scroll Event
and Intersection Observer API
. The former calculates the position of the viewport by listening for scroll events, while the latter determines the position of an element by observing its visibility. Based on these two APIs, different virtual scrolling solutions can be implemented.
First, let's look at the Scroll Event
. This is the most common way of monitoring scrolling - by listening for scroll events, we can obtain the scrolling distance of the scroll container. Then, by calculating the viewport height and scrolling distance, we can calculate the lines that need to be rendered in the current viewport, and then decide whether to render based on the calculated status in the view layer. In practice, a virtual scrolling solution based solely on monitoring the Scroll
event is very straightforward. However, it is also more likely to encounter performance issues, and there may still be lag even if marked as a Passive Event
. The core idea is to listen for scroll events on the scroll container. When a scroll event is triggered, we need to calculate the nodes within the current viewport based on the scroll position, and then calculate the actual nodes that need to be rendered based on the height of the nodes, thereby achieving virtual scrolling.
As mentioned earlier, it is relatively easy to implement virtual scrolling with a fixed height. However, our document blocks have dynamic heights, and we cannot know their actual heights until the blocks are rendered. So what is the difference between dynamically sized virtual scrolling and fixed-height virtual scrolling?
Firstly, the height of the scrolling container is unknown at the beginning. We can only determine the actual height through the continuous rendering process. Secondly, we cannot directly calculate the nodes that need to be rendered based on the scroll height. In the case of fixed heights, we calculate the starting index
cursor for rendering based on the scroll container height and the total height of all nodes in the list. However, in dynamic height virtual scrolling, we cannot obtain the total height, and the length of the rendered nodes is also unknown. Thus, we cannot know how many nodes need to be rendered in this instance.
Furthermore, it is difficult to determine the distance between each node and the top of the scrolling container, as mentioned earlier with translateY
. We need this height to support the scrolling area, allowing us to achieve the scrolling effect.
Some may argue that these values cannot be calculated, but that is not the case. Without any optimization, we can brute force iterate through and calculate these values.
Now let's figure out how to calculate the above content. Based on our previous discussion, documents are essentially based on block virtual scrolling. We can directly calculate the total height by summing the heights of all blocks. Here, it's important to note that even though we cannot obtain the height before rendering, we can estimate it based on the data structure and correct the height during actual rendering. Remember that we use placeholder blocks to support the scrolling area. Therefore, we need to calculate the specific placeholders based on the start and end cursors. We will calculate the specific cursor values later, but for now, let's calculate the height of the two placeholder nodes and render them in their respective positions.
Now we have an approximate estimation of the total height. Next, we need to determine the positions of the start and end cursors, i.e., the actual indices of the blocks to be rendered. For the start cursor, we can directly calculate it based on the scroll height. We iterate through the heights of the nodes until we find a node that exceeds the scroll height. At that point, we consider the cursor to be the index of the first node to be rendered.
For the end cursor, we need to calculate it based on the start cursor and the height of the scrolling container. Similarly, we iterate through the heights of the nodes until we find a node that exceeds the height of the scrolling container. At that point, we consider the cursor to be the index of the last node to be rendered.
Of course, we should not forget the buffer
data in this calculation. This is crucial to avoid blank areas during scrolling. Additionally, we are using a brute force method to calculate these cursor values. For modern machines and browsers, the performance cost of performing addition calculations is relatively low. For example, if we perform 10,000 addition operations, the actual time consumption might be less than 1ms.
Here we are discussing the most basic principle of virtual scrolling, so there are basically no optimizations in this example. It is obvious that our traversal processing for heights is relatively inefficient. Even though the overhead of performing thousands of addition calculations is not significant, it is still advisable to avoid such extensive computations in large-scale applications, especially in cases where the Scroll Event
is triggered at a very high frequency. Therefore, an obvious optimization direction is to implement height caching. In simple terms, we can cache the heights that have already been calculated, so that the cached heights can be used directly in subsequent calculations without the need for re-traversal. When there are height changes that require updating, we can recalculate the cached height from the current node to the newest cached node. Additionally, this approach is equivalent to an incrementally sorted array, and we can solve the lookup problem through methods such as binary search, thereby minimizing the need for extensive traversal calculations.
IntersectionObserver
has now been marked as Baseline Widely Available
. All browsers released after March 2019
have implemented this API
, and it is now very mature. Next, let's take a look at the implementation of virtual scrolling using the Intersection Observer API
, but before delving into the specific implementation, let's first examine the specific application scenarios of IntersectionObserver
. Based on its name, we can infer that the main goal of this API
is to asynchronously observe the intersection of target elements with their ancestor elements or the top-level document viewport, which is very useful for determining whether an element appears within the viewport.
At this point, we need to consider a potential issue. The application scenario of the IntersectionObserver
object is to observe the intersection of target elements with the viewport. However, the core concept of our virtual scrolling is not to render elements outside the viewport. Therefore, there is actually a deviation here. In virtual scrolling, the target elements either do not exist or are not rendered, so their state cannot be observed. To align with the concept of IntersectionObserver
, we need to render actual placeholder nodes. For example, for a list of 10,000 nodes, we need to first render 10,000 placeholder nodes. In reality, this is a reasonable approach, unless we initially notice performance issues with the document. In fact, most performance optimization is done later, especially in complex scenarios. For example, if we originally had 10,000 data items and even if only 3 nodes are rendered for each item, rendering placeholder nodes would still reduce the original 30,000 nodes on the page to approximately 10,000 nodes. This is highly meaningful for performance improvement, and if necessary, can continue to be further optimized.
Of course, if we don't use placeholder nodes, we can still implement virtual scrolling using Intersection Observer
, but in this case, we would need to rely on Scroll Event
to assist in implementing some forced refresh operations, making the overall implementation more complicated. Therefore, let's proceed with the implementation of a virtual scrolling solution based on IntersectionObserver
and placeholder nodes. First, we need to create an IntersectionObserver
. Similarly, because our scroll container may not necessarily be the window
, we need to create the IntersectionObserver
on the scroll container. In addition, as discussed earlier, we will add a buffer
to the viewport area to preload elements outside the viewport, thereby avoiding blank areas during user scrolling. The size of this buffer
is usually selected as half the current viewport height.
Next, we need to manage the state of the placeholder nodes. Since we currently have actual placeholders, we no longer need to estimate the height of the entire container. We only need to render the nodes when they are actually scrolled to the relevant position. We set three states for the nodes: loading
state represents the placeholder state, where the node only renders an empty placeholder or a 'loading' indicator. At this point, we do not know the actual height of the node. viewport
state represents the actual rendering state of the node, indicating that the node is within the logical viewport. We can record the actual height of the node at this point. placeholder
state represents the rendered placeholder state, indicating that the node has scrolled from within the viewport to outside the viewport. The height of the node has been recorded at this point, so we can set the node's height to its actual height.
Of course, the observation of our Observer
also needs configuration. Here, it is important to note that the callback function of IntersectionObserver
only carries information about the target
node. We need to use the node information to manage the node state. Therefore, we use WeakMap
to establish a relationship between elements and nodes, making it easier for us to handle.
Finally, the actual scrolling schedule is needed. When a node appears in the viewport, we need to obtain the node information based on ELEMENT_TO_NODE
, and then set the state based on the current viewport information. If the current node is in the viewport, we set the node state to viewport
. If the node is leaving the viewport, we need to further determine the current state. If it is not the initial loading
state, we can directly set the height to placeholder
in the node state. At this point, the node's height is the actual height.
The performance optimization methods continued to be discussed in this article are all based on the implementation of the Intersection Observer API
. In the document, each block may contain hundreds of nodes, especially in complex expressions such as tables. Moreover, the number of immediate blocks or rows under the main document is usually not very high, so the optimization for the number of nodes is significant.
Previously, I saw a question on Zhihu about why Python's built-in sort
is 100 times faster than self-written quicksort. This made me think of this question every time I see the Intersection Observer API. The main reason for this is that Python's standard library is implemented in C/C++, which makes its execution efficiency much higher than that of Python, an interpreted scripting language. The same issue applies to the Intersection Observer API, which is implemented at the browser's low level using C/C++, resulting in much higher execution efficiency than when we use JS for scroll scheduling. However, with the aid of JIT compilation, perhaps the performance gap is not that big.
In our document editor, virtual scrolling is not just about simple scroll rendering, but also involves managing various states. Typically, our editor already has a block manager, which manages the entire Block Tree state based on various changes. Essentially, this involves the manipulation of the tree structure, such as when a trigger operation is to "insert { parentId: xxx, id: yyy}", where we need to add a new node yyy under the node xxx. The management of this tree structure depends on the specific business implementation. For example, if the editor retains a block in the tree for the convenience of undo/redo instead of actually deleting it, the block manager's state management becomes about adding only, not deleting. Therefore, the implementation of the block manager depends on the specific editor engine.
Here, our focus is on extending the capabilities of this Block Engine to incorporate virtual scrolling state. If it were just a matter of adding new states, it would be a simple problem. However, we also need to consider the issue of nested block structures, in preparation for our subsequent scenarios. As mentioned earlier, we are currently focusing on the direct management of blocks within the main document. For nested structures, when a direct block is in a placeholder state, we need to set all nested blocks inside it to a placeholder state. This process is recursive and may involve a significant amount of calls, so we need to implement a caching layer to reduce redundant calculations.
Our approach here is to set a cache on each node, which stores references to all the child nodes in the subtree. This is a typical trade-off of space for time, but because it stores references, the space consumption is not significant. The advantage of this approach is that, for example, if a user keeps modifying the structure of a certain child node, caching will only re-calculate the content of that node, while other child nodes can directly access the cached content without the need for re-computation. It should be noted that when appending or removing child nodes from the current node, we need to clear the cache for that node and all parent nodes on the link, and recalculate them as needed for the next call. Actually, due to the granularity of our editor, which is based on changes for scheduling, achieving fine-grained structure management is not very difficult.
Now that we have a complete block manager, the next step is to consider how to schedule and control the rendering behavior. If our editor engine had a self-developed view layer, the controllability would certainly be very high, and it would not be difficult to control rendering behavior or implement rendering caches. However, as mentioned earlier, we tend to use React as the view layer for scheduling. Therefore, we need a more common management solution. The advantage of using React as the view layer is the ability to achieve rich custom view rendering using the ecosystem. However, the problem lies in the difficulty of control, including not only rendering scheduling behavior, but also issues related to the mapping between Model and View, as well as problems related to the reuse of ContentEditable. Nonetheless, these are not the focus of this article. Let's first discuss more common rendering control methods.
First of all, let's think about how to control the rendering of DOM
nodes in React
. Obviously, we can manage the rendering state through State
, or control the rendering through ReactDOM.render/unmountComponentAtNode
. As for directly manipulating the DOM
through Ref
, this method may be difficult to control and may not be the best management approach.
Let's start by looking at ReactDOM.render/unmountComponentAtNode
. This API
was deprecated in React18
, and although it may change in the future, the main issue is that using render
will result in the inability to directly share Context
, meaning it will detach from the original React Tree
and must be re-integrated with Context
in order to achieve this, which is clearly not suitable.
Therefore, ultimately, we still control the rendering state through State
. At this point, we also need a global document manager to control the state of all block nodes. In React
, obviously, we can accomplish this through Context
, using global state changes to affect the state of each ReactNode
. However, this essentially shifts control to the various child nodes to manage their own state. We may want to have a global controller to manage all the blocks. In order to achieve this, we implement the LayoutModule
module to manage all the nodes, and for the nodes themselves, we need to wrap them with an HOC
. To facilitate this, we choose class components to accomplish this, allowing us to manage the state of all block structure instances through the LayoutModule
module.
Using class components, the entire component instantiation is an object, making it easier to write function calls and control states. Of course, these implementations can also be achieved through function components, but using class components might be more convenient. We can then control their state through class methods. Additionally, we need to use ref
to obtain the current component's observed node. Using ReactDOM.findDOMNode(this)
can acquire the reference to the DOM
in a class component, but it has also been deprecated, so it is not recommended to use. Instead, we wrap a layer of DOM
and observe this layer of DOM
to achieve virtual scrolling. Furthermore, we need to note that the DOM
rendering is actually controlled by React
and is uncontrollable for our application, so we also need to record prevRef
to observe the change in the DOM
reference and update the IntersectionObserver
observation object accordingly.
In the selection module, we need to ensure that the view's status can be correctly mapped to the Model
. During virtual scrolling, the DOM
may not actually be rendered on the page, and the browser's selection expression needs to be determined by the anchorNode
and focusNode
nodes, so we need to ensure that these two nodes are rendered normally in the DOM
tree during the user's selection process. Implementing this capability is not actually complicated, as long as we understand the browser's selection model and ensure that the anchorNode
and focusNode
nodes are rendered normally. By ensuring the correct rendering of nodes, we do not need to redesign the selection model in the virtual scrolling scenarios. Based on this, let's deduce some scenarios.
View Model
mapping logic can be maintained.anchorNode
and focusNode
nodes are correctly rendered, or if the granularity is coarse, ensure that the blocks in which they are located are rendered normally.MouseDown
while the anchorNode
is within the viewport, and then uses drag to scroll the page, consequently dragging the anchorNode
outside the viewport. Similarly, in this scenario, we need to ensure that the block/node where the anchorNode
is located is rendered normally, even if it is outside the viewport, to prevent the loss of the selection.API
operations on the document content, two scenarios may arise. If the updated content is not the anchorNode
or focusNode
node, it will not affect the overall selection. Otherwise, we need to recalibrate the selection nodes through the Model
after rendering is completed.Model
, and then forcibly rendered to ensure that the anchorNode
and focusNode
nodes are correctly rendered, followed by the normal selection mapping logic.In fact, do you remember that our Intersection Observer API
usually requires placeholder nodes to achieve virtual scrolling? So, since the placeholder nodes are already present, if we do not pay special attention to the number of DOM
nodes, we can render the block's selection-marking node when creating the placeholder. This can solve some problems, for example, the "select all" operation can be handled without special treatment. If we broaden the scope a bit, when creating the placeholder, we can render the text blocks as well as the Void/Embed
structure \u200B
nodes at the same time, and only schedule the rendering of complex blocks. In this case, we may not even need to worry about the selection, as the nodes needed for selection mapping are already rendered. We only need to focus on scheduling the virtual scrolling of complex blocks.
Viewport locking is a significant module. In the case of virtual scrolling, if we always start browsing from the beginning of the list, viewport locking is usually not necessary. However, for our document system, the situation is different. Let's imagine a scenario: when user A
shares a link with an anchor to user B
, and user B
opens the link and directly navigates to a specific heading or even a specific block content area in the document. If user B
then scrolls upward, a problem will arise. As mentioned before, we cannot obtain the actual height of the block before we actually render the content. Therefore, when the user scrolls up, because there is a disparity between the height of our placeholder node and the actual height of the block, visual jumping occurs. The purpose of viewport locking is to address this issue by locking the user's view at the current scroll position.
Before delving into the specifics of virtual scrolling, let's first understand the overflow-anchor
property. In the actual implementation of the editor engine, a major challenge lies in achieving compatibility across various browsers. This is evident with the overflow-anchor
property, as even browsers based on the Webkit
engine, such as Chrome
and Safari
, differ in their support. Returning to the overflow-anchor
property, it is designed to adjust the scroll position to minimize content movement, thereby addressing the visual jumping issue mentioned earlier. This property is enabled by default in supporting browsers. Since Safari
does not support this property, and considering that we actually need the value of this jumping disparity, here we need to disable the default behavior of overflow-anchor
and proactively control the ability to lock the viewport. However, as obtaining DOM
Rect
data is unavoidable when locking the viewport, manual intervention in viewport locking may trigger more reflow/repaint
actions.
Apart from overflow-anchor
, another property we need to pay attention to is History.scrollRestoration
. You might notice that when you navigate to a specific position on a page, then click a hyperlink to go to another page, and then navigate back, the browser remembers the previous scroll position. However, in our case, with virtual scrolling in place, we don't want the browser to control this navigation behavior because it might not accurately remember the position. Now that scrolling behavior needs active management, we need to disable this behavior in the browser.
Now, let's consider what other scenarios might affect our viewport locking behavior. It's quite evident that during a Resize
, container width changes, consequently affecting the height of text blocks, thus requiring adjustments to our viewport locking behavior here as well. Our adjustment strategy here is relatively simple. Consider that the only states in which we need to adjust viewport locking are from loading -> viewport
, as in other state changes, the height remains stable due to our placeholder
state fetching the real height. However, in the case of Resize
, even with placeholder
, we might need to reapply viewport locking because the height might not be the actual rendering height. Thus, our logic is to re-mark all nodes in the placeholder
state during a Resize
for viewport locking.
Next, let's delve into our actual viewport locking method. The approach here is still rather straightforward. When our component undergoes rendering changes, we need to fetch height information via the component's state. Then, based on this height data, we calculate the difference and adjust the scrollbar position accordingly. We also need to obtain information about the scrolling container. If the observed node's top
value is above the scrolling container, we need to lock the viewport due to height changes. When adjusting the scrollbar position, we must not use a smooth
animation but explicitly set its value to prevent viewport locking failure and avoid value retrieval issues with multiple calls. Additionally, it's important to note that since we calculate height based on actual measurements, using margin
might lead to calculation issues, such as margin collapsing. Therefore, our principle here is to use padding
wherever possible for spacing adjustments in block structures, minimizing the use of margin
.
When users scroll quickly, there may be a short white screen due to the existence of virtual scrolling. To avoid this problem as much as possible, we still need a certain scheduling strategy. The buffer
we previously set on the view layer can partially solve this problem, but it is still not enough in fast scrolling scenarios. Of course, in reality, the white screen time is usually not too long, and in cases where there are placeholder nodes, the user experience is usually acceptable. Therefore, the optimization strategy here still needs to be based on specific user needs and feedback. After all, one of our goals for virtual scrolling is to reduce memory usage. When scrolling quickly, it is usually necessary to schedule the rendering of more blocks in the scrolling direction. This will inevitably increase memory usage. Therefore, we still need to find a balance between white screens during scrolling and memory usage.
Let's first think about our fast scrolling strategy. After users perform a large range of scrolling, they are likely to continue scrolling in the same direction. Therefore, we can customize the scrolling strategy. In the case of a sudden large number of block renderings or when the scrolling distance within a certain time slice is greater than N
times the viewport height, we can determine the scrolling order based on the rendering order of the blocks, and then perform pre-rendering based on this order. The range of pre-rendering and the time interval for rendering scheduling need to be scheduled in the same way. For example, fast rendering should not exceed 100ms
between two scheduling events, and the duration of fast rendering can be set to 500ms
. The maximum rendering range can be defined as 2000px
or N
times the viewport length, depending on specific business requirements.
In addition, we can use idle rendering scheduling during idle time to avoid the white screen phenomenon in fast scrolling as much as possible. When users stop scrolling, we can use requestIdleCallback
for idle rendering, and control it by manually setting the time interval. It can be similar to the scheduling strategy for fast scrolling, setting the rendering time interval and rendering distance, etc. If the view layer supports node caching, we can even prioritize caching the view layer without actually rendering it in the DOM structure. When the user scrolls to the relevant position, we can directly retrieve it from memory and place it in the node position. In addition, even if the view layer cache is not supported, we can try to calculate and cache the state of the nodes in advance to avoid the delay caused by calculating during rendering. However, this method will also increase memory usage, so we still need to find a balance between efficiency and space occupation.
In the previous discussion, we mainly talked about the rendering of blocks. Except for the selection module, which may involve the editing status, the other content is more inclined towards controlling the rendering status. However, during editing, new blocks will definitely be inserted, and this part of the content actually needs its management mechanism, otherwise it may cause some unexpected problems. Let's imagine a scenario where users insert a code block through the toolbar or shortcut input. If virtual scrolling is not enabled, the cursor should be placed directly inside the code block. However, due to the existence of virtual scrolling, the first frame will be a placeholder DOM, and then the block structure will be loaded normally. Therefore, since the ContentEditable
block structure does not exist at this time, the cursor naturally cannot be placed correctly, and this will usually trigger the fallback strategy of the selection area, resulting in unexpected problems.
Therefore, when inserting nodes, we need to control them. The solution to this problem is very simple. Let's think about when to perform insertion operations. It must be after the entire editor has finished loading. At that time, where should the insertion take place? It is highly likely that the editing will be done in the viewport area. Therefore, our solution is to mark the Layout
module as loaded after the editor is initially rendered. At this time, the initial state of the inserted HOC
can be considered as viewport
. In addition, often we may need to mark the order of HOC
with an index
tag. If we need to mark it where it is inserted, we usually need to rely on the DOM to determine its index
.
Translate into English:
Actually, the modules we have here all need to provide the capabilities required by the editor engine. In many cases, we need to interact with the external main application, such as comments, anchors, find and replace, etc., which all require obtaining the status of the editor block. For example, our word commenting capability is a common scenario in document applications. The comments panel on the right side usually needs to obtain the height information of the text we select for displaying the position. However, because of the existence of virtual scrolling, this DOM
node may not actually exist, so the actual module of the comment will also become virtualized, which means it is progressively loaded as the scrolling progresses. Therefore, we need the ability to interact with the external application. In fact, this part of the capability is relatively simple. We just need to implement an event mechanism to notify the main application when the status of the editor block changes. In addition to managing the block status, it is also very important to change the height value of the viewport lock, otherwise there will be jumping problems in the positioning of the comments panel.
In our document editor, it is obvious that it is not enough to simply implement virtual scrolling. Various API compatibility must also be provided for it. In fact, the module design described above can also be part of the scenario inference, but the preceding content tends to be more focused on the design of functional modules within the editor, while our current scenario inference tends to be about the scenarios and interaction between the editor and the main application.
Anchor jump is a fundamental feature of our document system, especially when users share links, it is used more frequently. Some users even want to share arbitrary text positions. Similar to anchor jump, there may be problems when we have virtual scrolling. Imagine a situation where the user's hash
value is in a block, which may not be rendered actually in the case of virtual scrolling. Therefore, both the default strategy of the browser and the capability provided by the editor will become ineffective. Therefore, we need to separately adapt the anchor jump scenario and create a separate module to control the positioning to certain locations.
We can clearly determine that after the virtual scrolling is integrated, the difference from the previous jump lies in the fact that the block structure may not have been rendered yet. In this case, we only need to schedule the block with the anchor to be rendered immediately after the page is loaded, and then schedule the original jump. Since there may be a situation where the jumping occurs during the loading, when the user jumps to a certain node, the block structure above it may be transitioning from loading
to viewport
state. In this case, we need the viewport locking capability described earlier to ensure that the user's viewport does not cause visual jump due to the difference in height caused by block state changes.
So here we define the locateTo
method. In the parameters, we need to specify the Hash Entry
that needs to be searched, which represents the structure of the anchor in the rich text data structure. Because we ultimately need to retrieve the DOM
nodes through the data, if the blockId
is not passed, we also need to find the Block
to which the node belongs based on the Entry
. In the options
, we need to define buffer
as the scroll position offset. Since the DOM
node may already exist, we pass domKey
to try to jump directly to the relevant position through the DOM
. Finally, if we can determine the blockId
, we will directly pre-render the relevant nodes; otherwise, we need to look up based on the key value
from the data.
Actually, in most cases, we usually jump to the position of the title, and we don't even jump to the title of a nested block. So in this case, we can even independently schedule the blocks of type Heading
, which means that it will be in the viewport
state instead of the loading
state when the HOC is loaded. In this way, the complexity of scheduling the anchor can be reduced to some extent. Of course, the independent position jump control capability is still necessary, as there are many other modules that may need it besides the anchor.
Find and replace is also a common feature in online documents. Usually, it is based on document data retrieval and marks the relevant positions in the document. It also has the capability to jump and replace. Due to the requirements of document retrieval and virtual layers in find and replace, our control scheduling becomes more dependent when using virtual scrolling. First of all, there is a jump issue in the find and replace scenario, which is similar to the anchor jump mentioned above. We need to render the relevant blocks when jumping and then proceed with the jump. In addition, find and replace also requires the rendering capability of the virtual layer VirtualLayer
. When rendering the actual blocks, we also need to render the layer at the same time. In other words, our virtual layer module also needs to be rendered on-demand.
So next, we need to adapt its related API
control capability. First of all, let's talk about the part of location jumping. Here, since our goal is to obtain the original data structure through retrieval, we don't need to retrieve the Entry
again through key value
. We can directly assemble the Entry
data, and then find the corresponding Text
node based on the mapping of Model
and View
. After that, use range
to get its position information, and finally jump to the relevant position. Of course, the node information here may not necessarily be a Text
node, it could also be a Line
node and so on, so it's necessary to focus on the implementation of the editor engine. However, one thing to note here is that we need to ensure the rendering state of the Block
in advance, which means we need to schedule forceRenderBlock
to render the Block
before the actual jump.
Next, we need to focus on the location jumping in the search and replace process itself. Typically, in the search and replace feature, there are buttons for finding the previous and next occurrences. So in this case, we need to consider one problem. Because our Block
may not always be rendered, it's not easy to obtain its height information, so the scheduling of previous and next occurrences might be inaccurate. For example, if we have a block structure nested with lines and code blocks at a position below the document, and we directly iterate through all the state blocks without recursively searching, then there may be a problem of jumping to the completion of the block content before jumping to the code block. Therefore, in the search process, we need to first predict the height. Remember, as we discussed earlier, we have placeholder nodes, so by using the placeholder nodes as the estimated height value, this problem can be solved. However, it still depends on the specific algorithm of the search and replace to determine whether such compatibility control is needed. In essence, we need to ensure that the order of marking the content before and after block rendering remains consistent.
Then, we need to pay attention to the rendering of the virtual layer in the actual document body, which means the markup displayed in the document. As mentioned earlier, we have integrated the Event
module into the Layout
module, so next, we need to use the Event
module to complete the rendering of the virtual layer. Actually, this part of the logic is quite simple. We just need to render the stored virtual layer nodes onto the block structure at the moment of attach-block
, and remove them at the moment of detach-block
.
Text selection commentary is also a common feature in online document products. Because comments may have various jump functions, such as the previous and next positions, jumping to the first comment, and positioning when the document opens, we need to adapt to these functions. First, let's consider the position update of the comments. When we open the document, whether it is anchor jumping or positioning the first comments in the document, the document will directly scroll to the corresponding position. If the user scrolls up again at this time, a problem will occur. Due to the existence of the viewport locking function, the scrollbar is constantly adjusting, and the height of the block structure will also change. Therefore, we must adjust the position of the comments equally, otherwise, there will be an offset phenomenon between the comments and the text selection.
Similarly, our comments may encounter situations where the block DOM does not exist, which causes problems obtaining its height. Therefore, our comments content also needs to be rendered on demand, that is, the comments content will only be displayed when scrolling to the block structure. Therefore, we only need to register the callback function for the comment module in the virtual scroll module. We may notice that during the implementation of the virtual scrolling events, the mounting and unmounting of the blocks are asynchronous notifications, while the notification events for locking the viewport are synchronous. This is because the viewport locking must be executed immediately, otherwise, a visual jump will occur. Additionally, we cannot set animations for the comment cards, as it may also cause a visual jump, so we need additional scheduling strategies to resolve this issue.
In fact, the updates mentioned earlier may encounter a problem. When we update the content of a block, will it only affect the height of that block? Obviously, it is not. When one of our blocks changes, it is likely to affect all the blocks after it, because our layout engine is from top to bottom, and a change in the height of one block will likely affect other blocks. Therefore, if we update the position information in full, it may cause significant performance consumption. So here we can consider determining the update scope based on the influence range of the 'HOC'. Even due to the clear height change caused by locking the viewport, we can update each position height as needed. We need to consider updating the 'HOC' index range to determine the influence range. For the comments of the current block, we need to update them all, while for the blocks after the current block, we only need to update their heights. Our strategy here is to determine the influence range based on the 'HOC' index, so we need to maintain the 'HOC' index range after any changes.
Actually, as we've mentioned multiple times before, we cannot handle scrolling through smooth scheduling because we need explicit height values and viewport locking scheduling. So we can also think about this issue. Since we basically take full control of the document's scrolling behavior, we just need to put the explicit height value in a variable. The main problem with viewport locking scheduling is that we cannot clearly know if we are currently scrolling. If we can clearly perceive when we are scrolling, we just need to schedule the viewport locking and block structure rendering after the scrolling ends. No relevant modules will be scheduled during the scrolling process.
As for this issue, I have an implementation idea, but it has not been specifically carried out. Since our scrolling is mainly to solve the above two problems, we can completely simulate this scrolling animation. In other words, for a fixed scrolling delta
value, we can simulate the animation effect through calculation, similar to the transition ease
animation effect, and manage all the scrolling progress through Promise.all
, then implement subsequent scheduling effects through a queue. When we need to obtain the current state, we can let the scrolling module decide whether to take the scheduling value or scrollTop
. After the scrolling is completed, the next task is scheduled. Actually, I think this approach can be considered as a future optimization direction. Even if we don't schedule animation effects, achieving the goal flashing effect by positioning to the relevant position is also a good idea.
After we complete the compatibility with various functions, we must evaluate the performance of our virtual scrolling solution. In fact, we need to conduct preliminary performance testing during the early research phase to determine the ROI of implementing this functionality and the allocation of resources.
Therefore, to evaluate performance, we need to clearly define our performance indicators. Our common performance testing indicators usually include:
FP - First Paint
: The time point of the first rendering, which can be considered as the white screen time in the performance statistics. Until the FP
time point, the user sees a completely white screen with no content, and no useful work is perceived by the user.FCP - First Contentful Paint
: The time point of the first rendering with content, considered as the time period of no content in the performance statistics. Until the FCP
time point, users see a screen with pixels rendered but no actual content, and no useful information is obtained by the user.LCP - Largest Contentful Paint
: A Core Web Vitals metric used to measure when the largest content element in the viewport becomes visible, which can be used to determine when the main content of the page is fully rendered on the screen.FMP - First Meaningful Paint
: The time of the first rendering of meaningful content, considered completed after the entire page layout and textual content are rendered.TTI - Time to Interactive
: An unofficial web performance progress metric defined as the time point when the previous LongTask
is completed, followed by 5 seconds of inactivity in both network and main thread.Since we want to test the editor engine, or in simpler terms, the performance indicators are not for the main application but rather for testing the performance of the SDK, our indicators may not be as generic. Moreover, since we prefer to conduct testing in an actual online scenario rather than solely based on the development version of the SDK, we have chosen LCP
and TTI
as our testing standards here. As we do not involve network status, static resources and caching can be enabled, and to prevent the impact of sudden spikes, we can conduct multiple tests and take the average value.
LCP
standard usually emit
s the completion of the initial rendering in our editor engine. This time point can be considered as the componentDidMount
timing of the component. So in this case, our LCP
takes this time point, which is also mentioned earlier as the isEditorLoaded
in the Layout
module. In addition, we can also start calculating from the time when the editor is instantiated, which can more accurately exclude the time consumption of the main application. This solution only needs to define the event trigger in the editor and subtract the timestamps in the HTML.TTI
standard, since the TTI
is a non-standard web performance progress metric, we do not need to strictly define this behavior according to the standard. In fact, we only need to find a proxy indicator. As mentioned earlier, we are conducting tests in real-world scenarios online, so all the functionalities exist in the system. Therefore, we can define this indicator based on the user's interaction behavior. The chosen solution in this test is to consider it as fully interactive when the user clicks the publish button and an actual modal for publishing is displayed. This solution can be achieved by using a Tampermonkey script to continuously check the status of the button and automatically simulate the user's publishing interaction behavior.In the preliminary performance testing conducted during the early stages of research, the introduction of virtual scrolling has brought significant improvements in performance. Especially for many API documents, a large number of table block structures can quickly degrade performance. The table contains nested block structures and also requires maintaining a large number of states. Therefore, implementing a virtual list is very valuable. So, remember the user feedback we mentioned earlier? We need to perform performance testing on this feedback document using the performance metrics mentioned above. Based on the performance data obtained earlier, we can make comparisons.
2505ms -> 446ms
, an improvement of 82.20%
.LCP
Metric: 6896ms -> 3376ms
, an improvement of 51.04%
.TTI
Metric: 13343ms -> 3878ms
, an improvement of 70.94%
.However, testing on the document provided by user feedback alone is not enough. We need to design other testing plans to test the document, especially fixed test documents or fixed test plans, which can provide more data references for future performance plans. Therefore, we can design a testing plan. Since our document is composed of block structures, it is obvious that we can generate performance testing benchmarks based on three types of blocks: plain text blocks, basic blocks, and code blocks.
First, let's start with a plain text block scenario. Here, we generate a plain text document consisting of 10,000 characters. In fact, our documents usually do not have a particularly large number of characters. For example, this document is about 37,000 characters, which is already considered a super large document. The majority of documents are less than 10,000 characters. When generating the text, I also noticed an interesting thing. Even randomly generated characters still have a classical Chinese feel to them when selecting Yueyang Tower as the base text. In this case, for plain text blocks, we adopt the strategy of full rendering without scheduling virtual scrolling because plain text is a simple block structure. However, the additional module causes an increase in the overall rendering time.
219ms -> 254ms
, an improvement of -13.78%
.FCP
Metric: 2276ms -> 2546ms
, an improvement of -10.60%
.TTI
Metric: 3270ms -> 3250ms
, an improvement of 0.61%
.Next up is the benchmark test for basic block structures, where basic block structures refer to simple blocks, such as highlighted blocks and code blocks. Due to the versatility of code blocks and the likelihood of their frequent occurrence in documents, we have chosen code blocks as the benchmark for testing. We will randomly generate 100
basic block structures here, with each block containing randomly generated text marked with random bold and italic styles.
488ms -> 163ms
, optimized by 66.60%
.FCP
metric: 3388ms -> 2307ms
, optimized by 30.05%
.TTI
metric: 4562ms -> 3560ms
, optimized by 21.96%
.Finally, we have the benchmark test for table block structures. Due to the high maintenance state and the potential existence of a large number of individual table cell structures, especially in the case of large tables in many documents, table structures have the maximum performance overhead on the editor engine. The benchmark here involves generating 100
table structures, each containing 4
cells, with each cell randomly filled with text and marked with random bold and italic styles.
2739ms -> 355ms
, optimized by 87.04%
.FCP
metric: 5124ms -> 2555ms
, optimized by 50.14%
.TTI
metric: 20779ms -> 4354ms
, optimized by 79.05%
.