In the previous planning, we needed to implement an editor architecture based on the MVC
pattern, splitting the application into three core components: controller, model, and view. When a command is executed through the controller, the current data model is modified, which is then reflected in the rendering of the view. Simply put, it involves constructing a data model that describes the document structure and content and using a custom execCommand
to modify the data model. This implementation of an L1
level rich text editor, by abstracting the data model, addresses issues in rich text such as dirty data and challenges in implementing complex functionalities.
Related articles on building a rich text editor project from scratch:
In the design of the entire system architecture, the most crucial core concept is state synchronization. If we take the state model as the foundation, then the state synchronization we need to maintain can be summarized into the following two aspects:
In fact, these two state synchronizations are a forward dependent process: user operations lead to synchronization in the state model, and changes in the state model are then synchronized to the view layer, which in turn serves as the basis for user operations. For example, when a user selects a portion of text by dragging, the selected range needs to be synchronized to the state model. Subsequently, when a deletion operation is executed, the corresponding text in the data needs to be removed, followed by refreshing the view to display the new DOM
structure. This cycle continues to ensure state synchronization, input execution, and view refresh in subsequent interactions.
Therefore, our primary goal is state synchronization. Although it may seem simple with just these two principles, it is not as straightforward in practice. When we perform state synchronization, we heavily rely on browser-related APIs
such as selection, input, keyboard events, etc. At this point, we must address browser-related issues, like the inability of ContentEditable
to fully prevent IME
input or the compatibility issues of EditContext
which require further enhancements, all of which are challenges to be tackled.
As we incorporate more browser APIs
for implementation, the considerations for browser compatibility also increase. This is why many rich text editor implementations explore alternatives to ContentEditable
, such as DingTalk's custom selection regions or Google Docs' Canvas
document drawing. While these alternatives reduce reliance on certain browser APIs
, they are still inherently tied to browser implementations. Even a document drawn with Canvas
still relies on browser APIs
for input handling, position calculations, and more.
Returning to our streamlined editor model, previous articles have mentioned the ContentEditable
attribute and execCommand
command. While using document.execCommand
to modify HTML
via commands may seem simple, its controllability is notably poor. The behavior of execCommand
commands varies across browsers, presenting the browser compatibility behavior mentioned earlier. Unfortunately, these behaviors are beyond our control and are default behaviors of the browsers.
Therefore, to achieve greater extensibility, control, and resolve the mismatch between data and view, the L1
rich text editor adopts the concept of a custom data model. By abstracting a data structure from the DOM
tree, where identical data structures ensure consistent rendering of HTML
, and directly controlling the data model with custom commands, the editor guarantees the consistency of the rendered HTML document. Regarding selection representation, continual normalization of the selection Model
based on DOM
selections is crucial.
This leads us to the topic of the MVC
model architecture we are discussing today. We organize the editor project by managing related packages through a monorepo
structure, naturally forming a layered architecture. However, before diving into that, we can implement the most basic editor in an HTML
file initially simple-mvc.html. Our focus will primarily be on implementing the basic ability to bold text, emphasizing control over the entire process. As for input capabilities, which pose a more complex challenge, we will address them in a separate section.
Firstly, we need to define the data model, comprising two parts: nodes describing document content and operations targeting data structures. Starting with describing the content of the document, we use a flat data structure to represent content and deliberately avoid multi-level DOM
nesting to simplify the DOM
structure description.
In the above data, type
represents the node type, while text
represents the text content. However, merely describing the data structure in the data model is not enough. We also need to include additional states to describe positional information, such as start
and len
in the data above. This part of the data is crucial for us to compute selection transformations.
Therefore, this data model section should not just be referred to as data, but should rather be termed as state. Next, we will focus on operations related to the data structure, specifically operations like insertion, deletion, and modification on the data model. Here, we have simply defined the operation of extracting data, while the complete compose
operation can be referred to in delta.ts.
The operation of extracting data serves as the foundation for executing the compose
operation. When we have the original text and change descriptions, we need to transform each into iterator objects to extract data separately, constructing a new data model in the process. The iterator part here first defines two methods, peekLength
and hasNext
, to determine if there is any remaining part of data that can be obtained and whether iteration can continue.
The handling of the next
method is more complex. Our main aim here is to acquire a portion of the text
. Note that each call to next
does not cross nodes, meaning that at most, next
will take the length of the insert
stored in the current index
. If the content taken exceeds the length of a single op
, theoretically its corresponding attributes are inconsistent, so they cannot be directly merged.
When calling the next
method, if the length
parameter does not exist, it defaults to Infinity
. Then, we obtain the current node at index
, calculate the remaining length of the current node, and if the length
to be taken is greater than the remaining length, we take the remaining length; otherwise, we take the desired length
. Subsequently, we extract the text
content based on the offset
and length
.
Thus, we have briefly defined the state that describes the data model and the iterator that can be used to extract data structure. This section forms the foundation for describing the content and modifications of the data structure. Although we have simplified many details here, making it appear relatively simple, the actual implementation involves many complexities, such as implementing immutable
to reduce redundant rendering and ensure performance.
The view layer is primarily responsible for rendering the data model. In this part, we can utilize React
for rendering, but in this simple example, we can directly create DOM
elements. Hence, here we iterate through the data model, create corresponding DOM
nodes based on the node type, and then insert them into the div
with contenteditable
.
Here we have additionally added the data-leaf
attribute to mark leaf nodes for the purpose of updating selections that need to fall on a specific DOM
node. The MODEL_TO_DOM
and DOM_TO_MODEL
mappings are used to maintain the relationship between the Model
and the DOM
since we need to retrieve corresponding values based on both DOM
and MODEL
.
This way, we have defined a very simple view layer, where performance issues are not a concern in the provided example. However, in a real-world scenario like in React
, completing the view layer involves dealing with various challenges due to the behavior of uncontrolled ContentEditable
, such as maintaining key values, checking for dirty DOM
elements, minimizing redundant renders, batch scheduling refreshes, and correcting selections.
The controller is the most complex part of our architecture, involving extensive logic. Our editor controller model needs to be implemented on top of the data structure and the view layer. Hence, we describe it last, coincidentally making it the last part of the MVC
model sequence under Controller
. The primary function at the controller layer is synchronization, which means keeping the data model and the view layer in sync.
For instance, our view layer is rendered based on the data model. If we input content at a certain node, we need to synchronize this content with the data model. Failing to sync the data model correctly can lead to issues with calculating the selection's length. This, in turn, can cause problems with synchronization of selection indices, where controlled and uncontrolled scenarios need to be differentiated.
First and foremost, we focus on synchronizing selections. Selections are fundamental to editor operations, where the selected state serves as the reference point for operations. The essence of synchronization lies in using browser APIs to sync with the data model. The browser's selection change can be tracked through the selectionchange
event, allowing us to capture the latest selection information.
By using window.getSelection()
, we can obtain the current selection's details, and then with getRangeAt
, we can retrieve the Range
object of the selection. Using this Range
object, we can determine the start and end positions of the selection. With these start and end positions, we can obtain the corresponding positions through the previously established mapping.
Obtaining the corresponding DOM
node from the selection node is not always straightforward, as the browser's selection position rules are uncertain for our model. Therefore, we need to find the target leaf node based on the selection node. For instance, in the common text selection scenario, the selection is on a text node, while in a triple-click selection, it's on the entire line DOM
node.
Hence, using closest
here handles only the simplest text node selection. More complex situations require normalization. DOM_TO_MODEL
serves as a state mapping, and once the latest selection positions are obtained, the actual selection position in the DOM
needs to be updated for correcting the browser's selection state.
On the contrary, the updateDOMselection
method works oppositely. While the event handling updates the Model
selection based on the DOM
selection, updateDOMselection
updates the DOM
selection based on the Model
selection. At this point, having only start/len
, finding the corresponding DOM
is not straightforward, and thus, DOM node lookup is necessary.
This involves considerable DOM
searching, so in practical scenarios, it's important to limit the scope of selections as much as possible. In our actual design, we base the search on lines to look for span
type nodes. Subsequently, we need to iterate through the whole leaves
array, fetch the corresponding state using DOM_TO_MODEL
, and then acquire the nodes and offsets required to construct the range.
Once we find the target DOM
node, we can create a modelRange
and set it as the browser's selection. However, it is important to check if the current selection is the same as the original one here. Imagine setting the selection again, it would trigger a SelectionChange
event, causing an infinite loop, which definitely needs to be avoided.
Dealing with selections is not less complicated than dealing with input methods. Here, we simply synchronize the browser selection with our model selection and the key remains the synchronization of states. Next, we can synchronize the data model, in here it means implementing the actual command execution instead of using document.execCommand
directly.
Now is the time to put our previously defined data iterator to use. The operation target also needs to be achieved using a range
, for example, a text segment like 123123
with a selection of start: 3, len: 2
, and a strong
type, data within this interval will turn into 123[12 strong]3
, which involves truncating long data.
Firstly, we construct a retain
array based on the selection we want to operate. Although this part should ideally use ops
for operations, it would require more implementation of compose
. Therefore, we simply use an array and an index here.
We then need to define an iterator and merge it with the retain
array. Here, we move the pointer with the 0
index and extract data within the index, change the type
with the 1
index, and fix the 2
index as Infinity
to cover all remaining data. The crucial length
here is calculated as the minimum of the two values to perform data truncation effectively.
In the end, remember that what we are maintaining is not just data representation, but also the description of the entire data state. Therefore, we still need to refresh all the data at the end to ensure the correctness of the final data model. At this point, we need to call render
to re-render the view layer, and then refresh the browser's selection.
By doing this, we have defined a relatively complex controller layer. The controller layer here mainly synchronizes the data model and the state of the view layer, and implements the most basic command operations, of course, without handling many complex edge cases. In actual editor implementation, this part of the logic can be very complex because we need to deal with many issues such as input methods, selection models, clipboards, and so on.
With our basic editor MVC
model implemented, it naturally can be abstracted into an independent package
. Luckily, we manage the project in the form of a monorepo
. Thus, we can abstract it into four core packages: core
, delta
, react
, and utils
, corresponding to the core logic of the editor, data model, view layer, and utility functions respectively. The specific editor module implementations are then defined in the form of plugins in the plugin
package.
The Core
module encapsulates the core logic of the editor, including modules such as clipboard, history operations, input handling, selection management, and state management. All these modules are referenced through an instantiated editor
object. Besides the layered logic implementation, we also hope to achieve modularity in the modules by allowing extension capabilities. Modules can be referenced, extended, and then reloaded onto the editor.
In fact, there are dependencies within the Core
module itself. For example, the selection module depends on event module event dispatching. This is mainly because modules need to depend on instances of other modules during construction to initialize their own data and events. Therefore, the order of instantiation of modules becomes important, but when we actually talk about it, we simply follow the order as defined above, rather than following the direct dependencies directed acyclic graph order.
The clipboard
module mainly handles data serialization and deserialization, as well as clipboard operations. Typically, the DOM structure of rich text editors may not be very standard. For instance, in slate
, we see attributes like data-slate-node
, data-slate-leaf
, which can be understood as template structures.
Therefore, when serializing, it is necessary to serialize this complex structure into relatively standard HTML
. Especially since many styles are not achieved using semantic tags, but through the style
attribute, standardizing them becomes crucial.
Deserialization involves converting HTML
into the editor's data model, which is essential for cross-editor content pasting. The internal data structures of editors are usually not consistent, so for cross-editors, a relatively standard intermediate structure is required. This is also an unwritten rule in editors, where serialization in editor A
should be as standard as possible for editor B
to deserialize effectively.
The collect
module fetches related data based on selection data. For instance, when a user selects a portion of text and performs a copy operation, this module needs to extract the selected data content before serialization can occur. Additionally, the collect
module can also retrieve an op
node at a certain position, handle inherited marks
, etc.
The editor
module is a module aggregator for the editor, primarily responsible for managing the entire editor's lifecycle, such as instantiation, mounting the DOM
, destruction, and other states. This module needs to combine all modules and also focus on organizing the modules' dependency relationships in a directed graph. Most of the essential editor APIs
should be exposed by this module.
The event
module is an event distribution module where native event bindings are implemented. All events within the editor should be dispatched from this module. This approach allows for greater customization space, such as extending plugin-level event execution. It can also reduce the probability of memory leaks because as long as we ensure that the editor's destruction method is called, all events can be correctly unmounted.
The history
module maintains the history of operations in the editor. Implementing undo
and redo
in the editor is quite complex; we need to perform atomic operations rather than storing a snapshot of the editor's full data. It involves maintaining two stacks to handle data transfer. Additionally, we need to implement extensions on top of this, such as automatic grouping, operation merging, collaborative processing, etc.
Automatic grouping refers to merging consecutive high-frequency user operations into one operation. Operation merging means that we can implement merging through APIs
, for instance, when a user uploads an image, performs other input operations, and then after a successful upload, the final operation should be combined with the image upload operation. Collaborative processing follows a principle that we can only undo our own operations and not the operations collaborated by others.
The input
module handles input in the editor, where input is one of the core operations in the editor. We need to handle input methods, keyboard, mouse, and other input operations. Interacting with input methods requires a lot of compatibility handling, including candidate words, predictive words, shortcuts, accents, etc. Mobile input method compatibility is even more troublesome, with compatibility issues for mobile input methods outlined separately in draft
.
For instance, a common example is that ContentEditable
cannot truly prevent input from an IME
, which means we cannot fully prevent Chinese input. In the example below, inputting English and numbers will not trigger a response, but Chinese can be input normally. This is one reason why many editors choose to customize the selection and input, such as VSCode
, DingTalk documents, etc.
The model
module is used to map the relationship between the DOM
view and the state model. This part serves as a bridge between the view layer and the data model. Often, we need to retrieve the state model through the DOM
, and vice versa, we might need to obtain the corresponding DOM
view through the state model. This section uses a WeakMap
to maintain mappings to achieve state synchronization.
The perform
module encapsulates the basic module for executing changes to the data model. Since constructing basic delta
operations can be complex, for instance, executing changes to the marks
attribute requires filtering out the \n
operation, while conversely, operations on line attributes need to filter out normal text operations. It's necessary to encapsulate these operations to simplify the cost of executing changes.
The plugin
module implements the editor's plugin mechanism. Pluginization is essential, and theoretically, all formats beyond plain text should be implemented by plugins. The pluginization here mainly provides basic plugin definitions and types, manages the lifecycle of plugins, and includes capabilities such as grouping by method call, method dispatch priority, etc.
The rect
module handles the position information of the editor. Often, we need to calculate the position based on DOM
nodes and provide the node's relative position in the editor. Particularly in many additional capabilities, such as virtual scroll viewport locking, virtual layers for comparison views, height positioning for commenting capabilities, etc. Additionally, the position information of the selection is also important, such as the popup location of a floating toolbar.
The ref
module implements the reference transfer of the editor's position, which actually utilizes collaborative transform
to handle index information, similar to Slate
's PathRef
. For example, when a user uploads an image, other content insertion operations may take place at the same time, causing the image's index value to change. Using the ref
module allows us to obtain the latest index value.
The schema
module is used to define the data application rules for the editor. Here, we need to define methods for data attributes to be processed. For example, bold marks need to inherit the bold attribute after input, while inline code inline types do not need to be inherited. Elements like images, horizontal lines should be defined as occupying a whole line as a Void
type, while Mention
, Emoji
, etc., need to be defined as Embed
types.
The selection
module is used to handle selections in the editor, which are the core operational benchmarks. We need to manage selection synchronization, selection correction, and so on. In reality, selection synchronization is a very complex matter. Mapping from the browser's DOM
to the selection model itself requires careful design, and selection correction involves handling a multitude of boundary cases.
As mentioned earlier, considering the following DOM
structure, if we were to express a collapsed selection to the left of the character 4
, there could be various ways to achieve this position depending on the browser's default behavior. Therefore, we need to ensure the mapping of this selection ourselves, as well as the correction logic in unconventional states.
The state
module maintains the core state of the editor. After passing basic data during editor instantiation, the content we subsequently maintain becomes the state, rather than the originally passed data content. Our state modification methods will also be implemented here, especially in maintaining the state using Immutable/Key
, ensuring the immutability of state to reduce redundant rendering.
The Delta
module encapsulates the editor's data model. We need to describe the editor's content and content changes based on the data model. Additionally, it encapsulates many operations on the data model such as compose
, transform
, invert
, diff
, Iterator
, and others.
The Delta
implementation is based on a transformation of Quill
's data model. Quill
's data model design is excellent, especially encapsulating methods for operational transformation based on OT
. However, there are still some inconvenient aspects in the design. Therefore, referencing EtherPad
's data implementation, we have modified some parts of the implementation based on that, and we will elaborate on the data structure design in the future.
Furthermore, it is important to note that our Delta
implementation is primarily used to describe documents and changes, serving as a form of serialization and deserialization. As mentioned, after initializing the editor, the data we maintain becomes an inherent state rather than the initially provided data content. Consequently, many methods at the controller level will have separate designs, such as maintaining the immutable
state.
The attributes
module maintains operations related to describing text attributes. We have simplified attribute implementation here, where AttributeMap
defines a type of <string, string>
. The specific module further defines methods such as compose
, invert
, transform
, diff
, to carry out operations like merging, reversing, transforming, and computing differences on attributes.
The delta
module implements the entire editor's data model, where delta
uses ops
to represent a linear structure of the data model. The structures of the ops
mainly include three types of operations: insert
for inserting text, delete
for deleting text, and retain
for retaining text/moving pointers, along with methods like compose
, transform
, and others.
The mutate
module implements an immutable version of the delta
module and introduces \n
as an independent operation. The initial design of the controller was based on data changes but has been transformed into maintaining the original state. Consequently, this implementation has been moved to the delta
module, making it directly correspond to editor state maintenance, suitable for unit testing, among other purposes.
The utils
module encapsulates helper methods for op
and delta
, with methods like clone
for deep copying op
, delta
, and equivalent operations. Due to our new design, there is no need to introduce lodash
's related methods. Additionally, it includes methods for data checking and formatting, such as determining start/end of strings, splitting \n
, and others.
The React
module serves as the adapter for the view layer, providing basic node types like Text
, Void
, Embed
, and rendering modes that encapsulate a dispatch pattern in line with the Core
mode. It also offers wrapped HOC
nodes, Hooks
, and methods to extend the view layer through plugins.
The hooks
module implements a method to retrieve the editor instance, making it convenient to access the editor instance in React
components, which, of course, depends on our Provider
component. Additionally, the read-only status method is also implemented here. The maintenance of the read-only status was originally maintained within the plugin itself, but later it was extracted into the React
component, making it easier to switch between editing and read-only states.
The model
module implements the built-in data model of the editor, which essentially corresponds to the State
of the Core
layer, i.e., the data model of Block/Line/Leaf
. Apart from adhering to the patterns required by DOM
nodes, other functionalities like dirty DOM
detection methods are also implemented. Moreover, there is a special EOL
node here, which is a special LeafModel
that schedules the rendering of end-of-line nodes based on a specific strategy.
The plugin
module implements the plugin mechanism of the editor. Here, pluginization mainly extends the basic plugin definitions and types. For example, the return value type of plugin methods defined in the Core
as any
needs to be specifically defined here as ReactNode
. Furthermore, it implements plugins for rendering, which are types that do not maintain state at the core level, mainly for pluginizing node types of Wrap
.
The preset
module presets the API components exposed by the editor, such as Context
, Void
, Embed
, Editable
components, etc., mainly providing basic components for building the editor view and extending components at the plugin layer. It also encapsulates many interactive implementations, such as automatic focus, selection synchronization, view refresh adapters, etc.
The Utils
module implements many common utility functions, mainly to handle common logic within the editor, such as debounce, throttle, etc. It also includes helper methods for handling DOM
structures, event dispatching methods, event binding decorators, list operations, internationalization, clipboard operations, etc.
Here we have implemented a simple editor MVC
architecture example, and from there, naturally abstracted the core modules of the editor, data model, view layer, utility functions, etc., followed by a brief description of each. In the future, we will describe the design of the editor's data model, introduce our Delta
data structure method, and discuss relevant applications in the editor. Data structure design is crucial because the core operations of the editor are all based on the data model. Failing to understand the design of the data structure may lead to difficulties in comprehending many operational models of the editor.