In the previous article on exploring the Operational Transformation (OT) algorithm for rich text, we discussed why collaboration is necessary, why atomic operations alone cannot achieve collaboration, the need for operation transformation, how to perform operation transformation, when to apply operations, and how the server handles collaborative scheduling. These are the fundamental knowledge needed to achieve collaboration. In fact, there are many mature collaboration implementations available today, such as ot.js
, ShareDB
, ot-json
, EasySync
, and more. This article provides an example of using ShareDB
as an OT collaboration framework to illustrate collaboration.
Incorporating a collaboration framework is not a simple task, especially when it comes to implementing the OT collaboration algorithm. The full name of 'OT' in English is 'Operational Transformation', which means that the basis of implementing 'OT' is the atomic description and operation of content. In the realm of rich text, the most classic 'Operation' includes the delta
model used by quill
, which describes and manipulates the entire document through three operations: retain
, insert
, and delete
. There's also the JSON
model used by slate
, which achieves the description and manipulation of the entire document through operations such as insert_text
, split_node
, and remove_text
. After establishing this foundation of collaboration implementation, it is necessary to specifically implement the transformation of all the 'Ops', which is a rather cumbersome but essential task. Taking the open-source editors quill
and slate
as examples, the specific implementation of transformations for their data structures, such as delta
in quill
, is already provided and can be directly called using the official quill-delta
package. However, for slate
, the official only provides an atomic operation API
and lacks the specific implementation of transformation. Instead, the community-maintained slate-ot
package has implemented the transformation of its JSON
data, which can also be called directly.
In the realm of rich text, there are several collaborative implementations available for reference, especially in open-source rich text engines. The implementation solutions are relatively mature. However, expanding into other fields may not have specific implementations, which would then require referring to the documentation for self-implementation. For example, if we have a self-developed collaboration feature for a mind map that uses custom data structures, and there is no readily available implementation solution, then we would need to implement the operation transformation ourselves. For a mind map, it is relatively easy to implement atomic operations, so our main focus would be on the implementation of transformation. If our mind map maintains data using a JSON
data structure, then we can refer to the implementation of json0
or slate-ot
, especially by reading the unit tests, which can facilitate an easier understanding of the specific functionality. By referencing their implementations, we can implement an 'OT' transformation ourselves, or maintain an intermediate data structure layer to perform data transformation based on this intermediate layer. Alternatively, if our mind map maintains a linear text-like structure, we can refer to the implementations of rich-text
and quill-delta
. However, implementing atomic operations in this case may be more challenging. Nevertheless, we can still maintain an intermediate data structure to achieve 'OT'. After referencing multiple sources, incorporating OT collaboration primarily revolves around understanding and implementation, providing a general direction for implementation, rather than being clueless about where to start. Furthermore, the guiding principle is that the most suitable approach is the best; one should consider the cost of implementation. There is no need to rigidly enforce the implementation of data structures. For instance, representing a mind map using a linear text structure, as mentioned earlier, may be quite a stretch. Nonetheless, it is not impossible. For example, Google Docs
's Table
is a completely linear structure, capable of nesting tables within tables where each cell is like a document that can embed any rich text structure. Such complex functionality is achieved through a linear structure.
The concepts of json0
and rich-text
mentioned earlier might be difficult to grasp at first, so the following Counter
and Quill
examples demonstrate how to use ShareDB
to implement collaboration, as well as what kind of work json0
and rich-text
actually accomplish. However, for specific API calls, refer to the ShareDB
documentation. This article only covers the most basic collaborative operations, and all the code is available at https://github.com/WindrunnerMax/Collab
. Note that this is a 'pnpm' workspace monorepo project, so be sure to install dependencies using 'pnpm'.
First, let's run a basic collaborative example called 'Counter', where the main functionality is to maintain a total counter that multiple clients can increment by '1'. The address of this example is https://github.com/WindrunnerMax/Collab/tree/master/packages/ot-counter
. First, let's take a quick look at the directory structure (tree --dirsfirst -I node_modules
):
First, let's briefly explain the role of each folder and file. The public
folder stores static resource files, and its contents will be moved to the build
folder when the client is packaged. The server
folder stores the implementation of the OT
server, which will be compiled into js
files and placed in the build
folder at runtime. The src
folder contains the client code, mainly for the views and the OT
client implementation. The babel.config.js
contains the configuration information for babel
, while rollup.config.js
is the configuration file for packaging the client, and rollup.server.js
is the configuration file for packaging the server. As for package.json
and tsconfig.json
, I believe everyone is familiar with them, so I won't go into detail.
First, let's understand json0
. At first glance, it's not easy to know what json0
is all about. In reality, it's the default type carried by sharedb
. sharedb
provides many mechanisms for handling operations, such as the server's scheduling of atomic operations for Op
as mentioned earlier. However, it doesn't provide the actual implementation of transformation operations. Due to the complexity of business, it inevitably leads to the complexity of the data structure to operate on. Therefore, the transformation and processing operations are actually delegated for businesses to implement on their own, called OT Types
in sharedb
.
OT Types
actually defines a series of interfaces, and to register a type in sharedb
, these interfaces must be implemented. These implementations are the OT
operation transformations that we need to implement, such as the transform
function transform(op1, op2, side) -> op1'
. It must satisfy apply(apply(snapshot, op1), transform(op2, op1, 'left')) == apply(apply(snapshot, op2), transform(op1, op2, 'right'))
to ensure the final consistency of the transformation. For example, the compose
function compose(op1, op2) -> op
must satisfy apply(apply(snapshot, op1), op2) == apply(snapshot, compose(op1, op2))
. Specific documentation and requirements can be found at https://github.com/ottypes/docs
.
This implementation may seem cumbersome at first glance, with even some formulas, which may seem to have some requirements for mathematics. Although the essence of implementing operation transformation is just transforming indices to ensure convergence, it still takes some time to write it by yourself. Fortunately, there are already many implementations available for reference in the open-source community. sharedb
also comes with a default type json0
, which uses the JSON OT
type to edit any JSON
document. In fact, not only JSON
documents, but our counter also uses json0
for implementation, as the counter here only needs to be implemented with the help of a field in JSON
. Going back to json0
, it supports the following operations:
text0 OT
type as a subtype.json0
is also a reversible type, which means that all operations have an inverse operation that can undo the original operation. However, it is not perfect, as it cannot implement object movement, setting to NULL
, and efficiently inserting multiple items in a list. In addition, you can also take a look at the implementation of json1
, which is a superset of json0
and solves some of the problems that json0
has. To see if certain operations are supported, you can simply check if they are defined in the documentation. For example, for the counter implementation in this example, you need the {p:[path], na:x}
Op
to add x
to the number at [path]
. Specific documentation can be found at https://github.com/ottypes/json0
.
Next, let's take a look at the server-side implementation. The main implementation is to instantiate ShareDB
and to get a document instance through the collection
and id
. After the document is ready, a callback is triggered to start the HTTP server. If the document does not exist at this point, it needs to be initialized, and the data initialized here is the data obtained when the client subscribes. We won't go into the specific API of the instance here, you can refer to https://share.github.io/sharedb/api/
for details. The main point here is to describe its functionality. Of course, this is just a very simple implementation here. In a real production environment, it will definitely need to integrate features such as routing and databases.
On the client side, we mainly establish a shared connection, obtaining the document instance through the 'collection' and 'id' to access the document created on the server. Subsequently, we subscribe to the document snapshot and listen for 'Op' events to manipulate data. In this case, we do not directly manipulate the data; instead, all operations are performed through the 'client'. This approach eliminates the need to consider atomic operations. For instances like the 'Quill' example mentioned below, it is necessary to listen for document changes. Within a complete atomic operation implementation, this approach is more suitable.
Next, we will run an instance of the rich text editor 'Quill'. The main functionality involved in this instance is the collaborative editing within the 'quill' rich text editor, with support for synchronized cursor editing. The GitHub repository for this instance can be found at https://github.com/WindrunnerMax/Collab/tree/master/packages/ot-quill
. Let's take a quick look at the directory structure (tree --dirsfirst -I node_modules
):
I'll briefly explain the functions of each folder and file. The public
folder stores static resource files, which will be moved to the build
folder when the client is packaged. The server
folder stores the implementation of the OT
server, which will also be compiled into js
files and placed in the build
folder at runtime. The src
folder contains the client's code, mainly for the view and the implementation of the OT
client. The rollup.config.js
is the configuration file for packaging the client, and rollup.server.js
is the configuration file for packaging the server. I assume everyone knows about package.json
and tsconfig.json
, so I won't go into detail about them.
The data structure of quill
is not JSON
but Delta
. Delta
describes and operates the entire document through three operations: retain
, insert
, and delete
. Therefore, we cannot use json0
to describe the data structure. We need to use the new OT
type rich-text
. The specific implementation of rich-text
is achieved in the official quill-delta
. For more details, please refer to https://www.npmjs.com/package/rich-text
and https://www.npmjs.com/package/quill-delta
.
On the client side, there are mainly three parts: instantiating the quill
instance, instantiating the ShareDB
client instance, and implementing communication between the quill
and ShareDB
clients. In the quill
implementation, the main tasks are instantiating the quill
, registering the cursor plugin, generating random id
, and getting random colors by id
. In the ShareDB
client operation, the main task is to register the rich-text OT
type and instantiate the ws
connection between the client and server. In the implementation of communication between the quill
and ShareDB
clients, the main tasks are event listening for quill
and doc
, mainly related to the implementation of Op
and Cursor
.