A First Look at Serialization and Deserialization of Rich Text#
In a rich text editor, serialization and deserialization are crucial steps involving content copying, pasting, importing, exporting, and more. When users copy content within the editor, the rich text is converted into standard HTML format and stored in the clipboard. During a paste operation, the editor must then parse and convert this HTML content into the editor's proprietary JSON structure for unified content management across editors.
When using online document editors, you might wonder how formatting can be directly copied instead of just plain text, even allowing for copying content from a browser to Office Word while retaining formatting. It might seem like magic, but once we understand the basics of clipboard operations, the underlying implementation becomes clear.
In terms of clipboard operations, while copying, we might think we are copying plain text only, but clearly, copying plain text alone cannot achieve the functionalities mentioned. The clipboard can indeed store complex content. Taking Word as an example, when we copy text from Word, several key values are written into the clipboard:
text/plain
text/html
text/rtf
image/png
The text/plain looks familiar, resembling the commonly seen Content-Type or MIME-Type. Hence, one might consider the clipboard as a type of Record<string, string>. However, let's not overlook the image/png type. Since files can be directly copied to the clipboard, the commonly used form for clipboard types is Record<string, string | File>. For example, when copying this text, the clipboard would contain the following content:
text/plain
For example, when copying this text, the clipboard would contain the following content
text/html
<meta charset="utf-8"><strong style="...">For example, when copying this text</strong><em style="...">the clipboard would contain the following content</em>
When performing a paste operation, simply reading the content from the clipboard is all that's required. For instance, when copying content from Yuque to Feishu, Yuque writes text/plain and text/html to the clipboard, which can then be checked for the presence of the text/htmlkey. If found, it can be read and parsed into Feishu's proprietary format, allowing the content to be pasted with the correct formatting. If text/html is not present, the content from text/plain can be directly written into Feishu's private JSON data.
Additionally, a consideration to be made is that in the aforementioned example, during copying, the conversion from JSON to HTML strings is required, and during pasting, the conversion from HTML strings to JSON is needed. These operations involve serialization and deserialization, incurring performance costs and potential content loss. Perhaps these costs can be minimized. Typically, for pasting within the application, the clipboard data can be directly mapped to the current JSON data, avoiding the need for HTML parsing and maintaining content integrity. For instance, in Feishu, there are separate clipboard keys for docx/text and data-lark-record-data as distinct JSON data sources.
Having understood how the clipboard works, let's discuss serialization. When it comes to copying, many may think of clipboard.js, suitable for higher compatibility (e.g., IE), but for modern browsers, utilizing the HTML5 standard API directly is more advisable. In browsers, two commonly used APIs for copying are document.execCommand("copy") and navigator.clipboard.write/writeText.
document.execCommand("selectAll");const res = document.execCommand("copy");console.log(res);// true
For deserialization or pasting behavior, we have document.execCommand("paste") and navigator.clipboard.read/readText at our disposal. However, it's essential to note that calls to the execCommand API consistently fail, while clipboard.read requires user authorization. This issue has been previously researched regarding trusted events in browser extensions; even with the clipboardRead permission declaration in the manifest, direct clipboard reading is not possible and must be executed in a Content Script or even chrome.debugger.
document.addEventListener("paste",(e)=>{const data = e.clipboardData; console.log(data);});const res = document.execCommand("paste");console.log(res);// false
navigator.clipboard.read().then(res=>{for(const item of res){ item.getType("text/html").then(console.log).catch(()=>null)}});
Well, the current focus here is not on this topic. What we are concerned with is the serialization and deserialization of content, specifically in the design of the copy-paste module of a rich text editor. Of course, this module will have broader uses beyond that, such as delivering Word documents and generating Markdown formats. Therefore, the design of this module should address the following key issues:
Pluginization. The modules in the editor are inherently designed to be pluginized, and as such, the design of the clipboard module for serialization/deserialization formats should also allow for flexible extension. Particularly when adapting to specific private formats of editors like Feishu, Yuque, etc., it should be possible to freely control related behaviors.
Universality. Since rich text editing requires mapping between the DOM and selection MODEL, the generated DOM structure is typically complex. When copying content from a document to the clipboard, we aim for a more standardized structure to enable better parsing when pasting to other platforms such as Feishu, Word, etc.
Integrity. It is essential that during serialization and deserialization, the content's integrity is maintained, meaning no content loss occurs due to these processes. This might involve compromising on performance to ensure content integrity. However, for the editor's own format, performance is the main concern. Since the registered modules are consistent, it should be possible to directly apply the data without traversing the entire parsing process.
Thus, this article will use slate as an example to handle the design of the clipboard module for nested structures and quill for flat structures. Moreover, using the content of Feishu documents as a case study, it will cover the serialization and deserialization design based on different types such as inline structures, paragraph structures, composite structures, embedded structures, and block-level structures.
The basic data structure of slate is a tree-structured JSON type, and relevant implementations can be found at https://github.com/WindRunnerMax/DocEditor. Let's take headers and bold formatting as an example to describe their basic content structure:
In fact, the data structure in slate is very similar to a nested DOM structure, to the extent that the DOM structure and data structure correspond entirely one-to-one. For instance, even when rendering zero-width character renderings in the Embed structure, they exist in the data structure. Therefore, ideally, this JSON structure should be convertible directly to the corresponding DOM structure during serialization and deserialization.
However, complete correspondence is an ideal scenario. The actual organization of content in rich text editors may vary. For example, when implementing a blockquote structure, the outer wrapping blockquote tag may either be present in the data structure itself or dynamically rendered based on line attributes during rendering. In such cases, directly serializing it into complete HTML from the data structure's perspective may not be feasible.
// Structure Rendered[{blockquote:true,children:[{children:[{text:"Quote Block Line 1"}]},{children:[{text:"Quote Block Line 2"}]},]}];// Dynamically Rendered[{children:[{text:"Quote Block Line 1"}],blockquote:true},{children:[{text:"Quote Block Line 2"}],blockquote:true},];
Additionally, our implemented editor will necessarily be pluginized, and in the clipboard module, we cannot accurately determine how plugins organize data structures. In the world of rich text editors, there are unwritten rules; the content we write into the clipboard needs to have a standardized structure as much as possible to facilitate pasting content across editors. Therefore, if we aim to ensure standardized data, the clipboard module should provide basic serialization and deserialization interfaces, while the actual implementation is left to the plugins themselves.
Based on this fundamental concept, let's first look at the serialization implementation - the conversion process from JSON structure to HTML. As mentioned earlier, for the editor's format, the focus is on performance. As the registered modules are uniform, it should be possible to directly apply the data without the need for the entire parsing process. Hence, we also need to write an additional application/x-doc-editor key in the clipboard to directly store Fragment data.```
Next, let's think about how to write the content to the clipboard and the scenarios where it will be triggered. Besides using Ctrl+C to copy content, users might also want to trigger the copy action through a button. For example, in Feishu, users can copy entire lines/blocks via the toolbar. Therefore, we cannot directly write data using clipboardData in the OnCopy event; we need to actively trigger an additional Copy event.
As mentioned earlier, navigator.clipboard.write can also write to the clipboard. Calling this API does not require actually triggering the Copy event. However, when using this method to write data, exceptions may be thrown. Additionally, this API must be used in an HTTPS environment; otherwise, the function will not be defined at all.
In the example below, the document must have focus, and there needs to be a click on the page within a certain delay. Otherwise, a DOMException will be thrown. Even when the focus is on the page, executing the code will still throw a DOMException, indicating that the application/x-doc-editor type is not supported.
(async()=>{awaitnewPromise((resolve)=>setTimeout(resolve,3000));const params ={"text/plain":"Editor","text/html":"<span>Editor</span>","application/x-doc-editor":'[{"children":[{"text":"Editor"}]}]',}const dataItems ={};for(const[key, value]of Object.entries(params)){const blob =newBlob([value],{type: key }); dataItems[key]= blob;}// DOMException: Type application/x-doc-editor not supported on write. navigator.clipboard.write([newClipboardItem(dataItems)]);})();
Since this API does not support writing custom types, we need to actively trigger a Copy event to write to the clipboard. Although we can embed this data as an HTML attribute value in text/html, we choose to handle it separately here. Hence, with the same data, we use document.execCommand to write to the clipboard by creating a new textarea element.
It is evident that due to textarea.select(), the focus of the original editor will be lost. Therefore, it is crucial to note that when performing the copy operation, the current selection value needs to be recorded. After writing to the clipboard, the focus should be set back to the editor, and the selection restored.
Next, let's delve into the definition of pluginization. Here, the Context is quite straightforward, simply requiring the recording of the current processing Node and the already processed html node. Within the plugin, we need to implement the serialize method to serialize the Node into HTML, while willSetToClipboard is a Hook definition that gets invoked when about to write to the clipboard.
// packages/core/src/clipboard/utils/types.ts/** Fragment => HTML */export type CopyContext ={/** Node base */node: BaseNode;/** HTML target */html: Node;};// packages/core/src/plugin/modules/declare.tsabstract classBasePlugin{/** Serialize Fragment to HTML */public serialize?(context: CopyContext):void;/** Content about to be written to clipboard */public willSetToClipboard?(context: CopyContext):void;}
Since our specific transformations are implemented within plugins, our main task is to schedule the execution of plugins. To facilitate data handling, we are not using the Immutable form here. Our Context object remains consistent throughout the scheduling process. This means that all methods within plugins handle processing in-place. Therefore, scheduling is directly done through the plugin component, fetching the html node from the context after calling.
The crucial aspect lies in our designed serialize scheduling method. Our core concept here is: when processing text lines, we create an empty Fragment node as a line node. Then, iterate through each text value of the current line, extract each Text value to create a text node, creating a context object in this manner, and then dispatch plugins with PLUGIN_TYPE.INLINE level to insert the serialized HTML node into the line node.
Subsequently, for each line node, we similarly need to dispatch plugins at PLUGIN_TYPE.BLOCK level, placing the processed content into the root node and returning the content. This completes the basic serialization operation for text lines. By adding additional identifiers on the DOM nodes, it helps for us to idempotently handle deserialization later on.
// packages/core/src/clipboard/modules/copy.tsAfter the basic line structure processing is completed, attention also needs to be paid to the outer `Node` node. The data processing here is similar to that of line nodes. However, it is important to note that this is a recursive structure processing. The execution sequence of the `JSON` structure here follows a depth-first traversal, which means processing text nodes and line nodes first, then handling external block structures, processing from the inside out to ensure the processing of the entire `DOM` tree structure.```js
// packages/core/src/clipboard/modules/copy.tsif(this.reflex.isBlock(current)){const blockFragment = document.createDocumentFragment(); current.children.forEach(child=>this.serialize(child, blockFragment));constcontext: CopyContext ={node: current,html: blockFragment };this.plugin.call(CALLER_TYPE.SERIALIZE, context,PLUGIN_TYPE.BLOCK); root.appendChild(context.html);return root asT;}
On the other hand, the deserialization process is relatively simple. The Paste event cannot be triggered at will, it must be triggered by a user's trusted event. Therefore, we can only read the values in clipboardData through this event. The data of interest here, in addition to the previously copied key, is the files field that needs to be processed. For deserialization, we also need to implement it specifically in the plugin, which also requires modifying the Context in place.
// packages/core/src/clipboard/utils/types.ts/** HTML => Fragment */export type PasteContext ={/** Target Node */nodes: BaseNode[];/** Base HTML */html: Node;/** Base FILE */ files?: File[];};/** Clipboard => Context */export type PasteNodesContext ={/** Base Node */nodes: BaseNode[];};// packages/core/src/plugin/modules/declare.tsabstract classBasePlugin{/** Deserialize HTML into Fragment */public deserialize?(context: PasteContext):void;/** Pasted content is about to be applied to the editor */public willApplyPasteNodes?(context: PasteNodesContext):void;}
The dispatching here is similar to serialization. If there is an application/x-doc-editor key in the clipboard, the value is read directly. If there are files to be processed, all plugins are scheduled to handle them. Otherwise, the value of text/html needs to be read. If it does not exist, then the content of text/plain is directly read, and the constructed JSON is applied to the editor.
The key point here is the processing of text/html, which involves deserializing HTML nodes into Fragment nodes. The processing method here is similar to serialization, requiring recursive data handling. Firstly, DOMParser object is used to parse the HTML, then nodes are processed in a depth-first traversal, from inner to outer, similar to serialization, requiring plugin scheduling for implementation.
// packages/core/src/clipboard/modules/paste.tsconst parser =newDOMParser();const html = parser.parseFromString(textHTML,TEXT_HTML);// ...constroot: BaseNode[]=[];// NOTE: Termination condition. `Text`, `Image`, and other nodes will be processed hereif(current.childNodes.length ===0){if(isDOMText(current)){const text = current.textContent ||""; root.push({ text });}else{constcontext: PasteContext ={nodes: root,html: current };this.plugin.call(CALLER_TYPE.DESERIALIZE, context);return context.nodes;}return root;}const children = Array.from(current.childNodes);for(const child of children){const nodes =this.deserialize(child); nodes.length && root.push(...nodes);}constcontext: PasteContext ={nodes: root,html: current };this.plugin.call(CALLER_TYPE.DESERIALIZE, context);return context.nodes;
Next, we will use slate as an example to handle the design of the nested clipboard module. Taking the content of Feishu documents as the source and target, we will process serialization and deserialization plugins based on inline structures, paragraph structures, composite structures, embedded structures, and block structures under the above basic pattern scheduling, categorized by type.
Inline structures refer to bold, italics, underline, strikethrough, inline code blocks, and other inline structure styles. Let's take bold as an example to handle serialization and deserialization. For the serialization of inline structures, we simply need to wrap a strong node around it if it is a text node. Note that we need to handle this in place.
// packages/plugin/src/bold/index.tsxexportclassBoldPluginextendsLeafPlugin{publicserialize(context: CopyContext){const{ node, html }= context;if(node[BOLD_KEY]){const strong = document.createElement("strong");// NOTE: Using the 'Wrap Base Node' plus in-place replacement approach strong.appendChild(html); context.html = strong;}}}
For deserialization, we also need preprocessing as a prerequisite. We need to handle pure text content first, which is a common handling method, i.e., when all nodes are text nodes, we need to add a first-level line node. Also, we need to format the data. Ideally, we should filter all nodes with a Normalize step, but here we will simply handle empty node data.
publicwillApplyPasteNodes(context: PasteNodesContext):void{const nodes = context.nodes;const queue: BaseNode[]=[...nodes];while(queue.length){const node = queue.shift();if(!node)continue; node.children && queue.push(...node.children);// FIX: Handle the scenario of nodes without text, for example <div><div></div></div>if(node.children &&!node.children.length){ node.children.push({ text:""});}}}}
When processing the content, it involves identifying the presence of bold formatting in HTML nodes, and applying bold formatting to all text nodes in the currently processed Node tree. In this case, in-place data processing is also required. A method applyMark has been encapsulated here to handle text node formatting. Interestingly, because our goal is to construct the entire JSON, we don't need to focus on using the slateTransform module to operate on the Model.
Paragraph structure refers to styles such as headings, line heights, and text alignment. Here we take headings as an example to handle serialization and deserialization. For serializing paragraph structure, when the Node is a heading node, we simply construct relevant HTML nodes, wrap the original nodes in place, and assign them to the context, using a nested node approach.
// packages/plugin/src/heading/index.tsxexportclassHeadingPluginextendsBlockPlugin{publicserialize(context: CopyContext):void{const element = context.node as BlockElement;const heading = element[HEADING_KEY];if(!heading)returnvoid0;const id = heading.id;const type = heading.type;const node = document.createElement(type); node.id = id; node.setAttribute("data-type",HEADING_KEY); node.appendChild(context.html); context.html = node;}}
Deserialization, on the other hand, involves the opposite operation. It checks if the current HTML node being processed is a heading node, and if so, converts it into a Node node. In this case, in-place data processing is also required. Unlike inline nodes, all line nodes need to be added to the heading format using applyLineMarker.
Composite structure here refers to styled structures like block quotes, ordered lists, unordered lists, etc. Let's take block quotes as an example to handle serialization and deserialization. When serializing composite structures, we also need to wrap the related HTML nodes when the Node is a block quote node.
// packages/plugin/src/quote-block/index.tsxexportclassQuoteBlockPluginextendsBlockPlugin{publicserialize(context: CopyContext):void{const element = context.node as BlockElement;const quote = element[QUOTE_BLOCK_KEY];if(!quote)returnvoid0;const node = document.createElement("blockquote"); node.setAttribute("data-type",QUOTE_BLOCK_KEY); node.appendChild(context.html); context.html = node;}}
Deserialization involves checking if the node is a block quote node and constructing the corresponding Node. The difference from the heading module is that while headings apply formatting to relevant line nodes, block quotes nest a layer of structure within the original node.
// packages/plugin/src/quote-block/index.tsxexportclassQuoteBlockPluginextendsBlockPlugin{publicdeserialize(context: PasteContext):void{const{ nodes, html }= context;if(!isHTMLElement(html))returnvoid0;if(isMatchTag(html,"blockquote")){const current =applyLineMarker(this.editor, nodes,{[QUOTE_BLOCK_ITEM_KEY]:true,}); context.nodes =[{children: current,[QUOTE_BLOCK_KEY]:true}];}}}
Embedded structure here refers to image, video, flowchart, and other styled structures. Let's take images as an example to handle serialization and deserialization of embedded structures. When serializing embedded structures, we simply need to wrap the related HTML nodes when the Node is an image node. Unlike previous nodes, at this point, we do not need to nest DOM nodes, just replace the standalone node in place.
In terms of deserialization structure, check if the current HTML node being processed is an image node, if it is, then convert it to a Node node. The key difference from the previous conversion is that we do not need a nested structure this time, we just need to set 'children' to a zero-width character as a placeholder. In practice, a common operation here is that pasting image content usually requires transferring the original 'src' to our service, for example, the images in Feishu are temporary links, and in production, resources need to be transferred.
Block structure refers to highlighted blocks, code blocks, tables, and other structural styles. Here, we will use highlighted blocks as an example to handle serialization and deserialization. Highlighted blocks are a customized structure in Feishu, essentially a nested structure of 'Editable'. The two layers of 'callout' nested structure here are for compatibility with Feishu's structure. Serializing block structures in Slate is similar to handling reference structures, simply nesting combined structures in the outer layer.
Deserialization involves determining if the current HTML node being processed is a highlighted block node, and if so, converting it to a Node node. The handling here is similar to that of reference blocks, but with an additional layer of nesting in the outer structure.
The fundamental data structure of quill is a flat structure in JSON format, and the related DEMO implementations can be found at https://github.com/WindRunnerMax/BlockKit. Let's take headers and bold formatting as an example to describe the basic content structure:
The serialization scheme is similar to slate, where we need to provide basic serialization and deserialization interfaces in the clipboard module, while the specific implementation belongs to the plugin itself. When it comes to serialization methods, we iterate through lines in a basic line-by-line manner, first handling the text in Delta structure, then addressing the formatting of line structures. However, due to the flat data structure of delta, we cannot handle it recursively. Instead, we should loop until we reach EOL to update the current line node with a new one.
The overall deserialization process is more similar to slate since we handle data based on HTML, deeply recursively traversing to first process leaf nodes and then extra nodes based on the processed delta. The final output data structure will be flat, eliminating the need for special focus on Normalization operations.
// packages/core/src/clipboard/modules/paste.tspublicdeserialize(current: Node): Delta {const delta =newDelta();// Termination conditions for handling Text, Image, and other nodesif(!current.childNodes.length){if(isDOMText(current)){const text = current.textContent ||""; delta.insert(text);}else{constcontext: DeserializeContext ={ delta,html: current };this.editor.plugin.call(CALLER_TYPE.DESERIALIZE, context);return context.delta;}return delta;}const children = Array.from(current.childNodes);for(const child of children){const newDelta =this.deserialize(child); delta.ops.push(...newDelta.ops);}constcontext: DeserializeContext ={ delta,html: current };this.editor.plugin.call(CALLER_TYPE.DESERIALIZE, context);return context.delta;}
Additionally, for the handling of block-level nested structures, our approach may be more complex, but it is still in the design phase in the current implementation. The serialization process is similar to the following workflow. Unlike the previous structure, when dealing with block structures, the clipboard's serialization module is called directly and the content is embedded.
The deserialization process is relatively more complex because we need to maintain the reference relationships of nested structures. Although the HTML content parsed through DOMParser itself is nested, our baseline parsing method targets a flat Delta structure. However, structures like block and table need nested referenced structures, and the relationship with the id needs to be established according to a convention.
Next, we will use the delta data structure as an example to handle the design of a flat clipboard module. Similarly, based on inline structure, paragraph structure, composite structure, embedded structure, and block-level structure, under the scheduling of the above basic patterns, plugins for serialization and deserialization will be implemented according to different types.
Inline structure refers to the styling of bold, italic, underline, strikethrough, inline code blocks, etc., inline. Here, we take bold as an example to handle serialization and deserialization. The serialization of inline structure is basically consistent with slate. We will start executing this using unit tests.
Paragraph structure refers to styles such as headings, line height, and text alignment. Here, we will focus on serialization and deserialization with headings as an example. To serialize paragraph structure, when a Node is a heading node, we construct the related HTML node, wrap the original node in place, and assign it to the context, using nested nodes as well.
Deserialization, on the other hand, involves identifying if the current HTML node being processed is a heading node and then converting it to a Node node. In this case, in-place data processing is required as well. Unlike inline nodes, all line nodes need to be added in heading format using applyLineMarker.
In this context, composite structure refers to block quotes, ordered lists, unordered lists, and similar structured styles. Here, we use block quotes as an example to handle serialization and deserialization. To serialize composite structures, I also need to construct related HTML nodes for wrapping when the Node is a block quote node. In a flat structure, handling composite structures would typically occur during rendering, so the serialization process is similar to handling headings.
Deserialization involves identifying whether the node is a block quote node and constructing the corresponding Node node. Unlike the heading module, where the format is applied to the relevant line nodes, block quotes involve nesting a layer of structure on the original node. The deserialization structure handling is similar to the heading handling, as the HTML structure is nested, applying the quote format across all line nodes during application.
Embed structure refers to styles like images, videos, flowcharts, etc. Here we'll focus on the serialization and deserialization of images. When serializing embed structures, we simply need to wrap related HTML nodes when the Node is an image node. Unlike previous nodes, there's no need to nest DOM nodes here; we can simply replace the standalone node in place.
For deserializing structures, check if the current HTML node being processed is an image node; if so, convert it to a Node node. Similarly, a common operation here is that pasting image content often requires migrating the original src to our service; for example, images in Lark are temporary links, and in production, resources need to be migrated.
Block Structure refers to highlighted blocks, code blocks, tables, and other structural styles. Here we use block structure as an example to handle serialization and deserialization. Nesting structures are not yet implemented, thus only the test cases for the mentioned deltas diagram are implemented here. The primary approach is to proactively call the serialization method when reference relationships exist to write them to HTML.
Deserialization involves determining if the current HTML node being processed is a block-level node, and if so, converting it into a Node node. The approach here is to generate an id when encountering the block node while traversing nodes in a depth-first manner, place it in the deltas, and then reference that node in the ROOT structure.
it("deserialize",()=>{constdeltas: Record<string, Delta>={};const plugin =getMockedPlugin({deserialize(context){const{ html }= context;if(!isHTMLElement(html))returnvoid0;if(isMatchHTMLTag(html,"div")&& html.hasAttribute("data-block")){const id = html.getAttribute("data-block")!; deltas[id]= context.delta; context.delta =newDelta().insert(" ",{_ref: id });}},}); editor.plugin.register(plugin);const parser =newDOMParser();const transferHTMLText =`<div data-node="true"><div data-block="id"><div data-node="true">inside</div></div></div>`;const html = parser.parseFromString(transferHTMLText,TEXT_HTML);const rootDelta = editor.clipboard.pasteModule.deserialize(html.body); deltas[ROOT_BLOCK]= rootDelta;expect(deltas).toEqual({[ROOT_BLOCK]:newDelta().insert(" ",{_ref:"id"}),id:newDelta().insert("inside"),});});