In the previous article, we discussed many capabilities of rich text engines and collaboration. In this article, we lean towards the specific implementation of application components. In some scenarios, such as when writing documentation for a component library, we hope to have the ability for real-time preview. This means that users can directly write code in the document and see the real-time preview on the page. This allows users to have a more intuitive understanding of how to use the components, which is a functionality found in many component library documents. Therefore, in this article, we will focus on the real-time preview of React
components and discuss the implementation of related capabilities. The relevant code mentioned in this article is available at https://github.com/WindrunnerMax/ReactLive
and the implementation effect in the rich text document can be referred to at https://windrunnermax.github.io/DocEditor/
.
First, let's briefly discuss the relevant scenarios. In fact, the API documentation for many component libraries is directly generated from Markdown
, such as Arco-Design
, which is actually generated from md
files for component application examples and API tables. When we use it, we find that we cannot directly edit code on the official website for real-time preview. This is because this method directly uses a loader
to compile the md
file into jsx
syntax based on certain rules. This effectively means that the code is directly generated from md
, and then it goes through the complete code packaging process. Since there are statically deployed API documents, there must also be dynamically rendered component API documents, such as MUI
. It also uses a loader
to process md
placeholder files and loads the corresponding jsx
components into specified positions. The rendering method not only involves static compilation, but also has the ability for dynamic rendering. The code examples on the official website can be edited in real-time and the preview effect can be seen immediately.
This kind of small-scale Playground
capability application is quite common. It is smaller in scale and does not require capabilities similar to code-sandbox
for complete demonstrations. For technical colleagues, using Markdown
to create documents is not a difficult task, but Markdown
is not a widely accepted capability and still requires a certain learning cost. Rich text capabilities are relatively easier to accept. Where there are scenarios, there are requirements. We also hope to implement dynamic rendering of components in rich text, and this ability is suitable to be developed as a third-party plugin that can be loaded on demand. In addition, in the implementation of rich text, there may be some very complex scenarios, such as the folding table capability commonly used in third-party interfaces. This is not a common scenario, and the cost of implementing it in rich text would be particularly high, especially in terms of implementing interactions. The return on investment (ROI
) would be relatively low. In reality, most companies have their own API interface platforms. Therefore, using OpenAPI
to directly generate complex components like folding tables from the interface platform is a relatively acceptable approach. Both of the above scenarios actually require the ability to dynamically render components. The understanding of the Playground
capability is relatively straightforward, while the reason for dynamic rendering of components in the API interface platform is that our data structure is probably not uniform. For example, some text needs to be bold. The lowest-cost solution is to directly assemble it into the <strong />
tag and then render it within an existing component library's folding table.
Here we briefly discuss the possible approaches for implementing preview capabilities in rich text. The structure of the preview block is actually very simple, it is nothing more than a part of the code block, while the other part is the real-time preview during editing. In rich text, the implementation of code blocks generally involves many examples, for example, when using slate
, the decorate
capability can be used, or a common approach can be adopted in quill
by using prismjs
or lowlight
to parse the entire code block. Then, the parsed parts can be sequentially placed as the content of text
with the parsed attributes in the data structure. When rendering, the corresponding styles can be rendered based on the attributes. It may even be possible to directly embed a code editor, but this makes it more difficult to perform document-level search and replace, and requires attention to event bubbling. For the preview area, the main task is to mark the rendered content as Embed/Void
to avoid the selection change affecting the editor's Model
.
Next, let's move on to the main topic of how to dynamically render React
components to achieve real-time preview. First, let's explore the implementation direction. In fact, we can simply consider that implementing a dynamically rendering component is essentially transitioning from a string to executable code. So, if in Js
we can directly execute code, there are two methods: eval
and new Function
. We certainly cannot use eval
because the code executed by eval
runs in the current scope, which means it can access and modify variables in the current scope. Although some restrictions are put in place when using strict mode, it is still not entirely secure. This may lead to security risks and unexpected side effects. Therefore, we definitely need to use new Function
to implement dynamic code execution.
So now that we have a clear direction, we can continue to study how to render React
code. After all, the browser cannot directly execute React
code. The relevant code in the article is available at https://github.com/WindrunnerMax/ReactLive
and the implementation effect can also be previewed online on Git Pages
.
Earlier we also mentioned that browsers cannot directly execute React
code. One of the issues is that the browser doesn't understand what this component is. For example, if we import a <Button />
component from a component library, the browser doesn't understand the syntax of <Button />
. Of course, we'll discuss the issue of the Button
component's dependencies later. So, when we write React
components, JSX
is actually compiled to React.createElement
. Starting from 17
, you can use the jsx
method from react/jsx-runtime
, but here we are still using React.createElement
. So, what we need to do now is to compile the React
string, converting from JSX
to a function call format, similar to the following:
Babel
is a widely used JS
compiler, usually used to transform the latest version of JS
code into older version codes that browsers can understand. We can use Babel
to compile JSX
syntax. babel-standalone
incorporates the core functionality and common plugins of Babel
, which can be directly referenced in the browser, allowing us to use Babel
to transform JS
code directly in the browser.
In this case, we are actually using babel 6.x
. The babel-standalone
in 6.x
version is just 791KB
in size, while @babel/standalone
in 7.x
version is already 2.77MB
. However, the 7.x
version supports direct types definition for TS
with @types/babel__standalone
, and we can use @types/babel-core
as an alternative to use babel-standalone
. Using Babel
is very straightforward. We just need to pass the code and configure the relevant presets
to obtain the code we want. However, what we obtain is still a code string. Also, we discovered that we cannot use the <> </>
syntax, after all, it's a package from 6 years ago. This is handled normally in @babel/standalone
.
Since we dynamically render components based on user input, security is a consideration, and using Babel
has the advantage of allowing us to easily register plugins. We can handle some processing during code parsing, such as only allowing users to define a component function named App
. If any other function is declared, a parsing failure exception will be thrown, and we can also choose to remove the current node. Of course, this is still not enough. We will need to continue discussing security-related issues in the future.
Also, we can do a simple benchmark here. Using the following code, we generated 1000 'Button' components, each containing a 'div' structure nested within it, to test the speed of compilation using babel
. The results show that the actual speed is quite good and sufficient for small-scale playground
scenarios.
SWC
is the abbreviation for Speedy Web Compiler
, which is a fast TypeScript/JavaScript
compiler written in Rust
, and is also a library that supports both Rust
and JavaScript
. Created to address the slow compilation speed in web development, SWC
performs exceptionally well in terms of compilation speed compared to traditional compilers. It can utilize multiple CPU cores to process code in parallel, significantly improving compilation speed, especially for large projects or projects with a large number of files. The rspack
we used previously is based on SWC
.
For us, the main purpose of using SWC
is its ability to compile quickly. We can directly use swc-wasm
, which is the WebAssembly
version of SWC
and can be used directly in the browser. Because SWC
must be asynchronously loaded, we need to define the entire process as an asynchronous function to wait for the loading to complete before using synchronous code transformation. In addition, similar to Babel
, we can write plugins to handle intermediate products in the parsing process, but this needs to be implemented in Rust
and involves a certain learning curve. For now, we focus on the code transformation capabilities.
Here we are still using 1000
Button
components nested with div
structures to perform a simple benchmark
. The results show that the actual compilation speed is very fast, with the main time being spent on the initial wasm
loading. Efficiency will be greatly improved if the page is refreshed without disabling caching and directly utilizing the 304
results, thus maintaining a relatively high level of speed after the initial load.
Sucrase
is an alternative to Babel
that enables super fast development builds. It focuses on compiling non-standard language extensions, such as JSX
, TypeScript
, and Flow
. Due to its narrower support range, Sucrase
can adopt a performance-oriented but less scalable and maintainable architecture. Its parser is a fork of Babel
's parser, reduced to a subset of the solutions used by Babel
.
Similarly, we use Sucrase
to improve compilation speed. It can be loaded directly in the browser and has a relatively small package size, making it ideal for small Playground
scenarios. However, due to the extensive use of advanced techniques for transformation and lacking a lengthy processing flow similar to Babel
, Sucrase
is unable to handle intermediate code products with plugins. Therefore, when there is a need to process code, we must use regular expressions to manually match and handle related code.
Here we are still using 1000
Button
components nested with div
structures to perform a simple benchmark
. The results show that the actual compilation speed is very fast, significantly faster than Babel
overall, but slightly inferior to SWC
. However, considering that SWC
requires a longer initialization time, using Sucrase
is still a good choice overall.
In the previous section, we solved the first issue of browsers not being able to directly execute React
code, which is that the browser does not recognize code like <Button />
as a React
component. We need to compile it into Js
code that the browser can understand. Therefore, in this section, we need to address two issues. The first is how to let the browser know how to find the Button
object, which is the dependency problem. After compiling the <Button />
component into React.createElement(Button, null)
, the browser is not informed of what the Button
object is or where to find it. The second issue is how to construct appropriate code after handling the compiled code and dependency problem, and how to place it within new Function
to obtain a true React
component instance.
Here we need to briefly review new Function
and the with
syntax because we will use them later. Using the Function
constructor allows us to dynamically create function objects, similar to how eval
dynamically executes code. However, unlike eval
, which has access to the local scope, functions created with the Function
constructor only execute within the global scope. Its syntax is new Function(arg0, arg1, /* ... */ argN, functionBody)
.
The with
statement sets the scope of the code to a specific object. Its syntax is with (expression) statement
, where expression
is an object, and statement
is a statement or block. with
specifies the scope of the code to a specific object, and its internal variables reference the properties of that object. If an accessed key
is not a property of the object, the scope continues to search until reaching window
. If the property is still not found on window
, a ReferenceError
is thrown. We can use with
to specify the scope of the code, but it increases the length of the scope chain, and its usage is not allowed in strict mode.
Next, let's address the dependency problem of the component, using the <Button />
component as an example. After compilation, we need React
and Button
as dependencies. However, as mentioned earlier, new Function
is in the global scope and does not access the current scope's values. Therefore, we need to find a way to pass the relevant dependencies into our code for it to execute properly. One approach could be to directly assign the relevant variables to the window
, but this method is not elegant and is overly intrusive. Instead, we can consider the parameters of the new Function
statement: all parameters except the last one are simply arguments, and the last one is the function body. Therefore, we can first construct an object, place all the dependencies in it, and then declare all the object's key
as parameters and pass their value
as parameter values during function construction and execution.
Using parameters is a good approach, but it may become less controllable as we use a large number of variables. In such cases, if we want to implement additional functionality, such as restricting user access to the window
, using with
may be a better choice. Let's first use with
to achieve the basic capability of accessing dependencies.
This kind of implementation seems to be more elegant. By using a sandbox
variable to hold all dependencies, accessing dependencies becomes more controllable. In fact, we may not want the user's code to have such high permission to access all global objects. For example, we may want to restrict users from accessing the window
. Of course, we can directly put window: {}
in the sandbox
variable because when searching upwards in the scope, it stops when window
is found. However, an obvious problem is that we cannot enumerate and put all global objects in the parameter. At this point, we need to use with
because when using with
, we first access this variable, so if we can proxy when accessing this variable, returning null
for those not in the whitelist is enough. At this point, we also need to bring in the Proxy
object. We can use with
together with Proxy
to restrict user access, and we will expand on this in the security section later.
In the aforementioned code, we solved the dependency issue and briefly addressed security concerns. However, so far we have only been dealing with strings and have not yet transformed them into actual React
components. Here, we focus on generating React
component objects from strings. Similarly, we still use new Function
to execute the code. However, we need to concatenate the code string into the form we want to bring out the generated object. For example, the <Button />
component, after being compiled by the compiler, we will get React.createElement(Button, null)
. Therefore, when constructing the function, if we only use new Function("sandbox", "React.createElement(Button, null)")
, even if we execute it, we will not get the component instance because this function has no return value. So, we need to concatenate it into return React.createElement(Button, null)
, so that we can get our first method, concatenate render
to get the returned component instance. Additionally, users may often write several components at the same level, usually requiring us to nest a layer of div
or React.Fragment
at the outermost level.
Although it seems able to meet our needs, it is important to note that we must enable the production
and other configurations of the compiler. Additionally, we must avoid extra user input such as import
statements. Otherwise, for example, the Babel
compilation result in such cases where we use the concatenated return
form will obviously cause syntax errors. So, can we change our approach and directly compile the return
part of the code, such as return <Button />
, in the compiler? In fact, this is possible in Sucrase
because it does not pay particular attention to syntax, but compiles as much as possible. However, in Babel
, it will throw a 'return' outside of function
exception, and in SWC
, it will throw a Return statement is not allowed here
exception. Even though our ultimate goal is to place it in the new Function
to construct the function, using return
is reasonable, however, the compiler is not aware of this, so we still need to pay attention to this limitation.
Since this approach has many limitations and requires attention and adaptation in many places, we need to change our approach. When compiling the code, it should fully comply with the syntax rules and not require attention to user input. We only need to extract the compiled components. We can achieve this by using the passed dependency. First, generate a random id
, then configure an empty object, and assign the compiled component to this object. Finally, in the rendering function, return it using the object and id
.
Here we still use the <Button />
component as an example. Let's compare it directly with the result compiled with Babel
. Even if we haven't turned on production
mode, the result of the compilation still complies with the syntax rules. Due to the reference passing, we can extract the compiled component instance using ___BRIDGE___
and the randomly generated id
.
Additionally, we can relatively comprehensively open up the capabilities of the component by using conventions to fix a function name such as App
. When splicing the code, we can use ___BRIDGE___["id-xxx"] = React.createElement(App);
. Afterwards, users can have more freedom to implement related interactions with the component, such as using useEffect
and other Hooks
. This conventional approach is more flexible and commonly seen in applications, such as conventional routing. Below is the result of compiling and splicing App
as the function name, which can be placed in new Function
and used the reference of the dependency to obtain the final generated component instance.
In the previous section, we discussed how to solve the problems of code compilation, component dependencies, and building code, and finally obtain the instance of the component. In this section, we mainly discuss how to render the component on the page, which is actually quite simple. We can choose several methods to achieve the final rendering.
In React
, we usually render components directly using ReactDOM.render
. Similarly, we can use this method to render the component, as we have already obtained the component instance. We simply find a suitable div
to mount and render the component on the DOM
.
Of course, we can also try a different approach. We can delegate the rendering capability to the user, meaning that we can specify that the user can execute ReactDOM.render
in the code. We can encapsulate this method once to ensure that the user can only render components to our fixed DOM
structure. Alternatively, we can directly pass ReactDOM
to the user code to execute the rendering logic, although this is not advisable due to lack of control. However, if we can fully trust the user input, this rendering method is acceptable.
In fact, rendering React
components in a Markdown
editor is a common practice, for example, dynamic rendering during editing and static rendering when consuming components. When consuming, dynamic rendering of components is the scenario we mentioned at the beginning, and Markdown
frameworks usually support SSR
. Therefore, we also need to support SSR
for static rendering of components. In fact, we can dynamically compile code to obtain React
components, then use ReactDOMServer.renderToString
(which returns data-reactid
to signal to React
that the content has been server-rendered and should not be re-rendered on the client) or ReactDOMServer.renderToStaticMarkup
to generate HTML tags, known as "dehydration". These can then be placed in HTML and returned to the client. On the client side, we can use ReactDOM.hydrate
to inject events into the components, known as "rehydration", thus achieving SSR
server-side rendering. Below is a DEMO
implemented using express
, which essentially represents the most basic principle of SSR
.
Since we've chosen to render components dynamically, security inevitably needs to be considered. For example, in the simplest form of attack, a user could write a function in the code to obtain the current user's Cookie
, and then construct an XHR
object or use fetch
to send the Cookie
to their server. If the website doesn't have HttpOnly
enabled and this code gets stored, every other user who visits the page in the future will unknowingly send their Cookie
to the malicious server, enabling the attacker to obtain other users' Cookie
information. This poses a serious threat known as persistent XSS attack. Additionally, as mentioned earlier, if malicious code executes on the server side in an SSR rendering mode, it would be an even more dangerous operation. Therefore, it's crucial to consider the security implications of user actions.
In reality, as long as user input is being accepted and executed as code, we cannot completely guarantee that this behavior is secure. What we need to be mindful of is to never trust user input. The safest approach is to not allow user input at all. However, for the current scenario, that's not feasible. Therefore, it's essential to ensure that user input is within manageable limits. For example, only allowing internal company input for document creation, and ensuring that externally received content is strictly for consumption without being stored and displayed to other users. This can significantly mitigate the risk of malicious attacks. Nevertheless, even with these measures in place, we still strive to securely execute user-inputted code. The most common approach is to restrict user access to global objects like window
.
In the previous section, we also mentioned that new Function
is in the global scope and does not read the variables in the defining scope. However, since we are constructing a function, we can completely pass all the variables from the window
to this function and assign null
to the variable names. This way, when searching for values in the scope, we will directly obtain the values we passed in without continuing to search upwards. This approach can be used whether by using parameters or constructing with
. It also allows us to limit user access through a whitelist. Of course, the properties of this object can be as many as thousands, which may not appear so elegant.
The Proxy
object can create a proxy for another object, which can intercept and redefine the basic operations of the object, such as property lookup, assignment, enumeration, function calls, and so on. Thus, in combination with the previous use of with
, we can delegate all object access and assignments to sandbox
to achieve more precise control over object access. Below is a simple sandbox implementation using Proxy
, which allows us to limit user access through a whitelist. If the accessed object is not in the whitelist, it returns null
; if it is in the whitelist, it returns the object itself.
In this implementation, the with
statement determines whether the accessed field is in the object using the in
operator, thus deciding whether to continue looking up through the scope chain. Therefore, we need to ensure that has
always returns true
to prevent code from accessing the global object through the scope chain. Furthermore, functions like alert
and setTimeout
must run in the window
scope. These functions share the characteristic of being non-constructible and lacking a prototype
property, which we can use for filtering and binding window
when fetching.
If you've used Google Chrome plugins like TamperMonkey
, ViolentMonkey
, or ScriptCat
, you may have noticed the presence of two objects, window
and unsafeWindow
. The window
object is a secure and isolated environment, while unsafeWindow
is the window object in the user's page. For a long time, I used to believe that the window
object accessible in these plugins was actually provided by the Content Scripts of the browser extensions, and that unsafeWindow
was the user page's window
. This led me to spend a significant amount of time exploring how to directly access the user page's window
object from the Content Scripts of the browser extensions. However, my quest ended in failure. One interesting concept in this pursuit was the implementation of an escape from the browser extension. Since Content Scripts and Inject Scripts share the DOM, it was once possible to escape through the DOM. However, this approach has long been outdated.
In addition, FireFox
also provides a wrappedJSObject
to help us access the page's window
object from Content Scripts. However, this feature may also be removed in future versions due to security concerns. So, why do we know that they are actually the same browser environment? Aside from inspecting the source code, we can also verify the script's effect in the browser through the following code. It shows that modifications to window
are actually synchronized with unsafeWindow
, proving that they are indeed the same reference.
// TamperMonkey: https://github.com/Tampermonkey/tampermonkey/blob/07f668cd1cabb2939220045839dec4d95d2db0c8/src/content.js#L476 - Not updated for a long time // ViolentMonkey: https://github.com/violentmonkey/violentmonkey/blob/ecbd94b4e986b18eef34f977445d65cf51fd2e01/src/injected/web/gm-global-wrapper.js#L141 // ScriptCat: https://github.com/scriptscat/scriptcat/blob/0c4374196ebe8b29ae1a9c61353f6ff48d0d8843/src/runtime/content/utils.ts#L175 // wrappedJSObject: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Sharing_objects_with_page_scripts
Upon careful observation, in the last two lines of the verification code, we actually circumvented the sandbox limitations of these extensions, thus enabling direct access to unsafeWindow
without needing @grant unsafeWindow
. This leads us to consider whether limiting the user's code access to objects like window
is sufficient to guarantee security. Clearly, it's not enough. We need to handle various cases to minimize the possibility of users bypassing the sandbox, such as controlling user access to this
.
When it comes to 'with', we can briefly talk about the knowledge of Symbol.unscopables
. We can focus on the following example. In the second part, we added a property to the object's prototype chain, and this property happens to have the same name as our with
variable. Additionally, the value in this property is accessed within the with
, leading to unexpected behavior. This issue was even exposed in the well-known framework Ext.js v4.2.1
. In order to address this issue, TC39
introduced the Symbol.unscopables
rule. After ES6
, this rule is applied to each array method.
In the previous discussion, we have been using methods to restrict user access to global variables or isolate the current environment in order to implement a sandbox. However, we can also adopt a different approach. Placing the user's code within an iframe
for execution allows us to isolate the user's code in an independent environment, effectively achieving the sandbox effect. This approach is quite common. For instance, CodeSandbox
uses this method for implementation. We can directly use the contentWindow
of the iframe
to access the window
object and then execute the user's code using this object. This enables us to achieve isolation of the user's access to the environment. Furthermore, we can also use the sandbox
attribute of the iframe
to restrict user behavior, such as limiting allow-forms
form submission, allow-popups
pop-ups, and allow-top-navigation
navigation modification, thus creating a more secure sandbox.
Similarly, we can also add a layer of proxies to ensure that all object accesses within the iframe
are done using the global object of the iframe
. If the object is not found, it will continue to access the originally passed value. Additionally, when compiling functions, we can use this completely isolated window
environment for execution, thereby achieving a completely isolated code execution environment.