Previously, we introduced the development of Chrome extensions from scratch. In reality, the overall architecture of browser-level extensions is quite complex. Although there are unified standards currently, the specific implementations in different browsers vary, and becoming a developer and publishing on the Chrome Web Store requires a registration fee of $5. If we only hope to perform some lightweight script writing on web pages, using browser extension-level capabilities will seem a bit costly. Therefore, in this article, we mainly discuss the implementation of lightweight scripts at the web level in browsers.
In the previous article on building Chrome extensions from scratch, we used TypeScript to implement the entire extension and used Rspack as the packaging tool to build the application. Although it is entirely possible to implement lightweight scripts directly using JavaScript, it will become increasingly difficult to maintain script capabilities as they expand. Therefore, here we still use TypeScript to build the script. As for the build tool, we can choose to use Rollup to bundle the script. The related implementation discussed in this article can be referred to in the personal script collection at https://github.com/WindrunnerMax/TKScript
.
Of course, browsers do not support the direct writing of web-level scripts, so we need a runtime environment to run the script. Currently, there are many open-source script managers:
Additionally, there are many script aggregation websites for sharing scripts, such as GreasyFork. As we mentioned earlier, after researching browser extension capabilities, we found that the permissions for extensions are simply too high. Similarly, the script manager is actually implemented through browser extensions. Therefore, selecting a trusted browser extension is crucial. For example, as mentioned above, "TamperMonkey" was open-source in its early versions, but the repository has not been updated after 2018. This means that the current "TamperMonkey" is actually a closed-source extension. Although there is a certain amount of review when uploading to the Chrome Web Store, ultimately being closed-source is a signal of mistrust for advanced user tools like user script managers. Therefore, it's also something to consider when choosing a manager.
In actuality, the script managers are still based on the implementation of browser extensions, encapsulating the capabilities of browser extensions, and exposing some of these capabilities as APIs to user scripts, allowing them to apply these API capabilities. In reality, there are many interesting implementations involved in this, such as the access to window
and unsafeWindow
within the script. Therefore, it's worth exploring how to implement a completely isolated window
sandbox environment. Another example is that web pages cannot access resources across domains, so how to implement a CustomEvent
communication mechanism for cross-domain resource access in the "Inject Script" can also be studied. Additionally, how to use createElementNS
to implement runtime and script injection at the HTML level, as well as the role of "//# sourceURL" after assembling script code, are also interesting topics. Therefore, for students who are interested, studying "ScriptCat" is recommended, a user script manager developed by Chinese students with comments that are easy to read due to being in Chinese. This article still mainly focuses on applications, from the most basic "UserScript" script-related capabilities, to building scripts using Rollup, and then exploring script implementation through examples to expand on the discussion in this article.
When "GreaseMonkey" initially implemented the script manager, it used "UserScript" as the metadata block description for the script, and also provided many advanced APIs starting with "GM." For example, the cross-domain GM.xmlHttpRequest
essentially implements a complete set of specifications. Subsequently developed script managers mostly follow or are compatible with these specifications for reuse in related ecosystems. This is also a troublesome matter for developers, as we cannot control the browser extensions that users install. If our script uses an API implemented separately by a particular extension, it will result in the script being unusable in other extensions. Especially after placing the script on the script platform, there is no way to build a channel package for distribution. Therefore, it is preferable to use the metadata and APIs supported by major extensions as much as possible, to avoid unnecessary trouble.
In addition, I've been curious for a long time about how user scripts are installed on GreasyFork
. I couldn't find any special event handling after clicking the install script button, or figure out how the current installation of the script manager is detected and communication is established. After a brief study, I found that as long as the user script file ends with .user.js
, it will automatically trigger the script installation feature of the script manager and can automatically record the script installation source for checking updates when the browser is opened. Subsequently, these script managers will continue to follow this specification. Now that we understand the installation principle, I will introduce my personal best practices for script distribution in the next section.
In this section, we mainly introduce the common usage of Meta
and API
. An overall overview of a script can be found at https://github.com/WindrunnerMax/TKScript/blob/gh-pages/copy-currency.user.js
.
Metadata exists in a fixed format, mainly for the convenience of script managers to parse related properties such as the name and matching sites. Each property must start with double slashes //
and must not use block comments /* */
. Simultaneously, all script metadata must be placed between // ==UserScript==
and // ==/UserScript==
to be recognized as valid metadata, and must be filled in the following format:
The commonly used properties are as follows:
@name
: The name of the script, a unique identifier for the @namespace
level of the script. You can set the language, for example, // @name:zh-CN Text Selected Copy (Generic)
.@author
: The author of the script, for example // @author Czy
.@license
: The license of the script, for example // @license MIT License
.@description
: The description of the script's functionality, which will be presented to the user in the management dialog when installing the script. It can also be set in different languages, for example // @description:zh-CN Universal version of website copy support
.@namespace
: The namespace of the script, used to distinguish the unique identifier of the script, for example // @namespace https://github.com/WindrunnerMax/TKScript
.@version
: The version number of the script. The script manager usually compares this field when starting to determine whether to download updates, for example // @version 1.1.2
.@updateURL
: The update check address. When checking for updates, this address will be visited first to compare the @version
field to determine whether an update is required. This address should only contain metadata and not the script content.@downloadURL
: The script update address (using the https
protocol). If an update is required after checking @updateURL
, this address will be requested to obtain the latest script. If this field is not specified, the script installation address will be used.@include
: It can use *
to represent any character and supports standard regular expression objects. There can be any number of @include
rules in the script, for example // @include http://www.example.org/*.bar
.@exclude
: It can use *
to represent any character and supports standard regular expression objects. It also supports any number of rules, and the matching priority of @exclude
is higher than @include
, for example // @exclude /^https?://www\.example\.com/.*$/
.@match
: A more strict matching pattern based on Chrome's Match Patterns rules, for example // @match *://*.google.com/foo*bar
.@icon
: The icon displayed in the script management interface. Almost any image can be used, but a 32x32
pixel size is the most appropriate resource size.@resource
: When installing the script, each @resource
will be downloaded once and stored on the user's hard drive with the script. These resources can be accessed separately through GM_getResourceText
and GM_getResourceURL
, for example // @resource name https://xxx/xxx.png
.@require
: Other scripts that the script depends on, usually libraries that can provide global objects. For example, to reference jQuery
, use // @require https://cdn.staticfile.org/jquery/3.7.1/jquery.min.js
.@run-at
: Used to specify the timing of the script execution. The only available parameters are document-start
(before the page loads), document-end
(before the page resources load), document-idle
(after the page and resources load), with the default value being document-end
.@noframes
: When present, this command will restrict the script's execution. The script will only run in the top-level document and not in nested frames. It does not require any parameters, and by default, this feature is turned off, allowing the script to run in iframes.@grant
: The permissions that the script needs, such as unsafeWindow
, GM.setValue
, GM.xmlHttpRequest
, etc. If @grant
is not specified, it defaults to none
, meaning no permissions are needed.The API
is an object provided by the script manager to enhance the script's functionality. Through these, we can achieve more advanced abilities for web pages, such as cross-origin requests, modifying page layouts, data storage, notification capabilities, clipboard handling, and more. Even in the Beta
version of TamperMonkey
, users have the ability to read and write HTTP Only
cookies with permission. Similarly, using the API
also follows a fixed format. Relevant permissions must be declared in the Meta
before using them, so that the script can dynamically inject relevant functions. Otherwise, the script may not run properly. It's also important to note that the naming of relevant functions may vary, so it's essential to refer to the relevant documentation when using them.
Additionally, FireFox
provides a wrappedJSObject
to help us access the window
object from Content Scripts
. However, this feature may potentially be removed in future versions due to security concerns. So, how do we know that it's actually the same browser environment now? Besides inspecting the source code, we can also verify the script's effect in the browser through the following code. It becomes evident that our modifications to window
are actually synchronized to unsafeWindow
, proving that they refer to the same object.
In the case of @grant none
, the script manager considers the current environment to be secure, and there are no longer any issues with unauthorized access. So, accessing window
at this point refers to the original window
object of the page. Additionally, upon close observation, we can see in the verification code above, the last two lines, that we have bypassed these sandbox restrictions, enabling direct access to unsafeWindow
even in the absence of @grant unsafeWindow
. However, this isn't a significant issue because the script manager itself provides access to unsafeWindow
, and this example will not work if the page's CSP
does not enable unsafe-eval
. Nevertheless, we may consider other solutions, such as simply disabling the execution of the Function
function and eval
. However, it's clear that even if we directly block access to the Function
object, it can still be accessed through constructor functions, like (function(){}).constructor
. Therefore, continuous offense and defense strategies are required for the window
sandbox environment. For instance, mini-programs prohibit the use of Function
, eval
, setTimeout
, and setInterval
for dynamic code execution. As a result, the community has begun to implement hand-written interpreters. In our scenario, we could even create a about:blank
window object as an isolation environment using an iframe
.
Moving forward, let's briefly discuss how to implement sandbox environment isolation. As seen in the previous example, directly printing window
outputs a Proxy
object. Therefore, we can also use Proxy
to achieve simple sandbox isolation. Our goal is to proxy the window
object, such that all operations occur on a new object and do not affect the original object. When retrieving values, we first try to fetch from our new object and then fallback to the window
object if necessary. When writing values, we only operate on our new object. In this process, we also utilize the with
operator to set the code's scope to a specific object, in our case, the context
we have created. The end result effectively demonstrates that our read operations on the window
object are accurate, and all write operations are confined within the sandbox environment.
So far, we have used Proxy
to achieve an isolated sandbox environment for the window
object. In summary, our goal is to create a clean window
sandbox environment. This means we want to ensure that anything executed on the website itself does not affect our window
object. For example, if the website mounts a $$
object on window
, we do not want this object to be directly accessible in the developer's script. Our sandbox environment is completely isolated. The goal of the user script manager, on the other hand, is different. For example, if a user needs to mount an event on window
, we should attach that event handling function to the original window
object. Therefore, we need to differentiate whether the property being read or written is from the original window
or a newly added property from the web page. If we want to address this issue, we need to record a copy of the keys on the original window
object before the user script is executed, essentially operating the sandbox in the form of a whitelist. This leads us to the next topic to discuss - how to execute scripts before document-start
, which is before the page loads.
In reality, document-start
is a very important implementation in user script managers. If we can ensure that the script is the first to be executed, then we can do almost anything at the language level, such as modifying the window
object, defining Hook
functions, modifying the prototype chain, preventing events, and so on. Of course, its own capabilities also originate from browser extensions, and the issue to consider is how to expose this capability of browser extensions to web pages. First, we would most likely have written an implementation for dynamically/asynchronously loading JavaScript scripts, similar to the example below:
So now there's an obvious question: if we load the script around the DOMContentLoaded
time, when the body
tag is constructed, it definitely won't achieve the goal of document-start
. Even handling it after the head
tag is finished won't work, as many websites write some JS resources within the head
which makes the timing unsuitable. Considering that the initial element loaded for the whole page is definitely the html
tag, obviously inserting the script at the html
tag level is the way to go. Coordinating with the browser extension's chrome.tabs.executeScript
for dynamic code execution, and content.js
with "run_at": "document_start"
to establish message communication to confirm the injection of the tab
─this method may seem simple, but it's exactly this seemingly simple issue that made me contemplate for a long time on how to achieve it. Furthermore, this solution is currently viable in extension V2
. In V3
, chrome.tabs.executeScript
has been removed and replaced with chrome.scripting.executeScript
. Currently, using this API can achieve framework injection, but not user script injection, as dynamic code execution is not possible.
We might wonder why script manager frameworks and user scripts are injected in the same way, yet in the browser console's Sources control panel, we can only see a userscript.html?name=xxxxxx.user.js
and cannot see the script manager's code injection. In reality, this is because the script manager injects a comment at the end of the user script, similar to //# sourceURL=chrome.runtime.getURL(xxx.user.js)
, where this sourceURL
will use the specified URL
in the comment as the script's source URL
, and identify and display the script in the Sources panel with that URL
, which is extremely useful for debugging and tracking code, especially when loading dynamically generated or inline scripts.
Remember our initial question? Even after we've completed the sandbox environment construction, the challenge is how to pass this object to the user script. We can't expose these variables to the website itself, but we still need to pass relevant variables to the script which runs on the user page. Otherwise, we wouldn't be able to access the user page's window
object. So, next we'll discuss how to safely pass our advanced methods to the user script. In fact, in the above source-map
, we can clearly see that we can directly access variables with closures and with
. Furthermore, we need to pay attention to the this
issue. Therefore, when calling this function, we can call it in the following way to pass the variables of the current scope to the script execution.
We all know that browsers have cross-origin restrictions, but why can our scripts access cross-origin interfaces through GM.xmlHttpRequest
? We've mentioned before that scripts run as Inject Script
on the user page, so they are subject to cross-origin access restrictions. So the way to solve this problem is relatively simple. Clearly, the communication initiated here is not directly from the page's window
, but from the browser extension. Therefore, we need to discuss how to communicate between the user page and the browser extension.
The DOM
and event flow in Content Script
are shared with Inject Script
, so in reality, we can implement communication in two ways. The first commonly used method is window.addEventListener + window.postMessage
. However, a clear problem with this approach is that messages can also be received on the Web
page. Even if we can generate some random token
to verify the source of the message, this method can still be easily intercepted by the page itself, so typically another method is used. Specifically, document.addEventListener + document.dispatchEvent + CustomEvent
custom event method. Here, it's important to note that the event name should be random. By generating a unique random event name in the injected framework during background
and then using this event name for communication in both Content Script
and Inject Script
, we can prevent users from capturing the messages generated during method calls.
When building Chrome extensions, we've used Rspack
in the past. This time, we'll switch to using Rollup
as the build tool. The reason being, Rspack
is more suitable for packaging complete Web
applications, while Rollup
is more suitable for packaging utility libraries. Our Web
scripts are single-file scripts, making them more suitable for packaging using Rollup
. However, if you wish to experience the packaging speed of the Rust
build tool, you can still use Rspack
. Additionally, you can even directly use SWC
for packaging. In this case, I didn't use Babel
but used ESBuild
to build the script, which has proven to be very efficient.
Furthermore, as mentioned earlier, although the APIs of script managers are compatible with GreaseMonkey
, each script manager may have its own unique APIs. This is a common occurrence as different script managers entirely implementing the same functionalities is not very meaningful. The differences among different browsers are also not the same, and the differences in browser APIs need to be determined at runtime. Therefore, if we need to support all platforms, it's necessary to implement channel packages. This concept is very common in Android
development. Writing every package manually is obviously impractical. However, using modern build tools not only makes maintenance easier but also makes it more convenient to support channel packages. By utilizing environment variables and TreeShaking
, channel package building can be easily achieved. When combined with script managers and the synchronization functionality of script websites, the ability to distribute different channel packages can be achieved.
This part is similar to the packaging of various SDKs
. Assuming we have multiple scripts that need to be packaged, and our goal is to package each project directory into a separate package, Rollup
provides the ability to simultaneously package multiple inputs and outputs. We can directly configure an array through rollup.config.js
, specify the entry file through input
, specify the output file through output
, and specify the plugins through plugins
. The packages we output generally need to use iife
self-executing functions, which are suitable as output formats for things like script
tags.
If you need to use @updateURL
to check for updates, you also need to separately package a meta
file. Packaging the meta
file is similar to the above process, you just need to provide a blank blank.js
as the input
and then inject the meta
data. One thing to note here is that the format
should be set to es
because the script we want to output cannot be wrapped with a self-executing function (function () {})();
.
We also mentioned the issue of channel packaging earlier. So, if you want to package channel bundles, there are several key points to note:
process.env.CHANNEL
.process
as a variable, during packaging it is treated as a string. We use @rollup/plugin-replace
to replace the process.env.CHANNEL
string with the environment variable set during the command execution.TreeShaking
. TreeShaking
is a static detection method, and we need to explicitly specify the Boolean
value of this expression in the code.Furthermore, we cannot use modules such as rollup-plugin-terser
to compress the packaged products, especially if they are to be distributed to platforms like GreasyFork
, as the script itself has very high privileges. Therefore, code review is essential. Similarly, due to similar reasons, packages like jQuery
cannot be directly packaged into the project and generally need to be included as external
with @require
. Platforms like GreasyFork
also employ a whitelist mechanism to review externally imported packages. In most cases, we need to use document-start
to execute code upfront, but at this time the head
tag is not complete, so special attention needs to be paid to the timing of injecting CSS
. If the script is executed at document-start
, usually you need to manually inject CSS
instead of using the default injection ability of rollup-plugin-postcss
. So, in actuality, there aren't many special considerations for this part of the Rollup
packaging. It's basically like our regular frontend engineering projects. For a complete configuration, you can refer to https://github.com/WindrunnerMax/TKScript/blob/master/rollup.config.js
.
Even though we've completed the main package construction above, it seems that we've overlooked a major issue, which is the generation of script manager script description Meta
. Luckily, there is a Rollup
plugin here that allows us to call it directly. Of course, the ability to implement plugins like this is not complicated in itself. First, you need to prepare a meta.json
file, use the json
format to describe various configurations in it, and then generate the string through traversal. Inject the string into the output file in the hook function of Rollup
. Of course, this package does a lot of other things, such as field format checks, output content beautification, and so on.
So, in this section, we'll implement an example of user scripts. Although we often Ctrl C+V
a lot of code, Ctrl C+V
is not just for coding, it's also very useful for copying homework or reports, especially when I was a student, it would have been a nightmare if I couldn't copy and paste for writing reports. Now, here comes the problem, there are always some websites that prevent us from copying and pasting happily. Therefore, here we'll implement a universal solution to bypass the browser's copy restriction. For specific code, please refer to the https://github.com/WindrunnerMax/TKScript
the part about text selection and copying - general.
Some websites may disable copy and paste using CSS
, which specifically manifests as the inability to directly select text. This is especially common on many document library websites. For instance, if you search for an internship report on Baidu, many of the search results are not copyable. Of course, we can use F12
to see this text, but when it's deeply nested and displayed separately, it's still quite cumbersome to use F12
to copy. Of course, you can directly use $0.innerText
to get the text, but it's still too cumbersome. Let's take a look at how CSS
disables text selection.
So, if you have worked with text manipulation capabilities, like rich text Void
block elements, it's easy to understand the user-select
CSS
property. The user-select
property is used to control whether the user can select text, and it doesn't have any effect on content that is part of the user interface of the browser unless it's in a text box.
So, when we inspect these websites, we can clearly see user-select: none;
, so if we want to remove this restriction, we can easily think of using CSS specificity, using specificity to forcibly override the values of all properties. This is also a more universal implementation approach that can easily adapt to the vast majority of pages that use this method to prevent copying.
Most of the time, websites not only use CSS
to prevent user copying behavior, especially for content drawn using Canvas
. This method is not within the scope of this article, so here we want to discuss how to use events to restrict user copying behavior. The main events that can affect user copying behavior are the onCopy
and onSelectStart
events. The onCopy
event is triggered when copying is detected, such as using Ctrl + C
or right-clicking to copy. Here, we can simply intercept it to prevent copying. Similarly, the onSelectStart
event, when prevented in its default behavior, can prevent user text selection, naturally preventing copying. Here, for simplicity, we directly use DOM0
events, and if you input this code into the console, you will find that copying is no longer possible.
Before researching how to handle these event behaviors, let's first look at the getEventListeners
method. The getEventListeners
method provided by the Chrome
browser is used to retrieve all event listeners, but this is a function that can only be used in the console, and is not universal, just for our debugging convenience.
So, why are we discussing this method if it's not universal? This involves a problem: for these event listeners, if we want to remove them, for DOM0
level events, we just need to set the property to null
, but for DOM2
level events, we need to use removeEventListener
to remove the event handler. So, how do we retrieve the reference to the function used during addEventListener
if we haven't saved this reference? This is where the getEventListeners
method comes into play. We can use this method to retrieve all event listeners and then use removeEventListener
to remove the event handler. Of course, here we can only use event judgment for debugging and it's not a universal solution.
So, do we really have to change our mindset? It's quite a fuss to remove event listeners. As the saying goes, the most upscale food often only requires the simplest cooking method. Since we can't remove it, we can just prevent it from executing. If we don't want it to execute, naturally, we should think of the event flow model in JS
and stop it from bubbling.
Seemingly, this method seems fine. However, if the Web
page itself listens for events on the body
, then it's quite obvious that stopping the bubbling on document
is too late and ineffective, so it's quite embarrassing. This shows that this solution is not versatile enough. Since stopping the bubbling isn't effective, we just obliterate it during the capture phase, and combining it with the script manager's document-start
to ensure that our event capture is the first to execute. This way not only solves problems with DOM0
events, but also works just as well for DOM2
events.
This solution is already a rather universal copying solution. We can resolve restrictions on most websites. However, intercepting events during the capture phase can potentially have some side effects. For example, if we prevent keyboard events during the capture phase and then try to edit a document in yuque, we would encounter issues, because the document in yuque is similar to Feishu, both process text by line. I assume it's preventing the default behavior of contenteditable
, and then the editor completely takes over the keyboard events, causing it to be unable to line break or handle shortcut menus. Similarly, if we directly stop the propagation of onCopy
, it may cause the editor to adopt the default behavior for copying, while editors usually perform some formatting on copied text. So, it requires caution in situations with editing functionalities, but it's not a big problem as an exhibition end, overall, it's more beneficial.
Not long ago, I found another very interesting thing. onFocus
and onBlur
events can also restrict user text selection. Just find a page and execute the following code in console, we can be surprised to find that we can no longer select text normally.
In fact, the principle here is also very simple. Usually, in elements like HTMLInputElement
, HTMLSelectElement
, HTMLTextAreaElement
, HTMLAnchorElement
, HTMLButtonElement
, there is a concept of focus, and text selection also has a focus behavior. So, since the focus cannot be focused together, we can forcibly focus it elsewhere, for example, in the above example, we forcibly focus on the button, so because the text content cannot get focus, it cannot be selected normally.
Similarly, we can use the method of stopping event execution during the capture phase to solve this problem, handling onFocus
and onBlur
events respectively. However, this method may cause some issues with page focus control, so here we have another way, by executing MutationObserver
at document-start
, and when similar DOM
nodes are found, directly remove them, preventing them from being inserted into the DOM
tree, naturally, there won't be any related issues. However, this is not a universal solution, and usually needs to be handled on a case by case
basis.
So, based on the above method, we have completed the writing and packaging of the script. Here I also share a best practice for script distribution. Script websites like GreasyFork
usually have the ability to sync the source code. We can directly enter a script link to automatically synchronize script updates, so we don’t need to fill them in everywhere. Now, there is a question, where should this script link come from? Similarly, we can use GitHub
's GitPages
to generate the script link, and GitHub
also has GitAction
to help us automate the script build process.
So, the whole process is like this: first, we configure GitAction
on GitHub
. When we push the code, it triggers the automatic build process. After the build is completed, we can automatically push the code to GitPages
, and then we can manually obtain the script link from GitPages
and fill it in various script websites. Also, if different channel packages are created, different script links can be distributed separately. This completes the automation of the entire process, and with the help of GitHub
, jsDelivr
can also be used as a CDN
. Below is the complete GitAction
configuration.