Issues and Solutions Related to Browser Extension Development

English

I'm thrilled to share that the installation count for my browser extension has finally surpassed a thousand! On Firefox AddOns, it has reached 1.7k+ installs, while on the Chrome Web Store, it's over 1k+ installs. In fact, the statistics on the Firefox extension market reflect weekly average installations, and the actual installs on any given day tend to exceed this average significantly. As for the Chrome extension market, installation numbers become inaccurate after surpassing 1k, meaning there are likely more than 1k actual installs.

Before developing the extension, I had implemented a script to handle related functionalities, which has garnered 2,688k+ installs on GreasyFork. There were two main reasons for creating the extension: Firstly, I wanted to learn about extension development, as I found that there were real-world applications at work, especially when needing to bypass browser limitations for specific tasks. Secondly, I discovered that some code I had packaged and published on GreasyFork under the GPL license was repurposed into a plugin that included ads, and it surprisingly amassed 400k+ installs.

Hence, I utilized my scripting skills to develop this browser extension, primarily for learning purposes. I built the entire development environment from scratch and addressed numerous compatibility issues. Next, let's delve into the related challenges and their solutions. You can find the project repository on GitHub; if you enjoy it, please give it a star! 😁

Extension Packaging Strategies

As mentioned earlier, we built the development environment from the ground up, which necessitated choosing a packaging tool for the extension. I opted for rspack, although using webpack or rollup would also work fine. I prefer rspack because I'm more familiar with it and it offers faster build times. The configuration is quite similar across all packaging tools. It's also worth noting that we are using a build level packaging approach here, and the devserver option in v3 isn't particularly suitable at this time.

A key point to remember while developing a browser extension is that we need to define multiple entry files with a single-file packaging scheme. We should avoid having multiple chunks for a single entry. This applies to CSS files as well; they should be packaged as a single entry with a single output. Moreover, the output filenames should not include hash suffixes to prevent issues with locating files. However, this is not a major concern if you pay attention to your configuration.

module.exports = {
  context: __dirname,
  entry: {
    popup: "./src/popup/index.tsx",
    content: "./src/content/index.ts",
    worker: "./src/worker/index.ts",
    [INJECT_FILE]: "./src/inject/index.ts",
  },
  // ...
  output: {
    publicPath: "/",
    filename: "[name].js",
    path: path.resolve(__dirname, folder),
  },
}

Here, you'll notice that the output filename for INJECT_FILE is dynamic. Since the inject script needs to be injected into the browser page, conflicts may arise due to the injection approach. As a result, the filenames generated during each build will differ, and they will change with every release. The naming for simulated event communication is also uniquely generated each time.

const EVENT_TYPE = isDev ? "EVENT_TYPE" : getUniqueId();
const INJECT_FILE = isDev ? "INJECT_FILE" : getUniqueId();

process.env.EVENT_TYPE = EVENT_TYPE;
process.env.INJECT_FILE = INJECT_FILE;
// ...

module.exports = {
  context: __dirname,
  builtins: {
    define: {
      "__DEV__": JSON.stringify(isDev),
      "process.env.EVENT_TYPE": JSON.stringify(process.env.EVENT_TYPE),
      "process.env.INJECT_FILE": JSON.stringify(process.env.INJECT_FILE),
      // ...
    }
  }
  // ...
}

Chrome and Firefox Compatibility

Chrome has been strongly pushing for the V3 version of extensions, which means manifest_version needs to be specified as 3. However, submitting a version with manifest_version: 3 in Firefox will trigger a warning against its usage. Personally, I also prefer not to use the v3 version due to its numerous restrictions, making it difficult to implement many functionalities properly; we can discuss this further later. Since Chrome mandates the use of v3 while Firefox recommends v2, we need to establish a compatibility solution for both the Chromium core and the Gecko core.

In fact, this resembles a multi-platform build scenario, where we need to package the same code for multiple platforms. When dealing with cross-platform compilation issues, my go-to method has been using process.env and __DEV__. However, I found that using conditional compilation this way, with extensive process.env.PLATFORM === xxx checks, can easily lead to deep nesting issues, negatively impacting readability. After all, our Promise exists to address the problem of asynchronous callback hell; introducing further nested issues due to cross-platform compilation doesn't seem like a wise solution.

In C/C++, there's an interesting preprocessor known as the C Preprocessor, which isn’t part of the compiler but acts as a separate step in the compilation process. Simply put, the C Preprocessor is like a text replacement tool, where macro parameters without identifiers directly replace raw text, guiding the compiler to complete the necessary pre-processing before actual compilation. Directives like #include, #define, and #ifdef, among others, belong to the preprocessor's directives. Here, we primarily focus on the conditional compilation aspects, such as #if, #endif, #ifdef, #endif, #ifndef, and #endif.

#if VERBOSE >= 2
  print("trace message");
#endif

#ifdef __unix__ /* __unix__ is usually defined by compilers targeting Unix systems */
# include <unistd.h>
#elif defined _WIN32 /* _WIN32 is usually defined by compilers targeting 32 or 64 bit Windows systems */
# include <windows.h>
#endif

We can implement a similar method using build tools. The C Preprocessor operates as a pre-processing tool that does not partake in the actual compilation process, making it quite similar to a loader in webpack. The direct replacement of raw text can also be fully achieved within loader. We can use comments to simulate directives like #ifdef and #endif, effectively avoiding deep nesting issues. Moreover, we can directly modify the string replacement logic; for instance, removing lines that don’t meet platform conditions while retaining those that do – achieving effects similar to #ifdef and #endif. Additionally, using comments can assist in complex scenarios, I've encountered instances involving intricate SDK packaging where internal and external behaviors differ significantly, making cross-platform setups challenging without multiple builds. Users may need to configure build tools themselves, but using comments can allow for complete packaging without needing to adjust loader configurations, thus alleviating the need for users to modify their settings in certain cases. However, this is still closely tied to specific business scenarios and serves merely as a reference.

// #IFDEF CHROMIUM
console.log("IS IN CHROMIUM");
// #ENDIF

// #IFDEF GECKO
console.log("IS IN GECKO");
// #ENDIF

Initially, I considered using regex for direct processing, but I found it quite cumbersome, especially with nested cases, where logic becomes harder to manage. Eventually, I realized that since the code is structured line by line, handling it on a per-line basis is the most efficient approach. Particularly since they are comments, which will ultimately be deleted, even if there’s indentation, we can simply trim the whitespace to match tags for processing. This streamlined my approach significantly; the pre-processing directive starting with #IFDEF will remain true, and the terminating #ENDIF will switch it to false. Our ultimate goal is merely to delete certain code sections, so we can return blank for non-qualifying lines. However, while handling nesting, it's vital to maintain a stack to track the indices of current processing directives’ start #IFDEF when they push onto the stack, and pop them off when encountering #ENDIF. We also need to keep track of the current processing state; if it's true upon popping, we must determine if we need to mark it as false to conclude processing for that block. Plus, we can utilize debug to generate files post-processing specific modules.

// CURRENT PLATFORM: GECKO

// #IFDEF CHROMIUM
// some expressions... // remove
// #ENDIF

// #IFDEF GECKO
// some expressions... // retain
// #ENDIF

// #IFDEF CHROMIUM
// some expressions... // remove
// #IFDEF GECKO
// some expressions... // remove
// #ENDIF
// #ENDIF

// #IFDEF GECKO
// some expressions... // retain
// #IFDEF CHROMIUM
// some expressions... // remove
// #ENDIF
// #ENDIF

// #IFDEF CHROMIUM|GECKO
// some expressions... // retain
// #IFDEF GECKO
// some expressions... // retain
// #ENDIF
// #ENDIF

// ...
// Control whether this line hits the preprocessing conditions during iteration
const platform = (process.env[envKey] || "").toLowerCase();
let terser = false;
let revised = false;
let terserIndex = -1;
/** @type {number[]} */
const stack = [];
const lines = source.split("\n");
const target = lines.map((line, index) => {
// Remove leading and trailing whitespace, remove the line's leading comment symbols and whitespace (optional)
const code = line.trim().replace(/^\/\/\s*/, "");
// Check for the start of a preprocessing directive `#IFDEF`, which will only set `true`
if (/^#IFDEF/.test(code)) {
  stack.push(index);
  // Continue if it's `true`
  if (terser) return "";
  const match = code.replace("#IFDEF", "").trim();
  const group = match.split("|").map(item => item.trim().toLowerCase());
  if (group.indexOf(platform) === -1) {
    terser = true;
    revised = true;
    terserIndex = index;
  }
  return "";
}
// Check for the end of a preprocessing directive `#ENDIF`, which will only set `false`
if (/^#ENDIF$/.test(code)) {
  const index = stack.pop();
  // Ignore extra `#ENDIF`
  if (index === undefined) return "";
  if (index === terserIndex) {
    terser = false;
    terserIndex = -1;
  }
  return "";
}
// If it hits the preprocessing conditions, erase it
if (terser) return "";
  return line;
});
// ...

In practical usage, taking the registration of Badge as an example, you can execute the respective code for different platforms by using if branches. Certainly, if there are similar definitions, you can easily redefine the variables directly.

let env = chrome;
// #IFDEF GECKO
if (typeof browser !== "undefined") {
  env = browser;
}
// #ENDIF
export const cross = env;

// ...
let action: typeof cross.action | typeof cross.browserAction = cross.action;
// #IFDEF GECKO
action = cross.browserAction;
// #ENDIF
action.setBadgeText({ text: payload.toString(), tabId });
action.setBadgeBackgroundColor({ color: "#4e5969", tabId });

Execute Before Page Js Code

One key feature of browser extensions is document_start, meaning that the code injected by the browser executes before the site's own Js code. This provides ample Hook space for our code. Consider the potential if we could run the Js code we wanted right as the page loads; we could manipulate the current page at will. Although we can't Hook the creation of literal objects, we can always call the APIs provided by the browser. As long as we invoke the API, we can find ways to intercept function calls and retrieve the data we desire. For instance, we could intercept the call to Function.prototype.call, and for this function to work effectively, it needs to be supported by our intercepting function first across the entire page. Otherwise, if this function has already been called, intercepting again would be pointless.

Function.prototype.call = function (dynamic, ...args) {
  const context = Object(dynamic) || window;
  const symbol = Symbol();
  context[symbol] = this;
  args.length === 2 && console.log(args);
  try {
    const result = context[symbol](...args);
    delete context[symbol];
    return result;
  } catch (error) {
    console.log("Hook Call Error", error);
    console.log(context, context[symbol], this, dynamic, args);
    return null;
  }
};

So we might all wonder what the significance of this code implementation is. Let's take a simple example: in a certain library, all text is rendered through a canvas. Since there is no DOM, if we want to obtain the entire content of the document, we cannot directly copy it. A feasible solution is to hijack the document.createElement function. When the created element is a canvas, we can grab the canvas object in advance to obtain the ctx. Moreover, since the actual rendering of text still requires calling the context2DPrototype.fillText method, by hijacking this method, we can extract the rendered text. Following that, we can create the DOM elsewhere, thus allowing for copying whenever we want.

Now, returning to the implementation of this issue, if we can ensure that the script runs first, we can accomplish nearly anything on the language level, such as modifying the window object, hooking function definitions, altering prototype chains, blocking events, and so forth. This capability ultimately stems from browser extensions, and the challenge for the script manager is how to expose this capability of browser extensions to Web pages. For our discussion, let's assume user scripts run on the browser page as Inject Scripts rather than Content Scripts. Based on this assumption, we are likely to write a dynamic/asynchronous loading implementation for JS scripts, similar to the following:

const loadScriptAsync = (url: string) => {
    return new Promise<Event>((resolve, reject) => {
        const script = document.createElement("script");
        script.src = url;
        script.async = true;
        script.onload = e => {
            script.remove();
            resolve(e);
        };
        script.onerror = e => {
            script.remove();
            reject(e);
        };
        document.body.appendChild(script);
    });
};

Now, there's a clear problem: if we load the script after the body tag has been constructed, roughly at the DOMContentLoaded moment, we will definitely not achieve the goal of document-start. Even handling it after the head tag is complete is not effective, as many websites include some JS resources within the head. Loading it at that timing is also unsuitable. In reality, the biggest issue is still that the entire process is asynchronous — by the time the external script finishes loading, a lot of JS code has already executed, preventing us from achieving our goal of “executing first.”

So let's explore the specific implementation. First is the v2 extension for browsers using the gecko engine. For the entire page, the first element to load is definitely the html tag. It’s clear that we just need to insert the script at the html tag level. Coupling this with the chrome.tabs.executeScript for dynamic code execution within the browser extension's background and the Content Script's "run_at": "document_start" to establish message communication confirming the injected tab, this method may seem very simple. Yet, this seemingly straightforward problem had me pondering for quite some time on how to achieve it.

// Content Script --> Background
// Background -> chrome.tabs.executeScript
chrome.tabs.executeScript(sender.tabId, {
  frameId: sender.frameId,
  code: `(function(){
    let temp = document.createElementNS("http://www.w3.org/1999/xhtml", "script");
        temp.setAttribute('type', 'text/javascript');
        temp.innerHTML = "${script.code}";
        temp.className = "injected-js";
        document.documentElement.appendChild(temp);
        temp.remove();
    }())`,
  runAt,
});

This approach actually looks quite good; it can basically achieve document-start. However, since we're saying it's only basic, it indicates that there are still some scenarios where issues may arise. If we take a closer look at the implementation of this code, there is a communication occurring which is Content Script --> Background. Since this is communication, it involves asynchronous processing, and because it's asynchronous, it will take time. Once time is consumed, the user’s page might have already executed a significant amount of code, making it occasionally impossible to truly achieve document-start, resulting in the script potentially failing to execute.

So, how can we address this issue? In v2, we can clearly understand that Content Script is entirely controlled with document-start, but Content Script is not the same as Inject Script and cannot access the page's window object. Therefore, it cannot effectively hijack the functions on the page. This problem appears complex, but once understood, the solution can actually be quite simple. We can build on the original Content Script by introducing an additional Content Script, the code of which is entirely equivalent to the original Inject Script, but it will be attached to window. We can utilize a packaging tool to write a plugin to accomplish this.

compiler.hooks.emit.tapAsync("WrapperCodePlugin", (compilation, done) => {
  Object.keys(compilation.assets).forEach(key => {
    if (!isChromium && key === process.env.INJECT_FILE + ".js") {
      try {
        const buffer = compilation.assets[key].source();
        let code = buffer.toString("utf-8");
        code = `window.${process.env.INJECT_FILE}=function(){${code}}`;
        compilation.assets[key] = {
          source() {
            return code;
          },
          size() {
            return this.source().length;
          },
        };
      } catch (error) {
        console.log("Parse Inject File Error", error);
      }
    }
  });
  done();
});

This piece of code indicates that we have attached a randomly generated key to the same window object of the Content Script. This is where we previously mentioned potential conflicts might occur, and the content is actually the script we wish to inject into the page. However, although we can now access this function, how can we ensure it executes on the user’s page? Here, we utilize the same document.documentElement.appendChild method to create the script. The implementation here is exceptionally clever: by using two Content Scripts in combination with toString, we obtain a string which is then directly injected into the page as code, thereby achieving true document-start.

const fn = window[process.env.INJECT_FILE as unknown as number] as unknown as () => void;
// #IFDEF GECKO
if (fn) {
  const script = document.createElementNS("http://www.w3.org/1999/xhtml", "script");
  script.setAttribute("type", "text/javascript");
  script.innerText = `;(${fn.toString()})();`;
  document.documentElement.appendChild(script);
  script.onload = () => script.remove();
  // eslint-disable-next-line @typescript-eslint/ban-ts-comment
  // @ts-ignore
  delete window[process.env.INJECT_FILE];
}
// #ENDIF

As previously mentioned, the Chrome browser no longer permits submissions of v2 extensions, meaning we can only submit v3 code. However, v3 code comes with very strict Content Security Policy (CSP) restrictions, effectively disallowing the dynamic execution of code. Thus, the approaches we've outlined above become ineffective, leading us to write something akin to the following code.

const script = document.createElementNS("http://www.w3.org/1999/xhtml", "script");
script.setAttribute("type", "text/javascript");
script.setAttribute("src", chrome.runtime.getURL("inject.js"));
document.documentElement.appendChild(script);
script.onload = () => script.remove();

Although it seems that we are immediately creating a Script tag in the Content Script and executing code, can this achieve our document-start goal? Unfortunately, the answer is no. While it works when the page is first opened, subsequently this script effectively behaves as an external script. Consequently, Chrome queues this script alongside the other scripts on the page, and due to strong caching, it’s uncertain which script will execute first. This instability is unacceptable, thus we definitely cannot meet the document-start objective.

In fact, this alone indicates that v3 is not fully mature; many capabilities are not adequately supported. In response, the official team later devised several solutions to tackle this problem. However, given that we have no way to determine the user's browser version, many compatibility methods still need to be addressed.

export const implantScript = () => {
  /**  RUN INJECT SCRIPT IN DOCUMENT START **/
  // #IFDEF CHROMIUM
  // https://bugs.chromium.org/p/chromium/issues/detail?id=634381
  // https://stackoverflow.com/questions/75495191/chrome-extension-manifest-v3-how-to-use-window-addeventlistener
  if (cross.scripting && cross.scripting.registerContentScripts) {
    logger.info("Register Inject Scripts By Scripting API");
    // https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/scripting/registerContentScripts
    cross.scripting
      .registerContentScripts([
        {
          matches: [...URL_MATCH],
          runAt: "document_start",
          world: "MAIN",
          allFrames: true,
          js: [process.env.INJECT_FILE + ".js"],
          id: process.env.INJECT_FILE,
        },
      ])
      .catch(err => {
        logger.warning("Register Inject Scripts Failed", err);
      });
  } else {
    logger.info("Register Inject Scripts By Tabs API");
    // https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs/onUpdated
    cross.tabs.onUpdated.addListener((_, changeInfo, tab) => {
      if (changeInfo.status == "loading") {
        const tabId = tab && tab.id;
        const tabURL = tab && tab.url;
        if (tabURL && !URL_MATCH.some(match => new RegExp(match).test(tabURL))) {
          return void 0;
        }
        if (tabId && cross.scripting) {
          cross.scripting.executeScript({
            target: { tabId: tabId, allFrames: true },
            files: [process.env.INJECT_FILE + ".js"],
            injectImmediately: true,
          });
        }
      }
    });
  }
  // #ENDIF
  // #IFDEF GECKO
  logger.info("Register Inject Scripts By Content Script Inline Code");
  // #ENDIF
};

Since Chrome V109, support for chrome.scripting.registerContentScripts has been introduced, and Chrome 111 allows scripts declared with world: 'MAIN' directly in the Manifest. However, compatibility issues still need to be handled by developers, especially when older browsers do not support world: 'MAIN', in which case the script will be treated as a Content Script. I find this aspect somewhat challenging to manage.

Handling Static Resources

Consider that many of our resource references are processed as strings, such as the icons reference in manifest.json, which is a string reference rather than referencing resources based on their actual paths as we would in our Web applications. In such instances, the resources will not be incorporated as content by the packaging tool. The specific issue is that modifying resources does not trigger the HMR of the packaging tool.

Therefore, for this part, we need to manually integrate these into the bundle dependencies. Additionally, it is necessary to copy the relevant files to the target packaging folder. This task is not overly complex; we simply need to create a plugin to accomplish this. Besides handling images and other static resources, we also need to manage locales as language files.

exports.FilesPlugin = class FilesPlugin {
  apply(compiler) {
    compiler.hooks.make.tap("FilesPlugin", compilation => {
      const resources = path.resolve("public/static");
      !compilation.contextDependencies.has(resources) &&
        compilation.contextDependencies.add(resources);
    });

    compiler.hooks.done.tapPromise("FilesPlugin", () => {
      const locales = path.resolve("public/locales/");
      const resources = path.resolve("public/static/");

      const folder = isGecko ? "build-gecko" : "build";
      const localesTarget = path.resolve(`${folder}/_locales/`);
      const resourcesTarget = path.resolve(`${folder}/static/`);

      return Promise.all([
        exec(`cp -r ${locales} ${localesTarget}`),
        exec(`cp -r ${resources} ${resourcesTarget}`),
      ]);
    });
  }
};

Generating Manifest

As previously mentioned regarding the handling of static resources, there is a similar issue with the generation of the manifest.json file. We also need to register it as a contextDependency with the bundler. Additionally, recalling that we need to maintain compatibility with both Chromium and Gecko, we similarly need to ensure that manifest.json is compatible. To avoid having two configuration files for this purpose, we can utilize ts-node to dynamically generate manifest.json. This allows us to write the configuration file dynamically through various logic.

exports.ManifestPlugin = class ManifestPlugin {
  constructor() {
    tsNode.register();
    this.manifest = path.resolve(`src/manifest/index.ts`);
  }

  apply(compiler) {
    compiler.hooks.make.tap("ManifestPlugin", compilation => {
      const manifest = this.manifest;
      !compilation.fileDependencies.has(manifest) && compilation.fileDependencies.add(manifest);
    });

    compiler.hooks.done.tapPromise("ManifestPlugin", () => {
      delete require.cache[require.resolve(this.manifest)];
      const manifest = require(this.manifest);
      const version = require(path.resolve("package.json")).version;
      manifest.version = version;
      const folder = isGecko ? "build-gecko" : "build";
      return writeFile(path.resolve(`${folder}/manifest.json`), JSON.stringify(manifest, null, 2));
    });
  }
};

const __URL_MATCH__ = ["https://*/*", "http://*/*", "file://*/*"];

// Chromium
const __MANIFEST__: Record<string, unknown> = {
  manifest_version: 3,
  name: "Force Copy",
  version: "0.0.0",
  description: "Force Copy Everything",
  default_locale: "en",
  icons: {
    32: "./static/favicon.128.png",
    96: "./static/favicon.128.png",
    128: "./static/favicon.128.png",
  },
  // ...
  permissions: ["activeTab", "tabs", "scripting"],
  minimum_chrome_version: "88.0",
};

// Gecko
if (process.env.PLATFORM === "gecko") {
  __MANIFEST__.manifest_version = 2;
  // ...
  __MANIFEST__.permissions = ["activeTab", "tabs", ...__URL_MATCH__];
  __MANIFEST__.browser_specific_settings = {
    gecko: { strict_min_version: "91.1.0" },
    gecko_android: { strict_min_version: "91.1.0" },
  };

  delete __MANIFEST__.action;
  delete __MANIFEST__.host_permissions;
  delete __MANIFEST__.minimum_chrome_version;
  delete __MANIFEST__.web_accessible_resources;
}

module.exports = __MANIFEST__;

Event Communication Scheme

In browser extensions, there are many modules, commonly including background/worker, popup, content, inject, devtools, etc. Each module serves a different purpose and collaborates to enhance the functionality of the extension. Clearly, due to the various modules, each responsible for distinct functions, we need to establish communication capabilities among the related modules.

Since the entire project is built with TS, we aim to implement a fully typed communication scheme, particularly during complex functional implementations where static type checking can help us avoid numerous issues. Here, we take Popup to Content as an example to create a unified data communication solution, requiring us to design relevant classes for each module that needs to communicate within the extension.

First, we need to define the communication key values, as we must use the type to determine the information type conveyed in the communication. To prevent value conflicts, we increase the complexity of our key values using reduce.

const PC_REQUEST_TYPE = ["A", "B"] as const;
export const POPUP_TO_CONTENT_REQUEST = PC_REQUEST_TYPE.reduce(
  (acc, cur) => ({ ...acc, [cur]: `__${cur}__${MARK}__` }),
  {} as { [K in typeof PC_REQUEST_TYPE[number]]: `__${K}__${typeof MARK}__` }
);

If we've used redux, we might encounter a challenge regarding how to align the type with the type carried by the payload. For example, we may want TS to automatically infer that when type is A, the payload type is { x: number }, and when type is B, the inferred type should be { y: string }. A relatively straightforward declarative example would be as follows:

type Tuple =
  | {
      type: "A";
      payload: { x: number };
    }
  | {
      type: "B";
      payload: { y: string };
    };
    
const pick = (data: Tuple) => {
  switch (data.type) {
    case "A":
      return data.payload.x; // number
    case "B":
      return data.payload.y; // string
  }
};

However, this approach is not very elegant; we might prefer a more refined declaration of types. Fortunately, we can utilize generics to accomplish this. Nonetheless, we may need to tackle the problem gradually, first by establishing a type Map for type -> payload, representing the mapping relationship. After that, we can transform this into a structure of type -> { type: T, payload: Map[T] }, from which we can derive Tuple.

type Map = {
  A: { x: number };
  B: { y: string };
};

type ToReflectMap<T extends string, M extends Record<string, unknown>> = {
  [P in T]: { type: unknown extends M[P] ? never : P; payload: M[P] };
};

type ReflectMap = ToReflectMap<keyof Map, Map>;

type Tuple = ReflectMap[keyof ReflectMap];

Now, we can encapsulate this within a namespace along with some basic methods for type conversion to make it easier for us to call.

export namespace Object {
  export type Keys<T extends Record<string, unknown>> = keyof T;

  export type Values<T extends Record<symbol | string | number, unknown>> = T[keyof T];
}

export namespace String {
  export type Map<T extends string> = { [P in T]: P };
}

export namespace EventReflect {
  export type Array<T, M extends Record<string, unknown>> = T extends string
    ? [type: unknown extends M[T] ? never : T, payload: M[T]]
    : never;

  export type Map<T extends string, M extends Record<string, unknown>> = {
    [P in T]: { type: unknown extends M[P] ? never : P; payload: M[P] };
  };

  export type Tuple<
    T extends Record<string, string>,
    M extends Record<string, unknown>
  > = Object.Values<Map<Object.Values<T>, M>>;
}

type Tuple = EventReflect.Tuple<String.Map<keyof Map>, Map>;

In fact, to facilitate our function calls, we can also process the parameters by casting them internally to the required parameter types.

type Map = {
  A: { x: number };
  B: { y: string };
};

type Args = EventReflect.Array<keyof Map, Map>;

declare function post(...args: Args): null;

post("A", { x: 2 });
post("B", { y: "" });

To clarify our type expressions, we will temporarily avoid using function parameter syntax. Instead, we'll denote types using an object format of type -> payload. Since we have already defined the types for the requests here, we now need to define the data types for the returned responses in a similar type -> payload format. The response type matches the request type.

type EventMap = {
  [POPUP_TO_CONTENT_REQUEST.A]: { [K in PCQueryAType]: boolean };
};

export type PCResponseType = EventReflect.Tuple<String.Map<keyof EventMap>, EventMap>;

Next, we will define the entire event communication Bridge. Since we are sending data from Popup to Content, we must specify which Tab we are sending data to, thus necessitating the inquiry of the currently active Tab. The data communication method will use cross.tabs.sendMessage, while receiving messages will involve cross.runtime.onMessage.addListener. Given the potential variety of communication channels, we also need to check the message source, which can be done by assessing the sent key.

It is important to note that even though the extended definition includes sendResponse for asynchronous data responses, testing reveals that this function cannot be called asynchronously. This means it must be executed immediately within the callback. The term asynchronous here refers to the overall event communication process, so we'll define it in terms of immediate data response.

export class PCBridge {
  public static readonly REQUEST = POPUP_TO_CONTENT_REQUEST;

static async postToContent(data: PCRequestType) {
  return new Promise<PCResponseType | null>(resolve => {
    cross.tabs
      .query({ active: true, currentWindow: true })
      .then(tabs => {
        const tab = tabs[0];
        const tabId = tab && tab.id;
        const tabURL = tab && tab.url;
        if (tabURL && !URL_MATCH.some(match => new RegExp(match).test(tabURL))) {
          resolve(null);
          return void 0;
        }
        if (!isEmptyValue(tabId)) {
          cross.tabs.sendMessage(tabId, data).then(resolve);
        } else {
          resolve(null);
        }
      })
      .catch(error => {
        logger.warning("Send Message Error", error);
      });
  });
}

static onPopupMessage(cb: (data: PCRequestType) => void | PCResponseType) {
  const handler = (
    request: PCRequestType,
    _: chrome.runtime.MessageSender,
    sendResponse: (response: PCResponseType | null) => void
  ) => {
    const response = cb(request);
    response && response.type === request.type && sendResponse(response);
  };
  cross.runtime.onMessage.addListener(handler);
  return () => {
    cross.runtime.onMessage.removeListener(handler);
  };
}

static isPCRequestType(data: PCRequestType): data is PCRequestType {
  return data && data.type && data.type.endsWith(`__${MARK}__`);
}

Furthermore, communication between content and inject requires a somewhat specialized encapsulation. The DOM and event flow in Content Scripts are shared with the Inject Script, which means we can effectively implement communication in two ways:

The first common method is using window.addEventListener + window.postMessage. However, one obvious issue with this approach is that messages can also be received in the Web page. Although we can generate random tokens to validate the source of the messages, this method can still be easily intercepted by the page itself, which is not very secure.
Another approach is utilizing document.addEventListener + document.dispatchEvent + CustomEvent to create custom events. Here, it's crucial to ensure that the event names are random. By generating a unique random event name in the background during the injection of the framework, and subsequently using that event name for communication between the Content Script and the Inject Script, we can prevent the messages generated from method calls from being intercepted by users.

It is important to note that all transmitted data types must be serializable. If not, they will be considered cross-origin objects in Gecko-based browsers, as they genuinely cross different Contexts. Otherwise, it would be akin to directly sharing memory.

// Content Script
document.addEventListener("xxxxxxxxxxxxx" + "content", e => {
    console.log("From Inject Script", e.detail);
});

// Inject Script
document.addEventListener("xxxxxxxxxxxxx" + "inject", e => {
    console.log("From Content Script", e.detail);
});

// Inject Script
document.dispatchEvent(
    new CustomEvent("xxxxxxxxxxxxx" + "content", {
        detail: { message: "call api" },
    }),
);

// Content Script
document.dispatchEvent(
    new CustomEvent("xxxxxxxxxxxxx" + "inject", {
        detail: { message: "return value" },
    }),
);

Hot Update Solution

Earlier, we discussed the various limitations of Google's heavily promoted v3. One significant restriction is its CSP - Content Security Policy, which no longer allows for the dynamic execution of code. This means that tools like our DevServer's HMR cannot function properly, while hot updates are a feature we actually need. Consequently, we are left with less refined solutions.

We can create a plugin for our build tool that uses ws.Server to start a WebSocket server. Then, we can establish a connection from worker.js, the Service Worker we intend to start, to this WebSocket server. By using new WebSocket, we can connect and listen for messages. Upon receiving a reload message from the server, we will execute chrome.runtime.reload() to reload the extension.

In the active WebSocket server, we need to send a reload message to the client after each build completion, for example, within the afterDone hook. This will allow for simple extension reload capabilities. However, this introduces another problem: in the v3 version, Service Workers are not persistent. Therefore, the WebSocket connection will also be terminated along with the destruction of the Service Worker. This has caused many Chrome extensions to struggle with transitioning smoothly from v2 to v3, and it is likely that this capability will be improved in the future.

exports.ReloadPlugin = class ReloadPlugin {
  constructor() {
    if (isDev) {
      try {
        const server = new WebSocketServer({ port: 3333 });
        server.on("connection", client => {
          wsClient && wsClient.close();
          wsClient = client;
          console.log("Client Connected");
        });
      } catch (error) {
        console.log("Auto Reload Server Error", error);
      }
    }
  }
  apply(compiler) {
    compiler.hooks.afterDone.tap("ReloadPlugin", () => {
      wsClient && wsClient.send("reload-app");
    });
  }
};

export const onReceiveReloadMsg = () => {
  if (__DEV__) {
    try {
      const ws = new WebSocket("ws://localhost:3333");
      // Reload upon receiving a message
      ws.onmessage = () => {
        try {
          CWBridge.postToWorker({ type: CWBridge.REQUEST.RELOAD, payload: null });
        } catch (error) {
          logger.warning("SEND MESSAGE ERROR", error);
        }
      };
    } catch (error) {
      logger.warning("CONNECT ERROR", error);
    }
  }
};

export const onContentMessage = (data: CWRequestType, sender: chrome.runtime.MessageSender) => {
  logger.info("Worker Receive Content Message", data);
  switch (data.type) {
    case CWBridge.REQUEST.RELOAD: {
      reloadApp(RELOAD_APP);
      break;
    }
    // ...
  }
  return null;
};

Here we have successfully implemented the entire hot update solution for the extension. At this point, we can leverage the extension's Install event to re-execute the Content/Inject Script code injection at this moment, thereby achieving a comprehensive hot update for the extension. Of course, we must ensure that the script injection is idempotent. It is important to note that there is no Uninstall event in the extension, so we need to manage the removal of previously injected side effects through the convention of calling specific global methods.

const onSetup = () => {
  cross.tabs.query({}).then(tabs => {
    for (const tab of tabs) {
      const tabId = tab && tab.id;
      cross.scripting.executeScript(/** ... */);
      cross.tabs.executeScript(tabId, /** ... */)
    }
  });
};
cross.runtime.onInstalled.addListener(/** ... */);

Interestingly, the multilingual solution provided by the browser is not very effective. The files we store in locals are merely placeholders, intended to let the extension marketplace recognize the languages supported by our browser extension. The actual multilingual implementation occurs within our Popup. For example, the data in packages/force-copy/public/locales/zh_CN looks like this:

{
  "name": {
    "message": "Force Copy"
  }
}

In reality, there are many front-end multilingual solutions available. Here, since our extension does not contain much multilingual content to worry about—after all, it is just a Popup layer—there is no need to create a separate index.html page. However, if that were necessary, it would be worthwhile to adopt a community multilingual solution. For now, we will keep it simple.

First, we ensure complete type coverage. In our extension, we use English as the base language, so the configuration is also set in English. Since we want a better grouping scheme, there may be more deeply nested structures. Therefore, the type definition must be comprehensive to support our multilingual requirements.

export const en = {
  Title: "Force Copy",
  Captain: {
    Modules: "Modules",
    Start: "Start",
    Once: "Once",
  },
  Operation: {
    Copy: "Copy",
    Keyboard: "Keyboard",
    ContextMenu: "ContextMenu",
  },
  Information: {
    GitHub: "GitHub",
    Help: "Help",
    Reload: "Reload",
  },
};

export type DefaultI18nConfig = typeof en;

export type ConfigBlock = {
  [key: string]: string | ConfigBlock;
};
type FlattenKeys<T extends ConfigBlock, Key = keyof T> = Key extends string
  ? T[Key] extends ConfigBlock
    ? `${Key}.${FlattenKeys<T[Key]>}`
    : `${Key}`
  : never;
export type I18nTypes = Record<FlattenKeys<DefaultI18nConfig>, string>;

Next, we define the I18n class along with a global cache for languages. In the I18n class, we implement functions for calling, generating multilingual configurations on demand, and retrieving multilingual configurations. When calling it, we instantiate with new I18n(cross.i18n.getUILanguage()); and can retrieve translations by calling i18n.t("Information.GitHub").

const cache: Record<string, I18nTypes> = {};

export class I18n {
  private config: I18nTypes;
  constructor(language: string) {
    this.config = I18n.getFullConfig(language);
  }

  public t = (key: keyof I18nTypes, defaultValue = "") => {
    return this.config[key] || defaultValue || key;
  };

private static getFullConfig = (key: string) => {
    if (cache[key]) return cache[key];
    let config;
    if (key.toLowerCase().startsWith("zh")) {
      config = this.generateFlattenConfig(zh);
    } else {
      config = this.generateFlattenConfig(en);
    }
    cache[key] = config;
    return config;
  };

  private static generateFlattenConfig = (config: ConfigBlock): I18nTypes => {
    const target: Record<string, string> = {};
    const dfs = (obj: ConfigBlock, prefix: string[]) => {
      for (const [key, value] of Object.entries(obj)) {
        if (isString(value)) {
          target[[...prefix, key].join(".")] = value;
        } else {
          dfs(value, [...prefix, key]);
        }
      }
    };
    dfs(config, []);
    return target as I18nTypes;
  };
}

Conclusion

Developing browser extensions is quite a complex endeavor, especially when it needs to be compatible with both v2 and v3. Many design considerations must be made to ensure functionality on v3. The shift to v3 has reduced some flexibility, but it has also enhanced security to some extent. However, the inherent permissions of browser extensions remain significantly high; for instance, even in v3, we can still utilize CDP - Chrome DevTools Protocol on Chrome to accomplish a wide array of tasks. There is an overwhelming number of capabilities that extensions can offer, making it daunting to install them without a clear understanding or when the source code is not open. High extension permissions can lead to severe issues, such as user data leaks. Even with a strict code upload requirement like that of Firefox to tighten review processes, it is challenging to eliminate all potential risks.

Daily Challenge

https://github.com/WindRunnerMax/EveryDay

References

ON THIS PAGE

Issues and Solutions Related to Browser Extension Development#

Extension Packaging Strategies#

Chrome and Firefox Compatibility#

Execute Before Page Js Code#

Handling Static Resources#

Generating Manifest#

Event Communication Scheme#

Hot Update Solution#

Popup Multilingual Support#

Conclusion#

Daily Challenge#

References#