---
title: Browser Automation
---
import current from '/version.js';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
Headless automation involves controlling "headless browsers" to access websites
and submit or download data.  It is also possible to automate browsers using
custom browser extensions.
The [SheetJS standalone scripts](/docs/getting-started/installation/standalone)
can be added to any website by inserting a `SCRIPT` tag.  Headless browsers
usually provide utility functions for running custom snippets in the browser and
passing data back to the automation script.
## Use Case
This demo focuses on exporting table data to a workbook.  Headless browsers do
not generally support passing objects between the browser context and the
automation script, so the file data must be generated in the browser context
and sent back to the automation script for saving in the file system.
```mermaid
sequenceDiagram
  autonumber off
  actor U as User
  participant C as Controller
  participant B as Browser
  U->>C: run script
  rect rgba(255, 0, 0, 0.25)
    C->>B: launch browser
    C->>B: load URL
  end
  rect rgba(0, 127, 0, 0.25)
    C->>B: add SheetJS script
  end
  rect rgba(255, 0, 0, 0.25)
    C->>B: ask for file
    Note over B: scrape tables
    Note over B: generate workbook
    B->>C: file bytes
  end
  rect rgba(0, 127, 0, 0.25)
    C->>U: save file
  end
```
Key Steps (click to hide)
1) Launch the headless browser and load the target site.
2) Add the standalone SheetJS build to the page in a `SCRIPT` tag.
3) Add a script to the page (in the browser context) that will:
- Make a workbook object from the first table using `XLSX.utils.table_to_book`
- Generate the bytes for an XLSB file using `XLSX.write`
- Send the bytes back to the automation script
4) When the automation context receives data, save to a file
Integration Details and Demo (click to show)
The steps are marked in the comments:
{`\
var page = require('webpage').create();
page.onConsoleMessage = function(msg) { console.log(msg); };
\n\
/* (1) Load the target page */
page.open('https://sheetjs.com/demos/table', function() {
\n\
  /* (2) Load the standalone SheetJS build from the CDN */
  page.includeJs("https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js", function() {
\n\
    /* (3) Run the snippet in browser and return data */
    var bin = page.evaluateJavaScript([ "function(){",
\n\
      /* find first table */
      "var table = document.body.getElementsByTagName('table')[0];",
\n\
      /* call table_to_book on first table */
      "var wb = XLSX.utils.table_to_book(table);",
\n\
      /* generate XLSB file and return binary string */
      "return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
    "}" ].join(""));
\n\
    /* (4) write data to file */
    require("fs").write("SheetJSPhantomJS.xlsb", bin, "wb");
\n\
    phantom.exit();
  });
});`}
:::caution pass
PhantomJS is very finicky and will hang if there are script errors.  It is
strongly recommended to add verbose logging and to lint scripts before use.
:::
**Demo**
:::note
This demo was last tested on 2023 September 14 against PhantomJS 2.1.1
:::
1) Download and unzip the PhantomJS release from the official website[^1].
2) Save the `SheetJSPhantom.js` code snippet to `SheetJSPhantom.js`.
3) Run the `phantomjs` program and pass the script as the first argument.
For example, if the macOS Archive Utility unzipped the `2.1.1` release, binaries
will be placed in `phantomjs-2.1.1-macosx/bin/` and the command will be:
```bash
./phantomjs-2.1.1-macosx/bin/phantomjs SheetJSPhantom.js
```
When the script finishes, the file `SheetJSPhantomJS.xlsb` will be created.
This file can be opened with Excel.