diff --git a/index.html b/index.html index 303d2bb..8782f02 100644 --- a/index.html +++ b/index.html @@ -75,22 +75,13 @@
Parser and writer for various spreadsheet formats. Pure-JS cleanroom -implementation from official specifications, related documents, and test files. -Emphasis on parsing and writing robustness, cross-format feature compatibility -with a unified JS representation, and ES3/ES5 browser compatibility back to IE6.
-This is the community version. We also offer a pro version with performance -enhancements, additional features like styling, and dedicated support.
-Community Translations of this README:
- - - - - - - +The SheetJS Community Edition offers battle-tested open-source solutions for +extracting useful data from almost any complex spreadsheet and generating new +spreadsheets that will work with legacy and modern software alike.
+SheetJS Pro offers solutions beyond data processing: +Edit complex templates with ease; let out your inner Picasso with styling; make +custom sheets with images/graphs/PivotTables; evaluate formula expressions and +port calculations to web apps; automate common spreadsheet tasks, and much more!
@@ -111,19 +102,20 @@ enhancements, additional features like styling, and dedicated support.
In the browser, just add a script tag:
+Getting Started +Standalone Browser Scripts
+The complete browser standalone build is saved to dist/xlsx.full.min.js and
+can be directly added to a page with a script tag:
<script lang="javascript" src="dist/xlsx.full.min.js"></script>unpkg makes the latest version available at:
For example, unpkg makes the latest version available at:
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>With npm:
-$ npm install xlsxThe complete single-file version is generated at dist/xlsx.full.min.js
A slimmer build is generated at dist/xlsx.mini.min.js. Compared to full build:
With bower:
$ bower install js-xlsxECMAScript Modules
+The ECMAScript Module build is saved to xlsx.mjs and can be directly added to
+a page with a script tag using type=module:
<script type="module">
+import { read, writeFileXLSX } from "./xlsx.mjs";
+
+/* load the codepage support library for extended support with older formats */
+import { set_cptable } from "./xlsx.mjs";
+import * as cptable from './dist/cpexcel.full.mjs';
+set_cptable(cptable);
+</script>The npm package also exposes the module
+with the module parameter, supported in Angular and other projects:
import { read, writeFileXLSX } from "xlsx";
+
+/* load the codepage support library for extended support with older formats */
+import { set_cptable } from "xlsx";
+import * as cptable from 'xlsx/dist/cpexcel.full.mjs';
+set_cptable(cptable);Deno
+The sheetjs package is hosted by Deno:
// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
+import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
+
+/* load the codepage support library for extended support with older formats */
+import * as cptable from 'https://deno.land/x/sheetjs/dist/cpexcel.full.mjs';
+XLSX.set_cptable(cptable);NodeJS
+With npm:
+$ npm install xlsxBy default, the module supports require:
var XLSX = require("xlsx");The module also ships with xlsx.mjs for use with import:
import * as XLSX from 'xlsx/xlsx.mjs';
+
+/* load 'fs' for readFile and writeFile support */
+import * as fs from 'fs';
+XLSX.set_fs(fs);
+
+/* load the codepage support library for extended support with older formats */
+import * as cpexcel from 'xlsx/dist/cpexcel.full.mjs';
+XLSX.set_cptable(cpexcel);Photoshop and InDesign
+dist/xlsx.extendscript.js is an ExtendScript build for Photoshop and InDesign
+that is included in the npm package. It can be directly referenced with a
+#include directive:
#include "xlsx.extendscript.js"
+
+For broad compatibility with JavaScript engines, the library is written using
+ECMAScript 3 language dialect as well as some ES5 features like Array#forEach.
+Older browsers require shims to provide missing functions.
To use the shim, add the shim before the script tag that loads xlsx.js:
<!-- add the shim first -->
+<script type="text/javascript" src="shim.min.js"></script>
+<!-- after the shim is referenced, add the library -->
+<script type="text/javascript" src="xlsx.full.min.js"></script>The script also includes IE_LoadFile and IE_SaveFile for loading and saving
+files in Internet Explorer versions 6-9. The xlsx.extendscript.js script
+bundles the shim in a format suitable for Photoshop and other Adobe products.
Most scenarios involving spreadsheets and data can be broken into 5 parts:
+Acquire Data: Data may be stored anywhere: local or remote files, +databases, HTML TABLE, or even generated programmatically in the web browser.
+Extract Data: For spreadsheet files, this involves parsing raw bytes to +read the cell data. For general JS data, this involves reshaping the data.
+Process Data: From generating summary statistics to cleaning data +records, this step is the heart of the problem.
+Package Data: This can involve making a new spreadsheet or serializing
+with JSON.stringify or writing XML or simply flattening data for UI tools.
Release Data: Spreadsheet files can be uploaded to a server or written +locally. Data can be presented to users in an HTML TABLE or data grid.
+A common problem involves generating a valid spreadsheet export from data stored
+in an HTML table. In this example, an HTML TABLE on the page will be scraped,
+a row will be added to the bottom with the date of the report, and a new file
+will be generated and downloaded locally. XLSX.writeFile takes care of
+packaging the data and attempting a local download:
// Acquire Data (reference to the HTML table)
+var table_elt = document.getElementById("my-table-id");
+
+// Extract Data (create a workbook object from the table)
+var workbook = XLSX.utils.table_to_book(table_elt);
+
+// Process Data (add a new row)
+var ws = workbook.Sheets["Sheet1"];
+XLSX.utils.sheet_add_aoa(ws, [["Created "+new Date().toISOString()]], {origin:-1});
+
+// Package and Release Data (`writeFile` tries to write and save an XLSB file)
+XLSX.writeFile(workbook, "Report.xlsb");This library tries to simplify steps 2 and 4 with functions to extract useful
+data from spreadsheet files (read / readFile) and generate new spreadsheet
+files from data (write / writeFile). Additional utility functions like
+table_to_book work with other common data sources like HTML tables.
This documentation and various demo projects cover a number of common scenarios +and approaches for steps 1 and 5.
+Utility functions help with step 3.
+Data processing should fit in any workflow
+The library does not impose a separate lifecycle. It fits nicely in websites +and apps built using any framework. The plain JS data objects play nice with +Web Workers and future APIs.
+"Acquiring and Extracting Data" describes +solutions for common data import scenarios.
+"Writing Workbooks" describes solutions for common data +export scenarios involving actual spreadsheet files.
+"Utility Functions" details utility functions for +translating JSON Arrays and other common JS structures into worksheet objects.
+JavaScript is a powerful language for data processing
+The "Common Spreadsheet Format" is a simple object +representation of the core concepts of a workbook. The various functions in the +library provide low-level tools for working with the object.
+For friendly JS processing, there are utility functions for converting parts of +a worksheet to/from an Array of Arrays. The following example combines powerful +JS Array methods with a network request library to download data, select the +information we want and create a workbook file:
+The goal is to generate a XLSB workbook of US President names and birthdays.
+Acquire Data
+Raw Data
+https://theunitedstates.io/congress-legislators/executive.json has the desired +data. For example, John Adams:
+{
+ "id": { /* (data omitted) */ },
+ "name": {
+ "first": "John", // <-- first name
+ "last": "Adams" // <-- last name
+ },
+ "bio": {
+ "birthday": "1735-10-19", // <-- birthday
+ "gender": "M"
+ },
+ "terms": [
+ { "type": "viceprez", /* (other fields omitted) */ },
+ { "type": "viceprez", /* (other fields omitted) */ },
+ { "type": "prez", /* (other fields omitted) */ } // <-- look for "prez"
+ ]
+}Filtering for Presidents
+The dataset includes Aaron Burr, a Vice President who was never President!
+Array#filter creates a new array with the desired rows. A President served
+at least one term with type set to "prez". To test if a particular row has
+at least one "prez" term, Array#some is another native JS function. The
+complete filter would be:
const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));Lining up the data
+For this example, the name will be the first name combined with the last name
+(row.name.first + " " + row.name.last) and the birthday will be the subfield
+row.bio.birthday. Using Array#map, the dataset can be massaged in one call:
const rows = prez.map(row => ({
+ name: row.name.first + " " + row.name.last,
+ birthday: row.bio.birthday
+}));The result is an array of "simple" objects with no nesting:
+[
+ { name: "George Washington", birthday: "1732-02-22" },
+ { name: "John Adams", birthday: "1735-10-19" },
+ // ... one row per President
+]Extract Data
+With the cleaned dataset, XLSX.utils.json_to_sheet generates a worksheet:
const worksheet = XLSX.utils.json_to_sheet(rows);XLSX.utils.book_new creates a new workbook and XLSX.utils.book_append_sheet
+appends a worksheet to the workbook. The new worksheet will be called "Dates":
const workbook = XLSX.utils.book_new();
+XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");Process Data
+Fixing headers
+By default, json_to_sheet creates a worksheet with a header row. In this case,
+the headers come from the JS object keys: "name" and "birthday".
The headers are in cells A1 and B1. XLSX.utils.sheet_add_aoa can write text
+values to the existing worksheet starting at cell A1:
XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });Fixing Column Widths
+Some of the names are longer than the default column width. Column widths are
+set by setting the "!cols" worksheet property.
The following line sets the width of column A to approximately 10 characters:
+worksheet["!cols"] = [ { wch: 10 } ]; // set column A width to 10 charactersOne Array#reduce call over rows can calculate the maximum width:
const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
+worksheet["!cols"] = [ { wch: max_width } ];Note: If the starting point was a file or HTML table, XLSX.utils.sheet_to_json
+will generate an array of JS objects.
Package and Release Data
+XLSX.writeFile creates a spreadsheet file and tries to write it to the system.
+In the browser, it will try to prompt the user to download the file. In NodeJS,
+it will write to the local directory.
XLSX.writeFile(workbook, "Presidents.xlsx");Complete Example
+// Uncomment the next line for use in NodeJS:
+// const XLSX = require("xlsx"), axios = require("axios");
+
+(async() => {
+ /* fetch JSON data and parse */
+ const url = "https://theunitedstates.io/congress-legislators/executive.json";
+ const raw_data = (await axios(url, {responseType: "json"})).data;
+
+ /* filter for the Presidents */
+ const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
+
+ /* flatten objects */
+ const rows = prez.map(row => ({
+ name: row.name.first + " " + row.name.last,
+ birthday: row.bio.birthday
+ }));
+
+ /* generate worksheet and workbook */
+ const worksheet = XLSX.utils.json_to_sheet(rows);
+ const workbook = XLSX.utils.book_new();
+ XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
+
+ /* fix headers */
+ XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
+
+ /* calculate column width */
+ const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
+ worksheet["!cols"] = [ { wch: max_width } ];
+
+ /* create an XLSX file and try to save to Presidents.xlsx */
+ XLSX.writeFile(workbook, "Presidents.xlsx");
+})();For use in the web browser, assuming the snippet is saved to snippet.js,
+script tags should be used to include the axios and xlsx standalone builds:
<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
+<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
+<script src="snippet.js"></script>File formats are implementation details
+The parser covers a wide gamut of common spreadsheet file formats to ensure that +"HTML-saved-as-XLS" files work as well as actual XLS or XLSX files.
+The writer supports a number of common output formats for broad compatibility +with the data ecosystem.
+To the greatest extent possible, data processing code should not have to worry +about the specific file formats involved.
The demos directory includes sample projects for:
Platforms and Integrations
denoelectron applicationnw.js applicationChrome / Chromium extensionsDownload a Google Sheet locallyAdobe ExtendScriptHeadless Browserscanvas-datagridinternet explorerOther examples are included in the showcase.
-The node version automatically requires modules for additional features. Some -of these modules are rather large in size and are only needed in special -circumstances, so they do not ship with the core. For browser use, they must -be included directly:
-<!-- international support from js-codepage -->
-<script src="dist/cpexcel.js"></script>An appropriate version for each dependency is included in the dist/ directory.
-The complete single-file version is generated at dist/xlsx.full.min.js
A slimmer build is generated at dist/xlsx.mini.min.js. Compared to full build:
Webpack and Browserify builds include optional modules by default. Webpack can
-be configured to remove support with resolve.alias:
/* uncomment the lines below to remove support */
- resolve: {
- alias: { "./dist/cpexcel.js": "" } // <-- omit international support
- }Since the library uses functions like Array#forEach, older browsers require
-shims to provide missing functions.
To use the shim, add the shim before the script tag that loads xlsx.js:
<!-- add the shim first -->
-<script type="text/javascript" src="shim.min.js"></script>
-<!-- after the shim is referenced, add the library -->
-<script type="text/javascript" src="xlsx.full.min.js"></script>The script also includes IE_LoadFile and IE_SaveFile for loading and saving
-files in Internet Explorer versions 6-9. The xlsx.extendscript.js script
-bundles the shim in a format suitable for Photoshop and other Adobe products.
API
+Extract data from spreadsheet bytes
+var workbook = XLSX.read(data, opts);The read method can extract data from spreadsheet bytes stored in a JS string,
+"binary string", NodeJS buffer or typed array (Uint8Array or ArrayBuffer).
Read spreadsheet bytes from a local file and extract data
+var workbook = XLSX.readFile(filename, opts);The readFile method attempts to read a spreadsheet file at the supplied path.
+Browsers generally do not allow reading files in this way (it is deemed a
+security risk), and attempts to read files in this way will throw an error.
The second opts argument is optional. "Parsing Options"
+covers the supported properties and behaviors.
Examples
+Here are a few common scenarios (click on each subtitle to see the code):
Prior to SheetJS, APIs for processing spreadsheet files were format-specific. -Third-party libraries either supported one format, or they involved a separate -set of classes for each supported file type. Even though XLSB was introduced in -Excel 2007, nothing outside of SheetJS or Excel supported the format.
-To promote a format-agnostic view, SheetJS starts from a pure-JS representation -that we call the "Common Spreadsheet Format". -Emphasizing a uniform object representation enables new features like format -conversion (reading an XLSX template and saving as XLS) and circumvents the mess -of classes. By abstracting the complexities of the various formats, tools -need not worry about the specific file type!
-A simple object representation combined with careful coding practices enables -use cases in older browsers and in alternative environments like ExtendScript -and Web Workers. It is always tempting to use the latest and greatest features, -but they tend to require the latest versions of browsers, limiting usability.
-Utility functions capture common use cases like generating JS objects or HTML. -Most simple operations should only require a few lines of code. More complex -operations generally should be straightforward to implement.
-Excel pushes the XLSX format as default starting in Excel 2007. However, there -are other formats with more appealing properties. For example, the XLSB format -is spiritually similar to XLSX but files often tend up taking less than half the -space and open much faster! Even though an XLSX writer is available, other -format writers are available so users can take advantage of the unique -characteristics of each format.
-The primary focus of the Community Edition is correct data interchange, focused -on extracting data from any compatible data representation and exporting data in -various formats suitable for any third party interface.
-For parsing, the first step is to read the file. This involves acquiring the -data and feeding it into the library. Here are a few common scenarios:
-readFile is only available in server environments. Browsers have no API for
-reading arbitrary files given a path, so another strategy must be used.
if(typeof require !== 'undefined') XLSX = require('xlsx');
-var workbook = XLSX.readFile('test.xlsx');
-/* DO SOMETHING WITH workbook HERE */readFile wraps the File logic in Photoshop and other ExtendScript targets.
-The specified path should be an absolute path:
#include "xlsx.extendscript.js"
-/* Read test.xlsx from the Documents folder */
-var workbook = XLSX.readFile(Folder.myDocuments + '/' + 'test.xlsx');
-/* DO SOMETHING WITH workbook HERE */The extendscript demo includes a more complex example.
The table_to_book and table_to_sheet utility functions take a DOM TABLE
-element and iterate through the child nodes.
var workbook = XLSX.utils.table_to_book(document.getElementById('tableau'));
-/* DO SOMETHING WITH workbook HERE */Multiple tables on a web page can be converted to individual worksheets:
-/* create new workbook */
-var workbook = XLSX.utils.book_new();
+ Local file in a NodeJS server (click to show)
+readFile uses fs.readFileSync under the hood:
+var XLSX = require("xlsx");
-/* convert table 'table1' to worksheet named "Sheet1" */
-var ws1 = XLSX.utils.table_to_sheet(document.getElementById('table1'));
-XLSX.utils.book_append_sheet(workbook, ws1, "Sheet1");
+var workbook = XLSX.readFile("test.xlsx");
+For Node ESM, the readFile helper is not enabled. Instead, fs.readFileSync
+should be used to read the file data as a Buffer for use with XLSX.read:
+import { readFileSync } from "fs";
+import { read } from "xlsx/xlsx.mjs";
-/* convert table 'table2' to worksheet named "Sheet2" */
-var ws2 = XLSX.utils.table_to_sheet(document.getElementById('table2'));
-XLSX.utils.book_append_sheet(workbook, ws2, "Sheet2");
-
-/* workbook now has 2 worksheets */
-Alternatively, the HTML code can be extracted and parsed:
-var htmlstr = document.getElementById('tableau').outerHTML;
-var workbook = XLSX.read(htmlstr, {type:'string'});
+const buf = readFileSync("test.xlsx");
+/* buf is a Buffer */
+const workbook = read(buf);Note: for a more complete example that works in older browsers, check the demo
-at http://oss.sheetjs.com/sheetjs/ajax.html. The xhr demo
-includes more examples with XMLHttpRequest and fetch.
readFile uses Deno.readFileSync under the hood:
// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
+import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
+
+const workbook = XLSX.readFile("test.xlsx");Applications reading files must be invoked with the --allow-read flag. The
+deno demo has more examples
For modern websites targeting Chrome 76+, File#arrayBuffer is recommended:
// XLSX is a global from the standalone script
+
+async function handleDropAsync(e) {
+ e.stopPropagation(); e.preventDefault();
+ const f = e.dataTransfer.files[0];
+ /* f is a File */
+ const data = await f.arrayBuffer();
+ /* data is an ArrayBuffer */
+ const workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+}
+drop_dom_element.addEventListener("drop", handleDropAsync, false);For maximal compatibility, the FileReader API should be used:
function handleDrop(e) {
+ e.stopPropagation(); e.preventDefault();
+ var f = e.dataTransfer.files[0];
+ /* f is a File */
+ var reader = new FileReader();
+ reader.onload = function(e) {
+ var data = e.target.result;
+ /* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
+ var workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+ };
+ reader.readAsArrayBuffer(f);
+}
+drop_dom_element.addEventListener("drop", handleDrop, false);https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.
+Starting with an HTML INPUT element with type="file":
<input type="file" id="input_dom_element">For modern websites targeting Chrome 76+, Blob#arrayBuffer is recommended:
// XLSX is a global from the standalone script
+
+async function handleFileAsync(e) {
+ const file = e.target.files[0];
+ const data = await file.arrayBuffer();
+ /* data is an ArrayBuffer */
+ const workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+}
+input_dom_element.addEventListener("change", handleFileAsync, false);For broader support (including IE10+), the FileReader approach is recommended:
function handleFile(e) {
+ var file = e.target.files[0];
+ var reader = new FileReader();
+ reader.onload = function(e) {
+ var data = e.target.result;
+ /* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
+ var workbook = XLSX.read(e.target.result);
+
+ /* DO SOMETHING WITH workbook HERE */
+ };
+ reader.readAsArrayBuffer(file);
+}
+input_dom_element.addEventListener("change", handleFile, false);The oldie demo shows an IE-compatible fallback scenario.
For modern websites targeting Chrome 42+, fetch is recommended:
// XLSX is a global from the standalone script
+
+(async() => {
+ const url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
+ const data = await (await fetch(url)).arrayBuffer();
+ /* data is an ArrayBuffer */
+ const workbook = XLSX.read(data);
+
+ /* DO SOMETHING WITH workbook HERE */
+})();For broader support, the XMLHttpRequest approach is recommended:
var url = "http://oss.sheetjs.com/test_files/formula_stress_test.xlsx";
/* set up async GET request */
@@ -493,108 +723,139 @@ includes more examples with XMLHttpRequest and fetch.<
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
-}
+};
req.send();The xhr demo includes a longer discussion and more examples.
http://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.
For modern browsers, Blob#arrayBuffer can read data from files:
async function handleDropAsync(e) {
- e.stopPropagation(); e.preventDefault();
- const f = evt.dataTransfer.files[0];
- const data = await f.arrayBuffer();
- const workbook = XLSX.read(data);
+ Local file in a PhotoShop or InDesign plugin (click to show)
+readFile wraps the File logic in Photoshop and other ExtendScript targets.
+The specified path should be an absolute path:
+#include "xlsx.extendscript.js"
- /* DO SOMETHING WITH workbook HERE */
-}
-drop_dom_element.addEventListener('drop', handleDropAsync, false);
-For maximal compatibility, the FileReader API should be used:
-function handleDrop(e) {
- e.stopPropagation(); e.preventDefault();
- var f = e.dataTransfer.files[0];
- var reader = new FileReader();
- reader.onload = function(e) {
- var workbook = XLSX.read(e.target.result);
+/* Read test.xlsx from the Documents folder */
+var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
+The extendscript demo includes a more complex example.
+
+
+ Local file in an Electron app (click to show)
+readFile can be used in the renderer process:
+/* From the renderer process */
+var XLSX = require("xlsx");
+
+var workbook = XLSX.readFile(path);
+Electron APIs have changed over time. The electron demo
+shows a complete example and details the required version-specific settings.
+
+
+ Local file in a mobile app with React Native (click to show)
+The react demo includes a sample React Native app.
+Since React Native does not provide a way to read files from the filesystem, a
+third-party library must be used. The following libraries have been tested:
+
+The base64 encoding returns strings compatible with the base64 type:
+import XLSX from "xlsx";
+import { FileSystem } from "react-native-file-access";
+
+const b64 = await FileSystem.readFile(path, "base64");
+/* b64 is a base64 string */
+const workbook = XLSX.read(b64, {type: "base64"});
+
+The ascii encoding returns binary strings compatible with the binary type:
+import XLSX from "xlsx";
+import { readFile } from "react-native-fs";
+
+const bstr = await readFile(path, "ascii");
+/* bstr is a binary string */
+const workbook = XLSX.read(bstr, {type: "binary"});
+
+
+ NodeJS Server File Uploads (click to show)
+read can accept a NodeJS buffer. readFile can read files generated by a
+HTTP POST request body parser like formidable:
+const XLSX = require("xlsx");
+const http = require("http");
+const formidable = require("formidable");
+
+const server = http.createServer((req, res) => {
+ const form = new formidable.IncomingForm();
+ form.parse(req, (err, fields, files) => {
+ /* grab the first file */
+ const f = Object.entries(files)[0][1];
+ const path = f.filepath;
+ const workbook = XLSX.readFile(path);
/* DO SOMETHING WITH workbook HERE */
- };
- reader.readAsArrayBuffer(f);
-}
-drop_dom_element.addEventListener('drop', handleDrop, false);
+ });
+}).listen(process.env.PORT || 7262);The server demo has more advanced examples.
Data from file input elements can be processed using the same APIs as in the -drag-and-drop example.
-Using Blob#arrayBuffer:
async function handleFileAsync(e) {
- const file = e.target.files[0];
- const data = await file.arrayBuffer();
- const workbook = XLSX.read(data);
+ Download files in a NodeJS process (click to show)
+Node 17.5 and 18.0 have native support for fetch:
+const XLSX = require("xlsx");
+
+const data = await (await fetch(url)).arrayBuffer();
+/* data is an ArrayBuffer */
+const workbook = XLSX.read(data);
+For broader compatibility, third-party modules are recommended.
+request requires a null encoding to yield Buffers:
+var XLSX = require("xlsx");
+var request = require("request");
+
+request({url: url, encoding: null}, function(err, resp, body) {
+ var workbook = XLSX.read(body);
/* DO SOMETHING WITH workbook HERE */
-}
-input_dom_element.addEventListener('change', handleFileAsync, false);
-Using FileReader:
-function handleFile(e) {
- var files = e.target.files, f = files[0];
- var reader = new FileReader();
- reader.onload = function(e) {
- var workbook = XLSX.read(e.target.result);
+});
+axios works the same way in browser and in NodeJS:
+const XLSX = require("xlsx");
+const axios = require("axios");
+
+(async() => {
+ const res = await axios.get(url, {responseType: "arraybuffer"});
+ /* res.data is a Buffer */
+ const workbook = XLSX.read(res.data);
+
+ /* DO SOMETHING WITH workbook HERE */
+})();
+
+
+ Download files in an Electron app (click to show)
+The net module in the main process can make HTTP/HTTPS requests to external
+resources. Responses should be manually concatenated using Buffer.concat:
+const XLSX = require("xlsx");
+const { net } = require("electron");
+
+const req = net.request(url);
+req.on("response", (res) => {
+ const bufs = []; // this array will collect all of the buffers
+ res.on("data", (chunk) => { bufs.push(chunk); });
+ res.on("end", () => {
+ const workbook = XLSX.read(Buffer.concat(bufs));
/* DO SOMETHING WITH workbook HERE */
- };
- reader.readAsArrayBuffer(f);
-}
-input_dom_element.addEventListener('change', handleFile, false);
-The oldie demo shows an IE-compatible fallback scenario.
+ });
+});
+req.end();More specialized cases, including mobile app file processing, are covered in the -included demos
-Note that older versions of IE do not support HTML5 File API, so the Base64 mode -is used for testing.
On OSX you can get the Base64 encoding with:
-$ <target_file base64 | pbcopyOn Windows XP and up you can get the Base64 encoding using certutil:
> certutil -encode target_file target_file.b64(note: You have to open the file and remove the header and footer lines)
-The most common and interesting formats (XLS, XLSX/M, XLSB, ODS) are ultimately -ZIP or CFB containers of files. Neither format puts the directory structure at -the beginning of the file: ZIP files place the Central Directory records at the -end of the logical file, while CFB files can place the storage info anywhere in -the file! As a result, to properly handle these formats, a streaming function -would have to buffer the entire file before commencing. That belies the -expectations of streaming, so we do not provide any streaming read API.
-When dealing with Readable Streams, the easiest approach is to buffer the stream -and process the whole thing at the end. This can be done with a temporary file -or by explicitly concatenating the stream:
-var fs = require('fs');
-var XLSX = require('xlsx');
-function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
+and process the whole thing at the end:
+var fs = require("fs");
+var XLSX = require("xlsx");
+
+function process_RS(stream, cb) {
var buffers = [];
- stream.on('data', function(data) { buffers.push(data); });
- stream.on('end', function() {
+ stream.on("data", function(data) { buffers.push(data); });
+ stream.on("end", function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer, {type:"buffer"});
@@ -602,26 +863,281 @@ or by explicitly concatenating the stream:
cb(workbook);
});
}
-More robust solutions are available using modules like concat-stream.
- Writing to filesystem first (click to show)
-This example uses tempfile to generate file names:
-var fs = require('fs'), tempfile = require('tempfile');
-var XLSX = require('xlsx');
-function process_RS(stream/*:ReadStream*/, cb/*:(wb:Workbook)=>void*/)/*:void*/{
- var fname = tempfile('.sheetjs');
- console.log(fname);
- var ostream = fs.createWriteStream(fname);
- stream.pipe(ostream);
- ostream.on('finish', function() {
- var workbook = XLSX.readFile(fname);
- fs.unlinkSync(fname);
+ ReadableStream in the browser (click to show)
+When dealing with ReadableStream, the easiest approach is to buffer the stream
+and process the whole thing at the end:
+// XLSX is a global from the standalone script
- /* DO SOMETHING WITH workbook IN THE CALLBACK */
- cb(workbook);
+async function process_RS(stream) {
+ /* collect data */
+ const buffers = [];
+ const reader = stream.getReader();
+ for(;;) {
+ const res = await reader.read();
+ if(res.value) buffers.push(res.value);
+ if(res.done) break;
+ }
+
+ /* concat */
+ const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
+
+ let off = 0;
+ for(const u8 of arr) {
+ out.set(u8, off);
+ off += u8.length;
+ }
+
+ return out;
+}
+
+const data = await process_RS(stream);
+/* data is Uint8Array */
+const workbook = XLSX.read(data);
+
+More detailed examples are covered in the included demos
+
+Processing JSON and JS Data
+JSON and JS data tend to represent single worksheets. This section will use a
+few utility functions to generate workbooks:
+Create a new Worksheet
+var workbook = XLSX.utils.book_new();
+The book_new utility function creates an empty workbook with no worksheets.
+Append a Worksheet to a Workbook
+XLSX.utils.book_append_sheet(workbook, worksheet, sheet_name);
+The book_append_sheet utility function appends a worksheet to the workbook.
+The third argument specifies the desired worksheet name. Multiple worksheets can
+be added to a workbook by calling the function multiple times.
+API
+Create a worksheet from an array of arrays of JS values
+var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
+The aoa_to_sheet utility function walks an "array of arrays" in row-major
+order, generating a worksheet object. The following snippet generates a sheet
+with cell A1 set to the string A1, cell B1 set to B2, etc:
+var worksheet = XLSX.utils.aoa_to_sheet([
+ ["A1", "B1", "C1"],
+ ["A2", "B2", "C2"],
+ ["A3", "B3", "C3"]
+])
+"Array of Arrays Input" describes the function and the
+optional opts argument in more detail.
+Create a worksheet from an array of JS objects
+var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
+The json_to_sheet utility function walks an array of JS objects in order,
+generating a worksheet object. By default, it will generate a header row and
+one row per object in the array. The optional opts argument has settings to
+control the column order and header output.
+"Array of Objects Input" describes the function and
+the optional opts argument in more detail.
+Examples
+"Zen of SheetJS" contains a detailed example "Get Data
+from a JSON Endpoint and Generate a Workbook"
+x-spreadsheet is an interactive
+data grid for previewing and modifying structured data in the web browser. The
+xspreadsheet demo includes a sample script with the
+xtos function for converting from x-spreadsheet data object to a workbook.
+https://oss.sheetjs.com/sheetjs/x-spreadsheet is a live demo.
+
+ Records from a database query (SQL or no-SQL) (click to show)
+The database demo includes examples of working with
+databases and query results.
+
+
+ Numerical Computations with TensorFlow.js (click to show)
+@tensorflow/tfjs and other libraries expect data in simple
+arrays, well-suited for worksheets where each column is a data vector. That is
+the transpose of how most people use spreadsheets, where each row is a vector.
+When recovering data from tfjs, the returned data points are stored in a typed
+array. An array of arrays can be constructed with loops. Array#unshift can
+prepend a title row before the conversion:
+const XLSX = require("xlsx");
+const tf = require('@tensorflow/tfjs');
+
+/* suppose xs and ys are vectors (1D tensors) -> tfarr will be a typed array */
+const tfdata = tf.stack([xs, ys]).transpose();
+const shape = tfdata.shape;
+const tfarr = tfdata.dataSync();
+
+/* construct the array of arrays */
+const aoa = [];
+for(let j = 0; j < shape[0]; ++j) {
+ aoa[j] = [];
+ for(let i = 0; i < shape[1]; ++i) aoa[j][i] = tfarr[j * shape[1] + i];
+}
+/* add headers to the top */
+aoa.unshift(["x", "y"]);
+
+/* generate worksheet */
+const worksheet = XLSX.utils.aoa_to_sheet(aoa);
+The array demo shows a complete example.
+
+
+Processing HTML Tables
+API
+Create a worksheet by scraping an HTML TABLE in the page
+var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
+The table_to_sheet utility function takes a DOM TABLE element and iterates
+through the rows to generate a worksheet. The opts argument is optional.
+"HTML Table Input" describes the function in more detail.
+Create a workbook by scraping an HTML TABLE in the page
+var workbook = XLSX.utils.table_to_book(dom_element, opts);
+The table_to_book utility function follows the same logic as table_to_sheet.
+After generating a worksheet, it creates a blank workbook and appends the
+spreadsheet.
+The options argument supports the same options as table_to_sheet, with the
+addition of a sheet property to control the worksheet name. If the property
+is missing or no options are specified, the default name Sheet1 is used.
+Examples
+Here are a few common scenarios (click on each subtitle to see the code):
+
+ HTML TABLE element in a webpage (click to show)
+<!-- include the standalone script and shim. this uses the UNPKG CDN -->
+<script src="https://unpkg.com/xlsx/dist/shim.min.js"></script>
+<script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
+
+<!-- example table with id attribute -->
+<table id="tableau">
+ <tr><td>Sheet</td><td>JS</td></tr>
+ <tr><td>12345</td><td>67</td></tr>
+</table>
+
+<!-- this block should appear after the table HTML and the standalone script -->
+<script type="text/javascript">
+ var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));
+
+ /* DO SOMETHING WITH workbook HERE */
+</script>
+Multiple tables on a web page can be converted to individual worksheets:
+/* create new workbook */
+var workbook = XLSX.utils.book_new();
+
+/* convert table "table1" to worksheet named "Sheet1" */
+var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
+XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
+
+/* convert table "table2" to worksheet named "Sheet2" */
+var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
+XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
+
+/* workbook now has 2 worksheets */
+Alternatively, the HTML code can be extracted and parsed:
+var htmlstr = document.getElementById("tableau").outerHTML;
+var workbook = XLSX.read(htmlstr, {type:"string"});
+
+
+ Chrome/Chromium Extension (click to show)
+The chrome demo shows a complete example and details the
+required permissions and other settings.
+In an extension, it is recommended to generate the workbook in a content script
+and pass the object back to the extension:
+/* in the worker script */
+chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
+ /* pass a message like { sheetjs: true } from the extension to scrape */
+ if(!msg || !msg.sheetjs) return;
+ /* create a new workbook */
+ var workbook = XLSX.utils.book_new();
+ /* loop through each table element */
+ var tables = document.getElementsByTagName("table")
+ for(var i = 0; i < tables.length; ++i) {
+ var worksheet = XLSX.utils.table_to_sheet(tables[i]);
+ XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
+ }
+ /* pass back to the extension */
+ return cb(workbook);
+});
+
+
+ Server-Side HTML Tables with Headless Chrome (click to show)
+The headless demo includes a complete demo to convert HTML
+files to XLSB workbooks. The core idea is to add the script to the page, parse
+the table in the page context, generate a base64 workbook and send it back
+for further processing:
+const XLSX = require("xlsx");
+const { readFileSync } = require("fs"), puppeteer = require("puppeteer");
+
+const url = `https://sheetjs.com/demos/table`;
+
+/* get the standalone build source (node_modules/xlsx/dist/xlsx.full.min.js) */
+const lib = readFileSync(require.resolve("xlsx/dist/xlsx.full.min.js"), "utf8");
+
+(async() => {
+ /* start browser and go to web page */
+ const browser = await puppeteer.launch();
+ const page = await browser.newPage();
+ await page.goto(url, {waitUntil: "networkidle2"});
+
+ /* inject library */
+ await page.addScriptTag({content: lib});
+
+ /* this function `s5s` will be called by the script below, receiving the Base64-encoded file */
+ await page.exposeFunction("s5s", async(b64) => {
+ const workbook = XLSX.read(b64, {type: "base64" });
+
+ /* DO SOMETHING WITH workbook HERE */
});
-}
+
+ /* generate XLSB file in webpage context and send back result */
+ await page.addScriptTag({content: `
+ /* call table_to_book on first table */
+ var workbook = XLSX.utils.table_to_book(document.querySelector("TABLE"));
+
+ /* generate XLSX file */
+ var b64 = XLSX.write(workbook, {type: "base64", bookType: "xlsb"});
+
+ /* call "s5s" hook exposed from the node process */
+ window.s5s(b64);
+ `});
+
+ /* cleanup */
+ await browser.close();
+})();
+
+
+ Server-Side HTML Tables with Headless WebKit (click to show)
+The headless demo includes a complete demo to convert HTML
+files to XLSB workbooks using PhantomJS. The core idea
+is to add the script to the page, parse the table in the page context, generate
+a binary workbook and send it back for further processing:
+var XLSX = require('xlsx');
+var page = require('webpage').create();
+
+/* this code will be run in the page */
+var code = [ "function(){",
+ /* call table_to_book on first table */
+ "var wb = XLSX.utils.table_to_book(document.body.getElementsByTagName('table')[0]);",
+
+ /* generate XLSB file and return binary string */
+ "return XLSX.write(wb, {type: 'binary', bookType: 'xlsb'});",
+"}" ].join("");
+
+page.open('https://sheetjs.com/demos/table', function() {
+ /* Load the browser script from the UNPKG CDN */
+ page.includeJs("https://unpkg.com/xlsx/dist/xlsx.full.min.js", function() {
+ /* The code will return an XLSB file encoded as binary string */
+ var bin = page.evaluateJavaScript(code);
+
+ var workbook = XLSX.read(bin, {type: "binary"});
+ /* DO SOMETHING WITH workbook HERE */
+
+ phantom.exit();
+ });
+});
+
+
+ NodeJS HTML Tables without a browser (click to show)
+NodeJS does not include a DOM implementation and Puppeteer requires a hefty
+Chromium build. jsdom is a lightweight alternative:
+const XLSX = require("xlsx");
+const { readFileSync } = require("fs");
+const { JSDOM } = require("jsdom");
+
+/* obtain HTML string. This example reads from test.html */
+const html_str = fs.readFileSync("test.html", "utf8");
+/* get first TABLE element */
+const doc = new JSDOM(html_str).window.document.querySelector("table");
+/* generate workbook */
+const workbook = XLSX.utils.table_to_book(doc);
Working with the Workbook
@@ -694,56 +1210,74 @@ files and output the contents in various formats. The source is available at
XLSX.utils.sheet_to_formulae generates a list of formulae
-Writing Workbooks
-For writing, the first step is to generate output data. The helper functions
-write and writeFile will produce the data in various formats suitable for
-dissemination. The second step is to actual share the data with the end point.
-Assuming workbook is a workbook object:
+Packaging and Releasing Data
+
+Writing Workbooks
+API
+Generate spreadsheet bytes (file) from data
+var data = XLSX.write(workbook, opts);
+The write method attempts to package data from the workbook into a file in
+memory. By default, XLSX files are generated, but that can be controlled with
+the bookType property of the opts argument. Based on the type option,
+the data can be stored as a "binary string", JS string, Uint8Array or Buffer.
+The second opts argument is required. "Writing Options"
+covers the supported properties and behaviors.
+Generate and attempt to save file
+XLSX.writeFile(workbook, filename, opts);
+The writeFile method packages the data and attempts to save the new file. The
+export file format is determined by the extension of filename (SheetJS.xlsx
+signals XLSX export, SheetJS.xlsb signals XLSB export, etc).
+The writeFile method uses platform-specific APIs to initiate the file save. In
+NodeJS, fs.readFileSync can create a file. In the web browser, a download is
+attempted using the HTML5 download attribute, with fallbacks for IE.
+Generate and attempt to save an XLSX file
+XLSX.writeFileXLSX(workbook, filename, opts);
+The writeFile method embeds a number of different export functions. This is
+great for developer experience but not amenable to dead code elimination using
+the current toolset. When only XLSX exports are needed, this method avoids
+referencing the other export codecs.
+The second opts argument is optional. "Writing Options"
+covers the supported properties and behaviors.
+Examples
- nodejs write a file (click to show)
-XLSX.writeFile uses fs.writeFileSync in server environments:
-if(typeof require !== 'undefined') XLSX = require('xlsx');
+ Local file in a NodeJS server (click to show)
+writeFile uses fs.writeFileSync in server environments:
+var XLSX = require("xlsx");
+
/* output format determined by filename */
-XLSX.writeFile(workbook, 'out.xlsb');
-/* at this point, out.xlsb is a file that you can distribute */
+XLSX.writeFile(workbook, "out.xlsb");
+For Node ESM, the writeFile helper is not enabled. Instead, fs.writeFileSync
+should be used to write the file data to a Buffer for use with XLSX.write:
+import { writeFileSync } from "fs";
+import { write } from "xlsx/xlsx.mjs";
+
+const buf = write(workbook, {type: "buffer", bookType: "xlsb"});
+/* buf is a Buffer */
+const workbook = writeFileSync("out.xlsb", buf);
- Photoshop ExtendScript write a file (click to show)
+ Local file in a Deno application (click to show)
+writeFile uses Deno.writeFileSync under the hood:
+// @deno-types="https://deno.land/x/sheetjs/types/index.d.ts"
+import * as XLSX from 'https://deno.land/x/sheetjs/xlsx.mjs'
+
+XLSX.writeFile(workbook, "test.xlsx");
+Applications writing files must be invoked with the --allow-write flag. The
+deno demo has more examples
+
+
+ Local file in a PhotoShop or InDesign plugin (click to show)
writeFile wraps the File logic in Photoshop and other ExtendScript targets.
The specified path should be an absolute path:
#include "xlsx.extendscript.js"
+
/* output format determined by filename */
-XLSX.writeFile(workbook, 'out.xlsx');
+XLSX.writeFile(workbook, "out.xlsx");
/* at this point, out.xlsx is a file that you can distribute */
The extendscript demo includes a more complex example.
- Browser add TABLE element to page (click to show)
-The sheet_to_html utility function generates HTML code that can be added to
-any DOM element.
-var worksheet = workbook.Sheets[workbook.SheetNames[0]];
-var container = document.getElementById('tableau');
-container.innerHTML = XLSX.utils.sheet_to_html(worksheet);
-
-
- Browser upload file (ajax) (click to show)
-A complete example using XHR is included in the XHR demo, along
-with examples for fetch and wrapper libraries. This example assumes the server
-can handle Base64-encoded files (see the demo for a basic nodejs server):
-/* in this example, send a base64 string to the server */
-var wopts = { bookType:'xlsx', bookSST:false, type:'base64' };
-
-var wbout = XLSX.write(workbook,wopts);
-
-var req = new XMLHttpRequest();
-req.open("POST", "/upload", true);
-var formdata = new FormData();
-formdata.append('file', 'test.xlsx'); // <-- server expects `file` to hold name
-formdata.append('data', wbout); // <-- `data` holds the base64-encoded data
-req.send(formdata);
-
-
- Browser save file (click to show)
+ Download a file in the browser to the user machine (click to show)
XLSX.writeFile wraps a few techniques for triggering a file save:
-
@@ -757,17 +1291,17 @@ XP and Windows 7. The shim must be included in the containing HTML page.
There is no standard way to determine if the actual file has been downloaded.
/* output format determined by filename */
-XLSX.writeFile(workbook, 'out.xlsb');
+XLSX.writeFile(workbook, "out.xlsb");
/* at this point, out.xlsb will have been downloaded */
- Browser save file (compatibility) (click to show)
+ Download a file in legacy browsers (click to show)
XLSX.writeFile techniques work for most modern browsers as well as older IE.
For much older browsers, there are workarounds implemented by wrapper libraries.
FileSaver.js implements saveAs.
Note: XLSX.writeFile will automatically call saveAs if available.
/* bookType can be any supported output type */
-var wopts = { bookType:'xlsx', bookSST:false, type:'array' };
+var wopts = { bookType:"xlsx", bookSST:false, type:"array" };
var wbout = XLSX.write(workbook,wopts);
@@ -776,14 +1310,47 @@ Note: XLSX.writeFile will automatically call saveAs if
Downloadify uses a Flash SWF button
to generate local files, suitable for environments where ActiveX is unavailable:
Downloadify.create(id,{
- /* other options are required! read the downloadify docs for more info */
- filename: "test.xlsx",
- data: function() { return XLSX.write(wb, {bookType:"xlsx", type:'base64'}); },
- append: false,
- dataType: 'base64'
+ /* other options are required! read the downloadify docs for more info */
+ filename: "test.xlsx",
+ data: function() { return XLSX.write(wb, {bookType:"xlsx", type:"base64"}); },
+ append: false,
+ dataType: "base64"
});
The oldie demo shows an IE-compatible fallback scenario.
+
+ Browser upload file (ajax) (click to show)
+A complete example using XHR is included in the XHR demo, along
+with examples for fetch and wrapper libraries. This example assumes the server
+can handle Base64-encoded files (see the demo for a basic nodejs server):
+/* in this example, send a base64 string to the server */
+var wopts = { bookType:"xlsx", bookSST:false, type:"base64" };
+
+var wbout = XLSX.write(workbook,wopts);
+
+var req = new XMLHttpRequest();
+req.open("POST", "/upload", true);
+var formdata = new FormData();
+formdata.append("file", "test.xlsx"); // <-- server expects `file` to hold name
+formdata.append("data", wbout); // <-- `data` holds the base64-encoded data
+req.send(formdata);
+
+
+ PhantomJS (Headless Webkit) File Generation (click to show)
+The headless demo includes a complete demo to convert HTML
+files to XLSB workbooks using PhantomJS. PhantomJS
+fs.write supports writing files from the main process but has a different
+interface from the NodeJS fs module:
+var XLSX = require('xlsx');
+var fs = require('fs');
+
+/* generate a binary string */
+var bin = XLSX.write(workbook, { type:"binary", bookType: "xlsx" });
+/* write to file */
+fs.write("test.xlsx", bin, "wb");
+Note: The section "Processing HTML Tables" shows how
+to generate a workbook from HTML tables in a page in "Headless WebKit".
+
The included demos cover mobile apps and other special deployments.
Writing Examples
@@ -824,6 +1391,193 @@ Stream. They are only exposed in NodeJS.
stream.pipe(conv); conv.pipe(process.stdout);
https://github.com/sheetjs/sheetaki pipes write streams to nodejs response.
+
+Generating JSON and JS Data
+JSON and JS data tend to represent single worksheets. The utility functions in
+this section work with single worksheets.
+The "Common Spreadsheet Format" section describes
+the object structure in more detail. workbook.SheetNames is an ordered list
+of the worksheet names. workbook.Sheets is an object whose keys are sheet
+names and whose values are worksheet objects.
+The "first worksheet" is stored at workbook.Sheets[workbook.SheetNames[0]].
+API
+Create an array of JS objects from a worksheet
+var jsa = XLSX.utils.sheet_to_json(worksheet, opts);
+Create an array of arrays of JS values from a worksheet
+var aoa = XLSX.utils.sheet_to_json(worksheet, {...opts, header: 1});
+The sheet_to_json utility function walks a workbook in row-major order,
+generating an array of objects. The second opts argument controls a number of
+export decisions including the type of values (JS values or formatted text). The
+"JSON" section describes the argument in more detail.
+By default, sheet_to_json scans the first row and uses the values as headers.
+With the header: 1 option, the function exports an array of arrays of values.
+Examples
+x-spreadsheet is an interactive
+data grid for previewing and modifying structured data in the web browser. The
+xspreadsheet demo includes a sample script with the
+stox function for converting from a workbook to x-spreadsheet data object.
+https://oss.sheetjs.com/sheetjs/x-spreadsheet is a live demo.
+
+ Previewing data in a React data grid (click to show)
+react-data-grid is a data grid tailored for
+react. It expects two properties: rows of data objects and columns which
+describe the columns. For the purposes of massaging the data to fit the react
+data grid API it is easiest to start from an array of arrays.
+This demo starts by fetching a remote file and using XLSX.read to extract:
+import { useEffect, useState } from "react";
+import DataGrid from "react-data-grid";
+import { read, utils } from "xlsx";
+
+const url = "https://oss.sheetjs.com/test_files/RkNumber.xls";
+
+export default function App() {
+ const [columns, setColumns] = useState([]);
+ const [rows, setRows] = useState([]);
+ useEffect(() => {(async () => {
+ const wb = read(await (await fetch(url)).arrayBuffer(), { WTF: 1 });
+
+ /* use sheet_to_json with header: 1 to generate an array of arrays */
+ const data = utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]], { header: 1 });
+
+ /* see react-data-grid docs to understand the shape of the expected data */
+ setColumns(data[0].map((r) => ({ key: r, name: r })));
+ setRows(data.slice(1).map((r) => r.reduce((acc, x, i) => {
+ acc[data[0][i]] = x;
+ return acc;
+ }, {})));
+ })(); });
+
+ return <DataGrid columns={columns} rows={rows} />;
+}
+
+
+ Populating a database (SQL or no-SQL) (click to show)
+The database demo includes examples of working with
+databases and query results.
+
+
+ Numerical Computations with TensorFlow.js (click to show)
+@tensorflow/tfjs and other libraries expect data in simple
+arrays, well-suited for worksheets where each column is a data vector. That is
+the transpose of how most people use spreadsheets, where each row is a vector.
+A single Array#map can pull individual named rows from sheet_to_json export:
+const XLSX = require("xlsx");
+const tf = require('@tensorflow/tfjs');
+
+const key = "age"; // this is the field we want to pull
+const ages = XLSX.utils.sheet_to_json(worksheet).map(r => r[key]);
+const tf_data = tf.tensor1d(ages);
+All fields can be processed at once using a transpose of the 2D tensor generated
+with the sheet_to_json export with header: 1. The first row, if it contains
+header labels, should be removed with a slice:
+const XLSX = require("xlsx");
+const tf = require('@tensorflow/tfjs');
+
+/* array of arrays of the data starting on the second row */
+const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1}).slice(1);
+/* dataset in the "correct orientation" */
+const tf_dataset = tf.tensor2d(aoa).transpose();
+/* pull out each dataset with a slice */
+const tf_field0 = tf_dataset.slice([0,0], [1,tensor.shape[1]]).flatten();
+const tf_field1 = tf_dataset.slice([1,0], [1,tensor.shape[1]]).flatten();
+The array demo shows a complete example.
+
+
+Generating HTML Tables
+API
+Generate HTML Table from Worksheet
+var html = XLSX.utils.sheet_to_html(worksheet);
+The sheet_to_html utility function generates HTML code based on the worksheet
+data. Each cell in the worksheet is mapped to a <TD> element. Merged cells
+in the worksheet are serialized by setting colspan and rowspan attributes.
+Examples
+The sheet_to_html utility function generates HTML code that can be added to
+any DOM element by setting the innerHTML:
+var container = document.getElementById("tavolo");
+container.innerHTML = XLSX.utils.sheet_to_html(worksheet);
+Combining with fetch, constructing a site from a workbook is straightforward:
+
+ Vanilla JS + HTML fetch workbook and generate table previews (click to show)
+<body>
+ <style>TABLE { border-collapse: collapse; } TD { border: 1px solid; }</style>
+ <div id="tavolo"></div>
+ <script src="https://unpkg.com/xlsx/dist/xlsx.full.min.js"></script>
+ <script type="text/javascript">
+(async() => {
+ /* fetch and parse workbook -- see the fetch example for details */
+ const workbook = XLSX.read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
+
+ let output = [];
+ /* loop through the worksheet names in order */
+ workbook.SheetNames.forEach(name => {
+
+ /* generate HTML from the corresponding worksheets */
+ const worksheet = workbook.Sheets[name];
+ const html = XLSX.utils.sheet_to_html(worksheet);
+
+ /* add a header with the title name followed by the table */
+ output.push(`<H3>${name}</H3>${html}`);
+ });
+ /* write to the DOM at the end */
+ tavolo.innerHTML = output.join("\n");
+})();
+ </script>
+</body>
+
+
+ React fetch workbook and generate HTML table previews (click to show)
+It is generally recommended to use a React-friendly workflow, but it is possible
+to generate HTML and use it in React with dangerouslySetInnerHTML:
+function Tabeller(props) {
+ /* the workbook object is the state */
+ const [workbook, setWorkbook] = React.useState(XLSX.utils.book_new());
+
+ /* fetch and update the workbook with an effect */
+ React.useEffect(() => { (async() => {
+ /* fetch and parse workbook -- see the fetch example for details */
+ const wb = XLSX.read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
+ setWorkbook(wb);
+ })(); });
+
+ return workbook.SheetNames.map(name => (<>
+ <h3>name</h3>
+ <div dangerouslySetInnerHTML={{
+ /* this __html mantra is needed to set the inner HTML */
+ __html: XLSX.utils.sheet_to_html(workbook.Sheets[name])
+ }} />
+ </>));
+}
+The react demo includes more React examples.
+
+
+ VueJS fetch workbook and generate HTML table previews (click to show)
+It is generally recommended to use a VueJS-friendly workflow, but it is possible
+to generate HTML and use it in VueJS with the v-html directive:
+import { read, utils } from 'xlsx';
+import { reactive } from 'vue';
+
+const S5SComponent = {
+ mounted() { (async() => {
+ /* fetch and parse workbook -- see the fetch example for details */
+ const workbook = read(await (await fetch("sheetjs.xlsx")).arrayBuffer());
+ /* loop through the worksheet names in order */
+ workbook.SheetNames.forEach(name => {
+ /* generate HTML from the corresponding worksheets */
+ const html = utils.sheet_to_html(workbook.Sheets[name]);
+ /* add to state */
+ this.wb.wb.push({ name, html });
+ });
+ })(); },
+ /* this state mantra is required for array updates to work */
+ setup() { return { wb: reactive({ wb: [] }) }; },
+ template: `
+ <div v-for="ws in wb.wb" :key="ws.name">
+ <h3>{{ ws.name }}</h3>
+ <div v-html="ws.html"></div>
+ </div>`
+};
+The vuejs demo includes more React examples.
+
Interface
XLSX is the exposed variable in the browser and the exported node variable
@@ -847,6 +1601,13 @@ If o is omitted, the writer will use the third argument as the call
Utilities
Utilities are available in the XLSX.utils object and are described in the
Utility Functions section:
+Constructing:
+book_new creates an empty workbookbook_append_sheet adds a worksheet to a workbookImporting:
Row Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM, ODS
+Column Properties: XLSX/M, XLSB, BIFF8 XLS, XLML, SYLK, DOM
+Row and Column properties are not extracted by default when reading from a file
+and are not persisted by default when writing to a file. The option
+cellStyles: true must be passed to the relevant read or write function.
Column Properties
The !cols array in each worksheet, if present, is a collection of ColInfo
objects which have the following properties:
type ColInfo = {
@@ -1630,6 +2400,23 @@ objects which have the following properties:
level?: number; // 0-indexed outline / group level
MDW?: number; // Excel's "Max Digit Width" unit, always integral
};Row Properties
+The !rows array in each worksheet, if present, is a collection of RowInfo
+objects which have the following properties:
type RowInfo = {
+ /* visibility */
+ hidden?: boolean; // if true, the row is hidden
+
+ /* row height is specified in one of the following ways: */
+ hpx?: number; // height in screen pixels
+ hpt?: number; // height in points
+
+ level?: number; // 0-indexed outline / group level
+};Outline / Group Levels Convention
+The Excel UI displays the base outline level as 1 and the max level as 8.
+Following JS conventions, SheetJS uses 0-indexed outline levels wherein the base
+outline level is 0 and the max level is 7.
There are three different width types corresponding to the three different ways
@@ -1653,6 +2440,17 @@ when changing the pixel width, delete the Row Heights Excel internally stores row heights in points. The default resolution is 72 DPI
+or 96 PPI, so the pixel and point size should agree. For different resolutions
+they may not agree, so the library separates the concepts. Even though all of the information is made available, writers are expected to
+follow the priority order: Column Widths Given the constraints, it is possible to determine the MDW without actually
inspecting the font! The parsers guess the pixel width by converting from width
to pixels and back, repeating for all possible MDW and selecting the MDW that
@@ -1667,34 +2465,6 @@ follow the priority order:wch and width
Implementation details (click to show)
+
+
+hpx pixel height if availablehpt point height if available
The !rows array in each worksheet, if present, is a collection of RowInfo
-objects which have the following properties:
type RowInfo = {
- /* visibility */
- hidden?: boolean; // if true, the row is hidden
-
- /* row height is specified in one of the following ways: */
- hpx?: number; // height in screen pixels
- hpt?: number; // height in points
-
- level?: number; // 0-indexed outline / group level
-};Note: Excel UI displays the base outline level as 1 and the max level as 8.
-The level field stores the base outline as 0 and the max level as 7.
Excel internally stores row heights in points. The default resolution is 72 DPI -or 96 PPI, so the pixel and point size should agree. For different resolutions -they may not agree, so the library separates the concepts.
-Even though all of the information is made available, writers are expected to -follow the priority order:
-hpx pixel height if availablehpt point height if availableThe cell.w formatted text for each cell is produced from cell.v and cell.z
format. If the format is not specified, the Excel General format is used.
@@ -3648,21 +4418,21 @@ range limits will be silently truncated:
Excel 2003 SpreadsheetML range limits are governed by the version of Excel and are not enforced by the writer.
-Core Spreadsheet Formats
+XLSX and XLSM files are ZIP containers containing a series of XML files in accordance with the Open Packaging Conventions (OPC). The XLSM format, almost identical to XLSX, is used for files containing macros.
The format is standardized in ECMA-376 and later in ISO/IEC 29500. Excel does not follow the specification, and there are additional documents discussing how Excel deviates from the specification.
-BIFF 2/3 XLS are single-sheet streams of binary records. Excel 4 introduced
the concept of a workbook (XLW files) but also had single-sheet XLS format.
The structure is largely similar to the Lotus 1-2-3 file formats. BIFF5/8/12
@@ -3671,85 +4441,64 @@ extended the format in various ways but largely stuck to the same record format.
files in these formats, so record lengths and fields were determined by writing
in all of the supported formats and comparing files. Excel 2016 can generate
BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.
BIFF8 exclusively uses the Compound File Binary container format, splitting some content into streams within the file. At its core, it still uses an extended version of the binary record format from older versions of BIFF.
The MS-XLS specification covers the basics of the file format, and other
specifications expand on serialization of features like properties.
Predating XLSX, SpreadsheetML files are simple XML files. There is no official and comprehensive specification, although MS has released documentation on the format. Since Excel 2016 can generate SpreadsheetML files, mapping features is pretty straightforward.
-Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture with the content separation and ZIP container of XLSX. For the most part nodes in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.
The MS-XLSB specification covers the basics of the file format, and other
specifications expand on serialization of features like properties.
Excel CSV deviates from RFC4180 in a number of important ways. The generated CSV files should generally work in Excel although they may not work in RFC4180 compatible readers. The parser should generally understand Excel CSV. The writer proactively generates cells for formulae if values are unavailable.
Excel TXT uses tab as the delimiter and code page 1200.
-Notes:
-0x49 0x44 ("ID") are treated as Symbolic
+Like in Excel, files starting with 0x49 0x44 ("ID") are treated as Symbolic
Link files. Unlike Excel, if the file does not have a valid SYLK header, it
will be proactively reinterpreted as CSV. There are some files with semicolon
delimiter that align with a valid SYLK file. For the broadest compatibility,
-all cells with the value of ID are automatically wrapped in double-quotes.
Support for other formats is generally far XLS/XLSB/XLSX support, due in large
+all cells with the value of ID are automatically wrapped in double-quotes.
Miscellaneous Workbook Formats
+Support for other formats is generally far behind XLS/XLSB/XLSX support, due in part to a lack of publicly available documentation. Test files were produced in the respective apps and compared to their XLS exports to determine structure. The main focus is data extraction.
-The Lotus formats consist of binary records similar to the BIFF structure. Lotus did release a specification decades ago covering the original WK1 format. Other features were deduced by producing files and comparing to Excel support.
Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0.
Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0.
-The Quattro Pro formats use binary records in the same way as BIFF and Lotus. Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like BIFF8 XLS.
-All versions of Works were limited to a single worksheet.
Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format with additional record types.
@@ -3760,42 +4509,33 @@ BIFF8 XLS: it uses the CFB container with a Workbook stream. Works 9 saves the exact Workbook stream for the XLR and the 97-2003 XLS export. Works 6 XLS includes two empty worksheets but the main worksheet has an identical encoding. XLR also includes aWksSSWorkBook stream similar to Lotus FM3/FMT files.
-iWork 2013 (Numbers 3.0 / Pages 5.0 / Keynote 6.0) switched from a proprietary XML-based format to the current file format based on the iWork Archive (IWA). This format has been used up through the current release (Numbers 11.2).
The parser focuses on extracting raw data from tables. Numbers technically supports multiple tables in a logical worksheet, including custom titles. This parser will generate one worksheet per Numbers table.
-ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to SpreadsheetML. Both are detailed in the OASIS standard, but tools like LO/OO add undocumented extensions. The parsers and writers do not implement the full standard, instead focusing on parts necessary to extract and store raw data.
-UOS is a very similar format, and it comes in 2 varieties corresponding to ODS and FODS respectively. For the most part, the difference between the formats is in the names of tags and attributes.
-Miscellaneous Worksheet Formats
Many older formats supported only one worksheet:
-DBF is really a typed table format: each column can only hold one data type and each record omits type information. The parser generates a header row and inserts records starting at the second row of the worksheet. The writer makes @@ -3803,47 +4543,48 @@ files compatible with Visual FoxPro extensions.
Multi-file extensions like external memos and tables are currently unsupported, limited by the general ability to read arbitrary files in the web browser. The reader understands DBF Level 7 extensions like DATETIME.
-There is no real documentation. All knowledge was gathered by saving files in various versions of Excel to deduce the meaning of fields. Notes:
Plain formulae are stored in the RC form.
+Column widths are rounded to integral characters.
+Lotus Formatted Text (PRN)
+There is no real documentation, and in fact Excel treats PRN as an output-only file format. Nevertheless we can guess the column widths and reverse-engineer the original layout. Excel's 240 character width limitation is not enforced.
-There is no unified definition. Visicalc DIF differs from Lotus DIF, and both differ from Excel DIF. Where ambiguous, the parser/writer follows the expected behavior from Excel. In particular, Excel extends DIF in incompatible ways:
"0.3" -> "=""0.3""
+Since Excel automatically converts numbers-as-strings to numbers, numeric
+string constants are converted to formulae: "0.3" -> "=""0.3""
DIF technically expects numeric cells to hold the raw numeric data, but Excel +permits formatted numbers (including dates)
+DIF technically has no support for formulae, but Excel will automatically +convert plain formulae. Array formulae are not preserved.
+HTML
Excel HTML worksheets include special metadata encoded in styles. For example,
mso-number-format is a localized string containing the number format. Despite
the metadata the output is valid HTML, although it does accept bare & symbols.
&<
looks for those tags and overrides the default interpretation. For example, text
like <td>12345</td> will be parsed as numbers but <td t="s">12345</td> will
be parsed as text.
-Excel RTF worksheets are stored in clipboard when copying cells or ranges from a worksheet. The supported codes are a subset of the Word RTF support.
-Ethercalc is an open source web spreadsheet powered by a record format reminiscent of SYLK wrapped in a MIME multi-part message.