forked from sheetjs/docs.sheetjs.com
		
	
		
			
	
	
		
			271 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			271 lines
		
	
	
		
			8.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| 
								 | 
							
								### Usage
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Most scenarios involving spreadsheets and data can be broken into 5 parts:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								1) **Acquire Data**:  Data may be stored anywhere: local or remote files,
							 | 
						||
| 
								 | 
							
								   databases, HTML TABLE, or even generated programmatically in the web browser.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								2) **Extract Data**:  For spreadsheet files, this involves parsing raw bytes to
							 | 
						||
| 
								 | 
							
								   read the cell data. For general JS data, this involves reshaping the data.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								3) **Process Data**:  From generating summary statistics to cleaning data
							 | 
						||
| 
								 | 
							
								   records, this step is the heart of the problem.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								4) **Package Data**:  This can involve making a new spreadsheet or serializing
							 | 
						||
| 
								 | 
							
								   with `JSON.stringify` or writing XML or simply flattening data for UI tools.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								5) **Release Data**:  Spreadsheet files can be uploaded to a server or written
							 | 
						||
| 
								 | 
							
								   locally.  Data can be presented to users in an HTML TABLE or data grid.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								A common problem involves generating a valid spreadsheet export from data stored
							 | 
						||
| 
								 | 
							
								in an HTML table.  In this example, an HTML TABLE on the page will be scraped,
							 | 
						||
| 
								 | 
							
								a row will be added to the bottom with the date of the report, and a new file
							 | 
						||
| 
								 | 
							
								will be generated and downloaded locally. `XLSX.writeFile` takes care of
							 | 
						||
| 
								 | 
							
								packaging the data and attempting a local download:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								// Acquire Data (reference to the HTML table)
							 | 
						||
| 
								 | 
							
								var table_elt = document.getElementById("my-table-id");
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								// Extract Data (create a workbook object from the table)
							 | 
						||
| 
								 | 
							
								var workbook = XLSX.utils.table_to_book(table_elt);
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								// Process Data (add a new row)
							 | 
						||
| 
								 | 
							
								var ws = workbook.Sheets["Sheet1"];
							 | 
						||
| 
								 | 
							
								XLSX.utils.sheet_add_aoa(ws, [["Created "+new Date().toISOString()]], {origin:-1});
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								// Package and Release Data (`writeFile` tries to write and save an XLSB file)
							 | 
						||
| 
								 | 
							
								XLSX.writeFile(workbook, "Report.xlsb");
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This library tries to simplify steps 2 and 4 with functions to extract useful
							 | 
						||
| 
								 | 
							
								data from spreadsheet files (`read` / `readFile`) and generate new spreadsheet
							 | 
						||
| 
								 | 
							
								files from data (`write` / `writeFile`).  Additional utility functions like
							 | 
						||
| 
								 | 
							
								`table_to_book` work with other common data sources like HTML tables.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This documentation and various demo projects cover a number of common scenarios
							 | 
						||
| 
								 | 
							
								and approaches for steps 1 and 5.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Utility functions help with step 3.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								["Acquiring and Extracting Data"](#acquiring-and-extracting-data) describes
							 | 
						||
| 
								 | 
							
								solutions for common data import scenarios.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								["Packaging and Releasing Data"](#packaging-and-releasing-data) describes
							 | 
						||
| 
								 | 
							
								solutions for common data export scenarios.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								["Processing Data"](#packaging-and-releasing-data) describes solutions for
							 | 
						||
| 
								 | 
							
								common workbook processing and manipulation scenarios.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								["Utility Functions"](#utility-functions) details utility functions for
							 | 
						||
| 
								 | 
							
								translating JSON Arrays and other common JS structures into worksheet objects.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								### The Zen of SheetJS
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_Data processing should fit in any workflow_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The library does not impose a separate lifecycle.  It fits nicely in websites
							 | 
						||
| 
								 | 
							
								and apps built using any framework.  The plain JS data objects play nice with
							 | 
						||
| 
								 | 
							
								Web Workers and future APIs.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_JavaScript is a powerful language for data processing_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The ["Common Spreadsheet Format"](#common-spreadsheet-format) is a simple object
							 | 
						||
| 
								 | 
							
								representation of the core concepts of a workbook.  The various functions in the
							 | 
						||
| 
								 | 
							
								library provide low-level tools for working with the object.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								For friendly JS processing, there are utility functions for converting parts of
							 | 
						||
| 
								 | 
							
								a worksheet to/from an Array of Arrays.  The following example combines powerful
							 | 
						||
| 
								 | 
							
								JS Array methods with a network request library to download data, select the
							 | 
						||
| 
								 | 
							
								information we want and create a workbook file:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								<details>
							 | 
						||
| 
								 | 
							
								  <summary><b>Get Data from a JSON Endpoint and Generate a Workbook</b> (click to show)</summary>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The goal is to generate a XLSB workbook of US President names and birthdays.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**Acquire Data**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_Raw Data_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								<https://theunitedstates.io/congress-legislators/executive.json> has the desired
							 | 
						||
| 
								 | 
							
								data.  For example, John Adams:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								{
							 | 
						||
| 
								 | 
							
								  "id": { /* (data omitted) */ },
							 | 
						||
| 
								 | 
							
								  "name": {
							 | 
						||
| 
								 | 
							
								    "first": "John",          // <-- first name
							 | 
						||
| 
								 | 
							
								    "last": "Adams"           // <-- last name
							 | 
						||
| 
								 | 
							
								  },
							 | 
						||
| 
								 | 
							
								  "bio": {
							 | 
						||
| 
								 | 
							
								    "birthday": "1735-10-19", // <-- birthday
							 | 
						||
| 
								 | 
							
								    "gender": "M"
							 | 
						||
| 
								 | 
							
								  },
							 | 
						||
| 
								 | 
							
								  "terms": [
							 | 
						||
| 
								 | 
							
								    { "type": "viceprez", /* (other fields omitted) */ },
							 | 
						||
| 
								 | 
							
								    { "type": "viceprez", /* (other fields omitted) */ },
							 | 
						||
| 
								 | 
							
								    { "type": "prez", /* (other fields omitted) */ } // <-- look for "prez"
							 | 
						||
| 
								 | 
							
								  ]
							 | 
						||
| 
								 | 
							
								}
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_Filtering for Presidents_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The dataset includes Aaron Burr, a Vice President who was never President!
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`Array#filter` creates a new array with the desired rows.  A President served
							 | 
						||
| 
								 | 
							
								at least one term with `type` set to `"prez"`.  To test if a particular row has
							 | 
						||
| 
								 | 
							
								at least one `"prez"` term, `Array#some` is another native JS function.  The
							 | 
						||
| 
								 | 
							
								complete filter would be:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_Lining up the data_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								For this example, the name will be the first name combined with the last name
							 | 
						||
| 
								 | 
							
								(`row.name.first + " " + row.name.last`) and the birthday will be the subfield
							 | 
						||
| 
								 | 
							
								`row.bio.birthday`.  Using `Array#map`, the dataset can be massaged in one call:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								const rows = prez.map(row => ({
							 | 
						||
| 
								 | 
							
								  name: row.name.first + " " + row.name.last,
							 | 
						||
| 
								 | 
							
								  birthday: row.bio.birthday
							 | 
						||
| 
								 | 
							
								}));
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The result is an array of "simple" objects with no nesting:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								[
							 | 
						||
| 
								 | 
							
								  { name: "George Washington", birthday: "1732-02-22" },
							 | 
						||
| 
								 | 
							
								  { name: "John Adams", birthday: "1735-10-19" },
							 | 
						||
| 
								 | 
							
								  // ... one row per President
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**Extract Data**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								With the cleaned dataset, `XLSX.utils.json_to_sheet` generates a worksheet:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								const worksheet = XLSX.utils.json_to_sheet(rows);
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`XLSX.utils.book_new` creates a new workbook and `XLSX.utils.book_append_sheet`
							 | 
						||
| 
								 | 
							
								appends a worksheet to the workbook. The new worksheet will be called "Dates":
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								const workbook = XLSX.utils.book_new();
							 | 
						||
| 
								 | 
							
								XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**Process Data**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_Fixing headers_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								By default, `json_to_sheet` creates a worksheet with a header row. In this case,
							 | 
						||
| 
								 | 
							
								the headers come from the JS object keys: "name" and "birthday".
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The headers are in cells A1 and B1.  `XLSX.utils.sheet_add_aoa` can write text
							 | 
						||
| 
								 | 
							
								values to the existing worksheet starting at cell A1:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_Fixing Column Widths_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Some of the names are longer than the default column width.  Column widths are
							 | 
						||
| 
								 | 
							
								set by [setting the `"!cols"` worksheet property](#row-and-column-properties).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The following line sets the width of column A to approximately 10 characters:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								worksheet["!cols"] = [ { wch: 10 } ]; // set column A width to 10 characters
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								One `Array#reduce` call over `rows` can calculate the maximum width:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
							 | 
						||
| 
								 | 
							
								worksheet["!cols"] = [ { wch: max_width } ];
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Note: If the starting point was a file or HTML table, `XLSX.utils.sheet_to_json`
							 | 
						||
| 
								 | 
							
								will generate an array of JS objects.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**Package and Release Data**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`XLSX.writeFile` creates a spreadsheet file and tries to write it to the system.
							 | 
						||
| 
								 | 
							
								In the browser, it will try to prompt the user to download the file.  In NodeJS,
							 | 
						||
| 
								 | 
							
								it will write to the local directory.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								XLSX.writeFile(workbook, "Presidents.xlsx");
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								**Complete Example**
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```js
							 | 
						||
| 
								 | 
							
								// Uncomment the next line for use in NodeJS:
							 | 
						||
| 
								 | 
							
								// const XLSX = require("xlsx"), axios = require("axios");
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								(async() => {
							 | 
						||
| 
								 | 
							
								  /* fetch JSON data and parse */
							 | 
						||
| 
								 | 
							
								  const url = "https://theunitedstates.io/congress-legislators/executive.json";
							 | 
						||
| 
								 | 
							
								  const raw_data = (await axios(url, {responseType: "json"})).data;
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  /* filter for the Presidents */
							 | 
						||
| 
								 | 
							
								  const prez = raw_data.filter(row => row.terms.some(term => term.type === "prez"));
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  /* flatten objects */
							 | 
						||
| 
								 | 
							
								  const rows = prez.map(row => ({
							 | 
						||
| 
								 | 
							
								    name: row.name.first + " " + row.name.last,
							 | 
						||
| 
								 | 
							
								    birthday: row.bio.birthday
							 | 
						||
| 
								 | 
							
								  }));
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  /* generate worksheet and workbook */
							 | 
						||
| 
								 | 
							
								  const worksheet = XLSX.utils.json_to_sheet(rows);
							 | 
						||
| 
								 | 
							
								  const workbook = XLSX.utils.book_new();
							 | 
						||
| 
								 | 
							
								  XLSX.utils.book_append_sheet(workbook, worksheet, "Dates");
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  /* fix headers */
							 | 
						||
| 
								 | 
							
								  XLSX.utils.sheet_add_aoa(worksheet, [["Name", "Birthday"]], { origin: "A1" });
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  /* calculate column width */
							 | 
						||
| 
								 | 
							
								  const max_width = rows.reduce((w, r) => Math.max(w, r.name.length), 10);
							 | 
						||
| 
								 | 
							
								  worksheet["!cols"] = [ { wch: max_width } ];
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								  /* create an XLSX file and try to save to Presidents.xlsx */
							 | 
						||
| 
								 | 
							
								  XLSX.writeFile(workbook, "Presidents.xlsx");
							 | 
						||
| 
								 | 
							
								})();
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								For use in the web browser, assuming the snippet is saved to `snippet.js`,
							 | 
						||
| 
								 | 
							
								script tags should be used to include the `axios` and `xlsx` standalone builds:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								```html
							 | 
						||
| 
								 | 
							
								<script src="https://cdn.sheetjs.com/xlsx-latest/package/dist/xlsx.full.min.js"></script>
							 | 
						||
| 
								 | 
							
								<script src="https://unpkg.com/axios/dist/axios.min.js"></script>
							 | 
						||
| 
								 | 
							
								<script src="snippet.js"></script>
							 | 
						||
| 
								 | 
							
								```
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								</details>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_File formats are implementation details_
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The parser covers a wide gamut of common spreadsheet file formats to ensure that
							 | 
						||
| 
								 | 
							
								"HTML-saved-as-XLS" files work as well as actual XLS or XLSX files.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The writer supports a number of common output formats for broad compatibility
							 | 
						||
| 
								 | 
							
								with the data ecosystem.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								To the greatest extent possible, data processing code should not have to worry
							 | 
						||
| 
								 | 
							
								about the specific file formats involved.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 |