304 lines
		
	
	
		
			8.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			304 lines
		
	
	
		
			8.6 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|  | --- | ||
|  | title: Stream Export | ||
|  | sidebar_position: 11 | ||
|  | hide_table_of_contents: true | ||
|  | --- | ||
|  | 
 | ||
|  | import Tabs from '@theme/Tabs'; | ||
|  | import TabItem from '@theme/TabItem'; | ||
|  | 
 | ||
|  | Many platforms offer methods to write files. These methods typically expect the | ||
|  | entire file to be generated before writing. Large workbook files may exceed | ||
|  | platform-specific size limits. | ||
|  | 
 | ||
|  | Some platforms also offer a "streaming" or "incremental" approach. Instead of | ||
|  | writing the entire file at once, these methods can accept small chunks of data | ||
|  | and incrementally write to the filesystem. | ||
|  | 
 | ||
|  | The [Streaming Write](/docs/demos/bigdata/stream#streaming-write) demo includes | ||
|  | live browser demos and notes for platforms that do not support SheetJS streams. | ||
|  | 
 | ||
|  | :::tip pass | ||
|  | 
 | ||
|  | This feature was expanded in version `0.20.3`. It is strongly recommended to | ||
|  | [upgrade to the latest version](/docs/getting-started/installation/). | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | ## Streaming Basics
 | ||
|  | 
 | ||
|  | SheetJS streams use the NodeJS push streams API. It is strongly recommended to | ||
|  | review the official NodeJS "Stream" documentation[^1]. | ||
|  | 
 | ||
|  | <details> | ||
|  |   <summary><b>Historical Note</b> (click to show)</summary> | ||
|  | 
 | ||
|  | NodeJS push streams were introduced in 2012. The text streaming methods `to_csv` | ||
|  | and `to_html` are supported in NodeJS v0.10 and later while the object streaming | ||
|  | method `to_json` is supported in NodeJS v0.12 and later. | ||
|  | 
 | ||
|  | The first SheetJS streaming write function, `to_csv`, was introduced in 2017. It | ||
|  | used and still uses the battle-tested NodeJS streaming API. | ||
|  | 
 | ||
|  | Years later, browser vendors opted to standardize a different stream API. | ||
|  | 
 | ||
|  | For maximal compatibility, the library uses NodeJS push streams. | ||
|  | 
 | ||
|  | </details> | ||
|  | 
 | ||
|  | #### NodeJS ECMAScript Module Support
 | ||
|  | 
 | ||
|  | In CommonJS modules, libraries can load the `stream` module using `require`. | ||
|  | SheetJS libraries will load streaming support where applicable. | ||
|  | 
 | ||
|  | Due to ESM limitations, libraries cannot freely import the `stream` module. | ||
|  | 
 | ||
|  | :::danger ECMAScript Module Limitations | ||
|  | 
 | ||
|  | The original specification only supported top-level imports: | ||
|  | 
 | ||
|  | ```js | ||
|  | import { Readable } from 'stream'; | ||
|  | ``` | ||
|  | 
 | ||
|  | If a module is unavailable, there is no way for scripts to gracefully fail or | ||
|  | ignore the error. | ||
|  | 
 | ||
|  | --- | ||
|  | 
 | ||
|  | Patches to the specification added two different solutions to the problem: | ||
|  | 
 | ||
|  | - "dynamic imports" will throw errors that can be handled by libraries. Dynamic | ||
|  | imports will taint APIs that do not use Promise-based methods. | ||
|  | 
 | ||
|  | ```js | ||
|  | /* Readable will be undefined if stream cannot be imported */ | ||
|  | const Readable = await (async() => { | ||
|  |   try { | ||
|  |     return (await import("stream"))?.Readable; | ||
|  |   } catch(e) { /* silently ignore error */ } | ||
|  | })(); | ||
|  | ``` | ||
|  | 
 | ||
|  | - "import maps" control module resolution, allowing library users to manually | ||
|  | shunt unsupported modules. | ||
|  | 
 | ||
|  | **These patches were released after browsers adopted ESM!** A number of browsers | ||
|  | and other platforms support top-level imports but do not support the patches. | ||
|  | 
 | ||
|  | --- | ||
|  | 
 | ||
|  | **Due to ESM woes, it is strongly recommended to use CommonJS when possible!** | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | For maximal platform support, SheetJS libraries expose a special `set_readable` | ||
|  | method to provide a `Readable` implementation: | ||
|  | 
 | ||
|  | ```js title="SheetJS NodeJS ESM streaming support" | ||
|  | import { stream as SheetJStream } from 'xlsx'; | ||
|  | import { Readable } from 'stream'; | ||
|  | 
 | ||
|  | SheetJStream.set_readable(Readable); | ||
|  | ``` | ||
|  | 
 | ||
|  | ## Worksheet Export
 | ||
|  | 
 | ||
|  | The worksheet export methods accept a SheetJS worksheet object. | ||
|  | 
 | ||
|  | ### CSV Export
 | ||
|  | 
 | ||
|  | **Export worksheet data in "Comma-Separated Values" (CSV)** | ||
|  | 
 | ||
|  | ```js | ||
|  | var csvstream = XLSX.stream.to_csv(ws, opts); | ||
|  | ``` | ||
|  | 
 | ||
|  | `to_csv` creates a NodeJS text stream. The options mirror the non-streaming | ||
|  | [`sheet_to_csv`](/docs/api/utilities/csv#delimiter-separated-output) method. | ||
|  | 
 | ||
|  | The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and | ||
|  | streams CSV rows to the terminal. | ||
|  | 
 | ||
|  | <Tabs groupId="mod"> | ||
|  |   <TabItem value="cjs" label="CommonJS"> | ||
|  | 
 | ||
|  | ```js title="Streaming CSV Print Example" | ||
|  | const XLSX = require("xlsx"); | ||
|  | 
 | ||
|  | (async() => { | ||
|  |   var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  |   var wb = XLSX.read(ab); | ||
|  |   var ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  |   XLSX.stream.to_csv(ws).pipe(process.stdout); | ||
|  | })(); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  |   <TabItem value="esm" label="ESM"> | ||
|  | 
 | ||
|  | ```js title="Streaming CSV Print Example" | ||
|  | import { read, stream } from "xlsx"; | ||
|  | import { Readable } from "stream"; | ||
|  | stream.set_readable(Readable); | ||
|  | 
 | ||
|  | var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  | var wb = read(ab); | ||
|  | var ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  | stream.to_csv(ws).pipe(process.stdout); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  | </Tabs> | ||
|  | 
 | ||
|  | ### JSON Export
 | ||
|  | 
 | ||
|  | **Export worksheet data to "Arrays of Arrays" or "Arrays of Objects"** | ||
|  | 
 | ||
|  | ```js | ||
|  | var jsonstream = XLSX.stream.to_json(ws, opts); | ||
|  | ``` | ||
|  | 
 | ||
|  | `to_json` creates a NodeJS object stream. The options mirror the non-streaming | ||
|  | [`sheet_to_json`](/docs/api/utilities/array#array-output) method. | ||
|  | 
 | ||
|  | The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and | ||
|  | streams JSON rows to the terminal. A `Transform`[^2] stream generates text from | ||
|  | the object streams. | ||
|  | 
 | ||
|  | <Tabs groupId="mod"> | ||
|  |   <TabItem value="cjs" label="CommonJS"> | ||
|  | 
 | ||
|  | ```js title="Streaming Objects Print Example" | ||
|  | const XLSX = require("xlsx") | ||
|  | const { Transform } = require("stream"); | ||
|  | 
 | ||
|  | /* this Transform stream converts JS objects to text */ | ||
|  | var conv = new Transform({writableObjectMode:true}); | ||
|  | conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); }; | ||
|  | 
 | ||
|  | (async() => { | ||
|  |   var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  |   var wb = XLSX.read(ab); | ||
|  |   var ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  |   XLSX.stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout); | ||
|  | })(); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  |   <TabItem value="esm" label="ESM"> | ||
|  | 
 | ||
|  | ```js title="Streaming Objects Print Example" | ||
|  | import { read, stream } from "xlsx"; | ||
|  | import { Readable, Transform } from "stream"; | ||
|  | stream.set_readable(Readable); | ||
|  | 
 | ||
|  | /* this Transform stream converts JS objects to text */ | ||
|  | var conv = new Transform({writableObjectMode:true}); | ||
|  | conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); }; | ||
|  | 
 | ||
|  | var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  | var wb = read(ab); | ||
|  | var ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  | stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  | </Tabs> | ||
|  | 
 | ||
|  | ### HTML Export
 | ||
|  | 
 | ||
|  | **Export worksheet data to HTML TABLE** | ||
|  | 
 | ||
|  | ```js | ||
|  | var htmlstream = XLSX.stream.to_html(ws, opts); | ||
|  | ``` | ||
|  | 
 | ||
|  | `to_html` creates a NodeJS text stream. The options mirror the non-streaming | ||
|  | [`sheet_to_html`](/docs/api/utilities/html#html-table-output) method. | ||
|  | 
 | ||
|  | The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and | ||
|  | streams HTML TABLE rows to the terminal. | ||
|  | 
 | ||
|  | <Tabs groupId="mod"> | ||
|  |   <TabItem value="cjs" label="CommonJS"> | ||
|  | 
 | ||
|  | ```js title="Streaming HTML Print Example" | ||
|  | const XLSX = require("xlsx"); | ||
|  | 
 | ||
|  | (async() => { | ||
|  |   var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  |   var wb = XLSX.read(ab); | ||
|  |   var ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  |   XLSX.stream.to_html(ws).pipe(process.stdout); | ||
|  | })(); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  |   <TabItem value="esm" label="ESM"> | ||
|  | 
 | ||
|  | ```js title="Streaming HTML Print Example" | ||
|  | import { read, stream } from "xlsx"; | ||
|  | import { Readable } from "stream"; | ||
|  | stream.set_readable(Readable); | ||
|  | 
 | ||
|  | var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  | var wb = read(ab); | ||
|  | var ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  | stream.to_html(ws).pipe(process.stdout); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  | </Tabs> | ||
|  | 
 | ||
|  | ## Workbook Export
 | ||
|  | 
 | ||
|  | The workbook export methods accept a SheetJS workbook object. | ||
|  | 
 | ||
|  | ### XLML Export
 | ||
|  | 
 | ||
|  | **Export workbook data to SpreadsheetML2003 XML files** | ||
|  | 
 | ||
|  | ```js | ||
|  | var xlmlstream = XLSX.stream.to_xlml(wb, opts); | ||
|  | ``` | ||
|  | 
 | ||
|  | `to_xlml` creates a NodeJS text stream. The options mirror the non-streaming | ||
|  | [`write`](/docs/api/write-options) method using the `xlml` book type. | ||
|  | 
 | ||
|  | The following NodeJS script fetches https://docs.sheetjs.com/pres.numbers and | ||
|  | writes a SpreadsheetML2003 workbook to `SheetJStream.xml.xls`: | ||
|  | 
 | ||
|  | <Tabs groupId="mod"> | ||
|  |   <TabItem value="cjs" label="CommonJS"> | ||
|  | 
 | ||
|  | ```js title="Streaming XLML Write Example" | ||
|  | const XLSX = require("xlsx"), fs = require("fs"); | ||
|  | 
 | ||
|  | (async() => { | ||
|  |   var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  |   var wb = XLSX.read(ab); | ||
|  |   XLSX.stream.to_xlml(wb).pipe(fs.createWriteStream("SheetJStream.xml.xls")); | ||
|  | })(); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  |   <TabItem value="esm" label="ESM"> | ||
|  | 
 | ||
|  | ```js title="Streaming XLML Write Example" | ||
|  | import { read, stream } from "xlsx"; | ||
|  | import { Readable } from "stream"; | ||
|  | stream.set_readable(Readable); | ||
|  | import { createWriteStream } from "fs"; | ||
|  | 
 | ||
|  | var ab = await (await fetch("https://docs.sheetjs.com/pres.numbers")).arrayBuffer() | ||
|  | var wb = read(ab); | ||
|  | stream.to_xlml(wb).pipe(createWriteStream("SheetJStream.xml.xls")); | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  | </Tabs> | ||
|  | 
 | ||
|  | [^1]: See ["Stream"](https://nodejs.org/api/stream.html) in the NodeJS documentation. | ||
|  | [^2]: See [`Transform`](https://nodejs.org/api/stream.html#class-streamtransform) in the NodeJS documentation. |