| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | --- | 
					
						
							|  |  |  | title: Large Datasets | 
					
						
							| 
									
										
										
										
											2023-02-28 11:40:44 +00:00
										 |  |  | pagination_prev: demos/extensions/index | 
					
						
							|  |  |  | pagination_next: demos/engines/index | 
					
						
							|  |  |  | sidebar_custom_props: | 
					
						
							|  |  |  |   summary: Dense Mode + Incremental CSV / HTML / JSON Export | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | import current from '/version.js'; | 
					
						
							|  |  |  | import CodeBlock from '@theme/CodeBlock'; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | For maximal compatibility, the library reads entire files at once and generates | 
					
						
							|  |  |  | files at once. Browsers and other JS engines enforce tight memory limits.  In | 
					
						
							|  |  |  | these cases, the library offers strategies to optimize for memory or space by | 
					
						
							|  |  |  | using platform-specific APIs. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Dense Mode
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | `read`, `readFile` and `aoa_to_sheet` accept the `dense` option. When enabled, | 
					
						
							|  |  |  | the methods create worksheet objects that store cells in arrays of arrays: | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							|  |  |  | var dense_wb = XLSX.read(ab, {dense: true}); | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-11-18 18:22:01 +00:00
										 |  |  | var dense_sheet = XLSX.utils.aoa_to_sheet(aoa, {dense: true}); | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-04-08 04:47:04 +00:00
										 |  |  | <details> | 
					
						
							|  |  |  |   <summary><b>Historical Note</b> (click to show)</summary> | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | The earliest versions of the library aimed for IE6+ compatibility.  In early | 
					
						
							|  |  |  | testing, both in Chrome 26 and in IE6, the most efficient worksheet storage for | 
					
						
							|  |  |  | small sheets was a large object whose keys were cell addresses. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Over time, V8 (the engine behind Chrome and NodeJS) evolved in a way that made | 
					
						
							|  |  |  | the array of arrays approach more efficient but reduced the performance of the | 
					
						
							|  |  |  | large object approach. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In the interest of preserving backwards compatibility, the library opts to make | 
					
						
							|  |  |  | the array of arrays approach available behind a special `dense` option. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The various API functions will seamlessly handle dense and sparse worksheets. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Streaming Write
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The streaming write functions are available in the `XLSX.stream` object.  They | 
					
						
							|  |  |  | take the same arguments as the normal write functions: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - `XLSX.stream.to_csv` is the streaming version of `XLSX.utils.sheet_to_csv`. | 
					
						
							|  |  |  | - `XLSX.stream.to_html` is the streaming version of `XLSX.utils.sheet_to_html`. | 
					
						
							|  |  |  | - `XLSX.stream.to_json` is the streaming version of `XLSX.utils.sheet_to_json`. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | "Stream" refers to the NodeJS push streams API. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-04-08 04:47:04 +00:00
										 |  |  | <details> | 
					
						
							|  |  |  |   <summary><b>Historical Note</b> (click to show)</summary> | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | NodeJS push streams were introduced in 2012. The text streaming methods `to_csv` | 
					
						
							|  |  |  | and `to_html` are supported in NodeJS v0.10 and later while the object streaming | 
					
						
							|  |  |  | method `to_json` is supported in NodeJS v0.12 and later. | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-09-18 06:44:33 +00:00
										 |  |  | The first streaming write function, `to_csv`, was introduced in early 2017. It | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | used and still uses the same NodeJS streaming API. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Years later, browser vendors are settling on a different stream API. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For maximal compatibility, the library uses NodeJS push streams. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### NodeJS
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In a CommonJS context, NodeJS Streams and `fs` immediately work with SheetJS: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							|  |  |  | const XLSX = require("xlsx"); // "just works" | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | :::warning ECMAScript Module Machinations | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | In NodeJS ESM, the dependency must be loaded manually: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							|  |  |  | import * as XLSX from 'xlsx'; | 
					
						
							|  |  |  | import { Readable } from 'stream'; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | XLSX.stream.set_readable(Readable); // manually load stream helpers | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Additionally, for file-related operations in NodeJS ESM, `fs` must be loaded: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							|  |  |  | import * as XLSX from 'xlsx'; | 
					
						
							|  |  |  | import * as fs from 'fs'; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | XLSX.set_fs(fs); // manually load fs helpers | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **It is strongly encouraged to use CommonJS in NodeJS whenever possible.** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | **`XLSX.stream.to_csv`** | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | This example reads a worksheet passed as an argument to the script, pulls the | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | first worksheet, converts to CSV and writes to `SheetJSNodeJStream.csv`: | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | var XLSX = require("xlsx"), fs = require("fs"); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | var wb = XLSX.readFile(process.argv[2]); | 
					
						
							|  |  |  | var ws = wb.Sheets[wb.SheetNames[0]]; | 
					
						
							|  |  |  | var ostream = fs.createWriteStream("SheetJSNodeJStream.csv"); | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | // highlight-next-line | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | XLSX.stream.to_csv(ws).pipe(ostream); | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | **`XLSX.stream.to_json`** | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | `stream.to_json` uses Object-mode streams. A `Transform` stream can be used to | 
					
						
							|  |  |  | generate a normal stream for streaming to a file or the screen: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | var XLSX = require("xlsx"), Transform = require("stream").Transform; | 
					
						
							|  |  |  | var wb = XLSX.readFile(process.argv[2], {dense: true}); | 
					
						
							|  |  |  | var ws = wb.Sheets[wb.SheetNames[0]]; | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | /* this Transform stream converts JS objects to text */ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | var conv = new Transform({writableObjectMode:true}); | 
					
						
							|  |  |  | conv._transform = function(obj, e, cb){ cb(null, JSON.stringify(obj) + "\n"); }; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | /* pipe `to_json` -> transformer -> standard output */ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | // highlight-next-line | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | XLSX.stream.to_json(ws, {raw: true}).pipe(conv).pipe(process.stdout); | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | **Demo** | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | :::note Tested Deployments | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | This demo was tested in the following deployments: | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | | Node Version | Date       | Node Status when tested | | 
					
						
							|  |  |  | |:-------------|:-----------|:------------------------| | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | | `0.12.18`    | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `4.9.1`      | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `6.17.1`     | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `8.17.0`     | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `10.24.1`    | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `12.22.12`   | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `14.21.3`    | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `16.20.2`    | 2024-02-23 | End-of-Life             | | 
					
						
							|  |  |  | | `18.19.1`    | 2024-02-23 | Maintenance LTS         | | 
					
						
							|  |  |  | | `20.11.1`    | 2024-02-23 | Active LTS              | | 
					
						
							|  |  |  | | `21.6.2`     | 2024-02-23 | Current                 | | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | While streaming methods work in End-of-Life versions of NodeJS, production | 
					
						
							|  |  |  | deployments should upgrade to a Current or LTS version of NodeJS. | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 1) Install the [NodeJS module](/docs/getting-started/installation/nodejs) | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | <CodeBlock language="bash">{`\ | 
					
						
							|  |  |  | npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz`} | 
					
						
							|  |  |  | </CodeBlock> | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 2) Download [`SheetJSNodeJStream.js`](pathname:///stream/SheetJSNodeJStream.js): | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | ```bash | 
					
						
							|  |  |  | curl -LO https://docs.sheetjs.com/stream/SheetJSNodeJStream.js | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 3) Download [the test file](https://sheetjs.com/pres.xlsx): | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | curl -LO https://sheetjs.com/pres.xlsx | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 4) Run the script: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | node SheetJSNodeJStream.js pres.xlsx | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-04-08 04:47:04 +00:00
										 |  |  | <details> | 
					
						
							|  |  |  |   <summary><b>Expected Output</b> (click to show)</summary> | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | The console will display a list of objects: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```json | 
					
						
							|  |  |  | {"Name":"Bill Clinton","Index":42} | 
					
						
							|  |  |  | {"Name":"GeorgeW Bush","Index":43} | 
					
						
							|  |  |  | {"Name":"Barack Obama","Index":44} | 
					
						
							|  |  |  | {"Name":"Donald Trump","Index":45} | 
					
						
							|  |  |  | {"Name":"Joseph Biden","Index":46} | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The script will also generate `SheetJSNodeJStream.csv`: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```csv | 
					
						
							|  |  |  | Name,Index | 
					
						
							|  |  |  | Bill Clinton,42 | 
					
						
							|  |  |  | GeorgeW Bush,43 | 
					
						
							|  |  |  | Barack Obama,44 | 
					
						
							|  |  |  | Donald Trump,45 | 
					
						
							|  |  |  | Joseph Biden,46 | 
					
						
							|  |  |  | ``` | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | ### Browser
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | :::note Tested Deployments | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | Each browser demo was tested in the following environments: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | | Browser     | Date       | | 
					
						
							|  |  |  | |:------------|:-----------| | 
					
						
							|  |  |  | | Chrome 121  | 2024-02-23 | | 
					
						
							|  |  |  | | Safari 17.3 | 2024-02-23 | | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | NodeJS streaming APIs are not available in the browser.  The following function | 
					
						
							|  |  |  | supplies a pseudo stream object compatible with the `to_csv` function: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							|  |  |  | function sheet_to_csv_cb(ws, cb, opts, batch = 1000) { | 
					
						
							|  |  |  |   XLSX.stream.set_readable(() => ({ | 
					
						
							|  |  |  |     __done: false, | 
					
						
							|  |  |  |     // this function will be assigned by the SheetJS stream methods | 
					
						
							|  |  |  |     _read: function() { this.__done = true; }, | 
					
						
							|  |  |  |     // this function is called by the stream methods | 
					
						
							|  |  |  |     push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; }, | 
					
						
							|  |  |  |     resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); } | 
					
						
							|  |  |  |   })); | 
					
						
							|  |  |  |   return XLSX.stream.to_csv(ws, opts); | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | // assuming `workbook` is a workbook, stream the first sheet | 
					
						
							|  |  |  | const ws = workbook.Sheets[workbook.SheetNames[0]]; | 
					
						
							|  |  |  | const strm = sheet_to_csv_cb(ws, (csv)=>{ if(csv != null) console.log(csv); }); | 
					
						
							|  |  |  | strm.resume(); | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | #### Web Workers
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For processing large files in the browser, it is strongly encouraged to use Web | 
					
						
							| 
									
										
										
										
											2023-04-29 11:21:37 +00:00
										 |  |  | Workers. The [Worker demo](/docs/demos/bigdata/worker#streaming-write) includes | 
					
						
							|  |  |  | examples using the File System Access API. | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-04-08 04:47:04 +00:00
										 |  |  | <details> | 
					
						
							|  |  |  |   <summary><b>Web Worker Details</b> (click to show)</summary> | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | Typically, the file and stream processing occurs in the Web Worker.  CSV rows | 
					
						
							|  |  |  | can be sent back to the main thread in the callback: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | <CodeBlock language="js" title="worker.js">{`\ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | /* load standalone script from CDN */ | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | importScripts("https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js"); | 
					
						
							|  |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | function sheet_to_csv_cb(ws, cb, opts, batch = 1000) { | 
					
						
							|  |  |  |   XLSX.stream.set_readable(() => ({ | 
					
						
							|  |  |  |     __done: false, | 
					
						
							|  |  |  |     // this function will be assigned by the SheetJS stream methods | 
					
						
							|  |  |  |     _read: function() { this.__done = true; }, | 
					
						
							|  |  |  |     // this function is called by the stream methods | 
					
						
							|  |  |  |     push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; }, | 
					
						
							|  |  |  |     resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); } | 
					
						
							|  |  |  |   })); | 
					
						
							|  |  |  |   return XLSX.stream.to_csv(ws, opts); | 
					
						
							|  |  |  | } | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | /* this callback will run once the main context sends a message */ | 
					
						
							|  |  |  | self.addEventListener('message', async(e) => { | 
					
						
							|  |  |  |   try { | 
					
						
							|  |  |  |     postMessage({state: "fetching " + e.data.url}); | 
					
						
							|  |  |  |     /* Fetch file */ | 
					
						
							|  |  |  |     const res = await fetch(e.data.url); | 
					
						
							|  |  |  |     const ab = await res.arrayBuffer(); | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  |     /* Parse file */ | 
					
						
							|  |  |  |     postMessage({state: "parsing"}); | 
					
						
							|  |  |  |     const wb = XLSX.read(ab, {dense: true}); | 
					
						
							|  |  |  |     const ws = wb.Sheets[wb.SheetNames[0]]; | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  |     /* Generate CSV rows */ | 
					
						
							|  |  |  |     postMessage({state: "csv"}); | 
					
						
							|  |  |  |     const strm = sheet_to_csv_cb(ws, (csv) => { | 
					
						
							|  |  |  |       if(csv != null) postMessage({csv}); | 
					
						
							|  |  |  |       else postMessage({state: "done"}); | 
					
						
							|  |  |  |     }); | 
					
						
							|  |  |  |     strm.resume(); | 
					
						
							|  |  |  |   } catch(e) { | 
					
						
							|  |  |  |     /* Pass the error message back */ | 
					
						
							|  |  |  |     postMessage({error: String(e.message || e) }); | 
					
						
							|  |  |  |   } | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | }, false);`} | 
					
						
							|  |  |  | </CodeBlock> | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | The main thread will receive messages with CSV rows for further processing: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | ```js title="main.js" | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | worker.onmessage = function(e) { | 
					
						
							|  |  |  |   if(e.data.error) { console.error(e.data.error); /* show an error message */ } | 
					
						
							|  |  |  |   else if(e.data.state) { console.info(e.data.state); /* current state */ } | 
					
						
							|  |  |  |   else { | 
					
						
							|  |  |  |     /* e.data.csv is the row generated by the stream */ | 
					
						
							|  |  |  |     console.log(e.data.csv); | 
					
						
							|  |  |  |   } | 
					
						
							|  |  |  | }; | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | </details> | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | ### Live Demo
 | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | The following live demo fetches and parses a file in a Web Worker.  The `to_csv` | 
					
						
							|  |  |  | streaming function is used to generate CSV rows and pass back to the main thread | 
					
						
							|  |  |  | for further processing. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-09-02 09:26:57 +00:00
										 |  |  | :::note pass | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | For Chromium browsers, the File System Access API provides a modern worker-only | 
					
						
							|  |  |  | approach. [The Web Workers demo](/docs/demos/bigdata/worker#streaming-write) | 
					
						
							|  |  |  | includes a live example of CSV streaming write. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The demo has a URL input box.  Feel free to change the URL.  For example, | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | `https://raw.githubusercontent.com/SheetJS/test_files/master/large_strings.xls` | 
					
						
							|  |  |  | is an XLS file over 50 MB | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | `https://raw.githubusercontent.com/SheetJS/libreoffice_test-files/master/calc/xlsx-import/perf/8-by-300000-cells.xlsx` | 
					
						
							|  |  |  | is an XLSX file with 300000 rows (approximately 20 MB) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <CodeBlock language="jsx" live>{`\ | 
					
						
							|  |  |  | function SheetJSFetchCSVStreamWorker() { | 
					
						
							|  |  |  |   const [__html, setHTML] = React.useState(""); | 
					
						
							|  |  |  |   const [state, setState] = React.useState(""); | 
					
						
							|  |  |  |   const [cnt, setCnt] = React.useState(0); | 
					
						
							| 
									
										
										
										
											2023-06-05 20:12:53 +00:00
										 |  |  |   const [url, setUrl] = React.useState("https://docs.sheetjs.com/test_files/large_strings.xlsx"); | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  |   return ( <> | 
					
						
							|  |  |  |     <b>URL: </b><input type="text" value={url} onChange={(e) => setUrl(e.target.value)} size="80"/> | 
					
						
							|  |  |  |     <button onClick={() => { | 
					
						
							|  |  |  |       /* this mantra embeds the worker source in the function */ | 
					
						
							|  |  |  |       const worker = new Worker(URL.createObjectURL(new Blob([\`\\ | 
					
						
							|  |  |  | /* load standalone script from CDN */ | 
					
						
							|  |  |  | importScripts("https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js"); | 
					
						
							|  |  |  | \n\ | 
					
						
							|  |  |  | function sheet_to_csv_cb(ws, cb, opts, batch = 1000) { | 
					
						
							|  |  |  |   XLSX.stream.set_readable(() => ({ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  |     __done: false, | 
					
						
							|  |  |  |     // this function will be assigned by the SheetJS stream methods | 
					
						
							|  |  |  |     _read: function() { this.__done = true; }, | 
					
						
							|  |  |  |     // this function is called by the stream methods | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  |     push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; }, | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  |     resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); } | 
					
						
							|  |  |  |   })); | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  |   return XLSX.stream.to_csv(ws, opts); | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | } | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | /* this callback will run once the main context sends a message */ | 
					
						
							|  |  |  | self.addEventListener('message', async(e) => { | 
					
						
							|  |  |  |   try { | 
					
						
							|  |  |  |     postMessage({state: "fetching " + e.data.url}); | 
					
						
							|  |  |  |     /* Fetch file */ | 
					
						
							|  |  |  |     const res = await fetch(e.data.url); | 
					
						
							|  |  |  |     const ab = await res.arrayBuffer(); | 
					
						
							|  |  |  | \n\ | 
					
						
							|  |  |  |     /* Parse file */ | 
					
						
							|  |  |  |     let len = ab.byteLength; | 
					
						
							|  |  |  |     if(len < 1024) len += " bytes"; else { len /= 1024; | 
					
						
							|  |  |  |       if(len < 1024) len += " KB"; else { len /= 1024; len += " MB"; } | 
					
						
							|  |  |  |     } | 
					
						
							|  |  |  |     postMessage({state: "parsing " + len}); | 
					
						
							|  |  |  |     const wb = XLSX.read(ab, {dense: true}); | 
					
						
							|  |  |  |     const ws = wb.Sheets[wb.SheetNames[0]]; | 
					
						
							|  |  |  | \n\ | 
					
						
							|  |  |  |     /* Generate CSV rows */ | 
					
						
							|  |  |  |     postMessage({state: "csv"}); | 
					
						
							|  |  |  |     const strm = sheet_to_csv_cb(ws, (csv) => { | 
					
						
							|  |  |  |       if(csv != null) postMessage({csv}); | 
					
						
							|  |  |  |       else postMessage({state: "done"}); | 
					
						
							|  |  |  |     }); | 
					
						
							|  |  |  |     strm.resume(); | 
					
						
							|  |  |  |   } catch(e) { | 
					
						
							|  |  |  |     /* Pass the error message back */ | 
					
						
							|  |  |  |     postMessage({error: String(e.message || e) }); | 
					
						
							|  |  |  |   } | 
					
						
							|  |  |  | }, false); | 
					
						
							|  |  |  |       \`]))); | 
					
						
							|  |  |  |       /* when the worker sends back data, add it to the DOM */ | 
					
						
							|  |  |  |       worker.onmessage = function(e) { | 
					
						
							|  |  |  |         if(e.data.error) return setHTML(e.data.error); | 
					
						
							|  |  |  |         else if(e.data.state) return setState(e.data.state); | 
					
						
							|  |  |  |         setHTML(e.data.csv); | 
					
						
							|  |  |  |         setCnt(cnt => cnt+1); | 
					
						
							|  |  |  |       }; | 
					
						
							|  |  |  |       setCnt(0); setState(""); | 
					
						
							|  |  |  |       /* post a message to the worker with the URL to fetch */ | 
					
						
							|  |  |  |       worker.postMessage({url}); | 
					
						
							|  |  |  |     }}><b>Click to Start</b></button> | 
					
						
							|  |  |  |     <pre>State: <b>{state}</b><br/>Number of rows: <b>{cnt}</b></pre> | 
					
						
							|  |  |  |     <pre dangerouslySetInnerHTML={{ __html }}/> | 
					
						
							|  |  |  |   </> ); | 
					
						
							|  |  |  | }`} | 
					
						
							|  |  |  | </CodeBlock> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### Deno
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Deno does not support NodeJS streams in normal execution, so a wrapper is used: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <CodeBlock language="ts">{`\ | 
					
						
							|  |  |  | // @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts" | 
					
						
							|  |  |  | import { stream } from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs'; | 
					
						
							|  |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  | /* Callback invoked on each row (string) and at the end (null) */ | 
					
						
							|  |  |  | const csv_cb = (d:string|null) => { | 
					
						
							|  |  |  |   if(d == null) return; | 
					
						
							|  |  |  |   /* The strings include line endings, so raw write ops should be used */ | 
					
						
							|  |  |  |   Deno.stdout.write(new TextEncoder().encode(d)); | 
					
						
							|  |  |  | }; | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | /* Prepare \`Readable\` function */ | 
					
						
							|  |  |  | const Readable = () => ({ | 
					
						
							|  |  |  |   __done: false, | 
					
						
							|  |  |  |   // this function will be assigned by the SheetJS stream methods | 
					
						
							|  |  |  |   _read: function() { this.__done = true; }, | 
					
						
							|  |  |  |   // this function is called by the stream methods | 
					
						
							|  |  |  |   push: function(d: any) { | 
					
						
							|  |  |  |     if(!this.__done) csv_cb(d); | 
					
						
							|  |  |  |     if(d == null) this.__done = true; | 
					
						
							|  |  |  |   }, | 
					
						
							|  |  |  |   resume: function pump() { | 
					
						
							|  |  |  |     for(var i = 0; i < 1000 && !this.__done; ++i) this._read(); | 
					
						
							|  |  |  |     if(!this.__done) setTimeout(pump.bind(this), 0); | 
					
						
							|  |  |  |   } | 
					
						
							|  |  |  | }) | 
					
						
							|  |  |  | /* Wire up */ | 
					
						
							|  |  |  | stream.set_readable(Readable); | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | \n\ | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | /* assuming \`workbook\` is a workbook, stream the first sheet */ | 
					
						
							|  |  |  | const ws = workbook.Sheets[workbook.SheetNames[0]]; | 
					
						
							|  |  |  | stream.to_csv(wb.Sheets[wb.SheetNames[0]]).resume();`} | 
					
						
							| 
									
										
										
										
											2023-05-03 03:40:40 +00:00
										 |  |  | </CodeBlock> | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | :::note Tested Deployments | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | This demo was last tested on 2024-02-23 against Deno `1.41.0`. | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | [`SheetJSDenoStream.ts`](pathname:///stream/SheetJSDenoStream.ts) is a small | 
					
						
							| 
									
										
										
										
											2024-04-08 04:47:04 +00:00
										 |  |  | example script that downloads https://sheetjs.com/pres.numbers and prints | 
					
						
							| 
									
										
										
										
											2023-05-30 06:41:09 +00:00
										 |  |  | CSV row objects. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-03-12 06:47:52 +00:00
										 |  |  | 1) Run the script: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | deno run -A https://docs.sheetjs.com/stream/SheetJSDenoStream.ts | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This script will fetch [`pres.numbers`](https://sheetjs.com/pres.numbers) and | 
					
						
							|  |  |  | generate CSV rows. The result will be printed to the terminal window. |