| 
									
										
										
										
											2023-05-18 22:41:23 +00:00
										 |  |  | --- | 
					
						
							|  |  |  | title: Synthetic DOM | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | import current from '/version.js'; | 
					
						
							|  |  |  | import CodeBlock from '@theme/CodeBlock'; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | `table_to_book` / `table_to_sheet` / `sheet_add_dom` act on HTML DOM elements. | 
					
						
							|  |  |  | Traditionally there is no DOM in server-side environments. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :::note | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The most robust approach for server-side processing is to automate a headless | 
					
						
							|  |  |  | web browser. ["Browser Automation"](/docs/demos/net/headless) includes demos. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This demo covers synthetic DOM implementations for non-browser platforms. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## NodeJS
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### JSDOM
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | JSDOM is a DOM implementation for NodeJS. Given an HTML string, a reference to | 
					
						
							|  |  |  | the table element plays nice with the SheetJS DOM methods: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							|  |  |  | const XLSX = require("xlsx"); | 
					
						
							|  |  |  | const { JSDOM } = require("jsdom"); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | /* parse HTML */ | 
					
						
							|  |  |  | const dom = new JSDOM(html_string); | 
					
						
							|  |  |  | /* get first TABLE element */ | 
					
						
							|  |  |  | const tbl = dom.window.document.querySelector("table"); | 
					
						
							|  |  |  | /* generate workbook */ | 
					
						
							|  |  |  | const workbook = XLSX.utils.table_to_book(tbl); | 
					
						
							|  |  |  | XLSX.writeFile(workbook, "SheetJSDOM.xlsx"); | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-20 21:37:10 +00:00
										 |  |  | <details><summary><b>Complete Demo</b> (click to show)</summary> | 
					
						
							| 
									
										
										
										
											2023-05-18 22:41:23 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | :::note | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This demo was last tested on 2023 May 18 against JSDOM `22.0.0` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1) Install SheetJS and JSDOM libraries: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <CodeBlock language="bash">{`\ | 
					
						
							|  |  |  | npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz jsdom@22.0.0`} | 
					
						
							|  |  |  | </CodeBlock> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2) Save the following script to `SheetJSDOM.js`: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js title="SheetJSDOM.js" | 
					
						
							|  |  |  | const XLSX = require("xlsx"); | 
					
						
							|  |  |  | const { readFileSync } = require("fs"); | 
					
						
							|  |  |  | const { JSDOM } = require("jsdom"); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | /* obtain HTML string.  This example reads from SheetJSTable.html */ | 
					
						
							|  |  |  | const html_str = readFileSync("SheetJSTable.html", "utf8"); | 
					
						
							|  |  |  | /* get first TABLE element */ | 
					
						
							|  |  |  | const doc = new JSDOM(html_str).window.document.querySelector("table"); | 
					
						
							|  |  |  | /* generate workbook */ | 
					
						
							|  |  |  | const workbook = XLSX.utils.table_to_book(doc); | 
					
						
							|  |  |  | XLSX.writeFile(workbook, "SheetJSDOM.xlsx"); | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 3) Download [the sample `SheetJSTable.html`](pathname:///dom/SheetJSTable.html): | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 4) Run the script: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | node SheetJSDOM.js | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The script will create a file `SheetJSDOM.xlsx` that can be opened. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-20 21:37:10 +00:00
										 |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### XMLDOM
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | XMLDOM provides a DOM framework for NodeJS. Given an HTML string, a reference to | 
					
						
							|  |  |  | the table element works with the SheetJS DOM methods after patching the object. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary><b>Complete Demo</b> (click to show)</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :::note | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This demo was last tested on 2023 May 18 against XMLDOM `0.8.7` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1) Install SheetJS and XMLDOM libraries: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <CodeBlock language="bash">{`\ | 
					
						
							|  |  |  | npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz @xmldom/xmldom@0.8.7`} | 
					
						
							|  |  |  | </CodeBlock> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2) Save the following codeblock to `SheetJSXMLDOM.js`: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js title="SheetJSXMLDOM.js" | 
					
						
							|  |  |  | const XLSX = require("xlsx"); | 
					
						
							|  |  |  | const { DOMParser, XMLSerializer } = require("@xmldom/xmldom"); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | (async() => { | 
					
						
							|  |  |  | const text = await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(); | 
					
						
							|  |  |  | const doc = new DOMParser().parseFromString( text, "text/html"); | 
					
						
							|  |  |  | const tbl = doc.getElementsByTagName("table")[0]; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | /* patch XMLDOM */ | 
					
						
							|  |  |  | tbl.rows = Array.from(tbl.getElementsByTagName("tr")); | 
					
						
							|  |  |  | tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td"))) | 
					
						
							|  |  |  | Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() { | 
					
						
							|  |  |  | 	var outerHTML = new XMLSerializer().serializeToString(this); | 
					
						
							|  |  |  | 	if(outerHTML.match(/</g).length == 1) return ""; | 
					
						
							|  |  |  | 	return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, ""); | 
					
						
							|  |  |  | }}); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | const workbook = XLSX.utils.table_to_book(tbl); | 
					
						
							|  |  |  | XLSX.writeFile(workbook, "SheetJSXMLDOM.xlsx"); | 
					
						
							|  |  |  | })(); | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 3) Run the script: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | node SheetJSXMLDOM.js | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The script will create a file `SheetJSXMLDOM.xlsx` that can be opened. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-05-18 22:41:23 +00:00
										 |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### CheerioJS
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :::caution | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Cheerio does not support a number of fundamental properties out of the box. They | 
					
						
							|  |  |  | can be shimmed, but it is strongly recommended to use a more compliant library. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | CheerioJS provides a DOM-like framework for NodeJS. Given an HTML string, a | 
					
						
							|  |  |  | reference to the table element works with the SheetJS DOM methods with some | 
					
						
							|  |  |  | prototype fixes. [`SheetJSCheerio.js`](pathname:///dom/SheetJSCheerio.js) is a | 
					
						
							|  |  |  | complete script. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary><b>Complete Demo</b> (click to show)</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :::note | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This demo was last tested on 2023 May 18 against Cheerio `1.0.0-rc.12` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1) Install SheetJS and CheerioJS libraries: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <CodeBlock language="bash">{`\ | 
					
						
							|  |  |  | npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz cheerio@1.0.0-rc.12`} | 
					
						
							|  |  |  | </CodeBlock> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2) Download [the sample script `SheetJSCheerio.js`](pathname:///dom/SheetJSCheerio.js): | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | curl -LO https://docs.sheetjs.com/dom/SheetJSCheerio.js | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 3) Download [the sample `SheetJSTable.html`](pathname:///dom/SheetJSTable.html): | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | curl -LO https://docs.sheetjs.com/dom/SheetJSTable.html | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 4) Run the script: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | node SheetJSCheerio.js | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The script will create a file `SheetJSCheerio.xlsx` that can be opened. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Other Platforms
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ### DenoDOM
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | DenoDOM provides a DOM framework for Deno. Given an HTML string, a reference to | 
					
						
							|  |  |  | the table element works with the SheetJS DOM methods after patching the object. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This example fetches [a sample table](pathname:///dom/SheetJSTable.html): | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```ts title="SheetJSDenoDOM.ts" | 
					
						
							|  |  |  | // @deno-types="https://cdn.sheetjs.com/xlsx-0.19.3/package/types/index.d.ts" | 
					
						
							|  |  |  | import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.19.3/package/xlsx.mjs'; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.38/deno-dom-wasm.ts'; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | const doc = new DOMParser().parseFromString( | 
					
						
							|  |  |  | 	await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(), | 
					
						
							|  |  |  |   "text/html", | 
					
						
							|  |  |  | )!; | 
					
						
							|  |  |  | // highlight-start | 
					
						
							|  |  |  | const tbl = doc.querySelector("table"); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | /* patch DenoDOM element */ | 
					
						
							|  |  |  | tbl.rows = tbl.querySelectorAll("tr"); | 
					
						
							|  |  |  | tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th")) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | /* generate workbook */ | 
					
						
							|  |  |  | const workbook = XLSX.utils.table_to_book(tbl); | 
					
						
							|  |  |  | // highlight-end | 
					
						
							|  |  |  | XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx"); | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details open><summary><b>Complete Demo</b> (click to hide)</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :::note | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This demo was last tested on 2023 May 18 against DenoDOM `0.1.38` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1) Save the previous codeblock to `SheetJSDenoDOM.ts`. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2) Run the script with `--allow-net` and `--allow-write` entitlements: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```bash | 
					
						
							|  |  |  | deno run --allow-net --allow-write SheetJSDenoDOM.ts | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The script will create a file `SheetJSDenoDOM.xlsx` that can be opened. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </details> |