| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | --- | 
					
						
							| 
									
										
										
										
											2023-05-18 09:21:08 +00:00
										 |  |  | sidebar_position: 3 | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | hide_table_of_contents: true | 
					
						
							|  |  |  | title: Reading Files | 
					
						
							|  |  |  | --- | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-08-21 23:07:34 +00:00
										 |  |  | **`XLSX.read(data, options)`** | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-08-21 23:07:34 +00:00
										 |  |  | `read` attempts to parse `data` and return [a workbook object](/docs/csf/book) | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-08-21 23:07:34 +00:00
										 |  |  | The [`type`](#input-type) of the `options` object determines how `data` is | 
					
						
							|  |  |  | interpreted. For string data, the default interpretation is Base64. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **`XLSX.readFile(filename, options)`** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | `readFile` attempts to read a local file with specified `filename`. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :::caution pass | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This method only works in specific environments. It does not work in browsers! | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The [NodeJS installation note](/docs/getting-started/installation/nodejs#usage) | 
					
						
							|  |  |  | includes additional instructions for non-standard use cases. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ::: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Parsing Options
 | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | The read functions accept an options argument: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | | Option Name | Default | Description                                          | | 
					
						
							|  |  |  | | :---------- | ------: | :--------------------------------------------------- | | 
					
						
							|  |  |  | |`type`       |         | Input data encoding (see Input Type below)           | | 
					
						
							|  |  |  | |`raw`        | false   | If true, plain text parsing will not parse values ** | | 
					
						
							| 
									
										
										
										
											2022-10-24 01:28:03 +00:00
										 |  |  | |`dense`      | false   | If true, use a dense worksheet representation **     | | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | |`codepage`   |         | If specified, use code page when appropriate **      | | 
					
						
							|  |  |  | |`cellFormula`| true    | Save formulae to the .f field                        | | 
					
						
							|  |  |  | |`cellHTML`   | true    | Parse rich text and save HTML to the `.h` field      | | 
					
						
							|  |  |  | |`cellNF`     | false   | Save number format string to the `.z` field          | | 
					
						
							|  |  |  | |`cellStyles` | false   | Save style/theme info to the `.s` field              | | 
					
						
							|  |  |  | |`cellText`   | true    | Generated formatted text to the `.w` field           | | 
					
						
							|  |  |  | |`cellDates`  | false   | Store dates as type `d` (default is `n`)             | | 
					
						
							|  |  |  | |`dateNF`     |         | If specified, use the string for date code 14 **     | | 
					
						
							|  |  |  | |`sheetStubs` | false   | Create cell objects of type `z` for stub cells       | | 
					
						
							|  |  |  | |`sheetRows`  | 0       | If >0, read the first `sheetRows` rows **            | | 
					
						
							|  |  |  | |`bookDeps`   | false   | If true, parse calculation chains                    | | 
					
						
							|  |  |  | |`bookFiles`  | false   | If true, add raw files to book object **             | | 
					
						
							|  |  |  | |`bookProps`  | false   | If true, only parse enough to get book metadata **   | | 
					
						
							|  |  |  | |`bookSheets` | false   | If true, only parse enough to get the sheet names    | | 
					
						
							|  |  |  | |`bookVBA`    | false   | If true, copy VBA blob to `vbaraw` field **          | | 
					
						
							|  |  |  | |`password`   | ""      | If defined and file is encrypted, use password **    | | 
					
						
							|  |  |  | |`WTF`        | false   | If true, throw errors on unexpected file features ** | | 
					
						
							|  |  |  | |`sheets`     |         | If specified, only parse specified sheets **         | | 
					
						
							|  |  |  | |`PRN`        | false   | If true, allow parsing of PRN files **               | | 
					
						
							|  |  |  | |`xlfn`       | false   | If true, preserve `_xlfn.` prefixes in formulae **   | | 
					
						
							|  |  |  | |`FS`         |         | DSV Field Separator override                         | | 
					
						
							| 
									
										
										
										
											2023-06-23 20:24:44 +00:00
										 |  |  | |`UTC`        | true    | If explicitly false, parse text dates in local time  | | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | - Even if `cellNF` is false, formatted text will be generated and saved to `.w` | 
					
						
							|  |  |  | - In some cases, sheets may be parsed even if `bookSheets` is false. | 
					
						
							|  |  |  | - Excel aggressively tries to interpret values from CSV and other plain text. | 
					
						
							|  |  |  |   This leads to surprising behavior! The `raw` option suppresses value parsing. | 
					
						
							|  |  |  | - `bookSheets` and `bookProps` combine to give both sets of information | 
					
						
							|  |  |  | - `Deps` will be an empty object if `bookDeps` is false | 
					
						
							|  |  |  | - `bookFiles` behavior depends on file type: | 
					
						
							|  |  |  |     * `keys` array (paths in the ZIP) for ZIP-based formats | 
					
						
							|  |  |  |     * `files` hash (mapping paths to objects representing the files) for ZIP | 
					
						
							|  |  |  |     * `cfb` object for formats using CFB containers | 
					
						
							|  |  |  | - `sheetRows-1` rows will be generated when looking at the JSON object output | 
					
						
							|  |  |  |   (since the header row is counted as a row when parsing the data) | 
					
						
							|  |  |  | - By default all worksheets are parsed.  `sheets` restricts based on input type: | 
					
						
							|  |  |  |     * number: zero-based index of worksheet to parse (`0` is first worksheet) | 
					
						
							|  |  |  |     * string: name of worksheet to parse (case insensitive) | 
					
						
							|  |  |  |     * array of numbers and strings to select multiple worksheets. | 
					
						
							|  |  |  | - `bookVBA` merely exposes the raw VBA CFB object.  It does not parse the data. | 
					
						
							|  |  |  |   XLSM and XLSB store the VBA CFB object in `xl/vbaProject.bin`. BIFF8 XLS mixes | 
					
						
							|  |  |  |   the VBA entries alongside the core Workbook entry, so the library generates a | 
					
						
							| 
									
										
										
										
											2022-08-25 08:22:28 +00:00
										 |  |  |   new blob from the XLS CFB container that works in XLSM and XLSB files. | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | - `codepage` is applied to BIFF2 - BIFF5 files without `CodePage` records and to | 
					
						
							|  |  |  |   CSV files without BOM in `type:"binary"`.  BIFF8 XLS always defaults to 1200. | 
					
						
							|  |  |  | - `PRN` affects parsing of text files without a common delimiter character. | 
					
						
							|  |  |  | - Currently only XOR encryption is supported.  Unsupported error will be thrown | 
					
						
							|  |  |  |   for files employing other encryption methods. | 
					
						
							|  |  |  | - Newer Excel functions are serialized with the `_xlfn.` prefix, hidden from the | 
					
						
							|  |  |  |   user. SheetJS will strip `_xlfn.` normally. The `xlfn` option preserves them. | 
					
						
							| 
									
										
										
										
											2022-08-23 03:20:02 +00:00
										 |  |  | - `WTF` is mainly for development.  By default, the parser will suppress read | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  |   errors on single worksheets, allowing you to read from the worksheets that do | 
					
						
							|  |  |  |   parse properly. Setting `WTF:true` forces those errors to be thrown. | 
					
						
							| 
									
										
										
										
											2022-10-24 01:28:03 +00:00
										 |  |  | - By default, "sparse" mode worksheets are generated. Individual cells are | 
					
						
							| 
									
										
										
										
											2022-10-31 00:58:49 +00:00
										 |  |  |   accessed by indexing the worksheet object with an A1-Style address.  "dense" | 
					
						
							| 
									
										
										
										
											2022-10-24 01:28:03 +00:00
										 |  |  |   worksheets store cells in an array of arrays at `sheet["!data"]`. | 
					
						
							| 
									
										
										
										
											2023-06-23 20:24:44 +00:00
										 |  |  | - `UTC` applies to CSV, Text and HTML formats.  When explicitly set to `false`, | 
					
						
							|  |  |  |   the parsers will assume the files are specified in local time. By default, as | 
					
						
							|  |  |  |   is the case for other file formats, dates and times are interpreted in UTC. | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | ### Input Type
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Strings can be interpreted in multiple ways.  The `type` parameter for `read` | 
					
						
							|  |  |  | tells the library how to parse the data argument: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | | `type`     | expected input                                                  | | 
					
						
							| 
									
										
										
										
											2023-06-25 09:36:58 +00:00
										 |  |  | |:-----------|:----------------------------------------------------------------| | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | | `"base64"` | string: Base64 encoding of the file                             | | 
					
						
							|  |  |  | | `"binary"` | string: binary string (byte `n` is `data.charCodeAt(n)`)        | | 
					
						
							| 
									
										
										
										
											2022-08-25 08:22:28 +00:00
										 |  |  | | `"string"` | string: JS string (only appropriate for UTF-8 text formats)     | | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | | `"buffer"` | nodejs Buffer                                                   | | 
					
						
							| 
									
										
										
										
											2023-06-25 09:36:58 +00:00
										 |  |  | | `"array"`  | array: array of 8-bit unsigned integers (byte `n` is `data[n]`) | | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | | `"file"`   | string: path of file that will be read (nodejs only)            | | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-11-13 20:45:13 +00:00
										 |  |  | Some common types are automatically deduced from the data input type, including | 
					
						
							|  |  |  | NodeJS `Buffer` objects, `Uint8Array` and `ArrayBuffer` objects, and arrays of | 
					
						
							|  |  |  | numbers. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When a JS `string` is passed with no `type`, the library assumes the data is a | 
					
						
							|  |  |  | Base64 string. `FileReader#readAsBinaryString` or ASCII data requires `"binary"` | 
					
						
							|  |  |  | type. DOM strings including `FileReader#readAsText` should use type `"string"`. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | ### Guessing File Type
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details> | 
					
						
							|  |  |  |   <summary><b>Implementation Details</b> (click to show)</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Excel and other spreadsheet tools read the first few bytes and apply other | 
					
						
							|  |  |  | heuristics to determine a file type.  This enables file type punning: renaming | 
					
						
							|  |  |  | files with the `.xls` extension will tell your computer to use Excel to open the | 
					
						
							|  |  |  | file but Excel will know how to handle it.  This library applies similar logic: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | | Byte 0 | Raw File Type | Spreadsheet Types                                   | | 
					
						
							|  |  |  | |:-------|:--------------|:----------------------------------------------------| | 
					
						
							|  |  |  | | `0xD0` | CFB Container | BIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLR   | | 
					
						
							|  |  |  | | `0x09` | BIFF Stream   | BIFF 2/3/4/5                                        | | 
					
						
							|  |  |  | | `0x3C` | XML/HTML      | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text | | 
					
						
							|  |  |  | | `0x50` | ZIP Archive   | XLSB or XLSX/M or ODS or UOS2 or NUMBERS or text    | | 
					
						
							|  |  |  | | `0x49` | Plain Text    | SYLK or plain text                                  | | 
					
						
							|  |  |  | | `0x54` | Plain Text    | DIF or plain text                                   | | 
					
						
							| 
									
										
										
										
											2022-08-25 08:22:28 +00:00
										 |  |  | | `0xEF` | UTF-8 Text    | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text | | 
					
						
							|  |  |  | | `0xFF` | UTF-16 Text   | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text | | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | | `0x00` | Record Stream | Lotus WK\* or Quattro Pro or plain text             | | 
					
						
							|  |  |  | | `0x7B` | Plain text    | RTF or plain text                                   | | 
					
						
							|  |  |  | | `0x0A` | Plain text    | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text | | 
					
						
							|  |  |  | | `0x0D` | Plain text    | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text | | 
					
						
							|  |  |  | | `0x20` | Plain text    | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text | | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | DBF files are detected based on the first byte as well as the third and fourth | 
					
						
							|  |  |  | bytes (corresponding to month and day of the file date) | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-08-25 08:22:28 +00:00
										 |  |  | Works for Windows files are detected based on the `BOF` record with type `0xFF` | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Plain text format guessing follows the priority order: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | | Format | Test                                                                | | 
					
						
							|  |  |  | |:-------|:--------------------------------------------------------------------| | 
					
						
							|  |  |  | | XML    | `<?xml` appears in the first 1024 characters                        | | 
					
						
							|  |  |  | | HTML   | starts with `<` and HTML tags appear in the first 1024 characters * | | 
					
						
							|  |  |  | | XML    | starts with `<` and the first tag is valid                          | | 
					
						
							|  |  |  | | RTF    | starts with `{\rt`                                                  | | 
					
						
							|  |  |  | | DSV    | starts with `/sep=.$/`, separator is the specified character        | | 
					
						
							|  |  |  | | DSV    | more unquoted `|` chars than `;` `\t`  `,` in the first 1024        | | 
					
						
							|  |  |  | | DSV    | more unquoted `;` chars than `\t` or `,` in the first 1024          | | 
					
						
							|  |  |  | | TSV    | more unquoted `\t` chars than `,` chars in the first 1024           | | 
					
						
							|  |  |  | | CSV    | one of the first 1024 characters is a comma `","`                   | | 
					
						
							|  |  |  | | ETH    | starts with `socialcalc:version:`                                   | | 
					
						
							|  |  |  | | PRN    | `PRN` option is set to true                                         | | 
					
						
							|  |  |  | | CSV    | (fallback)                                                          | | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | - HTML tags include: `html`, `table`, `head`, `meta`, `script`, `style`, `div` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-08-21 23:07:34 +00:00
										 |  |  | <details open> | 
					
						
							|  |  |  |   <summary><b>Why are random text files valid?</b> (click to hide)</summary> | 
					
						
							| 
									
										
										
										
											2022-05-16 03:26:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Excel is extremely aggressive in reading files.  Adding an XLS extension to any | 
					
						
							|  |  |  | display text file  (where the only characters are ANSI display chars) tricks | 
					
						
							|  |  |  | Excel into thinking that the file is potentially a CSV or TSV file, even if it | 
					
						
							|  |  |  | is only one column!  This library attempts to replicate that behavior. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The best approach is to validate the desired worksheet and ensure it has the | 
					
						
							|  |  |  | expected number of rows or columns.  Extracting the range is extremely simple: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```js | 
					
						
							|  |  |  | var range = XLSX.utils.decode_range(worksheet['!ref']); | 
					
						
							|  |  |  | var ncols = range.e.c - range.s.c + 1, nrows = range.e.r - range.s.r + 1; | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 |