2022-05-16 03:26:04 +00:00
|
|
|
---
|
2024-05-14 02:43:07 +00:00
|
|
|
title: Reading Files
|
2023-05-18 09:21:08 +00:00
|
|
|
sidebar_position: 3
|
2022-05-16 03:26:04 +00:00
|
|
|
hide_table_of_contents: true
|
|
|
|
|
---
|
|
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
The main SheetJS method for reading files is `read`. It expects developers to
|
2025-11-16 07:08:21 +00:00
|
|
|
supply the actual data in a [supported representation](#input-type).
|
2024-05-14 02:43:07 +00:00
|
|
|
|
|
|
|
|
The `readFile` helper method accepts a filename and tries to read the specified
|
|
|
|
|
file using standard APIs. *It does not work in web browsers!*
|
|
|
|
|
|
|
|
|
|
**Parse file data and generate a SheetJS workbook object**
|
|
|
|
|
|
|
|
|
|
```js
|
|
|
|
|
var wb = XLSX.read(data, opts);
|
|
|
|
|
```
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2023-08-21 23:07:34 +00:00
|
|
|
`read` attempts to parse `data` and return [a workbook object](/docs/csf/book)
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
The [`type`](#input-type) property of the `opts` object controls how `data` is
|
2023-08-21 23:07:34 +00:00
|
|
|
interpreted. For string data, the default interpretation is Base64.
|
|
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
**Read a specified file and generate a SheetJS workbook object**
|
|
|
|
|
|
|
|
|
|
```js
|
|
|
|
|
var wb = XLSX.readFile(filename, opts);
|
|
|
|
|
```
|
2023-08-21 23:07:34 +00:00
|
|
|
|
|
|
|
|
`readFile` attempts to read a local file with specified `filename`.
|
|
|
|
|
|
|
|
|
|
:::caution pass
|
|
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
`readFile` works in specific platforms. **It does not support web browsers!**
|
2023-08-21 23:07:34 +00:00
|
|
|
|
|
|
|
|
The [NodeJS installation note](/docs/getting-started/installation/nodejs#usage)
|
|
|
|
|
includes additional instructions for non-standard use cases.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
:::tip pass
|
|
|
|
|
|
|
|
|
|
The SheetJS file format import codecs focus on raw data. Not all codecs support
|
|
|
|
|
all features. Features not described in the documentation may not be extracted.
|
|
|
|
|
|
|
|
|
|
[SheetJS Pro](https://sheetjs.com/pro) offers support for additional features,
|
|
|
|
|
including styling, images, graphs, and PivotTables.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
2023-08-21 23:07:34 +00:00
|
|
|
## Parsing Options
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
|
The read functions accept an options argument:
|
|
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
| Option Name | Default | Description |
|
|
|
|
|
|:--------------|:--------|:---------------------------------------------------|
|
|
|
|
|
| `type` | | [Input data representation](#input-type) |
|
|
|
|
|
| `raw` | `false` | Disable [value parsing in plaintext formats](#raw) |
|
|
|
|
|
| `dense` | `false` | If true, [generate dense worksheets](#dense) |
|
|
|
|
|
| `codepage` | | Use specified [code page encoding](#codepage) |
|
|
|
|
|
| `cellFormula` | `true` | Save [formulae to the `f` field](#formulae) |
|
|
|
|
|
| `cellHTML` | `true` | Parse text and [save HTML to the `h` field](#html) |
|
|
|
|
|
| `cellNF` | `false` | Save [number format to the `z` field](#text) |
|
|
|
|
|
| `cellStyles` | `false` | Save [style/theme info to the `s` field](#style) |
|
|
|
|
|
| `cellText` | `true` | Save [formatted text to the `w` field](#text) |
|
|
|
|
|
| `cellDates` | `false` | [Generate proper date (type `d`) cells](#dates) |
|
|
|
|
|
| `dateNF` | | If specified, [override date code 14](#dates) |
|
|
|
|
|
| `sheetStubs` | `false` | [Create cells of type `z` for stub cells](#stubs) |
|
|
|
|
|
| `sheetRows` | `0` | If >0, read the [specified number of rows](#range) |
|
|
|
|
|
| `bookDeps` | `false` | If true, parse calculation chains |
|
|
|
|
|
| `bookFiles` | `false` | Add [raw files](#files) to book object |
|
|
|
|
|
| `bookProps` | `false` | If true, [only parse book metadata](#metadata) |
|
|
|
|
|
| `bookSheets` | `false` | If true, [only parse sheet names](#metadata) |
|
|
|
|
|
| `bookVBA` | `false` | If true, generate [VBA blob](#vba) |
|
|
|
|
|
| `password` | `""` | If specified, [decrypt workbook](#password) |
|
|
|
|
|
| `WTF` | `false` | [Do not suppress worksheet parsing errors](#wtf) |
|
|
|
|
|
| `sheets` | | Only parse [specified sheets](#sheets) |
|
|
|
|
|
| `nodim` | `false` | If true, calculate [worksheet ranges](#range) |
|
|
|
|
|
| `PRN` | `false` | If true, [allow parsing of PRN files](#prn) |
|
|
|
|
|
| `xlfn` | `false` | Use [raw formula function names](#formulae) |
|
|
|
|
|
| `FS` | | [DSV Field Separator override](#dsv) |
|
|
|
|
|
| `UTC` | `true` | Parse [text dates and times using UTC](#tz) |
|
|
|
|
|
|
|
|
|
|
### Cell-Level Options
|
|
|
|
|
|
|
|
|
|
#### Dates
|
|
|
|
|
|
|
|
|
|
By default, for consistency with spreadsheet applications, date cells are stored
|
|
|
|
|
as numeric cells (type `n`) with special number formats. If `cellDates` is
|
|
|
|
|
enabled, date codes are converted to proper Date objects.
|
|
|
|
|
|
|
|
|
|
Excel file formats (including XLSX, XLSB, and XLS) support a locale-specific
|
|
|
|
|
date format, typically stored as date code 14 or the string `m/d/yy`. The
|
|
|
|
|
formatted text for some cells will change based on the computer locale. SheetJS
|
|
|
|
|
parsers use the `en-US` form by default. If the `dateNF` option is set, that
|
|
|
|
|
number format string will be used.
|
|
|
|
|
|
|
|
|
|
["Dates and Times"](/docs/csf/features/dates) covers features in more detail.
|
|
|
|
|
|
|
|
|
|
#### Formulae
|
|
|
|
|
|
|
|
|
|
For some file formats, the `cellFormula` option must be explicitly enabled to
|
|
|
|
|
ensure that formulae are extracted.
|
|
|
|
|
|
|
|
|
|
Newer Excel functions are serialized with the `_xlfn.` prefix, hidden from the
|
|
|
|
|
user. By default, the file parsers will strip `_xlfn.` and similar prefixes.
|
|
|
|
|
If the `xlfn` option is enabled, the prefixes will be preserved.
|
|
|
|
|
|
|
|
|
|
[The "Formulae" docs](/docs/csf/features/formulae#prefixed-future-functions)
|
|
|
|
|
covers this in more detail.
|
|
|
|
|
|
|
|
|
|
#### Formatted Text {#text}
|
|
|
|
|
|
|
|
|
|
Many common spreadsheet formats (including XLSX, XLSB, and XLS) store numeric
|
|
|
|
|
values and number formats. Applications are expected to use the number formats
|
|
|
|
|
to display currency strings, dates, and other values.
|
|
|
|
|
|
|
|
|
|
Under the hood, parsers use the [SSF Number Formatter](/docs/constellation/ssf)
|
|
|
|
|
library to generated formatted text.
|
|
|
|
|
|
|
|
|
|
By default, formatted text is generated. If the `cellText` option is false,
|
|
|
|
|
formatted text will not be written.
|
|
|
|
|
|
|
|
|
|
By default, cell number formats are not preserved. If the `cellNF` option is
|
|
|
|
|
enabled, number format strings will be saved to the `z` field of cell objects.
|
|
|
|
|
|
|
|
|
|
["Number Formats"](/docs/csf/features/nf) covers the features in more detail.
|
|
|
|
|
|
|
|
|
|
:::note pass
|
|
|
|
|
|
|
|
|
|
Even if `cellNF` is false, formatted text will be generated and saved to `w`.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
#### Text and Cell Styling {#style}
|
|
|
|
|
|
|
|
|
|
By default, SheetJS CE parsers focus on data extraction.
|
|
|
|
|
|
|
|
|
|
If the `cellStyles` option is `true`, other styling metadata including
|
|
|
|
|
[row](/docs/csf/features/rowprops) and [column](/docs/csf/features/colprops)
|
|
|
|
|
properties will be parsed.
|
|
|
|
|
|
|
|
|
|
:::tip pass
|
|
|
|
|
|
|
|
|
|
[SheetJS Pro](https://sheetjs.com/pro) offers cell / text styling, conditional
|
|
|
|
|
formatting and additional styling options.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
#### HTML Formatted Text {#html}
|
|
|
|
|
|
|
|
|
|
Spreadsheet applications support a limited form of rich text styling.
|
|
|
|
|
|
|
|
|
|
If the `cellHTML` option is `true`, some file parsers will attempt to translate
|
|
|
|
|
the rich text to standard HTML with inner tags for bold text and other styles.
|
|
|
|
|
|
|
|
|
|
:::tip pass
|
|
|
|
|
|
|
|
|
|
[SheetJS Pro](https://sheetjs.com/pro) offers additional styling options,
|
|
|
|
|
conversions for all supported file formats, and whole-worsheet HTML generation.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
### Sheet-Level Options
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-07-08 08:18:18 +00:00
|
|
|
#### Dense
|
|
|
|
|
|
2025-08-13 20:28:31 +00:00
|
|
|
By default, the `read` and `readFile` functions generate "sparse" worksheets.
|
|
|
|
|
When the `dense` option is set to `true`, the functions will generate "dense"
|
|
|
|
|
worksheets that may be more efficient in modern browsers.
|
|
|
|
|
|
|
|
|
|
The ["Cell Storage"](/docs/csf/sheet#cell-storage) section explains worksheet
|
|
|
|
|
structures in more detail.
|
2024-07-08 08:18:18 +00:00
|
|
|
|
|
|
|
|
:::note pass
|
|
|
|
|
|
|
|
|
|
[Utility functions that process SheetJS workbook objects](/docs/api/utilities/)
|
2025-08-13 20:28:31 +00:00
|
|
|
typically support sparse and dense worksheets.
|
2024-07-08 08:18:18 +00:00
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
#### Range
|
|
|
|
|
|
2025-08-13 20:28:31 +00:00
|
|
|
Some file formats, including XLSX and XLS, can self-report worksheet ranges.
|
|
|
|
|
`read` and `readFile` assume the self-reported worksheet ranges are correct. If
|
|
|
|
|
files include cells outside this range, the parsers will save cell information
|
|
|
|
|
but other utility functions will ignore those cells.
|
2024-05-14 02:43:07 +00:00
|
|
|
|
|
|
|
|
If the `sheetRows` option is set, up to `sheetRows` rows will be parsed from the
|
|
|
|
|
worksheets. `sheetRows-1` rows will be generated when looking at the JSON object
|
2024-07-08 08:18:18 +00:00
|
|
|
output (since the header row is counted as a row when parsing the data). The
|
|
|
|
|
`!ref` property of the worksheet will hold the adjusted range. For formats that
|
|
|
|
|
self-report sheet ranges, the `!fullref` property will hold the original range.
|
2024-05-14 02:43:07 +00:00
|
|
|
|
|
|
|
|
The `nodim` option instructs the parser to ignore self-reported ranges and use
|
|
|
|
|
the actual cells in the worksheet to determine the range. This addresses known
|
|
|
|
|
issues with non-compliant third-party exporters.
|
|
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
#### Stubs
|
2024-05-14 02:43:07 +00:00
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
Some file formats, including XLSX and XLS, can specify cells without cell data.
|
|
|
|
|
For example, cells covered by a [merged cell block](/docs/csf/features/merges)
|
|
|
|
|
are technically invalid but files may include metadata.
|
2024-05-14 02:43:07 +00:00
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
By default, the cells are skipped. If the `sheetStubs` option is `true`, these
|
|
|
|
|
cells will be parsed as [stub cells](/docs/csf/cell#cell-types)
|
2024-05-14 02:43:07 +00:00
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
### Book-Level Options
|
2024-05-14 02:43:07 +00:00
|
|
|
|
|
|
|
|
#### VBA
|
|
|
|
|
|
|
|
|
|
When a macro-enabled file is parsed, if the `bookVBA` option is `true`, the raw
|
|
|
|
|
VBA blob will be stored in the `vbaraw` property of the workbook.
|
|
|
|
|
|
|
|
|
|
["VBA and Macros"](/docs/csf/features/vba) covers the features in more detail.
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary><b>Implementation Details</b> (click to show)</summary>
|
|
|
|
|
|
|
|
|
|
The `bookVBA` merely exposes the raw VBA CFB object. It does not parse the data.
|
|
|
|
|
|
|
|
|
|
XLSM and XLSB store the VBA CFB object in `xl/vbaProject.bin`. BIFF8 XLS mixes
|
|
|
|
|
the VBA entries alongside the core Workbook entry, so the library generates a
|
|
|
|
|
new blob from the XLS CFB container that works in XLSM and XLSB files.
|
|
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
#### Workbook Metadata {#metadata}
|
|
|
|
|
|
|
|
|
|
By default, the data from each worksheet is parsed.
|
|
|
|
|
|
|
|
|
|
If any of the following options are passed, parsers will not parse sheet data.
|
|
|
|
|
They will parse enough of the workbook to extract the requested information.
|
|
|
|
|
|
|
|
|
|
| Option | Extracted Data |
|
|
|
|
|
|:-------------|:--------------------|
|
|
|
|
|
| `bookProps` | Workbook properties |
|
|
|
|
|
| `bookSheets` | Worksheet names |
|
|
|
|
|
|
|
|
|
|
The options apply to XLSX, XLSB, XLS and XLML parsers.
|
|
|
|
|
|
|
|
|
|
#### Worksheets {#sheets}
|
|
|
|
|
|
|
|
|
|
By default, all worksheets are parsed. The `sheets` option limits which sheets
|
|
|
|
|
are parsed.
|
|
|
|
|
|
|
|
|
|
If the `sheets` option is a number, the number is interpreted as a zero-based
|
|
|
|
|
index. For example, `sheets: 2` instructs the parser to read the third sheet.
|
|
|
|
|
|
|
|
|
|
If the `sheets` option is text, the string is interpreted as a worksheet name.
|
|
|
|
|
The name is case-insensitive. `sheets: "Sheet1"` instructs the parser to read
|
|
|
|
|
the worksheet named "Sheet1".
|
|
|
|
|
|
|
|
|
|
If the `sheets` option is an array of numbers and text, each worksheets will
|
|
|
|
|
be parsed. `sheets: [2, "Sheet1"]` instructs the parser to read the third sheet
|
|
|
|
|
and the sheet named "Sheet1". If the third worksheet is coincidentally named
|
|
|
|
|
"Sheet1", only one worksheet will be parsed
|
|
|
|
|
|
|
|
|
|
### File-Level Options
|
|
|
|
|
|
|
|
|
|
#### Password Protection {#password}
|
|
|
|
|
|
|
|
|
|
SheetJS CE currently supports XOR encryption in XLS files. Errors will be thrown
|
|
|
|
|
when trying to parse files using unsupported encryption methods.
|
|
|
|
|
|
|
|
|
|
:::tip pass
|
|
|
|
|
|
|
|
|
|
[SheetJS Pro](https://sheetjs.com/pro) offers support for additional encryption
|
|
|
|
|
schemes, including the AES-CBC schemes used in XLSX / XLSM / XLSB files and the
|
|
|
|
|
RC4 schemes used in newer XLS files.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
#### Lotus Formatted Text (PRN) {#prn}
|
|
|
|
|
|
|
|
|
|
Lotus Formatted Text (`PRN`) worksheets are plain text files that do not include
|
|
|
|
|
delimiter characters. Each cell in a column has the same width.
|
|
|
|
|
|
|
|
|
|
If the `PRN` option is set, the plaintext parser will attempt to parse some
|
|
|
|
|
plaintext files as if they follow the `PRN` format.
|
|
|
|
|
|
|
|
|
|
:::note pass
|
|
|
|
|
|
|
|
|
|
If the `PRN` option is set, text files that do not include commas or semicolons
|
|
|
|
|
or other common delimiters may not be parsed as expected.
|
|
|
|
|
|
|
|
|
|
This option should not be enabled unless it is known that the file was exported
|
|
|
|
|
from Lotus 1-2-3 or from Excel using the "Lotus Formatted Text (`PRN`)" format.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
#### Value Parsing {#raw}
|
|
|
|
|
|
|
|
|
|
Spreadsheet software including Excel aggressively try to interpret values from
|
|
|
|
|
CSV and other plain text. This leads to surprising behavior[^1]!
|
|
|
|
|
|
|
|
|
|
If the `raw` option is true, value parsing will be suppressed. All cells values
|
|
|
|
|
are treated as strings.
|
|
|
|
|
|
|
|
|
|
The `raw` option affects the following formats: HTML, CSV, PRN, DIF, RTF.
|
|
|
|
|
|
|
|
|
|
The `raw` option does not affect XLSX, XLSB, XLS and other file formats that
|
|
|
|
|
support explicit value typing.
|
|
|
|
|
|
|
|
|
|
:::note pass
|
|
|
|
|
|
|
|
|
|
See [Issue #3331](https://git.sheetjs.com/sheetjs/sheetjs/issues/3145) in the
|
|
|
|
|
SheetJS CE bug tracker for more details.
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
#### Code Page Encoding {#codepage}
|
|
|
|
|
|
|
|
|
|
Spreadsheet applications support a number of legacy encodings. Plaintext files
|
|
|
|
|
will appear different when opened in different computers in different regions.
|
|
|
|
|
|
|
|
|
|
By default, the parsers use the most common "English (United States)" encodings.
|
|
|
|
|
The `codepage` option controls the encoding in BIFF2 - BIFF5 XLS files without
|
|
|
|
|
`CodePage` records, some legacy formats including DBF, and in CSV files without
|
|
|
|
|
BOM in `type: "binary"`. BIFF8 XLS always defaults to 1200.
|
|
|
|
|
|
|
|
|
|
The `codepage` support library is not guaranteed to be loaded by default. The
|
|
|
|
|
["Installation"](/docs/getting-started/installation/) section describes how to
|
|
|
|
|
install and load the support library.
|
|
|
|
|
|
|
|
|
|
See ["Legacy Codepages"](/docs/constellation/codepage) for more details.
|
|
|
|
|
|
|
|
|
|
#### Date Processing {#tz}
|
|
|
|
|
|
|
|
|
|
Plaintext formats may include date and time values without timezone info. The
|
|
|
|
|
time `12:30 AM` is ambiguous.
|
|
|
|
|
|
|
|
|
|
In the wild, there are two popular approaches:
|
|
|
|
|
|
|
|
|
|
A) Spreadsheet software typically interpret time values using local timezones.
|
|
|
|
|
When opening a file in New York, `12:30 AM` will be parsed as `12:30 AM ET`.
|
|
|
|
|
When opening a file in Los Angeles, the time will be parsed as `12:30 AM PT`.
|
|
|
|
|
|
|
|
|
|
B) APIs use [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time), the
|
|
|
|
|
most popular global time standard. `12:30 AM` will be parsed as the absolute
|
|
|
|
|
moment in time corresponding to `8:30 PM EDT` or `7:30 PM EST`.
|
|
|
|
|
|
|
|
|
|
By default, the parsers assume files are specified in UTC. When the `UTC` option
|
|
|
|
|
is explicitly set to `false`, dates and times are interpreted in timezone of the
|
|
|
|
|
web browser or JavaScript engine.
|
|
|
|
|
|
|
|
|
|
#### Delimiter-Separated Values {#dsv}
|
|
|
|
|
|
|
|
|
|
The plaintext parser applies a number of heuristics to determine if files are
|
|
|
|
|
CSV (fields separated by commas), TSV (fields separated by tabs), PSV (fields
|
|
|
|
|
separated by `|`) or SSV (fields separated by `;`). The heuristics are based on
|
|
|
|
|
the presence of characters not in a double-quoted value.
|
|
|
|
|
|
|
|
|
|
The `FS` option instructs the parser to use the specified delimiter if multiple
|
|
|
|
|
delimiter characters are in the text. This bypasses the default heuristics.
|
|
|
|
|
|
|
|
|
|
#### Internal Files {#files}
|
|
|
|
|
|
|
|
|
|
Some file formats are structured as larger containers that include sub-files.
|
|
|
|
|
For example, XLSX files are ZIP files with XML sub-files.
|
|
|
|
|
|
|
|
|
|
If the `bookFiles` option is `true`, each sub-file will be preserved in the
|
|
|
|
|
workbook. The behavior depends on file type:
|
|
|
|
|
|
|
|
|
|
- `keys` array (paths in the ZIP) for ZIP-based formats
|
|
|
|
|
- `files` hash (mapping paths to objects representing the files) for ZIP
|
|
|
|
|
- `cfb` object for formats using CFB containers
|
|
|
|
|
|
|
|
|
|
#### Parsing Errors {#wtf}
|
|
|
|
|
|
|
|
|
|
By default, the workbook parser will suppress errors when parsing worksheets.
|
|
|
|
|
This ensures the valid worksheets from a multi-sheet workbook are parsed.
|
|
|
|
|
|
|
|
|
|
If the `WTF` option is enabled, the errors will not be suppressed.
|
|
|
|
|
|
2022-05-16 03:26:04 +00:00
|
|
|
### Input Type
|
|
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
The `type` parameter for `read` controls how data is interpreted:
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
| `type` | expected input |
|
|
|
|
|
|:---------|:----------------------------------------------------------------|
|
|
|
|
|
| `base64` | string: Base64 encoding of the file |
|
|
|
|
|
| `binary` | string: binary string (byte `n` is `data.charCodeAt(n)`) |
|
|
|
|
|
| `string` | string: JS string (only appropriate for UTF-8 text formats) |
|
|
|
|
|
| `buffer` | nodejs Buffer |
|
|
|
|
|
| `array` | array: array of 8-bit unsigned integers (byte `n` is `data[n]`) |
|
|
|
|
|
| `file` | string: path of file that will be read (nodejs only) |
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2022-11-13 20:45:13 +00:00
|
|
|
Some common types are automatically deduced from the data input type, including
|
|
|
|
|
NodeJS `Buffer` objects, `Uint8Array` and `ArrayBuffer` objects, and arrays of
|
|
|
|
|
numbers.
|
|
|
|
|
|
|
|
|
|
When a JS `string` is passed with no `type`, the library assumes the data is a
|
|
|
|
|
Base64 string. `FileReader#readAsBinaryString` or ASCII data requires `"binary"`
|
|
|
|
|
type. DOM strings including `FileReader#readAsText` should use type `"string"`.
|
|
|
|
|
|
2022-05-16 03:26:04 +00:00
|
|
|
### Guessing File Type
|
|
|
|
|
|
|
|
|
|
<details>
|
|
|
|
|
<summary><b>Implementation Details</b> (click to show)</summary>
|
|
|
|
|
|
|
|
|
|
Excel and other spreadsheet tools read the first few bytes and apply other
|
|
|
|
|
heuristics to determine a file type. This enables file type punning: renaming
|
|
|
|
|
files with the `.xls` extension will tell your computer to use Excel to open the
|
|
|
|
|
file but Excel will know how to handle it. This library applies similar logic:
|
|
|
|
|
|
|
|
|
|
| Byte 0 | Raw File Type | Spreadsheet Types |
|
|
|
|
|
|:-------|:--------------|:----------------------------------------------------|
|
|
|
|
|
| `0xD0` | CFB Container | BIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLR |
|
|
|
|
|
| `0x09` | BIFF Stream | BIFF 2/3/4/5 |
|
|
|
|
|
| `0x3C` | XML/HTML | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
|
|
|
|
|
| `0x50` | ZIP Archive | XLSB or XLSX/M or ODS or UOS2 or NUMBERS or text |
|
|
|
|
|
| `0x49` | Plain Text | SYLK or plain text |
|
|
|
|
|
| `0x54` | Plain Text | DIF or plain text |
|
2022-08-25 08:22:28 +00:00
|
|
|
| `0xEF` | UTF-8 Text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
|
|
|
|
|
| `0xFF` | UTF-16 Text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
|
2022-05-16 03:26:04 +00:00
|
|
|
| `0x00` | Record Stream | Lotus WK\* or Quattro Pro or plain text |
|
|
|
|
|
| `0x7B` | Plain text | RTF or plain text |
|
|
|
|
|
| `0x0A` | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
|
|
|
|
|
| `0x0D` | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
|
|
|
|
|
| `0x20` | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
|
|
|
|
|
|
|
|
|
|
DBF files are detected based on the first byte as well as the third and fourth
|
|
|
|
|
bytes (corresponding to month and day of the file date)
|
|
|
|
|
|
2022-08-25 08:22:28 +00:00
|
|
|
Works for Windows files are detected based on the `BOF` record with type `0xFF`
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
|
Plain text format guessing follows the priority order:
|
|
|
|
|
|
|
|
|
|
| Format | Test |
|
|
|
|
|
|:-------|:--------------------------------------------------------------------|
|
|
|
|
|
| XML | `<?xml` appears in the first 1024 characters |
|
2024-05-14 02:43:07 +00:00
|
|
|
| HTML | starts with `<` and HTML tags appear in the first 1024 characters |
|
2022-05-16 03:26:04 +00:00
|
|
|
| XML | starts with `<` and the first tag is valid |
|
|
|
|
|
| RTF | starts with `{\rt` |
|
2024-03-12 06:47:52 +00:00
|
|
|
| DSV | starts with `sep=` followed by field delimiter and line separator |
|
|
|
|
|
| DSV | more unquoted `\|` chars than `;` `\t` or `,` in the first 1024 |
|
2022-05-16 03:26:04 +00:00
|
|
|
| DSV | more unquoted `;` chars than `\t` or `,` in the first 1024 |
|
|
|
|
|
| TSV | more unquoted `\t` chars than `,` chars in the first 1024 |
|
|
|
|
|
| CSV | one of the first 1024 characters is a comma `","` |
|
|
|
|
|
| ETH | starts with `socialcalc:version:` |
|
|
|
|
|
| PRN | `PRN` option is set to true |
|
|
|
|
|
| CSV | (fallback) |
|
|
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
HTML tags include `html`, `table`, `head`, `meta`, `script`, `style`, `div`
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
2023-08-21 23:07:34 +00:00
|
|
|
<details open>
|
2024-10-11 19:36:18 +00:00
|
|
|
<summary><b>Why are random files valid?</b> (click to hide)</summary>
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-05-14 02:43:07 +00:00
|
|
|
Excel is extremely aggressive in reading files. Adding the XLS extension to any
|
2024-10-11 19:36:18 +00:00
|
|
|
file tricks Excel into processing the file.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-10-11 19:36:18 +00:00
|
|
|
If the file matches certain heuristics, Excel will use a format-specific parser.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
2024-10-11 19:36:18 +00:00
|
|
|
If it cannot deduce the file type, Excel will parse the unknown file as if it
|
|
|
|
|
were CSV or TSV. SheetJS attempts to replicate that behavior.
|
2022-05-16 03:26:04 +00:00
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
2025-11-16 07:08:21 +00:00
|
|
|
[^1]: The gene [`SEPT1`](https://en.wikipedia.org/wiki/SEPTIN1) was renamed to
|
|
|
|
|
`SEPTIN1` to avoid Excel value interpretations: the string `SEPT1` is parsed as
|
|
|
|
|
the date "September 1".
|