forked from sheetjs/docs.sheetjs.com
		
	
		
			
	
	
		
			257 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			257 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|  | ## File Formats
 | ||
|  | 
 | ||
|  | Despite the library name `xlsx`, it supports numerous spreadsheet file formats: | ||
|  | 
 | ||
|  | | Format                                                       | Read  | Write | | ||
|  | |:-------------------------------------------------------------|:-----:|:-----:| | ||
|  | | **Excel Worksheet/Workbook Formats**                         |:-----:|:-----:| | ||
|  | | Excel 2007+ XML Formats (XLSX/XLSM)                          |   ✔   |   ✔   | | ||
|  | | Excel 2007+ Binary Format (XLSB BIFF12)                      |   ✔   |   ✔   | | ||
|  | | Excel 2003-2004 XML Format (XML "SpreadsheetML")             |   ✔   |   ✔   | | ||
|  | | Excel 97-2004 (XLS BIFF8)                                    |   ✔   |   ✔   | | ||
|  | | Excel 5.0/95 (XLS BIFF5)                                     |   ✔   |   ✔   | | ||
|  | | Excel 4.0 (XLS/XLW BIFF4)                                    |   ✔   |   ✔   | | ||
|  | | Excel 3.0 (XLS BIFF3)                                        |   ✔   |   ✔   | | ||
|  | | Excel 2.0/2.1 / Multiplan 4.x DOS (XLS BIFF2)                |   ✔   |   ✔   | | ||
|  | | **Excel Supported Text Formats**                             |:-----:|:-----:| | ||
|  | | Delimiter-Separated Values (CSV/TXT)                         |   ✔   |   ✔   | | ||
|  | | Data Interchange Format (DIF)                                |   ✔   |   ✔   | | ||
|  | | Symbolic Link (SYLK/SLK)                                     |   ✔   |   ✔   | | ||
|  | | Lotus Formatted Text (PRN)                                   |   ✔   |   ✔   | | ||
|  | | UTF-16 Unicode Text (TXT)                                    |   ✔   |   ✔   | | ||
|  | | **Other Workbook/Worksheet Formats**                         |:-----:|:-----:| | ||
|  | | Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS)             |   ✔   |   ✔   | | ||
|  | | OpenDocument Spreadsheet (ODS)                               |   ✔   |   ✔   | | ||
|  | | Flat XML ODF Spreadsheet (FODS)                              |   ✔   |   ✔   | | ||
|  | | Uniform Office Format Spreadsheet (标文通 UOS1/UOS2)         |   ✔   |       | | ||
|  | | dBASE II/III/IV / Visual FoxPro (DBF)                        |   ✔   |   ✔   | | ||
|  | | Lotus 1-2-3 (WK1/WK3)                                        |   ✔   |   ✔   | | ||
|  | | Lotus 1-2-3 (WKS/WK2/WK4/123)                                |   ✔   |       | | ||
|  | | Quattro Pro Spreadsheet (WQ1/WQ2/WB1/WB2/WB3/QPW)            |   ✔   |       | | ||
|  | | Works 1.x-3.x DOS / 2.x-5.x Windows Spreadsheet (WKS)        |   ✔   |       | | ||
|  | | Works 6.x-9.x Spreadsheet (XLR)                              |   ✔   |       | | ||
|  | | **Other Common Spreadsheet Output Formats**                  |:-----:|:-----:| | ||
|  | | HTML Tables                                                  |   ✔   |   ✔   | | ||
|  | | Rich Text Format tables (RTF)                                |       |   ✔   | | ||
|  | | Ethercalc Record Format (ETH)                                |   ✔   |   ✔   | | ||
|  | 
 | ||
|  | Features not supported by a given file format will not be written.  Formats with | ||
|  | range limits will be silently truncated: | ||
|  | 
 | ||
|  | | Format                                    | Last Cell  | Max Cols | Max Rows | | ||
|  | |:------------------------------------------|:-----------|---------:|---------:| | ||
|  | | Excel 2007+ XML Formats (XLSX/XLSM)       | XFD1048576 |    16384 |  1048576 | | ||
|  | | Excel 2007+ Binary Format (XLSB BIFF12)   | XFD1048576 |    16384 |  1048576 | | ||
|  | | Numbers 12.0 (NUMBERS)                    | ALL1000000 |     1000 |  1000000 | | ||
|  | | Quattro Pro 9+ (QPW)                      | IV1000000  |      256 |  1000000 | | ||
|  | | Excel 97-2004 (XLS BIFF8)                 | IV65536    |      256 |    65536 | | ||
|  | | Excel 5.0/95 (XLS BIFF5)                  | IV16384    |      256 |    16384 | | ||
|  | | Excel 4.0 (XLS BIFF4)                     | IV16384    |      256 |    16384 | | ||
|  | | Excel 3.0 (XLS BIFF3)                     | IV16384    |      256 |    16384 | | ||
|  | | Excel 2.0/2.1 (XLS BIFF2)                 | IV16384    |      256 |    16384 | | ||
|  | | Lotus 1-2-3 R2 - R5 (WK1/WK3/WK4)         | IV8192     |      256 |     8192 | | ||
|  | | Lotus 1-2-3 R1 (WKS)                      | IV2048     |      256 |     2048 | | ||
|  | 
 | ||
|  | Excel 2003 SpreadsheetML range limits are governed by the version of Excel and | ||
|  | are not enforced by the writer. | ||
|  | 
 | ||
|  | <details> | ||
|  |   <summary><b>File Format Details</b> (click to show)</summary> | ||
|  | 
 | ||
|  | **Core Spreadsheet Formats** | ||
|  | 
 | ||
|  | - **Excel 2007+ XML (XLSX/XLSM)** | ||
|  | 
 | ||
|  | XLSX and XLSM files are ZIP containers containing a series of XML files in | ||
|  | accordance with the Open Packaging Conventions (OPC).  The XLSM format, almost | ||
|  | identical to XLSX, is used for files containing macros. | ||
|  | 
 | ||
|  | The format is standardized in ECMA-376 and later in ISO/IEC 29500.  Excel does | ||
|  | not follow the specification, and there are additional documents discussing how | ||
|  | Excel deviates from the specification. | ||
|  | 
 | ||
|  | - **Excel 2.0-95 (BIFF2/BIFF3/BIFF4/BIFF5)** | ||
|  | 
 | ||
|  | BIFF 2/3 XLS are single-sheet streams of binary records.  Excel 4 introduced | ||
|  | the concept of a workbook (`XLW` files) but also had single-sheet `XLS` format. | ||
|  | The structure is largely similar to the Lotus 1-2-3 file formats.  BIFF5/8/12 | ||
|  | extended the format in various ways but largely stuck to the same record format. | ||
|  | 
 | ||
|  | Multiplan 4 "Normal" files are identical in structure to BIFF2 and use the same | ||
|  | cell value records.  There are some different record types for more advanced | ||
|  | features like Print Settings.  The BIFF2 writer generates files that can be read | ||
|  | in Multiplan 4 and the parser can extract values from "Normal" files. | ||
|  | 
 | ||
|  | There is no official specification for any of these formats.  Excel 95 can write | ||
|  | files in these formats, so record lengths and fields were determined by writing | ||
|  | in all of the supported formats and comparing files.  Excel 2016 can generate | ||
|  | BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2. | ||
|  | 
 | ||
|  | - **Excel 97-2004 Binary (BIFF8)** | ||
|  | 
 | ||
|  | BIFF8 exclusively uses the Compound File Binary container format, splitting some | ||
|  | content into streams within the file.  At its core, it still uses an extended | ||
|  | version of the binary record format from older versions of BIFF. | ||
|  | 
 | ||
|  | The `MS-XLS` specification covers the basics of the file format, and other | ||
|  | specifications expand on serialization of features like properties. | ||
|  | 
 | ||
|  | - **Excel 2003-2004 (SpreadsheetML)** | ||
|  | 
 | ||
|  | Predating XLSX, SpreadsheetML files are simple XML files.  There is no official | ||
|  | and comprehensive specification, although MS has released documentation on the | ||
|  | format.  Since Excel 2016 can generate SpreadsheetML files, mapping features is | ||
|  | pretty straightforward. | ||
|  | 
 | ||
|  | - **Excel 2007+ Binary (XLSB, BIFF12)** | ||
|  | 
 | ||
|  | Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture | ||
|  | with the content separation and ZIP container of XLSX.  For the most part nodes | ||
|  | in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file. | ||
|  | 
 | ||
|  | The `MS-XLSB` specification covers the basics of the file format, and other | ||
|  | specifications expand on serialization of features like properties. | ||
|  | 
 | ||
|  | - **Delimiter-Separated Values (CSV/TXT)** | ||
|  | 
 | ||
|  | Excel CSV deviates from RFC4180 in a number of important ways.  The generated | ||
|  | CSV files should generally work in Excel although they may not work in RFC4180 | ||
|  | compatible readers.  The parser should generally understand Excel CSV. The | ||
|  | writer proactively generates cells for formulae if values are unavailable. | ||
|  | 
 | ||
|  | Excel TXT uses tab as the delimiter and code page 1200. | ||
|  | 
 | ||
|  | Like in Excel, files starting with `0x49 0x44 ("ID")` are treated as Symbolic | ||
|  | Link files.  Unlike Excel, if the file does not have a valid SYLK header, it | ||
|  | will be proactively reinterpreted as CSV.  There are some files with semicolon | ||
|  | delimiter that align with a valid SYLK file.  For the broadest compatibility, | ||
|  | all cells with the value of `ID` are automatically wrapped in double-quotes. | ||
|  | 
 | ||
|  | **Miscellaneous Workbook Formats** | ||
|  | 
 | ||
|  | Support for other formats is generally far behind XLS/XLSB/XLSX support, due in | ||
|  | part to a lack of publicly available documentation.  Test files were produced in | ||
|  | the respective apps and compared to their XLS exports to determine structure. | ||
|  | The main focus is data extraction. | ||
|  | 
 | ||
|  | - **Lotus 1-2-3 (WKS/WK1/WK2/WK3/WK4/123)** | ||
|  | 
 | ||
|  | The Lotus formats consist of binary records similar to the BIFF structure. Lotus | ||
|  | did release a specification decades ago covering the original WK1 format.  Other | ||
|  | features were deduced by producing files and comparing to Excel support. | ||
|  | 
 | ||
|  | Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0. | ||
|  | 
 | ||
|  | Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0. | ||
|  | 
 | ||
|  | - **Quattro Pro (WQ1/WQ2/WB1/WB2/WB3/QPW)** | ||
|  | 
 | ||
|  | The Quattro Pro formats use binary records in the same way as BIFF and Lotus. | ||
|  | Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like | ||
|  | BIFF8 XLS. | ||
|  | 
 | ||
|  | - **Works for DOS / Windows Spreadsheet (WKS/XLR)** | ||
|  | 
 | ||
|  | All versions of Works were limited to a single worksheet. | ||
|  | 
 | ||
|  | Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format | ||
|  | with additional record types. | ||
|  | 
 | ||
|  | Works for Windows 3.x - 5.x uses the same format and WKS extension.  The BOF | ||
|  | record has type `FF` | ||
|  | 
 | ||
|  | Works for Windows 6.x - 9.x use the XLR format.  XLR is nearly identical to | ||
|  | BIFF8 XLS: it uses the CFB container with a Workbook stream.  Works 9 saves the | ||
|  | exact Workbook stream for the XLR and the 97-2003 XLS export.  Works 6 XLS | ||
|  | includes two empty worksheets but the main worksheet has an identical encoding. | ||
|  | XLR also includes a `WksSSWorkBook` stream similar to Lotus FM3/FMT files. | ||
|  | 
 | ||
|  | - **Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS)** | ||
|  | 
 | ||
|  | iWork 2013 (Numbers 3.0 / Pages 5.0 / Keynote 6.0) switched from a proprietary | ||
|  | XML-based format to the current file format based on the iWork Archive (IWA). | ||
|  | This format has been used up through the current release (Numbers 11.2). | ||
|  | 
 | ||
|  | The parser focuses on extracting raw data from tables.  Numbers technically | ||
|  | supports multiple tables in a logical worksheet, including custom titles.  This | ||
|  | parser will generate one worksheet per Numbers table. | ||
|  | 
 | ||
|  | The writer currently exports a small range from the first worksheet. | ||
|  | 
 | ||
|  | - **OpenDocument Spreadsheet (ODS/FODS)** | ||
|  | 
 | ||
|  | ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to | ||
|  | SpreadsheetML.  Both are detailed in the OASIS standard, but tools like LO/OO | ||
|  | add undocumented extensions.  The parsers and writers do not implement the full | ||
|  | standard, instead focusing on parts necessary to extract and store raw data. | ||
|  | 
 | ||
|  | - **Uniform Office Spreadsheet (UOS1/2)** | ||
|  | 
 | ||
|  | UOS is a very similar format, and it comes in 2 varieties corresponding to ODS | ||
|  | and FODS respectively.  For the most part, the difference between the formats | ||
|  | is in the names of tags and attributes. | ||
|  | 
 | ||
|  | **Miscellaneous Worksheet Formats** | ||
|  | 
 | ||
|  | Many older formats supported only one worksheet: | ||
|  | 
 | ||
|  | - **dBASE and Visual FoxPro (DBF)** | ||
|  | 
 | ||
|  | DBF is really a typed table format: each column can only hold one data type and | ||
|  | each record omits type information.  The parser generates a header row and | ||
|  | inserts records starting at the second row of the worksheet.  The writer makes | ||
|  | files compatible with Visual FoxPro extensions. | ||
|  | 
 | ||
|  | Multi-file extensions like external memos and tables are currently unsupported, | ||
|  | limited by the general ability to read arbitrary files in the web browser.  The | ||
|  | reader understands DBF Level 7 extensions like DATETIME. | ||
|  | 
 | ||
|  | - **Symbolic Link (SYLK)** | ||
|  | 
 | ||
|  | <https://oss.sheetjs.com/notes/sylk/> is an informal specification based on our | ||
|  | experimentation and previous documentation efforts. | ||
|  | 
 | ||
|  | - **Lotus Formatted Text (PRN)** | ||
|  | 
 | ||
|  | There is no real documentation, and in fact Excel treats PRN as an output-only | ||
|  | file format.  Nevertheless we can guess the column widths and reverse-engineer | ||
|  | the original layout.  Excel's 240 character width limitation is not enforced. | ||
|  | 
 | ||
|  | - **Data Interchange Format (DIF)** | ||
|  | 
 | ||
|  | There is no unified definition.  Visicalc DIF differs from Lotus DIF, and both | ||
|  | differ from Excel DIF.  Where ambiguous, the parser/writer follows the expected | ||
|  | behavior from Excel.  In particular, Excel extends DIF in incompatible ways: | ||
|  | 
 | ||
|  | - Since Excel automatically converts numbers-as-strings to numbers, numeric | ||
|  |   string constants are converted to formulae: `"0.3" -> "=""0.3""` | ||
|  | - DIF technically expects numeric cells to hold the raw numeric data, but Excel | ||
|  |   permits formatted numbers (including dates) | ||
|  | - DIF technically has no support for formulae, but Excel will automatically | ||
|  |   convert plain formulae.  Array formulae are not preserved. | ||
|  | 
 | ||
|  | - **HTML** | ||
|  | 
 | ||
|  | Excel HTML worksheets include special metadata encoded in styles.  For example, | ||
|  | `mso-number-format` is a localized string containing the number format.  Despite | ||
|  | the metadata the output is valid HTML, although it does accept bare `&` symbols. | ||
|  | 
 | ||
|  | The writer adds type metadata to the TD elements via the `t` tag.  The parser | ||
|  | looks for those tags and overrides the default interpretation. For example, text | ||
|  | like `<td>12345</td>` will be parsed as numbers but `<td t="s">12345</td>` will | ||
|  | be parsed as text. | ||
|  | 
 | ||
|  | - **Rich Text Format (RTF)** | ||
|  | 
 | ||
|  | Excel RTF worksheets are stored in clipboard when copying cells or ranges from a | ||
|  | worksheet.  The supported codes are a subset of the Word RTF support. | ||
|  | 
 | ||
|  | - **Ethercalc Record Format (ETH)** | ||
|  | 
 | ||
|  | [Ethercalc](https://ethercalc.net/) is an open source web spreadsheet powered by | ||
|  | a record format reminiscent of SYLK wrapped in a MIME multi-part message. | ||
|  | 
 | ||
|  | </details> | ||
|  | 
 | ||
|  | 
 |