| .github/workflows | ||
| src | ||
| test | ||
| .gitignore | ||
| eslint.config.js | ||
| hyparquet-compressors.jpg | ||
| LICENSE | ||
| package.json | ||
| README.md | ||
| rollup.config.js | ||
| tsconfig.json | ||
hyparquet decompressors
This package provides decompressors for various compression codecs. It is designed to be used with hyparquet in order to provide full support for all parquet compression formats.
Introduction
Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets. It supports a number of different compression formats, but most parquet files use snappy compression.
Hyparquet is a fast and lightweight parquet reader that is designed to work in both node.js and the browser.
By default, hyparquet only supports uncompressed and snappy compressed files (the most common parquet compression codecs). The hyparquet-compressors package extends support for all legal parquet compression formats.
hyparquet-compressors works in both node.js and the browser. Uses js and wasm packages, no system dependencies.
Hyparquet
To use hyparquet-compressors with hyparquet, simply pass the compressors object to the parquetReadObjects function.
import { parquetReadObjects } from 'hyparquet'
import { compressors } from 'hyparquet-compressors'
const data = await parquetReadObjects({ file, compressors })
See hyparquet repo for more info.
Compression formats
Parquet compression types supported with hyparquet-compressors:
- Uncompressed
- Snappy
- Gzip
- LZO
- Brotli
- LZ4
- ZSTD
- LZ4_RAW
Snappy
Snappy compression uses hysnappy for fast snappy decompression using minimal wasm.
Gzip
New gzip implementation adapted from fflate. Includes modifications to handle repeated back-to-back gzip streams that sometimes occur in parquet files (but was not supported by fflate).
Brotli
Includes a minimal port of brotli.js which pre-compresses the brotli dictionary using gzip to minimize the distribution bundle size.
LZ4
New LZ4 implementation includes support for legacy hadoop LZ4 frame format used on some old parquet files.
Zstd
Uses fzstd for Zstandard decompression.
Bundle size
| File | Size |
|---|---|
| hyparquet-compressors.min.js | 116.1kb |
| hyparquet-compressors.min.js.gz | 75.2kb |
References
- https://parquet.apache.org/docs/file-format/data-pages/compression/
- https://en.wikipedia.org/wiki/Brotli
- https://en.wikipedia.org/wiki/Gzip
- https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)
- https://en.wikipedia.org/wiki/Snappy_(compression)
- https://en.wikipedia.org/wiki/Zstd
- https://github.com/101arrowz/fflate
- https://github.com/101arrowz/fzstd
- https://github.com/foliojs/brotli.js
- https://github.com/hyparam/hysnappy
