hyparquet-compressors/README.md

41 lines
1.7 KiB
Markdown
Raw Normal View History

2024-05-09 00:23:53 +00:00
# hyparquet decompressors
2024-05-09 04:23:01 +00:00
2024-05-20 08:02:12 +00:00
[![npm](https://img.shields.io/npm/v/hyparquet-compressors)](https://www.npmjs.com/package/hyparquet-compressors)
2024-05-20 04:09:36 +00:00
[![workflow status](https://github.com/hyparam/hyparquet-compressors/actions/workflows/ci.yml/badge.svg)](https://github.com/hyparam/hyparquet-compressors/actions)
[![mit license](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
![coverage](https://img.shields.io/badge/Coverage-97-darkred)
2024-05-20 01:23:05 +00:00
This package exports a `compressors` object intended to be passed into [hyparquet](https://github.com/hyparam/hyparquet).
[Apache Parquet](https://parquet.apache.org) is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets. It supports a number of different compression formats, but most parquet files use snappy compression.
The hyparquet library by default only supports `uncompressed` and `snappy` compressed files. The `hyparquet-compressors` package extends support for all legal parquet compression formats.
2024-05-09 04:23:01 +00:00
## Usage
```js
import { parquetRead } from 'hyparquet'
import { compressors } from 'hyparquet-compressors'
2024-05-20 01:23:05 +00:00
await parquetRead({ file, compressors, onComplete: console.log })
2024-05-09 04:23:01 +00:00
```
2024-05-20 01:23:05 +00:00
# Supported compression formats
Parquet compression types supported with `hyparquet-compressors`:
- [X] Uncompressed
- [X] Snappy
- [x] GZip
- [ ] LZO
2024-05-20 07:03:23 +00:00
- [X] Brotli
2024-05-20 01:23:05 +00:00
- [X] LZ4
2024-05-20 07:03:23 +00:00
- [X] ZSTD
2024-05-20 01:23:05 +00:00
- [X] LZ4_RAW
# References
- https://parquet.apache.org/docs/file-format/data-pages/compression/
- https://en.wikipedia.org/wiki/Gzip
- https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)
- https://en.wikipedia.org/wiki/Snappy_(compression)