hyparquet-writer/README.md

96 lines
3.3 KiB
Markdown
Raw Normal View History

2025-03-26 03:15:14 +00:00
# Hyparquet Writer
2025-04-07 08:27:06 +00:00
![hyparquet writer parakeet](hyparquet-writer.jpg)
2025-03-27 06:37:05 +00:00
[![npm](https://img.shields.io/npm/v/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)
2025-03-27 07:27:22 +00:00
[![minzipped](https://img.shields.io/bundlephobia/minzip/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)
[![workflow status](https://github.com/hyparam/hyparquet-writer/actions/workflows/ci.yml/badge.svg)](https://github.com/hyparam/hyparquet-writer/actions)
2025-03-26 03:15:14 +00:00
[![mit license](https://img.shields.io/badge/License-MIT-orange.svg)](https://opensource.org/licenses/MIT)
2025-04-08 06:14:48 +00:00
![coverage](https://img.shields.io/badge/Coverage-97-darkred)
2025-04-01 06:32:14 +00:00
[![dependencies](https://img.shields.io/badge/Dependencies-1-blueviolet)](https://www.npmjs.com/package/hyparquet-writer?activeTab=dependencies)
2025-03-26 05:36:06 +00:00
2025-04-03 07:42:54 +00:00
Hyparquet Writer is a JavaScript library for writing [Apache Parquet](https://parquet.apache.org) files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the [hyparquet](https://github.com/hyparam/hyparquet) library, which is a JavaScript library for reading parquet files.
2025-04-08 10:22:30 +00:00
## Quick Start
2025-03-26 05:36:06 +00:00
2025-04-08 10:22:30 +00:00
To write a parquet file to an `ArrayBuffer` use `parquetWriteBuffer` with argument `columnData`. Each column in `columnData` should contain:
2025-04-08 06:14:48 +00:00
- `name`: the column name
- `data`: an array of same-type values
2025-04-08 10:22:30 +00:00
- `type`: the parquet schema type (optional)
2025-04-08 06:14:48 +00:00
2025-04-08 10:22:30 +00:00
```javascript
import { parquetWriteBuffer } from 'hyparquet-writer'
const arrayBuffer = parquetWriteBuffer({
columnData: [
{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
{ name: 'age', data: [25, 30, 35], type: 'INT32' },
],
})
```
Note: if `type` is not provided, the type will be guessed from the data. The supported parquet types are:
- `BOOLEAN`
- `INT32`
- `INT64`
- `FLOAT`
- `DOUBLE`
- `BYTE_ARRAY`
### Node.js Write to Local Parquet File
To write a local parquet file in node.js use `parquetWriteFile` with arguments `filename` and `columnData`:
2025-03-26 05:36:06 +00:00
```javascript
2025-04-08 10:22:30 +00:00
const { parquetWriteFile } = await import('hyparquet-writer')
2025-03-26 05:36:06 +00:00
2025-04-08 10:22:30 +00:00
parquetWriteFile({
filename: 'example.parquet',
2025-03-27 07:27:22 +00:00
columnData: [
2025-03-28 23:13:27 +00:00
{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
{ name: 'age', data: [25, 30, 35], type: 'INT32' },
2025-03-27 07:27:22 +00:00
],
})
2025-03-26 05:36:06 +00:00
```
2025-04-08 10:22:30 +00:00
Note: hyparquet-writer is published as an ES module, so dynamic `import()` may be required on the command line.
## Advanced Usage
2025-04-04 03:19:37 +00:00
2025-04-08 10:22:30 +00:00
Options can be passed to `parquetWrite` to adjust parquet file writing behavior:
2025-04-08 06:14:48 +00:00
2025-04-08 10:22:30 +00:00
- `writer`: a generic writer object
2025-04-03 20:21:57 +00:00
- `compression`: use snappy compression (default true)
- `statistics`: write column statistics (default true)
- `rowGroupSize`: number of rows in each row group (default 100000)
2025-04-08 10:22:30 +00:00
- `kvMetadata`: extra key-value metadata to be stored in the parquet footer
2025-04-04 03:19:37 +00:00
2025-04-08 06:14:48 +00:00
```javascript
2025-04-08 10:22:30 +00:00
import { ByteWriter, parquetWrite } from 'hyparquet-writer'
2025-04-08 06:14:48 +00:00
2025-04-08 10:22:30 +00:00
const writer = new ByteWriter()
2025-04-08 06:14:48 +00:00
const arrayBuffer = parquetWrite({
2025-04-08 10:22:30 +00:00
writer,
2025-04-08 06:14:48 +00:00
columnData: [
{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
{ name: 'age', data: [25, 30, 35], type: 'INT32' },
],
compression: false,
statistics: false,
rowGroupSize: 1000,
kvMetadata: {
'key1': 'value1',
'key2': 'value2',
},
})
```
2025-03-26 05:36:06 +00:00
## References
- https://github.com/hyparam/hyparquet
2025-03-31 21:51:11 +00:00
- https://github.com/hyparam/hyparquet-compressors
2025-03-26 05:36:06 +00:00
- https://github.com/apache/parquet-format
- https://github.com/apache/parquet-testing