hyparquet-writer/README.md

68 lines
2.4 KiB
Markdown
Raw Normal View History

2025-03-26 03:15:14 +00:00
# Hyparquet Writer
2025-04-07 08:27:06 +00:00
![hyparquet writer parakeet](hyparquet-writer.jpg)
2025-03-27 06:37:05 +00:00
[![npm](https://img.shields.io/npm/v/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)
2025-03-27 07:27:22 +00:00
[![minzipped](https://img.shields.io/bundlephobia/minzip/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)
[![workflow status](https://github.com/hyparam/hyparquet-writer/actions/workflows/ci.yml/badge.svg)](https://github.com/hyparam/hyparquet-writer/actions)
2025-03-26 03:15:14 +00:00
[![mit license](https://img.shields.io/badge/License-MIT-orange.svg)](https://opensource.org/licenses/MIT)
2025-04-08 06:14:48 +00:00
![coverage](https://img.shields.io/badge/Coverage-97-darkred)
2025-04-01 06:32:14 +00:00
[![dependencies](https://img.shields.io/badge/Dependencies-1-blueviolet)](https://www.npmjs.com/package/hyparquet-writer?activeTab=dependencies)
2025-03-26 05:36:06 +00:00
2025-04-03 07:42:54 +00:00
Hyparquet Writer is a JavaScript library for writing [Apache Parquet](https://parquet.apache.org) files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the [hyparquet](https://github.com/hyparam/hyparquet) library, which is a JavaScript library for reading parquet files.
2025-03-26 05:36:06 +00:00
## Usage
2025-04-08 06:14:48 +00:00
Call `parquetWrite` with argument `columnData`. Each column in `columnData` should contain:
- `name`: the column name
- `data`: an array of same-type values
- `type`: the parquet schema type (optional, type guessed from data if not provided)
Example:
2025-03-26 05:36:06 +00:00
```javascript
import { parquetWrite } from 'hyparquet-writer'
2025-03-26 05:36:06 +00:00
2025-03-27 07:27:22 +00:00
const arrayBuffer = parquetWrite({
columnData: [
2025-03-28 23:13:27 +00:00
{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
{ name: 'age', data: [25, 30, 35], type: 'INT32' },
2025-03-27 07:27:22 +00:00
],
})
2025-03-26 05:36:06 +00:00
```
2025-04-04 03:19:37 +00:00
## Options
2025-04-08 06:14:48 +00:00
Options can be passed to `parquetWrite` to change parquet file properties:
2025-04-03 20:21:57 +00:00
- `compression`: use snappy compression (default true)
- `statistics`: write column statistics (default true)
- `rowGroupSize`: number of rows in each row group (default 100000)
- `kvMetadata`: extra key-value metadata
2025-04-04 03:19:37 +00:00
2025-04-08 06:14:48 +00:00
```javascript
import { parquetWrite } from 'hyparquet-writer'
const arrayBuffer = parquetWrite({
columnData: [
{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
{ name: 'age', data: [25, 30, 35], type: 'INT32' },
],
compression: false,
statistics: false,
rowGroupSize: 1000,
kvMetadata: {
'key1': 'value1',
'key2': 'value2',
},
})
```
2025-03-26 05:36:06 +00:00
## References
- https://github.com/hyparam/hyparquet
2025-03-31 21:51:11 +00:00
- https://github.com/hyparam/hyparquet-compressors
2025-03-26 05:36:06 +00:00
- https://github.com/apache/parquet-format
- https://github.com/apache/parquet-testing