hyparquet-writer/README.md

# Hyparquet Writer

![hyparquet writer parakeet](hyparquet-writer.jpg)

[![npm](https://img.shields.io/npm/v/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)
[![minzipped](https://img.shields.io/bundlephobia/minzip/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)
[![workflow status](https://github.com/hyparam/hyparquet-writer/actions/workflows/ci.yml/badge.svg)](https://github.com/hyparam/hyparquet-writer/actions)
[![mit license](https://img.shields.io/badge/License-MIT-orange.svg)](https://opensource.org/licenses/MIT)
![coverage](https://img.shields.io/badge/Coverage-95-darkred)
[![dependencies](https://img.shields.io/badge/Dependencies-1-blueviolet)](https://www.npmjs.com/package/hyparquet-writer?activeTab=dependencies)

Hyparquet Writer is a JavaScript library for writing [Apache Parquet](https://parquet.apache.org) files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the [hyparquet](https://github.com/hyparam/hyparquet) library, which is a JavaScript library for reading parquet files.

## Quick Start

To write a parquet file to an `ArrayBuffer` use `parquetWriteBuffer` with argument `columnData`. Each column in `columnData` should contain:

- `name`: the column name
- `data`: an array of same-type values
- `type`: the parquet schema type (optional)

```javascript
import { parquetWriteBuffer } from 'hyparquet-writer'

const arrayBuffer = parquetWriteBuffer({
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'BYTE_ARRAY' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
})
```

Note: if `type` is not provided, the type will be guessed from the data. The supported parquet types are:

- `BOOLEAN`
- `INT32`
- `INT64`
- `FLOAT`
- `DOUBLE`
- `BYTE_ARRAY`
- `FIXED_LEN_BYTE_ARRAY`

Strings are represented in parquet as type `BYTE_ARRAY`.

### Node.js Write to Local Parquet File

To write a local parquet file in node.js use `parquetWriteFile` with arguments `filename` and `columnData`:

```javascript
const { parquetWriteFile } = await import('hyparquet-writer')

parquetWriteFile({
  filename: 'example.parquet',
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'BYTE_ARRAY' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
})
```

Note: hyparquet-writer is published as an ES module, so dynamic `import()` may be required on the command line.

## Advanced Usage

Options can be passed to `parquetWrite` to adjust parquet file writing behavior:

 - `writer`: a generic writer object
 - `compressed`: use snappy compression (default true)
 - `statistics`: write column statistics (default true)
 - `rowGroupSize`: number of rows in each row group (default 100000)
 - `kvMetadata`: extra key-value metadata to be stored in the parquet footer

```javascript
import { ByteWriter, parquetWrite } from 'hyparquet-writer'

const writer = new ByteWriter()
const arrayBuffer = parquetWrite({
  writer,
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'BYTE_ARRAY' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
  compressed: false,
  statistics: false,
  rowGroupSize: 1000,
  kvMetadata: [
    { key: 'key1', value: 'value1' },
    { key: 'key2', value: 'value2' },
  ],
})
```

### Converted Types

You can provide additional type hints by providing a `converted_type` to the `columnData` elements:

```javascript
parquetWrite({
  columnData: [
    {
      name: 'dates',
      data: [new Date(1000000), new Date(2000000)],
      type: 'INT64',
      converted_type: 'TIMESTAMP_MILLIS',
    },
    {
      name: 'json',
      data: [{ foo: 'bar' }, { baz: 3 }, 'imastring'],
      type: 'BYTE_ARRAY',
      converted_type: 'JSON',
    },
  ]
})
```

Most converted types will be auto-detected if you just provide data with no types. However, it is still recommended that you provide type information when possible. (zero rows would throw an exception, floats might be typed as int, etc)

## References

 - https://github.com/hyparam/hyparquet
 - https://github.com/hyparam/hyparquet-compressors
 - https://github.com/apache/parquet-format
 - https://github.com/apache/parquet-testing
Plain string support 2025-03-26 03:15:14 +00:00			`# Hyparquet Writer`

Add mascot 2025-04-07 08:27:06 +00:00			`![hyparquet writer parakeet](hyparquet-writer.jpg)`

Snappy compression 2025-03-27 06:37:05 +00:00			`[![npm](https://img.shields.io/npm/v/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)`
Optional compression flag 2025-03-27 07:27:22 +00:00			`[![minzipped](https://img.shields.io/bundlephobia/minzip/hyparquet-writer)](https://www.npmjs.com/package/hyparquet-writer)`
Handle byte array vs string, and change parquetWrite column api 2025-03-26 07:45:22 +00:00			`[![workflow status](https://github.com/hyparam/hyparquet-writer/actions/workflows/ci.yml/badge.svg)](https://github.com/hyparam/hyparquet-writer/actions)`
Plain string support 2025-03-26 03:15:14 +00:00			`[![mit license](https://img.shields.io/badge/License-MIT-orange.svg)](https://opensource.org/licenses/MIT)`
Fix DATE converted type 2025-04-12 00:24:21 +00:00			`![coverage](https://img.shields.io/badge/Coverage-95-darkred)`
Use constants from hyparquet 2025-04-01 06:32:14 +00:00			`[![dependencies](https://img.shields.io/badge/Dependencies-1-blueviolet)](https://www.npmjs.com/package/hyparquet-writer?activeTab=dependencies)`
Support nullable columns 2025-03-26 05:36:06 +00:00
rowGroupSize option 2025-04-03 07:42:54 +00:00			`Hyparquet Writer is a JavaScript library for writing [Apache Parquet](https://parquet.apache.org) files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the [hyparquet](https://github.com/hyparam/hyparquet) library, which is a JavaScript library for reading parquet files.`

FileWriter 2025-04-08 10:22:30 +00:00			`## Quick Start`
Support nullable columns 2025-03-26 05:36:06 +00:00
FileWriter 2025-04-08 10:22:30 +00:00			To write a parquet file to an `ArrayBuffer` use `parquetWriteBuffer` with argument `columnData`. Each column in `columnData` should contain:
BYO writer 2025-04-08 06:14:48 +00:00
			- `name`: the column name
			- `data`: an array of same-type values
FileWriter 2025-04-08 10:22:30 +00:00			- `type`: the parquet schema type (optional)
BYO writer 2025-04-08 06:14:48 +00:00
FileWriter 2025-04-08 10:22:30 +00:00			```javascript
			`import { parquetWriteBuffer } from 'hyparquet-writer'`

			`const arrayBuffer = parquetWriteBuffer({`
			`columnData: [`
Update README 2025-04-15 06:22:55 +00:00			`{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'BYTE_ARRAY' },`
FileWriter 2025-04-08 10:22:30 +00:00			`{ name: 'age', data: [25, 30, 35], type: 'INT32' },`
			`],`
			`})`
			```

			Note: if `type` is not provided, the type will be guessed from the data. The supported parquet types are:

			- `BOOLEAN`
			- `INT32`
			- `INT64`
			- `FLOAT`
			- `DOUBLE`
			- `BYTE_ARRAY`
Update README 2025-04-15 06:22:55 +00:00			- `FIXED_LEN_BYTE_ARRAY`

			Strings are represented in parquet as type `BYTE_ARRAY`.
FileWriter 2025-04-08 10:22:30 +00:00
			`### Node.js Write to Local Parquet File`

			To write a local parquet file in node.js use `parquetWriteFile` with arguments `filename` and `columnData`:
Handle byte array vs string, and change parquetWrite column api 2025-03-26 07:45:22 +00:00
Support nullable columns 2025-03-26 05:36:06 +00:00			```javascript
FileWriter 2025-04-08 10:22:30 +00:00			`const { parquetWriteFile } = await import('hyparquet-writer')`
Support nullable columns 2025-03-26 05:36:06 +00:00
FileWriter 2025-04-08 10:22:30 +00:00			`parquetWriteFile({`
			`filename: 'example.parquet',`
Optional compression flag 2025-03-27 07:27:22 +00:00			`columnData: [`
Update README 2025-04-15 06:22:55 +00:00			`{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'BYTE_ARRAY' },`
Allow specifying column type 2025-03-28 23:13:27 +00:00			`{ name: 'age', data: [25, 30, 35], type: 'INT32' },`
Optional compression flag 2025-03-27 07:27:22 +00:00			`],`
			`})`
Support nullable columns 2025-03-26 05:36:06 +00:00			```

FileWriter 2025-04-08 10:22:30 +00:00			Note: hyparquet-writer is published as an ES module, so dynamic `import()` may be required on the command line.

			`## Advanced Usage`
Move convert to unconvert and test it 2025-04-04 03:19:37 +00:00
FileWriter 2025-04-08 10:22:30 +00:00			Options can be passed to `parquetWrite` to adjust parquet file writing behavior:
BYO writer 2025-04-08 06:14:48 +00:00
FileWriter 2025-04-08 10:22:30 +00:00			- `writer`: a generic writer object
Update README 2025-04-15 06:22:55 +00:00			- `compressed`: use snappy compression (default true)
Write statistics 2025-04-03 20:21:57 +00:00			- `statistics`: write column statistics (default true)
			- `rowGroupSize`: number of rows in each row group (default 100000)
FileWriter 2025-04-08 10:22:30 +00:00			- `kvMetadata`: extra key-value metadata to be stored in the parquet footer
Move convert to unconvert and test it 2025-04-04 03:19:37 +00:00
BYO writer 2025-04-08 06:14:48 +00:00			```javascript
FileWriter 2025-04-08 10:22:30 +00:00			`import { ByteWriter, parquetWrite } from 'hyparquet-writer'`
BYO writer 2025-04-08 06:14:48 +00:00
FileWriter 2025-04-08 10:22:30 +00:00			`const writer = new ByteWriter()`
BYO writer 2025-04-08 06:14:48 +00:00			`const arrayBuffer = parquetWrite({`
FileWriter 2025-04-08 10:22:30 +00:00			`writer,`
BYO writer 2025-04-08 06:14:48 +00:00			`columnData: [`
Update README 2025-04-15 06:22:55 +00:00			`{ name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'BYTE_ARRAY' },`
BYO writer 2025-04-08 06:14:48 +00:00			`{ name: 'age', data: [25, 30, 35], type: 'INT32' },`
			`],`
Update README 2025-04-15 06:22:55 +00:00			`compressed: false,`
BYO writer 2025-04-08 06:14:48 +00:00			`statistics: false,`
			`rowGroupSize: 1000,`
Update README 2025-04-15 06:22:55 +00:00			`kvMetadata: [`
			`{ key: 'key1', value: 'value1' },`
			`{ key: 'key2', value: 'value2' },`
			`],`
			`})`
			```

			`### Converted Types`

			You can provide additional type hints by providing a `converted_type` to the `columnData` elements:

			```javascript
			`parquetWrite({`
			`columnData: [`
			`{`
			`name: 'dates',`
			`data: [new Date(1000000), new Date(2000000)],`
			`type: 'INT64',`
			`converted_type: 'TIMESTAMP_MILLIS',`
			`},`
			`{`
			`name: 'json',`
			`data: [{ foo: 'bar' }, { baz: 3 }, 'imastring'],`
			`type: 'BYTE_ARRAY',`
			`converted_type: 'JSON',`
			`},`
			`]`
BYO writer 2025-04-08 06:14:48 +00:00			`})`
			```

Update README 2025-04-15 06:22:55 +00:00			`Most converted types will be auto-detected if you just provide data with no types. However, it is still recommended that you provide type information when possible. (zero rows would throw an exception, floats might be typed as int, etc)`

Support nullable columns 2025-03-26 05:36:06 +00:00			`## References`

			`- https://github.com/hyparam/hyparquet`
Publish v0.1.4 2025-03-31 21:51:11 +00:00			`- https://github.com/hyparam/hyparquet-compressors`
Support nullable columns 2025-03-26 05:36:06 +00:00			`- https://github.com/apache/parquet-format`
			`- https://github.com/apache/parquet-testing`