Apache Parquet file writer in JavaScript
Go to file
2025-04-11 04:38:06 -07:00
.github/workflows Support more SchemaElement options 2025-04-11 02:50:38 -07:00
src Unconvert decimal type 2025-04-11 04:38:06 -07:00
test Unconvert decimal type 2025-04-11 04:38:06 -07:00
.gitignore FileWriter 2025-04-08 03:22:30 -07:00
eslint.config.js Thrift writer 2025-03-25 10:30:37 -07:00
hyparquet-writer.jpg Add mascot 2025-04-07 01:27:45 -07:00
LICENSE Initial JS project 2025-03-21 00:08:34 -07:00
package.json Support more SchemaElement options 2025-04-11 02:50:38 -07:00
README.md FileWriter 2025-04-08 03:22:30 -07:00
tsconfig.build.json Handle byte array vs string, and change parquetWrite column api 2025-03-26 01:01:04 -07:00
tsconfig.json Thrift writer 2025-03-25 10:30:37 -07:00

Hyparquet Writer

hyparquet writer parakeet

npm minzipped workflow status mit license coverage dependencies

Hyparquet Writer is a JavaScript library for writing Apache Parquet files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the hyparquet library, which is a JavaScript library for reading parquet files.

Quick Start

To write a parquet file to an ArrayBuffer use parquetWriteBuffer with argument columnData. Each column in columnData should contain:

  • name: the column name
  • data: an array of same-type values
  • type: the parquet schema type (optional)
import { parquetWriteBuffer } from 'hyparquet-writer'

const arrayBuffer = parquetWriteBuffer({
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
})

Note: if type is not provided, the type will be guessed from the data. The supported parquet types are:

  • BOOLEAN
  • INT32
  • INT64
  • FLOAT
  • DOUBLE
  • BYTE_ARRAY

Node.js Write to Local Parquet File

To write a local parquet file in node.js use parquetWriteFile with arguments filename and columnData:

const { parquetWriteFile } = await import('hyparquet-writer')

parquetWriteFile({
  filename: 'example.parquet',
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
})

Note: hyparquet-writer is published as an ES module, so dynamic import() may be required on the command line.

Advanced Usage

Options can be passed to parquetWrite to adjust parquet file writing behavior:

  • writer: a generic writer object
  • compression: use snappy compression (default true)
  • statistics: write column statistics (default true)
  • rowGroupSize: number of rows in each row group (default 100000)
  • kvMetadata: extra key-value metadata to be stored in the parquet footer
import { ByteWriter, parquetWrite } from 'hyparquet-writer'

const writer = new ByteWriter()
const arrayBuffer = parquetWrite({
  writer,
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
  compression: false,
  statistics: false,
  rowGroupSize: 1000,
  kvMetadata: {
    'key1': 'value1',
    'key2': 'value2',
  },
})

References