Apache Parquet file writer in JavaScript
Go to file
2025-04-08 01:05:19 -07:00
.github/workflows Initial JS project 2025-03-21 00:08:34 -07:00
src BYO writer 2025-04-08 01:05:19 -07:00
test BYO writer 2025-04-08 01:05:19 -07:00
.gitignore Snappy compression 2025-03-26 23:38:25 -07:00
eslint.config.js Thrift writer 2025-03-25 10:30:37 -07:00
hyparquet-writer.jpg Add mascot 2025-04-07 01:27:45 -07:00
LICENSE Initial JS project 2025-03-21 00:08:34 -07:00
package.json BYO writer 2025-04-08 01:05:19 -07:00
README.md BYO writer 2025-04-08 01:05:19 -07:00
tsconfig.build.json Handle byte array vs string, and change parquetWrite column api 2025-03-26 01:01:04 -07:00
tsconfig.json Thrift writer 2025-03-25 10:30:37 -07:00

Hyparquet Writer

hyparquet writer parakeet

npm minzipped workflow status mit license coverage dependencies

Hyparquet Writer is a JavaScript library for writing Apache Parquet files. It is designed to be lightweight, fast and store data very efficiently. It is a companion to the hyparquet library, which is a JavaScript library for reading parquet files.

Usage

Call parquetWrite with argument columnData. Each column in columnData should contain:

  • name: the column name
  • data: an array of same-type values
  • type: the parquet schema type (optional, type guessed from data if not provided)

Example:

import { parquetWrite } from 'hyparquet-writer'

const arrayBuffer = parquetWrite({
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
})

Options

Options can be passed to parquetWrite to change parquet file properties:

  • compression: use snappy compression (default true)
  • statistics: write column statistics (default true)
  • rowGroupSize: number of rows in each row group (default 100000)
  • kvMetadata: extra key-value metadata
import { parquetWrite } from 'hyparquet-writer'

const arrayBuffer = parquetWrite({
  columnData: [
    { name: 'name', data: ['Alice', 'Bob', 'Charlie'], type: 'STRING' },
    { name: 'age', data: [25, 30, 35], type: 'INT32' },
  ],
  compression: false,
  statistics: false,
  rowGroupSize: 1000,
  kvMetadata: {
    'key1': 'value1',
    'key2': 'value2',
  },
})

References