hyparquet/README.md

# hyparquet

![hyparquet parakeet](hyparquet.jpg)

[![npm](https://img.shields.io/npm/v/hyparquet)](https://www.npmjs.com/package/hyparquet)
[![minzipped](https://img.shields.io/bundlephobia/minzip/hyparquet)](https://www.npmjs.com/package/hyparquet)
[![workflow status](https://github.com/hyparam/hyparquet/actions/workflows/ci.yml/badge.svg)](https://github.com/hyparam/hyparquet/actions)
[![mit license](https://img.shields.io/badge/License-MIT-orange.svg)](https://opensource.org/licenses/MIT)
![coverage](https://img.shields.io/badge/Coverage-96-darkred)
[![dependencies](https://img.shields.io/badge/Dependencies-0-blueviolet)](https://www.npmjs.com/package/hyparquet?activeTab=dependencies)

Dependency free since 2023!

## What is hyparquet?

**Hyparquet** is a JavaScript library for parsing [Apache Parquet](https://parquet.apache.org) files in the browser. Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for storing large datasets. Hyparquet is designed to read parquet files efficiently over http, so that parquet files in cloud storage can be queried directly from the browser without needing a server.

 - Works in browsers and node.js
 - Pure JavaScript, no dependencies
 - Supports all parquet types, encodings, and compression codecs
 - Minimizes data fetching using HTTP range requests
 - Includes TypeScript definitions

## Parquet Viewer

**Try hyparquet online**: Drag and drop your parquet file onto [hyperparam.app](https://hyperparam.app) to view it directly in your browser. This service is powered by hyparquet's in-browser capabilities.

[![hyperparam parquet viewer](./hyperparam.png)](https://hyperparam.app/)

## Quick Start

### Browser Example

In the browser use `asyncBufferFromUrl` to wrap a url for reading asynchronously over the network.
It is recommended that you filter by row and column to limit fetch size:

```javascript
const { asyncBufferFromUrl, parquetReadObjects } = await import('https://cdn.jsdelivr.net/npm/hyparquet/src/hyparquet.min.js')

const url = 'https://hyperparam-public.s3.amazonaws.com/bunnies.parquet'
const file = await asyncBufferFromUrl({ url }) // wrap url for async fetching
const data = await parquetReadObjects({
  file,
  columns: ['Breed Name', 'Lifespan'],
  rowStart: 10,
  rowEnd: 20,
})
```

### Node.js Example

To read the contents of a local parquet file in a node.js environment use `asyncBufferFromFile`:

```javascript
const { asyncBufferFromFile, parquetReadObjects } = await import('hyparquet')

const file = await asyncBufferFromFile('example.parquet')
const data = await parquetReadObjects({ file })
```

Note: hyparquet is published as an ES module, so dynamic `import()` may be required for old versions of node.

## Parquet Writing

To create parquet files from javascript, check out the [hyparquet-writer](https://github.com/hyparam/hyparquet-writer) package.

## Advanced Usage

### Reading Metadata

You can read just the metadata, including schema and data statistics using the `parquetMetadataAsync` function. This is useful for getting the schema, number of rows, and column names without reading the entire file.

```javascript
import { parquetMetadataAsync, parquetSchema } from 'hyparquet'

const file = await asyncBufferFromUrl({ url })
const metadata = await parquetMetadataAsync(file)
// Get total number of rows (convert bigint to number)
const numRows = Number(metadata.num_rows)
// Get nested table schema
const schema = parquetSchema(metadata)
// Get top-level column header names
const columnNames = schema.children.map(e => e.element.name)
```

### AsyncBuffer

Hyparquet requires an argument `file` of type `AsyncBuffer`. An `AsyncBuffer` is similar to a js `ArrayBuffer` but the `slice` method can return async `Promise<ArrayBuffer>`. This makes it a useful way to represent a remote file.

```typescript
type Awaitable<T> = T | Promise<T>
interface AsyncBuffer {
  byteLength: number
  slice(start: number, end?: number): Awaitable<ArrayBuffer>
}
```

In most cases, you should probably use `asyncBufferFromUrl` or `asyncBufferFromFile` to create an `AsyncBuffer` for hyparquet.

#### asyncBufferFromUrl

If you want to read a parquet file remotely over http, use `asyncBufferFromUrl` to wrap an http url as an `AsyncBuffer` using http range requests.

 - Pass `requestInit` option to provide additional fetch headers for authentication (optional)
 - Pass `byteLength` if you know the file size to save a round trip HEAD request (optional)

```typescript
const url = 'https://s3.hyperparam.app/wiki_en.parquet'
const requestInit = { headers: { Authorization: 'Bearer my_token' } } // auth header
const byteLength = 415958713 // optional
const file: AsyncBuffer = await asyncBufferFromUrl({ url, requestInit, byteLength })
const data = await parquetReadObjects({ file })
```

#### asyncBufferFromFile

If you are in a node.js environment, use `asyncBufferFromFile` to wrap a local file as an `AsyncBuffer`:

```typescript
import { asyncBufferFromFile, parquetReadObjects } from 'hyparquet'

const file: AsyncBuffer = await asyncBufferFromFile('example.parquet')
const data = await parquetReadObjects({ file })
```

#### ArrayBuffer

You can provide an `ArrayBuffer` anywhere that an `AsyncBuffer` is expected. This is useful if you already have the entire parquet file in memory.

### parquetRead vs parquetReadObjects

#### parquetReadObjects

`parquetReadObjects` is a convenience wrapper around `parquetRead` that returns the complete rows as `Promise<Record<string, any>[]>`. This is the simplest way to read parquet files.

```typescript
parquetReadObjects({ file }): Promise<Record<string, any>[]>
```

#### parquetRead

`parquetRead` is the "base" function for reading parquet files.
It returns a `Promise<void>` that resolves when the file has been read or rejected if an error occurs.
Data is returned via `onComplete` or `onChunk` or `onPage` callbacks passed as arguments.

The reason for this design is that parquet is a column-oriented format, and returning data in row-oriented format requires transposing the column data. This is an expensive operation in javascript. If you don't pass in an `onComplete` argument to `parquetRead`, hyparquet will skip this transpose step and save memory.

### Chunk Streaming

The `onChunk` callback returns column-oriented data as it is ready. `onChunk` will always return top-level columns, including structs, assembled as a single column. This may require waiting for multiple sub-columns to all load before assembly can occur.

The `onPage` callback returns column-oriented page data as it is ready. `onPage` will NOT assemble struct columns and will always return individual sub-column data. Note that `onPage` _will_ assemble nested lists.

In some cases, `onPage` can return data sooner than `onChunk`.

```typescript
interface ColumnData {
  columnName: string
  columnData: ArrayLike<any>
  rowStart: number
  rowEnd: number
}
await parquetRead({
  file,
  onChunk(chunk: ColumnData) {
    console.log('chunk', chunk)
  },
  onPage(chunk: ColumnData) {
    console.log('page', chunk)
  },
})
```

### Returned row format

By default, the `onComplete` function returns an **array** of values for each row: `[value]`. If you would prefer each row to be an **object**:  `{ columnName: value }`, set the option `rowFormat` to `'object'`.

```javascript
import { parquetRead } from 'hyparquet'

await parquetRead({
  file,
  rowFormat: 'object',
  onComplete: data => console.log(data),
})
```

The `parquetReadObjects` function defaults to `rowFormat: 'object'`.

### Binary columns

Hyparquet defaults to decoding binary columns as utf8 text strings. A parquet `BYTE_ARRAY` column may contain arbitrary binary data or utf8 encoded text data. In theory, a column should be annotated as [LogicalType](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) STRING if it contains utf8 text. But in practice, many parquet files omit this annotation. Hyparquet's default decoding behavior can be disabled by setting the `utf8` option to `false`. The `utf8` option only affects `BYTE_ARRAY` columns _without_ an annotation.

If Hyparquet detects a [GeoParquet](https://geoparquet.org/) file, any geospatial column will be marked with the GEOMETRY or GEOGRAPHY logical type and decoded to GeoJSON geometries. Set the `geoparquet` option to `false` to disable this behavior.

## Compression

By default, hyparquet supports uncompressed and snappy-compressed parquet files.
To support the full range of parquet compression codecs (gzip, brotli, zstd, etc), use the [hyparquet-compressors](https://github.com/hyparam/hyparquet-compressors) package.

```javascript
import { parquetReadObjects } from 'hyparquet'
import { compressors } from 'hyparquet-compressors'

const data = await parquetReadObjects({ file, compressors })
```

| Codec         | hyparquet | with hyparquet-compressors |
|---------------|-----------|----------------------------|
| Uncompressed  | ✅        | ✅                         |
| Snappy        | ✅        | ✅                         |
| GZip          | ❌        | ✅                         |
| LZO           | ❌        | ✅                         |
| Brotli        | ❌        | ✅                         |
| LZ4           | ❌        | ✅                         |
| ZSTD          | ❌        | ✅                         |
| LZ4_RAW       | ❌        | ✅                         |

## References

 - https://github.com/apache/parquet-format
 - https://github.com/apache/parquet-testing
 - https://github.com/apache/thrift
 - https://github.com/apache/arrow
 - https://github.com/dask/fastparquet
 - https://github.com/duckdb/duckdb
 - https://github.com/google/snappy
 - https://github.com/hyparam/hightable
 - https://github.com/hyparam/hysnappy
 - https://github.com/hyparam/hyparquet-compressors
 - https://github.com/ironSource/parquetjs
 - https://github.com/zhipeng-jia/snappyjs

Sample project that shows how to build a parquet viewer using hyparquet, react, and [HighTable](https://github.com/hyparam/hightable):

 - Hyparquet Demo: [https://hyparam.github.io/demos/hyparquet/](https://hyparam.github.io/demos/hyparquet/)
 - Hyparquet Demo Source Code: [https://github.com/hyparam/demos/tree/master/hyparquet](https://github.com/hyparam/demos/tree/master/hyparquet)


## Contributions

Contributions are welcome!
If you have suggestions, bug reports, or feature requests, please open an issue or submit a pull request.

Hyparquet development is supported by an open-source grant from Hugging Face :hugs:
Initial commit 2023-12-29 17:37:37 +00:00			`# hyparquet`
Update readme 2023-12-29 18:46:40 +00:00
Prepare for alternate decompressors 2024-02-19 00:42:58 +00:00			`![hyparquet parakeet](hyparquet.jpg)`
hyparakeet 2023-12-29 20:12:30 +00:00
Update README 2024-01-04 19:24:35 +00:00			`[![npm](https://img.shields.io/npm/v/hyparquet)](https://www.npmjs.com/package/hyparquet)`
Validate url for asyncBufferFromUrl 2024-12-17 17:25:54 +00:00			`[![minzipped](https://img.shields.io/bundlephobia/minzip/hyparquet)](https://www.npmjs.com/package/hyparquet)`
Dependencies: 0 2024-01-11 18:46:23 +00:00			`[![workflow status](https://github.com/hyparam/hyparquet/actions/workflows/ci.yml/badge.svg)](https://github.com/hyparam/hyparquet/actions)`
Validate url for asyncBufferFromUrl 2024-12-17 17:25:54 +00:00			`[![mit license](https://img.shields.io/badge/License-MIT-orange.svg)](https://opensource.org/licenses/MIT)`
demo: use web worker for parquet parsing to avoid blocking main thread 2024-09-25 08:59:21 +00:00			`![coverage](https://img.shields.io/badge/Coverage-96-darkred)`
Validate url for asyncBufferFromUrl 2024-12-17 17:25:54 +00:00			`[![dependencies](https://img.shields.io/badge/Dependencies-0-blueviolet)](https://www.npmjs.com/package/hyparquet?activeTab=dependencies)`
Update readme 2023-12-29 18:46:40 +00:00
Update README 2024-04-05 18:28:57 +00:00			`Dependency free since 2023!`
Github actions 2023-12-29 19:27:16 +00:00
Update README 2024-04-05 18:28:57 +00:00			`## What is hyparquet?`
Return Decoded struct with bytes read 2024-01-03 01:16:33 +00:00
Publish v1.21.1 2025-11-26 06:29:17 +00:00			`Hyparquet is a JavaScript library for parsing [Apache Parquet](https://parquet.apache.org) files in the browser. Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for storing large datasets. Hyparquet is designed to read parquet files efficiently over http, so that parquet files in cloud storage can be queried directly from the browser without needing a server.`
Update README 2024-04-05 18:28:57 +00:00
Publish v1.21.1 2025-11-26 06:29:17 +00:00			`- Works in browsers and node.js`
			`- Pure JavaScript, no dependencies`
			`- Supports all parquet types, encodings, and compression codecs`
			`- Minimizes data fetching using HTTP range requests`
			`- Includes TypeScript definitions`
Update README 2024-04-05 18:28:57 +00:00
Update README 2024-12-06 03:11:53 +00:00			`## Parquet Viewer`
Update dependencies 2024-07-23 04:51:26 +00:00
Update README 2024-12-06 03:11:53 +00:00			`Try hyparquet online: Drag and drop your parquet file onto [hyperparam.app](https://hyperparam.app) to view it directly in your browser. This service is powered by hyparquet's in-browser capabilities.`
remove demo (#37) * remove demo * remove more references to the demo + fix the image * remove unused dependencies * set new demo URL 2024-11-19 17:56:09 +00:00
Update README 2024-12-06 03:11:53 +00:00			`[![hyperparam parquet viewer](./hyperparam.png)](https://hyperparam.app/)`
Add demo image 2024-09-04 19:52:39 +00:00
Update README 2024-12-06 03:11:53 +00:00			`## Quick Start`
Update README 2024-07-26 01:03:14 +00:00
Update README 2024-12-06 03:11:53 +00:00			`### Browser Example`
Update README 2024-07-26 01:03:14 +00:00
Update README 2025-03-10 06:47:59 +00:00			In the browser use `asyncBufferFromUrl` to wrap a url for reading asynchronously over the network.
Update README 2024-12-06 03:11:53 +00:00			`It is recommended that you filter by row and column to limit fetch size:`
Update README 2024-07-26 01:03:14 +00:00
Validate url for asyncBufferFromUrl 2024-12-17 17:25:54 +00:00			```javascript
Update README 2025-03-10 06:47:59 +00:00			`const { asyncBufferFromUrl, parquetReadObjects } = await import('https://cdn.jsdelivr.net/npm/hyparquet/src/hyparquet.min.js')`
Update README 2024-12-06 03:11:53 +00:00
Export asyncBufferFromFile, asyncBufferFromUrl and add to README 2024-07-27 00:02:45 +00:00			`const url = 'https://hyperparam-public.s3.amazonaws.com/bunnies.parquet'`
Update README 2025-03-10 06:47:59 +00:00			`const file = await asyncBufferFromUrl({ url }) // wrap url for async fetching`
			`const data = await parquetReadObjects({`
			`file,`
Update README 2024-12-06 03:11:53 +00:00			`columns: ['Breed Name', 'Lifespan'],`
			`rowStart: 10,`
			`rowEnd: 20,`
Publish 1.6.1 - fix type of utils and update the doc (#44) * Publish 1.6.1 - fix types * update the doc 2024-11-22 20:19:34 +00:00			`})`
			```

Node-specific exports for asyncBufferFromFile (#80) * Update README for asyncBufferFromFile * Simplify asyncBufferFromFile 2025-05-30 20:01:20 +00:00			`### Node.js Example`

			To read the contents of a local parquet file in a node.js environment use `asyncBufferFromFile`:

			```javascript
			`const { asyncBufferFromFile, parquetReadObjects } = await import('hyparquet')`

			`const file = await asyncBufferFromFile('example.parquet')`
			`const data = await parquetReadObjects({ file })`
			```

			Note: hyparquet is published as an ES module, so dynamic `import()` may be required for old versions of node.

Re-order types.d.ts to put important apis up front 2025-04-10 23:27:25 +00:00			`## Parquet Writing`

			`To create parquet files from javascript, check out the [hyparquet-writer](https://github.com/hyparam/hyparquet-writer) package.`

Update README 2024-12-06 03:11:53 +00:00			`## Advanced Usage`
Update README 2024-01-09 23:15:08 +00:00
Update README 2024-12-06 03:11:53 +00:00			`### Reading Metadata`
Update README 2024-01-09 23:15:08 +00:00
Publish v1.21.1 2025-11-26 06:29:17 +00:00			You can read just the metadata, including schema and data statistics using the `parquetMetadataAsync` function. This is useful for getting the schema, number of rows, and column names without reading the entire file.
Update README 2024-01-09 23:15:08 +00:00
Update dependencies 2024-09-24 23:47:56 +00:00			```javascript
Update README 2025-03-10 06:47:59 +00:00			`import { parquetMetadataAsync, parquetSchema } from 'hyparquet'`
Update README 2024-01-04 19:24:35 +00:00
Update README 2025-03-10 06:47:59 +00:00			`const file = await asyncBufferFromUrl({ url })`
			`const metadata = await parquetMetadataAsync(file)`
Better error messages 2025-03-04 17:38:39 +00:00			`// Get total number of rows (convert bigint to number)`
			`const numRows = Number(metadata.num_rows)`
			`// Get nested table schema`
			`const schema = parquetSchema(metadata)`
			`// Get top-level column header names`
			`const columnNames = schema.children.map(e => e.element.name)`
Update README 2024-01-04 19:24:35 +00:00			```

Update README 2024-12-06 03:11:53 +00:00			`### AsyncBuffer`
Update README 2024-01-09 23:15:08 +00:00
Publish v1.21.1 2025-11-26 06:29:17 +00:00			Hyparquet requires an argument `file` of type `AsyncBuffer`. An `AsyncBuffer` is similar to a js `ArrayBuffer` but the `slice` method can return async `Promise<ArrayBuffer>`. This makes it a useful way to represent a remote file.
Update README with example for Async and Row/Column filtering 2024-04-11 20:11:30 +00:00
Update README 2024-12-06 03:11:53 +00:00			```typescript
Update README with Awaitable 2024-12-21 23:28:24 +00:00			`type Awaitable<T> = T \| Promise<T>`
Update README 2024-12-06 03:11:53 +00:00			`interface AsyncBuffer {`
			`byteLength: number`
Update README with Awaitable 2024-12-21 23:28:24 +00:00			`slice(start: number, end?: number): Awaitable<ArrayBuffer>`
Update README 2024-12-06 03:11:53 +00:00			`}`
			```
Update README with example for Async and Row/Column filtering 2024-04-11 20:11:30 +00:00
Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			In most cases, you should probably use `asyncBufferFromUrl` or `asyncBufferFromFile` to create an `AsyncBuffer` for hyparquet.

			`#### asyncBufferFromUrl`

			If you want to read a parquet file remotely over http, use `asyncBufferFromUrl` to wrap an http url as an `AsyncBuffer` using http range requests.

			- Pass `requestInit` option to provide additional fetch headers for authentication (optional)
			- Pass `byteLength` if you know the file size to save a round trip HEAD request (optional)

			```typescript
			`const url = 'https://s3.hyperparam.app/wiki_en.parquet'`
Side-effect-free hint in package.json 2025-05-15 18:37:24 +00:00			`const requestInit = { headers: { Authorization: 'Bearer my_token' } } // auth header`
			`const byteLength = 415958713 // optional`
Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			`const file: AsyncBuffer = await asyncBufferFromUrl({ url, requestInit, byteLength })`
			`const data = await parquetReadObjects({ file })`
			```

Node-specific exports for asyncBufferFromFile (#80) * Update README for asyncBufferFromFile * Simplify asyncBufferFromFile 2025-05-30 20:01:20 +00:00			`#### asyncBufferFromFile`

			If you are in a node.js environment, use `asyncBufferFromFile` to wrap a local file as an `AsyncBuffer`:

			```typescript
			`import { asyncBufferFromFile, parquetReadObjects } from 'hyparquet'`

			`const file: AsyncBuffer = await asyncBufferFromFile('example.parquet')`
			`const data = await parquetReadObjects({ file })`
			```

Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			`#### ArrayBuffer`

			You can provide an `ArrayBuffer` anywhere that an `AsyncBuffer` is expected. This is useful if you already have the entire parquet file in memory.

Update README 2025-03-10 06:47:59 +00:00			`### parquetRead vs parquetReadObjects`

			`#### parquetReadObjects`

			`parquetReadObjects` is a convenience wrapper around `parquetRead` that returns the complete rows as `Promise<Record<string, any>[]>`. This is the simplest way to read parquet files.

			```typescript
			`parquetReadObjects({ file }): Promise<Record<string, any>[]>`
			```

			`#### parquetRead`

			`parquetRead` is the "base" function for reading parquet files.
			It returns a `Promise<void>` that resolves when the file has been read or rejected if an error occurs.
Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			Data is returned via `onComplete` or `onChunk` or `onPage` callbacks passed as arguments.
Update README 2025-03-10 06:47:59 +00:00
			The reason for this design is that parquet is a column-oriented format, and returning data in row-oriented format requires transposing the column data. This is an expensive operation in javascript. If you don't pass in an `onComplete` argument to `parquetRead`, hyparquet will skip this transpose step and save memory.

Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			`### Chunk Streaming`

			The `onChunk` callback returns column-oriented data as it is ready. `onChunk` will always return top-level columns, including structs, assembled as a single column. This may require waiting for multiple sub-columns to all load before assembly can occur.

			The `onPage` callback returns column-oriented page data as it is ready. `onPage` will NOT assemble struct columns and will always return individual sub-column data. Note that `onPage` _will_ assemble nested lists.

			In some cases, `onPage` can return data sooner than `onChunk`.
Update README 2025-03-10 06:47:59 +00:00
			```typescript
			`interface ColumnData {`
			`columnName: string`
			`columnData: ArrayLike<any>`
			`rowStart: number`
			`rowEnd: number`
			`}`
Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			`await parquetRead({`
			`file,`
			`onChunk(chunk: ColumnData) {`
			`console.log('chunk', chunk)`
			`},`
			`onPage(chunk: ColumnData) {`
			`console.log('page', chunk)`
			`},`
			`})`
Update README with example for Async and Row/Column filtering 2024-04-11 20:11:30 +00:00			```

Update README 2024-12-06 03:11:53 +00:00			`### Returned row format`
Add an option to return each row as an object keyed by column name (#25) * Add an option to return each row as an object keyed by column name * rename option to rowFormat and address feedback 2024-08-13 16:15:59 +00:00
Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			By default, the `onComplete` function returns an array of values for each row: `[value]`. If you would prefer each row to be an object: `{ columnName: value }`, set the option `rowFormat` to `'object'`.
Add an option to return each row as an object keyed by column name (#25) * Add an option to return each row as an object keyed by column name * rename option to rowFormat and address feedback 2024-08-13 16:15:59 +00:00
Update dependencies 2024-09-24 23:47:56 +00:00			```javascript
Add an option to return each row as an object keyed by column name (#25) * Add an option to return each row as an object keyed by column name * rename option to rowFormat and address feedback 2024-08-13 16:15:59 +00:00			`import { parquetRead } from 'hyparquet'`

			`await parquetRead({`
			`file,`
			`rowFormat: 'object',`
			`onComplete: data => console.log(data),`
			`})`
			```

Add onPage callback to parquetRead 2025-04-11 06:29:58 +00:00			The `parquetReadObjects` function defaults to `rowFormat: 'object'`.
for is faster than forEach 2025-03-17 17:07:08 +00:00
Add section about binary columns (#107) * re add section but it's not accurate... to be improved * improve text and add links to the spec 2025-08-20 23:15:43 +00:00			`### Binary columns`

Publish v1.21.1 2025-11-26 06:29:17 +00:00			Hyparquet defaults to decoding binary columns as utf8 text strings. A parquet `BYTE_ARRAY` column may contain arbitrary binary data or utf8 encoded text data. In theory, a column should be annotated as [LogicalType](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) STRING if it contains utf8 text. But in practice, many parquet files omit this annotation. Hyparquet's default decoding behavior can be disabled by setting the `utf8` option to `false`. The `utf8` option only affects `BYTE_ARRAY` columns _without_ an annotation.
Add section about binary columns (#107) * re add section but it's not accurate... to be improved * improve text and add links to the spec 2025-08-20 23:15:43 +00:00
Publish v1.21.1 2025-11-26 06:29:17 +00:00			If Hyparquet detects a [GeoParquet](https://geoparquet.org/) file, any geospatial column will be marked with the GEOMETRY or GEOGRAPHY logical type and decoded to GeoJSON geometries. Set the `geoparquet` option to `false` to disable this behavior.
Update README with hyparquet-compressors 2024-05-20 12:10:21 +00:00
			`## Compression`

Update README 2024-12-06 03:11:53 +00:00			`By default, hyparquet supports uncompressed and snappy-compressed parquet files.`
			`To support the full range of parquet compression codecs (gzip, brotli, zstd, etc), use the [hyparquet-compressors](https://github.com/hyparam/hyparquet-compressors) package.`
hysnappy docs 2024-04-08 06:08:09 +00:00
Publish v1.21.1 2025-11-26 06:29:17 +00:00			```javascript
			`import { parquetReadObjects } from 'hyparquet'`
			`import { compressors } from 'hyparquet-compressors'`

			`const data = await parquetReadObjects({ file, compressors })`
			```

Update README 2024-12-06 03:11:53 +00:00			`\| Codec \| hyparquet \| with hyparquet-compressors \|`
			`\|---------------\|-----------\|----------------------------\|`
			`\| Uncompressed \| ✅ \| ✅ \|`
			`\| Snappy \| ✅ \| ✅ \|`
			`\| GZip \| ❌ \| ✅ \|`
			`\| LZO \| ❌ \| ✅ \|`
			`\| Brotli \| ❌ \| ✅ \|`
			`\| LZ4 \| ❌ \| ✅ \|`
			`\| ZSTD \| ❌ \| ✅ \|`
			`\| LZ4_RAW \| ❌ \| ✅ \|`
hysnappy docs 2024-04-08 06:08:09 +00:00
Return Decoded struct with bytes read 2024-01-03 01:16:33 +00:00			`## References`

			`- https://github.com/apache/parquet-format`
parquet-testing byte_array_decimal 2024-02-14 05:25:40 +00:00			`- https://github.com/apache/parquet-testing`
Return Decoded struct with bytes read 2024-01-03 01:16:33 +00:00			`- https://github.com/apache/thrift`
Update README with example for Async and Row/Column filtering 2024-04-11 20:11:30 +00:00			`- https://github.com/apache/arrow`
parquet-testing byte_array_decimal 2024-02-14 05:25:40 +00:00			`- https://github.com/dask/fastparquet`
Rewrite dremel assembly 2024-04-29 02:03:39 +00:00			`- https://github.com/duckdb/duckdb`
Return Decoded struct with bytes read 2024-01-03 01:16:33 +00:00			`- https://github.com/google/snappy`
Update README 2024-12-06 03:11:53 +00:00			`- https://github.com/hyparam/hightable`
			`- https://github.com/hyparam/hysnappy`
			`- https://github.com/hyparam/hyparquet-compressors`
Update README with example for Async and Row/Column filtering 2024-04-11 20:11:30 +00:00			`- https://github.com/ironSource/parquetjs`
Return Decoded struct with bytes read 2024-01-03 01:16:33 +00:00			`- https://github.com/zhipeng-jia/snappyjs`
Hugging Face Open-Source Grant 2024-06-18 16:56:00 +00:00
Publish v1.21.1 2025-11-26 06:29:17 +00:00			`Sample project that shows how to build a parquet viewer using hyparquet, react, and [HighTable](https://github.com/hyparam/hightable):`

			`- Hyparquet Demo: [https://hyparam.github.io/demos/hyparquet/](https://hyparam.github.io/demos/hyparquet/)`
			`- Hyparquet Demo Source Code: [https://github.com/hyparam/demos/tree/master/hyparquet](https://github.com/hyparam/demos/tree/master/hyparquet)`



Hugging Face Open-Source Grant 2024-06-18 16:56:00 +00:00			`## Contributions`

			`Contributions are welcome!`
Update README 2024-12-06 03:11:53 +00:00			`If you have suggestions, bug reports, or feature requests, please open an issue or submit a pull request.`
Hugging Face Open-Source Grant 2024-06-18 16:56:00 +00:00
			`Hyparquet development is supported by an open-source grant from Hugging Face :hugs:`