**Hyparquet** is a JavaScript library for parsing [Apache Parquet](https://parquet.apache.org) files in the browser. Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for storing large datasets. Hyparquet is designed to read parquet files efficiently over http, so that parquet files in cloud storage can be queried directly from the browser without needing a server.
**Try hyparquet online**: Drag and drop your parquet file onto [hyperparam.app](https://hyperparam.app) to view it directly in your browser. This service is powered by hyparquet's in-browser capabilities.
You can read just the metadata, including schema and data statistics using the `parquetMetadataAsync` function. This is useful for getting the schema, number of rows, and column names without reading the entire file.
Hyparquet requires an argument `file` of type `AsyncBuffer`. An `AsyncBuffer` is similar to a js `ArrayBuffer` but the `slice` method can return async `Promise<ArrayBuffer>`. This makes it a useful way to represent a remote file.
`parquetReadObjects` is a convenience wrapper around `parquetRead` that returns the complete rows as `Promise<Record<string, any>[]>`. This is the simplest way to read parquet files.
The reason for this design is that parquet is a column-oriented format, and returning data in row-oriented format requires transposing the column data. This is an expensive operation in javascript. If you don't pass in an `onComplete` argument to `parquetRead`, hyparquet will skip this transpose step and save memory.
The `onChunk` callback returns column-oriented data as it is ready. `onChunk` will always return top-level columns, including structs, assembled as a single column. This may require waiting for multiple sub-columns to all load before assembly can occur.
The `onPage` callback returns column-oriented page data as it is ready. `onPage` will NOT assemble struct columns and will always return individual sub-column data. Note that `onPage`_will_ assemble nested lists.
In some cases, `onPage` can return data sooner than `onChunk`.
By default, the `onComplete` function returns an **array** of values for each row: `[value]`. If you would prefer each row to be an **object**: `{ columnName: value }`, set the option `rowFormat` to `'object'`.
Hyparquet defaults to decoding binary columns as utf8 text strings. A parquet `BYTE_ARRAY` column may contain arbitrary binary data or utf8 encoded text data. In theory, a column should be annotated as [LogicalType](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md) STRING if it contains utf8 text. But in practice, many parquet files omit this annotation. Hyparquet's default decoding behavior can be disabled by setting the `utf8` option to `false`. The `utf8` option only affects `BYTE_ARRAY` columns _without_ an annotation.
If Hyparquet detects a [GeoParquet](https://geoparquet.org/) file, any geospatial column will be marked with the GEOMETRY or GEOGRAPHY logical type and decoded to GeoJSON geometries. Set the `geoparquet` option to `false` to disable this behavior.
By default, hyparquet supports uncompressed and snappy-compressed parquet files.
To support the full range of parquet compression codecs (gzip, brotli, zstd, etc), use the [hyparquet-compressors](https://github.com/hyparam/hyparquet-compressors) package.