* Parquet Query Planner: plan byte ranges, pre-fetch in parallel.
- parquetPlan() that returns lists of byte ranges to fetch.
- prefetchAsyncBuffer() pre-fetches all byte ranges in parallel.
throws exception if non-pre-fetched slice is requested later.
* types must be the first element. Spotted by publint.dev
* Package test for exports
* Test package.json for string exports
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>
* pass custom fetch function to utils
it can be used to implement retry logic.
* Update src/utils.js
Co-authored-by: Kenny Daniel <platypii@gmail.com>
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>
Do this by passing rowGroupStart and rowGroupEnd for the rows to
fetch within a rowgroup. If a page is outside those bounds, we can
skip the page. Replaces rowLimit.
* Support endpoints that don't support range requests in asyncBufferFromUrl
Before this commit asyncBufferFromUrl assumes that the body of whatever
successful response it gets is equivalent to the range it requested. If
the origin server does not support HTTP range requests then this
assumption is usually wrong and will lead to parsing failures.
This commit changes asyncBufferFromUrl to change its behaviour slightly
based on the status code in the response:
- if 200 then we got the whole parquet file as the response. Save it and
use the resulting ArrayBuffer to serve all future slice calls.
- if 206 then we got a range response and we can just return that.
I have also included some test cases to ensure that such responses are
handled correctly and also tweaked other existing mocks to also include
the relevant status code.
* Fix all lint warnings
* replace switch with if-else
* implement ParquetQueryFilter types
* implement parquetQuery filter tests
* implement parquetQuery filter
* filter before ordering
* apply filters before sorting/slicing
* format types
* add deep equality utility
* document and format equals utility
* use deep equality checks
* update filter tests
* support more types for equality
* make $not unary
* ensure arrays are correctly compared
* support both forms of $not
* add operator tests
* Filter operator tests
---------
Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
Co-authored-by: Kenny Daniel <platypii@gmail.com>
* Enable readColumn to read all rows
* Refactor readColumn to use hasRowLimit
* Simplify hasRowLimit condition
* Check less common condition first
* add readColumn test files
* implement readColumn tests for undefined rowLimits
* remove unused variable
* return early if no metadata is present
* address tsc warnings
* add comparison
* clarify that undefined is valid for rowLimit
* remove test files
* verify edge case works when rowLimit is undefined
* add test cases for readColumn
---------
Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
* build types before publishing to npm
* use prepare instead of prepublishOnly + make it clear that we only build types
doc for prepare vs prepublishOnly is here: https://docs.npmjs.com/cli/v8/using-npm/scripts
* no jsx in this lib
* relative imports from the root, so that it works from types/
* remove unused hyparquet.d.ts + report differences to jsdoc in files
* try to understand if this is the cause of the failing CI check
tsc fails: https://github.com/hyparam/hyparquet/actions/runs/12040954822/job/33571851170?pr=46
* Revert "try to understand if this is the cause of the failing CI check"
This reverts commit 5e2fc8ca179064369de71793ab1cda3facefddc7.
* not sure what happens, but we just need to ensure the types are created correctly
* increment version
* Explicitly export types for use in downstream typescript projects
* Use new typescript jsdoc imports for smaller package
* Combine some files and use @import jsdoc
* use the local typescript
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>