* Parquet Query Planner: plan byte ranges, pre-fetch in parallel.
- parquetPlan() that returns lists of byte ranges to fetch.
- prefetchAsyncBuffer() pre-fetches all byte ranges in parallel.
throws exception if non-pre-fetched slice is requested later.
* types must be the first element. Spotted by publint.dev
* Package test for exports
* Test package.json for string exports
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>
* pass custom fetch function to utils
it can be used to implement retry logic.
* Update src/utils.js
Co-authored-by: Kenny Daniel <platypii@gmail.com>
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>
Do this by passing rowGroupStart and rowGroupEnd for the rows to
fetch within a rowgroup. If a page is outside those bounds, we can
skip the page. Replaces rowLimit.
* Support endpoints that don't support range requests in asyncBufferFromUrl
Before this commit asyncBufferFromUrl assumes that the body of whatever
successful response it gets is equivalent to the range it requested. If
the origin server does not support HTTP range requests then this
assumption is usually wrong and will lead to parsing failures.
This commit changes asyncBufferFromUrl to change its behaviour slightly
based on the status code in the response:
- if 200 then we got the whole parquet file as the response. Save it and
use the resulting ArrayBuffer to serve all future slice calls.
- if 206 then we got a range response and we can just return that.
I have also included some test cases to ensure that such responses are
handled correctly and also tweaked other existing mocks to also include
the relevant status code.
* Fix all lint warnings
* replace switch with if-else
* implement ParquetQueryFilter types
* implement parquetQuery filter tests
* implement parquetQuery filter
* filter before ordering
* apply filters before sorting/slicing
* format types
* add deep equality utility
* document and format equals utility
* use deep equality checks
* update filter tests
* support more types for equality
* make $not unary
* ensure arrays are correctly compared
* support both forms of $not
* add operator tests
* Filter operator tests
---------
Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
Co-authored-by: Kenny Daniel <platypii@gmail.com>
* Enable readColumn to read all rows
* Refactor readColumn to use hasRowLimit
* Simplify hasRowLimit condition
* Check less common condition first
* add readColumn test files
* implement readColumn tests for undefined rowLimits
* remove unused variable
* return early if no metadata is present
* address tsc warnings
* add comparison
* clarify that undefined is valid for rowLimit
* remove test files
* verify edge case works when rowLimit is undefined
* add test cases for readColumn
---------
Co-authored-by: Brian Park <park-brian@users.noreply.github.com>