Commit Graph

226 Commits

Author SHA1 Message Date
Kenny Daniel
0e6d7dee6f
Parquet Query Planner: plan byte ranges, pre-fetch in parallel (#75)
* Parquet Query Planner: plan byte ranges, pre-fetch in parallel.

 - parquetPlan() that returns lists of byte ranges to fetch.
 - prefetchAsyncBuffer() pre-fetches all byte ranges in parallel.
   throws exception if non-pre-fetched slice is requested later.
2025-04-30 00:49:40 -07:00
Kenny Daniel
1d65bc68bb
Move imports to non-exported functions (yields smaller types) 2025-04-27 12:31:39 -07:00
Kenny Daniel
9a04cbccd3
Convert unsigned types 2025-04-14 23:20:58 -07:00
Sylvain Lesage
447a58eca4
pass custom fetch function to utils (#73)
* pass custom fetch function to utils

it can be used to implement retry logic.

* Update src/utils.js

Co-authored-by: Kenny Daniel <platypii@gmail.com>

---------

Co-authored-by: Kenny Daniel <platypii@gmail.com>
2025-04-15 00:37:05 +02:00
Kenny Daniel
8161983962
Publish v1.12.0 2025-04-11 04:43:11 -07:00
Kenny Daniel
11c7d8174a
LogicalType DECIMAL is not a LogicalTypeSimple 2025-04-11 00:21:55 -07:00
Kenny Daniel
f5274904b7
Add onPage callback to parquetRead 2025-04-10 23:29:58 -07:00
Kenny Daniel
90be536e05
Group selection of a row group into an object 2025-04-10 22:36:10 -07:00
Kenny Daniel
4df7095ab4
Group column decoding params into an object 2025-04-10 19:30:25 -07:00
Kenny Daniel
4645e34f97
Re-order types.d.ts to put important apis up front 2025-04-10 16:33:50 -07:00
Kenny Daniel
8740f14450
Publish v1.11.1 2025-04-09 17:35:49 -07:00
Kenny Daniel
972402d083
Fix handling of dictionary pages from parquet.net 2025-04-09 17:26:47 -07:00
Kenny Daniel
655444bcde
Fix continued data pages
Parquet allows consecutive pages to continue a previously assembled
list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.
2025-04-07 17:40:23 -07:00
Kenny Daniel
6c225888c4
Skip unnecessary pages
Do this by passing rowGroupStart and rowGroupEnd for the rows to
fetch within a rowgroup. If a page is outside those bounds, we can
skip the page. Replaces rowLimit.
2025-04-07 00:40:17 -07:00
Kenny Daniel
ba74d58dd3
Test for reading the last row of files 2025-04-06 22:05:58 -07:00
Kenny Daniel
f9a10da20b
Type thrift 2025-04-03 19:20:00 -07:00
Kenny Daniel
b38b65f7c7
Refactor assembleLists to take a schemaPath 2025-04-02 23:39:55 -07:00
Kenny Daniel
1247f5d606
Split out readPage
Remove dict-page-offset-zero test because it's a malformed parquet file.
2025-04-02 20:27:10 -07:00
Kenny Daniel
6af6f43f44
Export more constants 2025-03-31 23:20:22 -07:00
Kenny Daniel
85e1af66c1
Fix thrift parsing of crypto_metadata 2025-03-25 15:42:48 -07:00
Kenny Daniel
9c201e00e5
Use defaultInitialFetchSize for both metadata and cachedAsyncBuffer 2025-03-20 16:05:41 -07:00
Kenny Daniel
4b094178b3
Move toVarInt to tests 2025-03-20 12:37:24 -07:00
Kenny Daniel
95c47f243d
Add minSize parameter to cachedAsyncBuffer 2025-03-17 23:54:20 -07:00
Kenny Daniel
f37b2aea9f
for is faster than forEach 2025-03-17 10:18:01 -07:00
Kenny Daniel
d7f8d39de3
Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. (#67)
Refactored readColumn to avoid `concat` operations.
This avoids extra copying and allocation.
2025-03-10 23:33:47 -07:00
Kenny Daniel
a9467f6c3d
Remove selfCopyBytes in favor of copyBytes 2025-03-10 20:56:00 -07:00
Kenny Daniel
4bbc7742e5
Comment out unnecessary length read in readRleBitPackedHybrid 2025-03-09 11:20:58 -07:00
Kenny Daniel
791a847e42
Revert "Simplify relative import paths"
This reverts commit e590f4ee03263460a389bdd29678015727cdcd5a.
2025-03-06 08:54:32 -08:00
Kenny Daniel
e590f4ee03
Simplify relative import paths 2025-03-05 14:03:17 -08:00
Kenny Daniel
2456cdc85f
Better error messages 2025-03-04 11:05:22 -08:00
Kenny Daniel
2a302702d4
Fix handling of boolean rle 2025-02-22 13:29:29 -08:00
Johan Levin
bf268e141c
Use prepended length for bit-packed hybrid bool columns (#62) 2025-02-19 11:07:49 -08:00
Kenny Daniel
36d8ea2e1d
Fix handling of signed decimals (#60) 2025-02-07 18:52:48 -08:00
Sean Lynch
725545731d
Support endpoints that don't support range requests in asyncBufferFromUrl (#57)
* Support endpoints that don't support range requests in asyncBufferFromUrl

Before this commit asyncBufferFromUrl assumes that the body of whatever
successful response it gets is equivalent to the range it requested. If
the origin server does not support HTTP range requests then this
assumption is usually wrong and will lead to parsing failures.

This commit changes asyncBufferFromUrl to change its behaviour slightly
based on the status code in the response:
- if 200 then we got the whole parquet file as the response. Save it and
  use the resulting ArrayBuffer to serve all future slice calls.
- if 206 then we got a range response and we can just return that.

I have also included some test cases to ensure that such responses are
handled correctly and also tweaked other existing mocks to also include
the relevant status code.

* Fix all lint warnings

* replace switch with if-else
2025-01-16 11:55:05 -08:00
Brian Park
c9727a4246
Query filter (#56)
* implement ParquetQueryFilter types

* implement parquetQuery filter tests

* implement parquetQuery filter

* filter before ordering

* apply filters before sorting/slicing

* format types

* add deep equality utility

* document and format equals utility

* use deep equality checks

* update filter tests

* support more types for equality

* make $not unary

* ensure arrays are correctly compared

* support both forms of $not

* add operator tests

* Filter operator tests

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
Co-authored-by: Kenny Daniel <platypii@gmail.com>
2024-12-21 15:23:57 -08:00
Brian Park
9992316748
Enable readColumn to read all rows (#53)
* Enable readColumn to read all rows

* Refactor readColumn to use hasRowLimit

* Simplify hasRowLimit condition

* Check less common condition first

* add readColumn test files

* implement readColumn tests for undefined rowLimits

* remove unused variable

* return early if no metadata is present

* address tsc warnings

* add comparison

* clarify that undefined is valid for rowLimit

* remove test files

* verify edge case works when rowLimit is undefined

* add test cases for readColumn

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
2024-12-19 18:08:22 -08:00
Kenny Daniel
7ce11ad844
Validate url for asyncBufferFromUrl 2024-12-17 09:25:54 -08:00
Kenny Daniel
f762dba6a8
Use ParquetReadOptions type for parquetRead options (#51) 2024-12-10 16:16:52 -08:00
Sylvain Lesage
09ae9400c5
build types before publishing to npm (#46)
* build types before publishing to npm

* use prepare instead of prepublishOnly + make it clear that we only build types

doc for prepare vs prepublishOnly is here: https://docs.npmjs.com/cli/v8/using-npm/scripts

* no jsx in this lib

* relative imports from the root, so that it works from types/

* remove unused hyparquet.d.ts + report differences to jsdoc in files

* try to understand if this is the cause of the failing CI check

tsc fails: https://github.com/hyparam/hyparquet/actions/runs/12040954822/job/33571851170?pr=46

* Revert "try to understand if this is the cause of the failing CI check"

This reverts commit 5e2fc8ca179064369de71793ab1cda3facefddc7.

* not sure what happens, but we just need to ensure the types are created correctly

* increment version

* Explicitly export types for use in downstream typescript projects

* Use new typescript jsdoc imports for smaller package

* Combine some files and use @import jsdoc

* use the local typescript

---------

Co-authored-by: Kenny Daniel <platypii@gmail.com>
2024-12-02 17:47:42 +01:00
Kenny Daniel
82b25df871
Update dependencies 2024-11-29 14:11:04 -08:00
Carlos Bardasano
9aacd15349
Fix metadata timestamp conversion (#45)
* Fix metadata timestamp conversion

* Remove redundant check

Co-authored-by: Kenny Daniel <platypii@gmail.com>

---------

Co-authored-by: Kenny Daniel <platypii@gmail.com>
2024-11-29 13:48:06 -08:00
Sylvain Lesage
e738643745
Publish 1.6.1 - fix type of utils and update the doc (#44)
* Publish 1.6.1 - fix types

* update the doc
2024-11-22 21:19:34 +01:00
Sylvain Lesage
6ec836dac5
pass requestInit to fetch utils (#34)
* pass requestInit to fetch utils

It will allow authentication

* add tests
2024-11-08 22:22:30 +01:00
Kenny Daniel
5d21b09b7a
Export cachedAsyncBuffer 2024-10-16 01:29:31 -07:00
Kenny Daniel
bb202dfdeb
Fix util export types 2024-10-04 21:46:35 -07:00
Matthew Peveler
c8033621f9 Export additional types
Signed-off-by: Matthew Peveler <mpeveler@timescale.com>
2024-10-04 21:30:32 -07:00
Kenny Daniel
e6301a8bc8
demo: use web worker for parquet parsing to avoid blocking main thread 2024-09-25 02:22:30 -07:00
Kenny Daniel
9d49dabc15
Query api 2024-09-24 21:01:04 -07:00
Kenny Daniel
9a2f4fdcba
Update dependencies 2024-09-24 16:54:44 -07:00
Kenny Daniel
df02229407
Promisified parquetReadObjects function 2024-08-20 11:30:39 -07:00