Commit Graph

239 Commits

Author SHA1 Message Date
Kenny Daniel
113fbe3ca8
Move hyparquet.js to index.js (#84) 2025-05-30 15:47:02 -07:00
Kenny Daniel
f23b2757ca
Node-specific exports for asyncBufferFromFile (#80)
* Update README for asyncBufferFromFile
* Simplify asyncBufferFromFile
2025-05-30 13:01:20 -07:00
Kenny Daniel
4e2f76df09
parquetReadAsync (#83) 2025-05-26 17:27:15 -07:00
Kenny Daniel
bf6ac3b644
Simplify error messages 2025-05-25 17:49:39 -07:00
Kenny Daniel
9a9519f0b7
Add more details to QueryPlan. (#82)
- Add metadata
 - Add rowStart and rowEnd
 - Add columns
 - Add groupStart, selectStart, selectEnd, and groupRows to GroupPlan
 - Rename ranges to fetches
 - Rename numRows to groupRows in ColumnDecoder
2025-05-25 15:21:58 -07:00
Kenny Daniel
78f19aaf6d
Move readRowGroup to rowgroup.js 2025-05-25 14:55:30 -07:00
Kenny Daniel
5e846e6b13
Fix page continuation issue #81 2025-05-24 23:35:48 -07:00
Kenny Daniel
5d8f17903e
Omit onComplete from parquetReadObjects 2025-05-22 23:07:04 -07:00
Kenny Daniel
e4504c524d
Fast filter by loading each row group and filtering until rowEnd (#78) 2025-05-19 02:13:37 -07:00
Kenny Daniel
c6bc226180
parquetSchema more generic argument 2025-05-17 17:52:48 -07:00
Kenny Daniel
8dbb74ac78
Convert logical strings 2025-05-15 23:44:09 -07:00
mike-iqmo
dbf3065f8e
Addresses issues with duckdb use of delta encodings (#77)
* Addresses issues with duckdb use of delta encodings

* Shrunk size of test data
2025-05-14 16:28:58 -07:00
Kenny Daniel
d1d08d02bd
Throw exception for unsupported file_path 2025-05-03 20:38:04 -07:00
Kenny Daniel
0e6d7dee6f
Parquet Query Planner: plan byte ranges, pre-fetch in parallel (#75)
* Parquet Query Planner: plan byte ranges, pre-fetch in parallel.

 - parquetPlan() that returns lists of byte ranges to fetch.
 - prefetchAsyncBuffer() pre-fetches all byte ranges in parallel.
   throws exception if non-pre-fetched slice is requested later.
2025-04-30 00:49:40 -07:00
Kenny Daniel
1d65bc68bb
Move imports to non-exported functions (yields smaller types) 2025-04-27 12:31:39 -07:00
Kenny Daniel
9a04cbccd3
Convert unsigned types 2025-04-14 23:20:58 -07:00
Sylvain Lesage
447a58eca4
pass custom fetch function to utils (#73)
* pass custom fetch function to utils

it can be used to implement retry logic.

* Update src/utils.js

Co-authored-by: Kenny Daniel <platypii@gmail.com>

---------

Co-authored-by: Kenny Daniel <platypii@gmail.com>
2025-04-15 00:37:05 +02:00
Kenny Daniel
8161983962
Publish v1.12.0 2025-04-11 04:43:11 -07:00
Kenny Daniel
11c7d8174a
LogicalType DECIMAL is not a LogicalTypeSimple 2025-04-11 00:21:55 -07:00
Kenny Daniel
f5274904b7
Add onPage callback to parquetRead 2025-04-10 23:29:58 -07:00
Kenny Daniel
90be536e05
Group selection of a row group into an object 2025-04-10 22:36:10 -07:00
Kenny Daniel
4df7095ab4
Group column decoding params into an object 2025-04-10 19:30:25 -07:00
Kenny Daniel
4645e34f97
Re-order types.d.ts to put important apis up front 2025-04-10 16:33:50 -07:00
Kenny Daniel
8740f14450
Publish v1.11.1 2025-04-09 17:35:49 -07:00
Kenny Daniel
972402d083
Fix handling of dictionary pages from parquet.net 2025-04-09 17:26:47 -07:00
Kenny Daniel
655444bcde
Fix continued data pages
Parquet allows consecutive pages to continue a previously assembled
list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.
2025-04-07 17:40:23 -07:00
Kenny Daniel
6c225888c4
Skip unnecessary pages
Do this by passing rowGroupStart and rowGroupEnd for the rows to
fetch within a rowgroup. If a page is outside those bounds, we can
skip the page. Replaces rowLimit.
2025-04-07 00:40:17 -07:00
Kenny Daniel
ba74d58dd3
Test for reading the last row of files 2025-04-06 22:05:58 -07:00
Kenny Daniel
f9a10da20b
Type thrift 2025-04-03 19:20:00 -07:00
Kenny Daniel
b38b65f7c7
Refactor assembleLists to take a schemaPath 2025-04-02 23:39:55 -07:00
Kenny Daniel
1247f5d606
Split out readPage
Remove dict-page-offset-zero test because it's a malformed parquet file.
2025-04-02 20:27:10 -07:00
Kenny Daniel
6af6f43f44
Export more constants 2025-03-31 23:20:22 -07:00
Kenny Daniel
85e1af66c1
Fix thrift parsing of crypto_metadata 2025-03-25 15:42:48 -07:00
Kenny Daniel
9c201e00e5
Use defaultInitialFetchSize for both metadata and cachedAsyncBuffer 2025-03-20 16:05:41 -07:00
Kenny Daniel
4b094178b3
Move toVarInt to tests 2025-03-20 12:37:24 -07:00
Kenny Daniel
95c47f243d
Add minSize parameter to cachedAsyncBuffer 2025-03-17 23:54:20 -07:00
Kenny Daniel
f37b2aea9f
for is faster than forEach 2025-03-17 10:18:01 -07:00
Kenny Daniel
d7f8d39de3
Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. (#67)
Refactored readColumn to avoid `concat` operations.
This avoids extra copying and allocation.
2025-03-10 23:33:47 -07:00
Kenny Daniel
a9467f6c3d
Remove selfCopyBytes in favor of copyBytes 2025-03-10 20:56:00 -07:00
Kenny Daniel
4bbc7742e5
Comment out unnecessary length read in readRleBitPackedHybrid 2025-03-09 11:20:58 -07:00
Kenny Daniel
791a847e42
Revert "Simplify relative import paths"
This reverts commit e590f4ee03263460a389bdd29678015727cdcd5a.
2025-03-06 08:54:32 -08:00
Kenny Daniel
e590f4ee03
Simplify relative import paths 2025-03-05 14:03:17 -08:00
Kenny Daniel
2456cdc85f
Better error messages 2025-03-04 11:05:22 -08:00
Kenny Daniel
2a302702d4
Fix handling of boolean rle 2025-02-22 13:29:29 -08:00
Johan Levin
bf268e141c
Use prepended length for bit-packed hybrid bool columns (#62) 2025-02-19 11:07:49 -08:00
Kenny Daniel
36d8ea2e1d
Fix handling of signed decimals (#60) 2025-02-07 18:52:48 -08:00
Sean Lynch
725545731d
Support endpoints that don't support range requests in asyncBufferFromUrl (#57)
* Support endpoints that don't support range requests in asyncBufferFromUrl

Before this commit asyncBufferFromUrl assumes that the body of whatever
successful response it gets is equivalent to the range it requested. If
the origin server does not support HTTP range requests then this
assumption is usually wrong and will lead to parsing failures.

This commit changes asyncBufferFromUrl to change its behaviour slightly
based on the status code in the response:
- if 200 then we got the whole parquet file as the response. Save it and
  use the resulting ArrayBuffer to serve all future slice calls.
- if 206 then we got a range response and we can just return that.

I have also included some test cases to ensure that such responses are
handled correctly and also tweaked other existing mocks to also include
the relevant status code.

* Fix all lint warnings

* replace switch with if-else
2025-01-16 11:55:05 -08:00
Brian Park
c9727a4246
Query filter (#56)
* implement ParquetQueryFilter types

* implement parquetQuery filter tests

* implement parquetQuery filter

* filter before ordering

* apply filters before sorting/slicing

* format types

* add deep equality utility

* document and format equals utility

* use deep equality checks

* update filter tests

* support more types for equality

* make $not unary

* ensure arrays are correctly compared

* support both forms of $not

* add operator tests

* Filter operator tests

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
Co-authored-by: Kenny Daniel <platypii@gmail.com>
2024-12-21 15:23:57 -08:00
Brian Park
9992316748
Enable readColumn to read all rows (#53)
* Enable readColumn to read all rows

* Refactor readColumn to use hasRowLimit

* Simplify hasRowLimit condition

* Check less common condition first

* add readColumn test files

* implement readColumn tests for undefined rowLimits

* remove unused variable

* return early if no metadata is present

* address tsc warnings

* add comparison

* clarify that undefined is valid for rowLimit

* remove test files

* verify edge case works when rowLimit is undefined

* add test cases for readColumn

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
2024-12-19 18:08:22 -08:00
Kenny Daniel
7ce11ad844
Validate url for asyncBufferFromUrl 2024-12-17 09:25:54 -08:00