Kenny Daniel
c0e0c7cfe5
Fix BYTE_STREAM_SPLIT with data page v2 and compression
2025-11-26 16:04:47 -08:00
Kenny Daniel
0a20750193
Pushdown filter ( #141 )
2025-11-21 03:07:56 -08:00
Kenny Daniel
c3a42b5bc9
Fix plan row boundaries
2025-11-21 00:25:30 -08:00
David Sisson
4b19b19268
feat: use GET to get bytelength as a fallback where HEAD is forbidden ( #137 )
...
* feat: use GET to get bytelength as a fallback where HEAD is forbidden
* add protection for servers that don't support partial requests
2025-11-03 12:42:35 -08:00
Sylvain Lesage
e8b1c8e570
Minimal support for GeoParquet ( #133 )
...
* Initial support for GeoParquet
* pr comments
* convert crs
* add test file + expected JSON files
* add sentence to README
* Apply suggestion from @platypii
Co-authored-by: Kenny Daniel <platypii@gmail.com>
* PR comments
* update README
* review comment
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>
2025-10-16 04:22:01 -04:00
Kenny Daniel
d701904253
Add well-known-binary decoder for geometry and geography ( #131 )
2025-09-30 11:45:39 -07:00
Kenny Daniel
8611663334
Custom string parser option ( #129 )
2025-09-26 19:07:25 -07:00
Sylvain Lesage
c6429d5abe
try to fix the types again ( #120 )
...
* try to fix the types again
* fix test (breaking)
* [breaking] only support object format for parquetReadObjects and parquetQuery
* remove internal types
* remove redundant test
* override __index__ with original data if present
Also: add comments to explain special cases.
* remove the need to slice arrays
* loosen the types to avoid code duplication
* always write the index, because the results should be consistent
* Revert "always write the index, because the results should be consistent"
This reverts commit fd4e3060674fa6e81bd32fc894d7c366103e004a.
2025-09-16 15:29:44 -07:00
Sylvain Lesage
709d6b41fc
fix a bug in parquetQuery, when rowFormat is 'array' ( #118 )
...
It silently provided an empty array, instead of throwing an Error, or
providing the data in rowFormat="object".
Here, I (silently) force the rowFormat to "object".
2025-09-05 09:55:21 +02:00
Kenny Daniel
a7bfab0e99
Fix high-precision decimal parsing ( #116 )
2025-09-01 11:24:20 -07:00
Kenny Daniel
6f5ac750cd
Publish v1.17.1
2025-07-02 15:51:58 -07:00
kroche98
ee192054b2
Skip plan for files with no rows ( #98 )
2025-07-02 15:46:32 -07:00
Kenny Daniel
8050e0e38d
Fix filter on unselected column ( #95 )
2025-06-30 01:47:05 -07:00
Kenny Daniel
ef8e1c8c71
Fix bug when encoding length is zero ( #93 )
2025-06-17 14:16:38 -07:00
Kenny Daniel
1f4e1f2f0b
Fix duckdb empty block ( #91 )
2025-06-13 00:39:01 -07:00
LiraNuna
8609192b23
Introduce 'custom parsers' option for decoding dates ( #87 )
2025-06-09 18:02:31 -07:00
LiraNuna
67ab9d5e1a
Plumb ColumnDecoder into convert ( #86 )
2025-06-03 13:47:55 -07:00
Kenny Daniel
113fbe3ca8
Move hyparquet.js to index.js ( #84 )
2025-05-30 15:47:02 -07:00
Kenny Daniel
f23b2757ca
Node-specific exports for asyncBufferFromFile ( #80 )
...
* Update README for asyncBufferFromFile
* Simplify asyncBufferFromFile
2025-05-30 13:01:20 -07:00
Kenny Daniel
bf6ac3b644
Simplify error messages
2025-05-25 17:49:39 -07:00
Kenny Daniel
9a9519f0b7
Add more details to QueryPlan. ( #82 )
...
- Add metadata
- Add rowStart and rowEnd
- Add columns
- Add groupStart, selectStart, selectEnd, and groupRows to GroupPlan
- Rename ranges to fetches
- Rename numRows to groupRows in ColumnDecoder
2025-05-25 15:21:58 -07:00
Kenny Daniel
5e846e6b13
Fix page continuation issue #81
2025-05-24 23:35:48 -07:00
Kenny Daniel
e4504c524d
Fast filter by loading each row group and filtering until rowEnd ( #78 )
2025-05-19 02:13:37 -07:00
Kenny Daniel
c6bc226180
parquetSchema more generic argument
2025-05-17 17:52:48 -07:00
Kenny Daniel
8dbb74ac78
Convert logical strings
2025-05-15 23:44:09 -07:00
mike-iqmo
dbf3065f8e
Addresses issues with duckdb use of delta encodings ( #77 )
...
* Addresses issues with duckdb use of delta encodings
* Shrunk size of test data
2025-05-14 16:28:58 -07:00
Kenny Daniel
0e6d7dee6f
Parquet Query Planner: plan byte ranges, pre-fetch in parallel ( #75 )
...
* Parquet Query Planner: plan byte ranges, pre-fetch in parallel.
- parquetPlan() that returns lists of byte ranges to fetch.
- prefetchAsyncBuffer() pre-fetches all byte ranges in parallel.
throws exception if non-pre-fetched slice is requested later.
2025-04-30 00:49:40 -07:00
Kenny Daniel
b7db4653e7
Add another column to page_indexed test
2025-04-26 17:18:11 -07:00
Sylvain Lesage
7f0b57e265
types must be the first element ( #74 )
...
* types must be the first element. Spotted by publint.dev
* Package test for exports
* Test package.json for string exports
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>
2025-04-16 21:29:43 +02:00
Kenny Daniel
9a04cbccd3
Convert unsigned types
2025-04-14 23:20:58 -07:00
Sylvain Lesage
447a58eca4
pass custom fetch function to utils ( #73 )
...
* pass custom fetch function to utils
it can be used to implement retry logic.
* Update src/utils.js
Co-authored-by: Kenny Daniel <platypii@gmail.com>
---------
Co-authored-by: Kenny Daniel <platypii@gmail.com>
2025-04-15 00:37:05 +02:00
Kenny Daniel
8161983962
Publish v1.12.0
2025-04-11 04:43:11 -07:00
Kenny Daniel
f5274904b7
Add onPage callback to parquetRead
2025-04-10 23:29:58 -07:00
Kenny Daniel
90be536e05
Group selection of a row group into an object
2025-04-10 22:36:10 -07:00
Kenny Daniel
4df7095ab4
Group column decoding params into an object
2025-04-10 19:30:25 -07:00
Kenny Daniel
4645e34f97
Re-order types.d.ts to put important apis up front
2025-04-10 16:33:50 -07:00
Kenny Daniel
972402d083
Fix handling of dictionary pages from parquet.net
2025-04-09 17:26:47 -07:00
Kenny Daniel
655444bcde
Fix continued data pages
...
Parquet allows consecutive pages to continue a previously assembled
list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.
2025-04-07 17:40:23 -07:00
Kenny Daniel
6c225888c4
Skip unnecessary pages
...
Do this by passing rowGroupStart and rowGroupEnd for the rows to
fetch within a rowgroup. If a page is outside those bounds, we can
skip the page. Replaces rowLimit.
2025-04-07 00:40:17 -07:00
Kenny Daniel
ba74d58dd3
Test for reading the last row of files
2025-04-06 22:05:58 -07:00
Kenny Daniel
b38b65f7c7
Refactor assembleLists to take a schemaPath
2025-04-02 23:39:55 -07:00
Kenny Daniel
1247f5d606
Split out readPage
...
Remove dict-page-offset-zero test because it's a malformed parquet file.
2025-04-02 20:27:10 -07:00
Kenny Daniel
6af6f43f44
Export more constants
2025-03-31 23:20:22 -07:00
Kenny Daniel
85e1af66c1
Fix thrift parsing of crypto_metadata
2025-03-25 15:42:48 -07:00
Kenny Daniel
4b094178b3
Move toVarInt to tests
2025-03-20 12:37:24 -07:00
Kenny Daniel
95c47f243d
Add minSize parameter to cachedAsyncBuffer
2025-03-17 23:54:20 -07:00
Kenny Daniel
d7f8d39de3
Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. ( #67 )
...
Refactored readColumn to avoid `concat` operations.
This avoids extra copying and allocation.
2025-03-10 23:33:47 -07:00
Kenny Daniel
2cd582ea5a
Remove unnecessary toJson in tests
2025-03-10 19:32:31 -07:00
Kenny Daniel
e590f4ee03
Simplify relative import paths
2025-03-05 14:03:17 -08:00
Kenny Daniel
2456cdc85f
Better error messages
2025-03-04 11:05:22 -08:00