hyparquet

mirror of https://github.com/asadbek064/hyparquet.git synced 2026-05-27 15:19:36 +00:00

Author	SHA1	Message	Date
Kenny Daniel	c0e0c7cfe5	Fix BYTE_STREAM_SPLIT with data page v2 and compression	2025-11-26 16:04:47 -08:00
Kenny Daniel	0a20750193	Pushdown filter (#141 )	2025-11-21 03:07:56 -08:00
Kenny Daniel	c3a42b5bc9	Fix plan row boundaries	2025-11-21 00:25:30 -08:00
David Sisson	4b19b19268	feat: use GET to get bytelength as a fallback where HEAD is forbidden (#137 ) * feat: use GET to get bytelength as a fallback where HEAD is forbidden * add protection for servers that don't support partial requests	2025-11-03 12:42:35 -08:00
Sylvain Lesage	e8b1c8e570	Minimal support for GeoParquet (#133 ) * Initial support for GeoParquet * pr comments * convert crs * add test file + expected JSON files * add sentence to README * Apply suggestion from @platypii Co-authored-by: Kenny Daniel <platypii@gmail.com> * PR comments * update README * review comment --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-10-16 04:22:01 -04:00
Kenny Daniel	d701904253	Add well-known-binary decoder for geometry and geography (#131 )	2025-09-30 11:45:39 -07:00
Kenny Daniel	8611663334	Custom string parser option (#129 )	2025-09-26 19:07:25 -07:00
Sylvain Lesage	c6429d5abe	try to fix the types again (#120 ) * try to fix the types again * fix test (breaking) * [breaking] only support object format for parquetReadObjects and parquetQuery * remove internal types * remove redundant test * override __index__ with original data if present Also: add comments to explain special cases. * remove the need to slice arrays * loosen the types to avoid code duplication * always write the index, because the results should be consistent * Revert "always write the index, because the results should be consistent" This reverts commit fd4e3060674fa6e81bd32fc894d7c366103e004a.	2025-09-16 15:29:44 -07:00
Sylvain Lesage	709d6b41fc	fix a bug in parquetQuery, when rowFormat is 'array' (#118 ) It silently provided an empty array, instead of throwing an Error, or providing the data in rowFormat="object". Here, I (silently) force the rowFormat to "object".	2025-09-05 09:55:21 +02:00
Kenny Daniel	a7bfab0e99	Fix high-precision decimal parsing (#116 )	2025-09-01 11:24:20 -07:00
Kenny Daniel	6f5ac750cd	Publish v1.17.1	2025-07-02 15:51:58 -07:00
kroche98	ee192054b2	Skip plan for files with no rows (#98 )	2025-07-02 15:46:32 -07:00
Kenny Daniel	8050e0e38d	Fix filter on unselected column (#95 )	2025-06-30 01:47:05 -07:00
Kenny Daniel	ef8e1c8c71	Fix bug when encoding length is zero (#93 )	2025-06-17 14:16:38 -07:00
Kenny Daniel	1f4e1f2f0b	Fix duckdb empty block (#91 )	2025-06-13 00:39:01 -07:00
LiraNuna	8609192b23	Introduce 'custom parsers' option for decoding dates (#87 )	2025-06-09 18:02:31 -07:00
LiraNuna	67ab9d5e1a	Plumb ColumnDecoder into `convert` (#86 )	2025-06-03 13:47:55 -07:00
Kenny Daniel	113fbe3ca8	Move hyparquet.js to index.js (#84 )	2025-05-30 15:47:02 -07:00
Kenny Daniel	f23b2757ca	Node-specific exports for asyncBufferFromFile (#80 ) * Update README for asyncBufferFromFile * Simplify asyncBufferFromFile	2025-05-30 13:01:20 -07:00
Kenny Daniel	bf6ac3b644	Simplify error messages	2025-05-25 17:49:39 -07:00
Kenny Daniel	9a9519f0b7	Add more details to QueryPlan. (#82 ) - Add metadata - Add rowStart and rowEnd - Add columns - Add groupStart, selectStart, selectEnd, and groupRows to GroupPlan - Rename ranges to fetches - Rename numRows to groupRows in ColumnDecoder	2025-05-25 15:21:58 -07:00
Kenny Daniel	5e846e6b13	Fix page continuation issue #81	2025-05-24 23:35:48 -07:00
Kenny Daniel	e4504c524d	Fast filter by loading each row group and filtering until rowEnd (#78 )	2025-05-19 02:13:37 -07:00
Kenny Daniel	c6bc226180	parquetSchema more generic argument	2025-05-17 17:52:48 -07:00
Kenny Daniel	8dbb74ac78	Convert logical strings	2025-05-15 23:44:09 -07:00
mike-iqmo	dbf3065f8e	Addresses issues with duckdb use of delta encodings (#77 ) * Addresses issues with duckdb use of delta encodings * Shrunk size of test data	2025-05-14 16:28:58 -07:00
Kenny Daniel	0e6d7dee6f	Parquet Query Planner: plan byte ranges, pre-fetch in parallel (#75 ) * Parquet Query Planner: plan byte ranges, pre-fetch in parallel. - parquetPlan() that returns lists of byte ranges to fetch. - prefetchAsyncBuffer() pre-fetches all byte ranges in parallel. throws exception if non-pre-fetched slice is requested later.	2025-04-30 00:49:40 -07:00
Kenny Daniel	b7db4653e7	Add another column to page_indexed test	2025-04-26 17:18:11 -07:00
Sylvain Lesage	7f0b57e265	types must be the first element (#74 ) * types must be the first element. Spotted by publint.dev * Package test for exports * Test package.json for string exports --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-04-16 21:29:43 +02:00
Kenny Daniel	9a04cbccd3	Convert unsigned types	2025-04-14 23:20:58 -07:00
Sylvain Lesage	447a58eca4	pass custom fetch function to utils (#73 ) * pass custom fetch function to utils it can be used to implement retry logic. * Update src/utils.js Co-authored-by: Kenny Daniel <platypii@gmail.com> --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-04-15 00:37:05 +02:00
Kenny Daniel	8161983962	Publish v1.12.0	2025-04-11 04:43:11 -07:00
Kenny Daniel	f5274904b7	Add onPage callback to parquetRead	2025-04-10 23:29:58 -07:00
Kenny Daniel	90be536e05	Group selection of a row group into an object	2025-04-10 22:36:10 -07:00
Kenny Daniel	4df7095ab4	Group column decoding params into an object	2025-04-10 19:30:25 -07:00
Kenny Daniel	4645e34f97	Re-order types.d.ts to put important apis up front	2025-04-10 16:33:50 -07:00
Kenny Daniel	972402d083	Fix handling of dictionary pages from parquet.net	2025-04-09 17:26:47 -07:00
Kenny Daniel	655444bcde	Fix continued data pages Parquet allows consecutive pages to continue a previously assembled list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.	2025-04-07 17:40:23 -07:00
Kenny Daniel	6c225888c4	Skip unnecessary pages Do this by passing rowGroupStart and rowGroupEnd for the rows to fetch within a rowgroup. If a page is outside those bounds, we can skip the page. Replaces rowLimit.	2025-04-07 00:40:17 -07:00
Kenny Daniel	ba74d58dd3	Test for reading the last row of files	2025-04-06 22:05:58 -07:00
Kenny Daniel	b38b65f7c7	Refactor assembleLists to take a schemaPath	2025-04-02 23:39:55 -07:00
Kenny Daniel	1247f5d606	Split out readPage Remove dict-page-offset-zero test because it's a malformed parquet file.	2025-04-02 20:27:10 -07:00
Kenny Daniel	6af6f43f44	Export more constants	2025-03-31 23:20:22 -07:00
Kenny Daniel	85e1af66c1	Fix thrift parsing of crypto_metadata	2025-03-25 15:42:48 -07:00
Kenny Daniel	4b094178b3	Move toVarInt to tests	2025-03-20 12:37:24 -07:00
Kenny Daniel	95c47f243d	Add minSize parameter to cachedAsyncBuffer	2025-03-17 23:54:20 -07:00
Kenny Daniel	d7f8d39de3	Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. (#67 ) Refactored readColumn to avoid `concat` operations. This avoids extra copying and allocation.	2025-03-10 23:33:47 -07:00
Kenny Daniel	2cd582ea5a	Remove unnecessary toJson in tests	2025-03-10 19:32:31 -07:00
Kenny Daniel	e590f4ee03	Simplify relative import paths	2025-03-05 14:03:17 -08:00
Kenny Daniel	2456cdc85f	Better error messages	2025-03-04 11:05:22 -08:00

1 2 3 4

194 Commits