hyparquet

mirror of https://github.com/asadbek064/hyparquet.git synced 2026-02-22 04:11:32 +00:00

Author	SHA1	Message	Date
Kenny Daniel	d701904253	Add well-known-binary decoder for geometry and geography (#131 )	2025-09-30 11:45:39 -07:00
Kenny Daniel	8611663334	Custom string parser option (#129 )	2025-09-26 19:07:25 -07:00
Sylvain Lesage	c6429d5abe	try to fix the types again (#120 ) * try to fix the types again * fix test (breaking) * [breaking] only support object format for parquetReadObjects and parquetQuery * remove internal types * remove redundant test * override __index__ with original data if present Also: add comments to explain special cases. * remove the need to slice arrays * loosen the types to avoid code duplication * always write the index, because the results should be consistent * Revert "always write the index, because the results should be consistent" This reverts commit fd4e3060674fa6e81bd32fc894d7c366103e004a.	2025-09-16 15:29:44 -07:00
Sylvain Lesage	709d6b41fc	fix a bug in parquetQuery, when rowFormat is 'array' (#118 ) It silently provided an empty array, instead of throwing an Error, or providing the data in rowFormat="object". Here, I (silently) force the rowFormat to "object".	2025-09-05 09:55:21 +02:00
Kenny Daniel	a7bfab0e99	Fix high-precision decimal parsing (#116 )	2025-09-01 11:24:20 -07:00
Kenny Daniel	6f5ac750cd	Publish v1.17.1	2025-07-02 15:51:58 -07:00
kroche98	ee192054b2	Skip plan for files with no rows (#98 )	2025-07-02 15:46:32 -07:00
Kenny Daniel	8050e0e38d	Fix filter on unselected column (#95 )	2025-06-30 01:47:05 -07:00
Kenny Daniel	ef8e1c8c71	Fix bug when encoding length is zero (#93 )	2025-06-17 14:16:38 -07:00
Kenny Daniel	1f4e1f2f0b	Fix duckdb empty block (#91 )	2025-06-13 00:39:01 -07:00
LiraNuna	8609192b23	Introduce 'custom parsers' option for decoding dates (#87 )	2025-06-09 18:02:31 -07:00
LiraNuna	67ab9d5e1a	Plumb ColumnDecoder into `convert` (#86 )	2025-06-03 13:47:55 -07:00
Kenny Daniel	113fbe3ca8	Move hyparquet.js to index.js (#84 )	2025-05-30 15:47:02 -07:00
Kenny Daniel	f23b2757ca	Node-specific exports for asyncBufferFromFile (#80 ) * Update README for asyncBufferFromFile * Simplify asyncBufferFromFile	2025-05-30 13:01:20 -07:00
Kenny Daniel	bf6ac3b644	Simplify error messages	2025-05-25 17:49:39 -07:00
Kenny Daniel	9a9519f0b7	Add more details to QueryPlan. (#82 ) - Add metadata - Add rowStart and rowEnd - Add columns - Add groupStart, selectStart, selectEnd, and groupRows to GroupPlan - Rename ranges to fetches - Rename numRows to groupRows in ColumnDecoder	2025-05-25 15:21:58 -07:00
Kenny Daniel	5e846e6b13	Fix page continuation issue #81	2025-05-24 23:35:48 -07:00
Kenny Daniel	e4504c524d	Fast filter by loading each row group and filtering until rowEnd (#78 )	2025-05-19 02:13:37 -07:00
Kenny Daniel	c6bc226180	parquetSchema more generic argument	2025-05-17 17:52:48 -07:00
Kenny Daniel	8dbb74ac78	Convert logical strings	2025-05-15 23:44:09 -07:00
mike-iqmo	dbf3065f8e	Addresses issues with duckdb use of delta encodings (#77 ) * Addresses issues with duckdb use of delta encodings * Shrunk size of test data	2025-05-14 16:28:58 -07:00
Kenny Daniel	0e6d7dee6f	Parquet Query Planner: plan byte ranges, pre-fetch in parallel (#75 ) * Parquet Query Planner: plan byte ranges, pre-fetch in parallel. - parquetPlan() that returns lists of byte ranges to fetch. - prefetchAsyncBuffer() pre-fetches all byte ranges in parallel. throws exception if non-pre-fetched slice is requested later.	2025-04-30 00:49:40 -07:00
Kenny Daniel	b7db4653e7	Add another column to page_indexed test	2025-04-26 17:18:11 -07:00
Sylvain Lesage	7f0b57e265	types must be the first element (#74 ) * types must be the first element. Spotted by publint.dev * Package test for exports * Test package.json for string exports --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-04-16 21:29:43 +02:00
Kenny Daniel	9a04cbccd3	Convert unsigned types	2025-04-14 23:20:58 -07:00
Sylvain Lesage	447a58eca4	pass custom fetch function to utils (#73 ) * pass custom fetch function to utils it can be used to implement retry logic. * Update src/utils.js Co-authored-by: Kenny Daniel <platypii@gmail.com> --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-04-15 00:37:05 +02:00
Kenny Daniel	8161983962	Publish v1.12.0	2025-04-11 04:43:11 -07:00
Kenny Daniel	f5274904b7	Add onPage callback to parquetRead	2025-04-10 23:29:58 -07:00
Kenny Daniel	90be536e05	Group selection of a row group into an object	2025-04-10 22:36:10 -07:00
Kenny Daniel	4df7095ab4	Group column decoding params into an object	2025-04-10 19:30:25 -07:00
Kenny Daniel	4645e34f97	Re-order types.d.ts to put important apis up front	2025-04-10 16:33:50 -07:00
Kenny Daniel	972402d083	Fix handling of dictionary pages from parquet.net	2025-04-09 17:26:47 -07:00
Kenny Daniel	655444bcde	Fix continued data pages Parquet allows consecutive pages to continue a previously assembled list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.	2025-04-07 17:40:23 -07:00
Kenny Daniel	6c225888c4	Skip unnecessary pages Do this by passing rowGroupStart and rowGroupEnd for the rows to fetch within a rowgroup. If a page is outside those bounds, we can skip the page. Replaces rowLimit.	2025-04-07 00:40:17 -07:00
Kenny Daniel	ba74d58dd3	Test for reading the last row of files	2025-04-06 22:05:58 -07:00
Kenny Daniel	b38b65f7c7	Refactor assembleLists to take a schemaPath	2025-04-02 23:39:55 -07:00
Kenny Daniel	1247f5d606	Split out readPage Remove dict-page-offset-zero test because it's a malformed parquet file.	2025-04-02 20:27:10 -07:00
Kenny Daniel	6af6f43f44	Export more constants	2025-03-31 23:20:22 -07:00
Kenny Daniel	85e1af66c1	Fix thrift parsing of crypto_metadata	2025-03-25 15:42:48 -07:00
Kenny Daniel	4b094178b3	Move toVarInt to tests	2025-03-20 12:37:24 -07:00
Kenny Daniel	95c47f243d	Add minSize parameter to cachedAsyncBuffer	2025-03-17 23:54:20 -07:00
Kenny Daniel	d7f8d39de3	Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. (#67 ) Refactored readColumn to avoid `concat` operations. This avoids extra copying and allocation.	2025-03-10 23:33:47 -07:00
Kenny Daniel	2cd582ea5a	Remove unnecessary toJson in tests	2025-03-10 19:32:31 -07:00
Kenny Daniel	e590f4ee03	Simplify relative import paths	2025-03-05 14:03:17 -08:00
Kenny Daniel	2456cdc85f	Better error messages	2025-03-04 11:05:22 -08:00
Kenny Daniel	2a302702d4	Fix handling of boolean rle	2025-02-22 13:29:29 -08:00
Johan Levin	bf268e141c	Use prepended length for bit-packed hybrid bool columns (#62 )	2025-02-19 11:07:49 -08:00
Kenny Daniel	36d8ea2e1d	Fix handling of signed decimals (#60 )	2025-02-07 18:52:48 -08:00
Kenny Daniel	5675560266	Use bigint literals	2025-02-07 17:50:34 -08:00
Sean Lynch	725545731d	Support endpoints that don't support range requests in `asyncBufferFromUrl` (#57 ) * Support endpoints that don't support range requests in asyncBufferFromUrl Before this commit asyncBufferFromUrl assumes that the body of whatever successful response it gets is equivalent to the range it requested. If the origin server does not support HTTP range requests then this assumption is usually wrong and will lead to parsing failures. This commit changes asyncBufferFromUrl to change its behaviour slightly based on the status code in the response: - if 200 then we got the whole parquet file as the response. Save it and use the resulting ArrayBuffer to serve all future slice calls. - if 206 then we got a range response and we can just return that. I have also included some test cases to ensure that such responses are handled correctly and also tweaked other existing mocks to also include the relevant status code. * Fix all lint warnings * replace switch with if-else	2025-01-16 11:55:05 -08:00

1 2 3 4

189 Commits