hyparquet

mirror of https://github.com/asadbek064/hyparquet.git synced 2026-06-05 18:42:08 +00:00

Author	SHA1	Message	Date
Kenny Daniel	113fbe3ca8	Move hyparquet.js to index.js (#84 )	2025-05-30 15:47:02 -07:00
Kenny Daniel	f23b2757ca	Node-specific exports for asyncBufferFromFile (#80 ) * Update README for asyncBufferFromFile * Simplify asyncBufferFromFile	2025-05-30 13:01:20 -07:00
Kenny Daniel	4e2f76df09	parquetReadAsync (#83 )	2025-05-26 17:27:15 -07:00
Kenny Daniel	bf6ac3b644	Simplify error messages	2025-05-25 17:49:39 -07:00
Kenny Daniel	9a9519f0b7	Add more details to QueryPlan. (#82 ) - Add metadata - Add rowStart and rowEnd - Add columns - Add groupStart, selectStart, selectEnd, and groupRows to GroupPlan - Rename ranges to fetches - Rename numRows to groupRows in ColumnDecoder	2025-05-25 15:21:58 -07:00
Kenny Daniel	78f19aaf6d	Move readRowGroup to rowgroup.js	2025-05-25 14:55:30 -07:00
Kenny Daniel	5e846e6b13	Fix page continuation issue #81	2025-05-24 23:35:48 -07:00
Kenny Daniel	5d8f17903e	Omit onComplete from parquetReadObjects	2025-05-22 23:07:04 -07:00
Kenny Daniel	e4504c524d	Fast filter by loading each row group and filtering until rowEnd (#78 )	2025-05-19 02:13:37 -07:00
Kenny Daniel	c6bc226180	parquetSchema more generic argument	2025-05-17 17:52:48 -07:00
Kenny Daniel	8dbb74ac78	Convert logical strings	2025-05-15 23:44:09 -07:00
mike-iqmo	dbf3065f8e	Addresses issues with duckdb use of delta encodings (#77 ) * Addresses issues with duckdb use of delta encodings * Shrunk size of test data	2025-05-14 16:28:58 -07:00
Kenny Daniel	d1d08d02bd	Throw exception for unsupported file_path	2025-05-03 20:38:04 -07:00
Kenny Daniel	0e6d7dee6f	Parquet Query Planner: plan byte ranges, pre-fetch in parallel (#75 ) * Parquet Query Planner: plan byte ranges, pre-fetch in parallel. - parquetPlan() that returns lists of byte ranges to fetch. - prefetchAsyncBuffer() pre-fetches all byte ranges in parallel. throws exception if non-pre-fetched slice is requested later.	2025-04-30 00:49:40 -07:00
Kenny Daniel	1d65bc68bb	Move imports to non-exported functions (yields smaller types)	2025-04-27 12:31:39 -07:00
Kenny Daniel	9a04cbccd3	Convert unsigned types	2025-04-14 23:20:58 -07:00
Sylvain Lesage	447a58eca4	pass custom fetch function to utils (#73 ) * pass custom fetch function to utils it can be used to implement retry logic. * Update src/utils.js Co-authored-by: Kenny Daniel <platypii@gmail.com> --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-04-15 00:37:05 +02:00
Kenny Daniel	8161983962	Publish v1.12.0	2025-04-11 04:43:11 -07:00
Kenny Daniel	11c7d8174a	LogicalType DECIMAL is not a LogicalTypeSimple	2025-04-11 00:21:55 -07:00
Kenny Daniel	f5274904b7	Add onPage callback to parquetRead	2025-04-10 23:29:58 -07:00
Kenny Daniel	90be536e05	Group selection of a row group into an object	2025-04-10 22:36:10 -07:00
Kenny Daniel	4df7095ab4	Group column decoding params into an object	2025-04-10 19:30:25 -07:00
Kenny Daniel	4645e34f97	Re-order types.d.ts to put important apis up front	2025-04-10 16:33:50 -07:00
Kenny Daniel	8740f14450	Publish v1.11.1	2025-04-09 17:35:49 -07:00
Kenny Daniel	972402d083	Fix handling of dictionary pages from parquet.net	2025-04-09 17:26:47 -07:00
Kenny Daniel	655444bcde	Fix continued data pages Parquet allows consecutive pages to continue a previously assembled list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.	2025-04-07 17:40:23 -07:00
Kenny Daniel	6c225888c4	Skip unnecessary pages Do this by passing rowGroupStart and rowGroupEnd for the rows to fetch within a rowgroup. If a page is outside those bounds, we can skip the page. Replaces rowLimit.	2025-04-07 00:40:17 -07:00
Kenny Daniel	ba74d58dd3	Test for reading the last row of files	2025-04-06 22:05:58 -07:00
Kenny Daniel	f9a10da20b	Type thrift	2025-04-03 19:20:00 -07:00
Kenny Daniel	b38b65f7c7	Refactor assembleLists to take a schemaPath	2025-04-02 23:39:55 -07:00
Kenny Daniel	1247f5d606	Split out readPage Remove dict-page-offset-zero test because it's a malformed parquet file.	2025-04-02 20:27:10 -07:00
Kenny Daniel	6af6f43f44	Export more constants	2025-03-31 23:20:22 -07:00
Kenny Daniel	85e1af66c1	Fix thrift parsing of crypto_metadata	2025-03-25 15:42:48 -07:00
Kenny Daniel	9c201e00e5	Use defaultInitialFetchSize for both metadata and cachedAsyncBuffer	2025-03-20 16:05:41 -07:00
Kenny Daniel	4b094178b3	Move toVarInt to tests	2025-03-20 12:37:24 -07:00
Kenny Daniel	95c47f243d	Add minSize parameter to cachedAsyncBuffer	2025-03-17 23:54:20 -07:00
Kenny Daniel	f37b2aea9f	for is faster than forEach	2025-03-17 10:18:01 -07:00
Kenny Daniel	d7f8d39de3	Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. (#67 ) Refactored readColumn to avoid `concat` operations. This avoids extra copying and allocation.	2025-03-10 23:33:47 -07:00
Kenny Daniel	a9467f6c3d	Remove selfCopyBytes in favor of copyBytes	2025-03-10 20:56:00 -07:00
Kenny Daniel	4bbc7742e5	Comment out unnecessary length read in readRleBitPackedHybrid	2025-03-09 11:20:58 -07:00
Kenny Daniel	791a847e42	Revert "Simplify relative import paths" This reverts commit e590f4ee03263460a389bdd29678015727cdcd5a.	2025-03-06 08:54:32 -08:00
Kenny Daniel	e590f4ee03	Simplify relative import paths	2025-03-05 14:03:17 -08:00
Kenny Daniel	2456cdc85f	Better error messages	2025-03-04 11:05:22 -08:00
Kenny Daniel	2a302702d4	Fix handling of boolean rle	2025-02-22 13:29:29 -08:00
Johan Levin	bf268e141c	Use prepended length for bit-packed hybrid bool columns (#62 )	2025-02-19 11:07:49 -08:00
Kenny Daniel	36d8ea2e1d	Fix handling of signed decimals (#60 )	2025-02-07 18:52:48 -08:00
Sean Lynch	725545731d	Support endpoints that don't support range requests in `asyncBufferFromUrl` (#57 ) * Support endpoints that don't support range requests in asyncBufferFromUrl Before this commit asyncBufferFromUrl assumes that the body of whatever successful response it gets is equivalent to the range it requested. If the origin server does not support HTTP range requests then this assumption is usually wrong and will lead to parsing failures. This commit changes asyncBufferFromUrl to change its behaviour slightly based on the status code in the response: - if 200 then we got the whole parquet file as the response. Save it and use the resulting ArrayBuffer to serve all future slice calls. - if 206 then we got a range response and we can just return that. I have also included some test cases to ensure that such responses are handled correctly and also tweaked other existing mocks to also include the relevant status code. * Fix all lint warnings * replace switch with if-else	2025-01-16 11:55:05 -08:00
Brian Park	c9727a4246	Query filter (#56 ) * implement ParquetQueryFilter types * implement parquetQuery filter tests * implement parquetQuery filter * filter before ordering * apply filters before sorting/slicing * format types * add deep equality utility * document and format equals utility * use deep equality checks * update filter tests * support more types for equality * make $not unary * ensure arrays are correctly compared * support both forms of $not * add operator tests * Filter operator tests --------- Co-authored-by: Brian Park <park-brian@users.noreply.github.com> Co-authored-by: Kenny Daniel <platypii@gmail.com>	2024-12-21 15:23:57 -08:00
Brian Park	9992316748	Enable readColumn to read all rows (#53 ) * Enable readColumn to read all rows * Refactor readColumn to use hasRowLimit * Simplify hasRowLimit condition * Check less common condition first * add readColumn test files * implement readColumn tests for undefined rowLimits * remove unused variable * return early if no metadata is present * address tsc warnings * add comparison * clarify that undefined is valid for rowLimit * remove test files * verify edge case works when rowLimit is undefined * add test cases for readColumn --------- Co-authored-by: Brian Park <park-brian@users.noreply.github.com>	2024-12-19 18:08:22 -08:00
Kenny Daniel	7ce11ad844	Validate url for asyncBufferFromUrl	2024-12-17 09:25:54 -08:00

1 2 3 4 5

239 Commits