hyparquet

mirror of https://github.com/asadbek064/hyparquet.git synced 2026-02-22 20:31:32 +00:00

Author	SHA1	Message	Date
Kenny Daniel	8050e0e38d	Fix filter on unselected column (#95 )	2025-06-30 01:47:05 -07:00
Kenny Daniel	ef8e1c8c71	Fix bug when encoding length is zero (#93 )	2025-06-17 14:16:38 -07:00
Kenny Daniel	1f4e1f2f0b	Fix duckdb empty block (#91 )	2025-06-13 00:39:01 -07:00
LiraNuna	8609192b23	Introduce 'custom parsers' option for decoding dates (#87 )	2025-06-09 18:02:31 -07:00
LiraNuna	67ab9d5e1a	Plumb ColumnDecoder into `convert` (#86 )	2025-06-03 13:47:55 -07:00
Kenny Daniel	113fbe3ca8	Move hyparquet.js to index.js (#84 )	2025-05-30 15:47:02 -07:00
Kenny Daniel	f23b2757ca	Node-specific exports for asyncBufferFromFile (#80 ) * Update README for asyncBufferFromFile * Simplify asyncBufferFromFile	2025-05-30 13:01:20 -07:00
Kenny Daniel	bf6ac3b644	Simplify error messages	2025-05-25 17:49:39 -07:00
Kenny Daniel	9a9519f0b7	Add more details to QueryPlan. (#82 ) - Add metadata - Add rowStart and rowEnd - Add columns - Add groupStart, selectStart, selectEnd, and groupRows to GroupPlan - Rename ranges to fetches - Rename numRows to groupRows in ColumnDecoder	2025-05-25 15:21:58 -07:00
Kenny Daniel	5e846e6b13	Fix page continuation issue #81	2025-05-24 23:35:48 -07:00
Kenny Daniel	e4504c524d	Fast filter by loading each row group and filtering until rowEnd (#78 )	2025-05-19 02:13:37 -07:00
Kenny Daniel	c6bc226180	parquetSchema more generic argument	2025-05-17 17:52:48 -07:00
Kenny Daniel	8dbb74ac78	Convert logical strings	2025-05-15 23:44:09 -07:00
mike-iqmo	dbf3065f8e	Addresses issues with duckdb use of delta encodings (#77 ) * Addresses issues with duckdb use of delta encodings * Shrunk size of test data	2025-05-14 16:28:58 -07:00
Kenny Daniel	0e6d7dee6f	Parquet Query Planner: plan byte ranges, pre-fetch in parallel (#75 ) * Parquet Query Planner: plan byte ranges, pre-fetch in parallel. - parquetPlan() that returns lists of byte ranges to fetch. - prefetchAsyncBuffer() pre-fetches all byte ranges in parallel. throws exception if non-pre-fetched slice is requested later.	2025-04-30 00:49:40 -07:00
Kenny Daniel	b7db4653e7	Add another column to page_indexed test	2025-04-26 17:18:11 -07:00
Sylvain Lesage	7f0b57e265	types must be the first element (#74 ) * types must be the first element. Spotted by publint.dev * Package test for exports * Test package.json for string exports --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-04-16 21:29:43 +02:00
Kenny Daniel	9a04cbccd3	Convert unsigned types	2025-04-14 23:20:58 -07:00
Sylvain Lesage	447a58eca4	pass custom fetch function to utils (#73 ) * pass custom fetch function to utils it can be used to implement retry logic. * Update src/utils.js Co-authored-by: Kenny Daniel <platypii@gmail.com> --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2025-04-15 00:37:05 +02:00
Kenny Daniel	8161983962	Publish v1.12.0	2025-04-11 04:43:11 -07:00
Kenny Daniel	f5274904b7	Add onPage callback to parquetRead	2025-04-10 23:29:58 -07:00
Kenny Daniel	90be536e05	Group selection of a row group into an object	2025-04-10 22:36:10 -07:00
Kenny Daniel	4df7095ab4	Group column decoding params into an object	2025-04-10 19:30:25 -07:00
Kenny Daniel	4645e34f97	Re-order types.d.ts to put important apis up front	2025-04-10 16:33:50 -07:00
Kenny Daniel	972402d083	Fix handling of dictionary pages from parquet.net	2025-04-09 17:26:47 -07:00
Kenny Daniel	655444bcde	Fix continued data pages Parquet allows consecutive pages to continue a previously assembled list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.	2025-04-07 17:40:23 -07:00
Kenny Daniel	6c225888c4	Skip unnecessary pages Do this by passing rowGroupStart and rowGroupEnd for the rows to fetch within a rowgroup. If a page is outside those bounds, we can skip the page. Replaces rowLimit.	2025-04-07 00:40:17 -07:00
Kenny Daniel	ba74d58dd3	Test for reading the last row of files	2025-04-06 22:05:58 -07:00
Kenny Daniel	b38b65f7c7	Refactor assembleLists to take a schemaPath	2025-04-02 23:39:55 -07:00
Kenny Daniel	1247f5d606	Split out readPage Remove dict-page-offset-zero test because it's a malformed parquet file.	2025-04-02 20:27:10 -07:00
Kenny Daniel	6af6f43f44	Export more constants	2025-03-31 23:20:22 -07:00
Kenny Daniel	85e1af66c1	Fix thrift parsing of crypto_metadata	2025-03-25 15:42:48 -07:00
Kenny Daniel	4b094178b3	Move toVarInt to tests	2025-03-20 12:37:24 -07:00
Kenny Daniel	95c47f243d	Add minSize parameter to cachedAsyncBuffer	2025-03-17 23:54:20 -07:00
Kenny Daniel	d7f8d39de3	Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. (#67 ) Refactored readColumn to avoid `concat` operations. This avoids extra copying and allocation.	2025-03-10 23:33:47 -07:00
Kenny Daniel	2cd582ea5a	Remove unnecessary toJson in tests	2025-03-10 19:32:31 -07:00
Kenny Daniel	e590f4ee03	Simplify relative import paths	2025-03-05 14:03:17 -08:00
Kenny Daniel	2456cdc85f	Better error messages	2025-03-04 11:05:22 -08:00
Kenny Daniel	2a302702d4	Fix handling of boolean rle	2025-02-22 13:29:29 -08:00
Johan Levin	bf268e141c	Use prepended length for bit-packed hybrid bool columns (#62 )	2025-02-19 11:07:49 -08:00
Kenny Daniel	36d8ea2e1d	Fix handling of signed decimals (#60 )	2025-02-07 18:52:48 -08:00
Kenny Daniel	5675560266	Use bigint literals	2025-02-07 17:50:34 -08:00
Sean Lynch	725545731d	Support endpoints that don't support range requests in `asyncBufferFromUrl` (#57 ) * Support endpoints that don't support range requests in asyncBufferFromUrl Before this commit asyncBufferFromUrl assumes that the body of whatever successful response it gets is equivalent to the range it requested. If the origin server does not support HTTP range requests then this assumption is usually wrong and will lead to parsing failures. This commit changes asyncBufferFromUrl to change its behaviour slightly based on the status code in the response: - if 200 then we got the whole parquet file as the response. Save it and use the resulting ArrayBuffer to serve all future slice calls. - if 206 then we got a range response and we can just return that. I have also included some test cases to ensure that such responses are handled correctly and also tweaked other existing mocks to also include the relevant status code. * Fix all lint warnings * replace switch with if-else	2025-01-16 11:55:05 -08:00
Kenny Daniel	870187c7de	Update README with Awaitable	2024-12-21 15:31:59 -08:00
Brian Park	c9727a4246	Query filter (#56 ) * implement ParquetQueryFilter types * implement parquetQuery filter tests * implement parquetQuery filter * filter before ordering * apply filters before sorting/slicing * format types * add deep equality utility * document and format equals utility * use deep equality checks * update filter tests * support more types for equality * make $not unary * ensure arrays are correctly compared * support both forms of $not * add operator tests * Filter operator tests --------- Co-authored-by: Brian Park <park-brian@users.noreply.github.com> Co-authored-by: Kenny Daniel <platypii@gmail.com>	2024-12-21 15:23:57 -08:00
Sylvain Lesage	cb639a0b45	factor tests with it.for() (#55 )	2024-12-20 09:53:56 +01:00
Brian Park	9992316748	Enable readColumn to read all rows (#53 ) * Enable readColumn to read all rows * Refactor readColumn to use hasRowLimit * Simplify hasRowLimit condition * Check less common condition first * add readColumn test files * implement readColumn tests for undefined rowLimits * remove unused variable * return early if no metadata is present * address tsc warnings * add comparison * clarify that undefined is valid for rowLimit * remove test files * verify edge case works when rowLimit is undefined * add test cases for readColumn --------- Co-authored-by: Brian Park <park-brian@users.noreply.github.com>	2024-12-19 18:08:22 -08:00
Kenny Daniel	7ce11ad844	Validate url for asyncBufferFromUrl	2024-12-17 09:25:54 -08:00
Sylvain Lesage	09ae9400c5	build types before publishing to npm (#46 ) * build types before publishing to npm * use prepare instead of prepublishOnly + make it clear that we only build types doc for prepare vs prepublishOnly is here: https://docs.npmjs.com/cli/v8/using-npm/scripts * no jsx in this lib * relative imports from the root, so that it works from types/ * remove unused hyparquet.d.ts + report differences to jsdoc in files * try to understand if this is the cause of the failing CI check tsc fails: https://github.com/hyparam/hyparquet/actions/runs/12040954822/job/33571851170?pr=46 * Revert "try to understand if this is the cause of the failing CI check" This reverts commit 5e2fc8ca179064369de71793ab1cda3facefddc7. * not sure what happens, but we just need to ensure the types are created correctly * increment version * Explicitly export types for use in downstream typescript projects * Use new typescript jsdoc imports for smaller package * Combine some files and use @import jsdoc * use the local typescript --------- Co-authored-by: Kenny Daniel <platypii@gmail.com>	2024-12-02 17:47:42 +01:00
Kenny Daniel	82b25df871	Update dependencies	2024-11-29 14:11:04 -08:00

1 2 3 4

182 Commits