Commit Graph

149 Commits

Author SHA1 Message Date
Kenny Daniel
95c47f243d
Add minSize parameter to cachedAsyncBuffer 2025-03-17 23:54:20 -07:00
Kenny Daniel
d7f8d39de3
Return typed arrays in onChunk. Change readColumn to return DecodedArray[]. (#67)
Refactored readColumn to avoid `concat` operations.
This avoids extra copying and allocation.
2025-03-10 23:33:47 -07:00
Kenny Daniel
2cd582ea5a
Remove unnecessary toJson in tests 2025-03-10 19:32:31 -07:00
Kenny Daniel
e590f4ee03
Simplify relative import paths 2025-03-05 14:03:17 -08:00
Kenny Daniel
2456cdc85f
Better error messages 2025-03-04 11:05:22 -08:00
Kenny Daniel
2a302702d4
Fix handling of boolean rle 2025-02-22 13:29:29 -08:00
Johan Levin
bf268e141c
Use prepended length for bit-packed hybrid bool columns (#62) 2025-02-19 11:07:49 -08:00
Kenny Daniel
36d8ea2e1d
Fix handling of signed decimals (#60) 2025-02-07 18:52:48 -08:00
Kenny Daniel
5675560266
Use bigint literals 2025-02-07 17:50:34 -08:00
Sean Lynch
725545731d
Support endpoints that don't support range requests in asyncBufferFromUrl (#57)
* Support endpoints that don't support range requests in asyncBufferFromUrl

Before this commit asyncBufferFromUrl assumes that the body of whatever
successful response it gets is equivalent to the range it requested. If
the origin server does not support HTTP range requests then this
assumption is usually wrong and will lead to parsing failures.

This commit changes asyncBufferFromUrl to change its behaviour slightly
based on the status code in the response:
- if 200 then we got the whole parquet file as the response. Save it and
  use the resulting ArrayBuffer to serve all future slice calls.
- if 206 then we got a range response and we can just return that.

I have also included some test cases to ensure that such responses are
handled correctly and also tweaked other existing mocks to also include
the relevant status code.

* Fix all lint warnings

* replace switch with if-else
2025-01-16 11:55:05 -08:00
Kenny Daniel
870187c7de
Update README with Awaitable 2024-12-21 15:31:59 -08:00
Brian Park
c9727a4246
Query filter (#56)
* implement ParquetQueryFilter types

* implement parquetQuery filter tests

* implement parquetQuery filter

* filter before ordering

* apply filters before sorting/slicing

* format types

* add deep equality utility

* document and format equals utility

* use deep equality checks

* update filter tests

* support more types for equality

* make $not unary

* ensure arrays are correctly compared

* support both forms of $not

* add operator tests

* Filter operator tests

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
Co-authored-by: Kenny Daniel <platypii@gmail.com>
2024-12-21 15:23:57 -08:00
Sylvain Lesage
cb639a0b45
factor tests with it.for() (#55) 2024-12-20 09:53:56 +01:00
Brian Park
9992316748
Enable readColumn to read all rows (#53)
* Enable readColumn to read all rows

* Refactor readColumn to use hasRowLimit

* Simplify hasRowLimit condition

* Check less common condition first

* add readColumn test files

* implement readColumn tests for undefined rowLimits

* remove unused variable

* return early if no metadata is present

* address tsc warnings

* add comparison

* clarify that undefined is valid for rowLimit

* remove test files

* verify edge case works when rowLimit is undefined

* add test cases for readColumn

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
2024-12-19 18:08:22 -08:00
Kenny Daniel
7ce11ad844
Validate url for asyncBufferFromUrl 2024-12-17 09:25:54 -08:00
Sylvain Lesage
09ae9400c5
build types before publishing to npm (#46)
* build types before publishing to npm

* use prepare instead of prepublishOnly + make it clear that we only build types

doc for prepare vs prepublishOnly is here: https://docs.npmjs.com/cli/v8/using-npm/scripts

* no jsx in this lib

* relative imports from the root, so that it works from types/

* remove unused hyparquet.d.ts + report differences to jsdoc in files

* try to understand if this is the cause of the failing CI check

tsc fails: https://github.com/hyparam/hyparquet/actions/runs/12040954822/job/33571851170?pr=46

* Revert "try to understand if this is the cause of the failing CI check"

This reverts commit 5e2fc8ca179064369de71793ab1cda3facefddc7.

* not sure what happens, but we just need to ensure the types are created correctly

* increment version

* Explicitly export types for use in downstream typescript projects

* Use new typescript jsdoc imports for smaller package

* Combine some files and use @import jsdoc

* use the local typescript

---------

Co-authored-by: Kenny Daniel <platypii@gmail.com>
2024-12-02 17:47:42 +01:00
Kenny Daniel
82b25df871
Update dependencies 2024-11-29 14:11:04 -08:00
Sylvain Lesage
cb1e965e02
simulate an async operation (#33) 2024-11-08 22:23:35 +01:00
Sylvain Lesage
6ec836dac5
pass requestInit to fetch utils (#34)
* pass requestInit to fetch utils

It will allow authentication

* add tests
2024-11-08 22:22:30 +01:00
Kenny Daniel
d8cc46b915
No dependencies allowed 2024-10-23 22:51:31 -07:00
Kenny Daniel
a5c34e2950
cachedAsyncBuffer tests 2024-10-16 01:38:02 -07:00
Kenny Daniel
e6301a8bc8
demo: use web worker for parquet parsing to avoid blocking main thread 2024-09-25 02:22:30 -07:00
Kenny Daniel
9d49dabc15
Query api 2024-09-24 21:01:04 -07:00
Kenny Daniel
df02229407
Promisified parquetReadObjects function 2024-08-20 11:30:39 -07:00
Kenny
a2024a781c
Parse column and offset indexes (#29)
* Parse indicies

* Add parsed offset indices

* Add parsed column indices

* Test readColumnIndex and readOffsetIndex

* Add more parsed offset indices

* Remove unnecessary toJson when loading expected results

* Add length checks to convertMetadata

* Rename indicies.js to indexes.js

* Rename indices.test.js to indexes.test.js

* Rename *_indices.json to *_indexes.json

* Use asyncBufferFromFile in indexes.test.js

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
2024-08-18 18:23:54 -07:00
ctranstrum
8ace1a47d2
return column names in the order requested (#27)
* return column names in the order requested

* retain correct ordering of columns in object rows as well
2024-08-14 00:01:47 -07:00
ctranstrum
d13d52b606
Add an option to return each row as an object keyed by column name (#25)
* Add an option to return each row as an object keyed by column name

* rename option to rowFormat and address feedback
2024-08-13 09:15:59 -07:00
Kenny Daniel
c6c79c05ca
Fix for issue #23 nested struct assembly 2024-08-02 14:47:04 -07:00
Kenny Daniel
a5122e61d6
utils: asyncBufferFromFile 2024-07-26 15:07:47 -07:00
Kenny Daniel
9ab5004cd8
Bit pack testing 2024-06-13 21:33:28 -07:00
Kenny Daniel
ddb8b16cd0
Fix handling of multiple pages 2024-06-07 23:16:04 -07:00
Kenny Daniel
9db378de2f
toJson tests 2024-05-28 14:24:12 -07:00
Kenny Daniel
f28735c0ce
readVarInt tests 2024-05-28 14:18:04 -07:00
Kenny Daniel
490d1ec800
Fix 3-byte RLE 2024-05-28 13:58:02 -07:00
Kenny Daniel
17f412c2f5
Convert logical date units 2024-05-24 16:55:13 -07:00
Kenny Daniel
efdbf459a5
Convert date and decimal stats 2024-05-24 15:22:59 -07:00
Kenny Daniel
a56420de2f
Parse metadata TimeUnit 2024-05-24 15:17:20 -07:00
Kenny Daniel
2edc14b70e
Convert unsigned ints 2024-05-23 23:35:49 -07:00
Kenny Daniel
10b9b299d8
Fix complex.parquet 2024-05-23 23:20:16 -07:00
Kenny Daniel
c68256575b
Convert logical timestamp 2024-05-23 18:50:57 -07:00
Kenny Daniel
7a08aa3183
Handle repeated with no children 2024-05-23 18:26:16 -07:00
Kenny Daniel
ed3b525a27
Fix nested optional from duckdb#3734 🦆 2024-05-23 18:19:01 -07:00
Kenny Daniel
af7bab33f8
Handle top level repeated from duckdb#2557 🦆 2024-05-23 17:43:36 -07:00
Kenny Daniel
d92cc5fd22
Convert timestamps and json 2024-05-23 16:43:26 -07:00
Kenny Daniel
06578a9419
struct_strings.parquet 2024-05-23 02:10:04 -07:00
Kenny Daniel
7d1d877c9f
Fix metadata parsing of page_type 2024-05-23 00:11:58 -07:00
Kenny Daniel
b8e4496063
Upgrade dataPage to match dictionary type 2024-05-23 00:07:09 -07:00
Kenny Daniel
c4ad05e580
Convert byte arrays to utf8 by default 2024-05-22 22:40:21 -07:00
Kenny Daniel
1f8289b4b2
rle_boolean_encoding.parquet 2024-05-22 19:16:10 -07:00
Kenny Daniel
9369faad46
Code cleanup 🧹 2024-05-22 12:58:37 -07:00