Commit Graph

75 Commits

Author SHA1 Message Date
Kenny Daniel
c0e0c7cfe5
Fix BYTE_STREAM_SPLIT with data page v2 and compression 2025-11-26 16:04:47 -08:00
Kenny Daniel
c3a42b5bc9
Fix plan row boundaries 2025-11-21 00:25:30 -08:00
Sylvain Lesage
e8b1c8e570
Minimal support for GeoParquet (#133)
* Initial support for GeoParquet

* pr comments

* convert crs

* add test file + expected JSON files

* add sentence to README

* Apply suggestion from @platypii

Co-authored-by: Kenny Daniel <platypii@gmail.com>

* PR comments

* update README

* review comment

---------

Co-authored-by: Kenny Daniel <platypii@gmail.com>
2025-10-16 04:22:01 -04:00
Kenny Daniel
d701904253
Add well-known-binary decoder for geometry and geography (#131) 2025-09-30 11:45:39 -07:00
Kenny Daniel
a7bfab0e99
Fix high-precision decimal parsing (#116) 2025-09-01 11:24:20 -07:00
kroche98
ee192054b2
Skip plan for files with no rows (#98) 2025-07-02 15:46:32 -07:00
Kenny Daniel
ef8e1c8c71
Fix bug when encoding length is zero (#93) 2025-06-17 14:16:38 -07:00
Kenny Daniel
1f4e1f2f0b
Fix duckdb empty block (#91) 2025-06-13 00:39:01 -07:00
Kenny Daniel
8dbb74ac78
Convert logical strings 2025-05-15 23:44:09 -07:00
mike-iqmo
dbf3065f8e
Addresses issues with duckdb use of delta encodings (#77)
* Addresses issues with duckdb use of delta encodings

* Shrunk size of test data
2025-05-14 16:28:58 -07:00
Kenny Daniel
b7db4653e7
Add another column to page_indexed test 2025-04-26 17:18:11 -07:00
Kenny Daniel
9a04cbccd3
Convert unsigned types 2025-04-14 23:20:58 -07:00
Kenny Daniel
972402d083
Fix handling of dictionary pages from parquet.net 2025-04-09 17:26:47 -07:00
Kenny Daniel
655444bcde
Fix continued data pages
Parquet allows consecutive pages to continue a previously assembled
list. Broke in hyparquet 1.9.0. Added continued_page.parquet test.
2025-04-07 17:40:23 -07:00
Kenny Daniel
ba74d58dd3
Test for reading the last row of files 2025-04-06 22:05:58 -07:00
Kenny Daniel
1247f5d606
Split out readPage
Remove dict-page-offset-zero test because it's a malformed parquet file.
2025-04-02 20:27:10 -07:00
Kenny Daniel
85e1af66c1
Fix thrift parsing of crypto_metadata 2025-03-25 15:42:48 -07:00
Kenny Daniel
2a302702d4
Fix handling of boolean rle 2025-02-22 13:29:29 -08:00
Johan Levin
bf268e141c
Use prepended length for bit-packed hybrid bool columns (#62) 2025-02-19 11:07:49 -08:00
Kenny Daniel
36d8ea2e1d
Fix handling of signed decimals (#60) 2025-02-07 18:52:48 -08:00
Kenny Daniel
870187c7de
Update README with Awaitable 2024-12-21 15:31:59 -08:00
Kenny
a2024a781c
Parse column and offset indexes (#29)
* Parse indicies

* Add parsed offset indices

* Add parsed column indices

* Test readColumnIndex and readOffsetIndex

* Add more parsed offset indices

* Remove unnecessary toJson when loading expected results

* Add length checks to convertMetadata

* Rename indicies.js to indexes.js

* Rename indices.test.js to indexes.test.js

* Rename *_indices.json to *_indexes.json

* Use asyncBufferFromFile in indexes.test.js

---------

Co-authored-by: Brian Park <park-brian@users.noreply.github.com>
2024-08-18 18:23:54 -07:00
Kenny Daniel
c6c79c05ca
Fix for issue #23 nested struct assembly 2024-08-02 14:47:04 -07:00
Kenny Daniel
17f412c2f5
Convert logical date units 2024-05-24 16:55:13 -07:00
Kenny Daniel
efdbf459a5
Convert date and decimal stats 2024-05-24 15:22:59 -07:00
Kenny Daniel
a56420de2f
Parse metadata TimeUnit 2024-05-24 15:17:20 -07:00
Kenny Daniel
2edc14b70e
Convert unsigned ints 2024-05-23 23:35:49 -07:00
Kenny Daniel
c68256575b
Convert logical timestamp 2024-05-23 18:50:57 -07:00
Kenny Daniel
7a08aa3183
Handle repeated with no children 2024-05-23 18:26:16 -07:00
Kenny Daniel
ed3b525a27
Fix nested optional from duckdb#3734 🦆 2024-05-23 18:19:01 -07:00
Kenny Daniel
af7bab33f8
Handle top level repeated from duckdb#2557 🦆 2024-05-23 17:43:36 -07:00
Kenny Daniel
d92cc5fd22
Convert timestamps and json 2024-05-23 16:43:26 -07:00
Kenny Daniel
06578a9419
struct_strings.parquet 2024-05-23 02:10:04 -07:00
Kenny Daniel
7d1d877c9f
Fix metadata parsing of page_type 2024-05-23 00:11:58 -07:00
Kenny Daniel
b8e4496063
Upgrade dataPage to match dictionary type 2024-05-23 00:07:09 -07:00
Kenny Daniel
c4ad05e580
Convert byte arrays to utf8 by default 2024-05-22 22:40:21 -07:00
Kenny Daniel
1f8289b4b2
rle_boolean_encoding.parquet 2024-05-22 19:16:10 -07:00
Kenny Daniel
5eeb05da40
dict-page-offset-zero.parquet 2024-05-21 22:50:50 -07:00
Kenny Daniel
4f7791354c
incorrect_map_schema.parquet 2024-05-21 22:18:39 -07:00
Kenny Daniel
6a75a960da
Convert boolean column 2024-05-21 22:05:29 -07:00
Kenny Daniel
a1ca1ef785
byte_stream_split_extended.gzip.parquet 2024-05-21 17:21:36 -07:00
Kenny Daniel
70387fa345
repeated_no_annotation.parquet 2024-05-20 23:09:31 -07:00
Kenny Daniel
d453313dca
Fix optional structs! 2024-05-20 05:03:33 -07:00
Kenny Daniel
9cd09b8eed
Byte stream split encoding 2024-05-20 04:09:32 -07:00
Kenny Daniel
1689d7473a
Delta length byte array encoding 2024-05-20 02:32:31 -07:00
Kenny Daniel
da72c06ac2
Use hyparquet-compressors for tests (brotli, lz4, zstd) 2024-05-20 02:07:40 -07:00
Kenny Daniel
d4341b803e
Delta byte array encoding 2024-05-18 19:23:11 -07:00
Kenny Daniel
561f06f701
Int_Map test is redundant with nullable.impala.parquet 2024-05-18 18:33:15 -07:00
Kenny Daniel
3583aeb549
nullable.impala.parquet 2024-05-17 22:52:57 -07:00
Kenny
cf4c4ba04d
Assembly of nested column types (#11) 2024-05-17 22:44:03 -07:00