forked from sheetjs/docs.sheetjs.com
		
	V8 Java Binding demo
This commit is contained in:
		
							parent
							
								
									30827f4b7f
								
							
						
					
					
						commit
						234c63dcaa
					
				| @ -244,7 +244,7 @@ | ||||
|   </WorksheetOptions> | ||||
|  </Worksheet> | ||||
|  <Worksheet ss:Name="Bindings"> | ||||
|   <Table ss:ExpandedColumnCount="8" ss:ExpandedRowCount="12" x:FullColumns="1" x:FullRows="1" ss:DefaultColumnWidth="65" ss:DefaultRowHeight="16"> | ||||
|   <Table ss:ExpandedColumnCount="8" ss:ExpandedRowCount="15" x:FullColumns="1" x:FullRows="1" ss:DefaultColumnWidth="65" ss:DefaultRowHeight="16"> | ||||
|    <Column ss:Index="3" ss:Width="24"/> | ||||
|    <Column ss:Width="31"/> | ||||
|    <Column ss:Width="24"/> | ||||
| @ -317,6 +317,16 @@ | ||||
|     <Cell ss:StyleID="s16"><Data ss:Type="String">✔</Data></Cell> | ||||
|     <Cell ss:StyleID="s16"><Data ss:Type="String">✔</Data></Cell> | ||||
|    </Row> | ||||
|    <Row> | ||||
|     <Cell ss:StyleID="s20" ss:HRef="/docs/demos/engines/v8#java"><Data ss:Type="String">V8</Data></Cell> | ||||
|     <Cell><Data ss:Type="String">Java</Data></Cell> | ||||
|     <Cell ss:StyleID="s16"><Data ss:Type="String">✔</Data></Cell> | ||||
|     <Cell ss:StyleID="s16"><Data ss:Type="String">✔</Data></Cell> | ||||
|     <Cell ss:StyleID="s16"/> | ||||
|     <Cell ss:StyleID="s16"/> | ||||
|     <Cell ss:StyleID="s16"/> | ||||
|     <Cell ss:StyleID="s16"/> | ||||
|    </Row> | ||||
|    <Row> | ||||
|     <Cell ss:StyleID="s20" ss:HRef="/docs/demos/engines/jsc#swift"><Data ss:Type="String">JSC</Data></Cell> | ||||
|     <Cell><Data ss:Type="String">Swift</Data></Cell> | ||||
|  | ||||
| @ -23,8 +23,11 @@ In ["SheetJS Conversion"](#sheetjs-conversion), we will use SheetJS libraries to | ||||
| generate CSV files for the LangChain CSV loader. These conversions can be run in | ||||
| a preprocessing step without disrupting existing CSV workflows. | ||||
| 
 | ||||
| In ["SheetJS Loader"](#sheetjs-loader), we will use SheetJS libraries in a custom | ||||
| loader to directly generate documents and metadata. | ||||
| In ["SheetJS Loader"](#sheetjs-loader), we will use SheetJS libraries in a | ||||
| custom loader to directly generate documents and metadata. | ||||
| 
 | ||||
| ["SheetJS Loader Demo"](#sheetjs-loader-demo) is a complete demo that uses the | ||||
| SheetJS Loader to answer questions based on data from a XLS workbook. | ||||
| 
 | ||||
| :::note Tested Deployments | ||||
| 
 | ||||
| @ -34,6 +37,7 @@ This demo was tested in the following configurations: | ||||
| |:-----------|:--------------------------------------------------------------| | ||||
| | 2024-06-19 | Apple M2 Max 12-Core CPU + 30-Core GPU (32 GB unified memory) | | ||||
| | 2024-06-19 | NVIDIA RTX 4080 SUPER (16 GB VRAM) + i9-10910 (128 GB RAM)    | | ||||
| | 2024-06-19 | NVIDIA RTX 3090 (24 GB VRAM) + Ryzen 9 3900XT (128 GB RAM)    | | ||||
| 
 | ||||
| This explanation was verified against LangChain 0.2. | ||||
| 
 | ||||
| @ -103,7 +107,8 @@ Document { | ||||
| The [SheetJS NodeJS module](/docs/getting-started/installation/nodejs) can be | ||||
| imported in NodeJS scripts that use LangChain and other JavaScript libraries. | ||||
| 
 | ||||
| A simple pre-processing step can convert workbooks to spreadsheets | ||||
| A simple pre-processing step can convert workbooks to CSV files that can be | ||||
| processed by the existing CSV tooling: | ||||
| 
 | ||||
| ```mermaid | ||||
| flowchart LR | ||||
| @ -150,6 +155,23 @@ const csv = utils.sheet_to_csv(first_ws); | ||||
| console.log(csv); | ||||
| ``` | ||||
| 
 | ||||
| :::note pass | ||||
| 
 | ||||
| A number of demos cover spiritually similar workflows: | ||||
| 
 | ||||
| - [Stata](/docs/demos/extensions/stata), [MATLAB](/docs/demos/extensions/matlab) | ||||
| and [Maple](/docs/demos/extensions/maple/) support XLSX data import. The SheetJS | ||||
| integrations generate clean XLSX workbooks from user-supplied spreadsheets. | ||||
| 
 | ||||
| - [TensorFlow.js](/docs/demos/math/tensorflow), [Pandas](/docs/demos/math/pandas) | ||||
| and [Mathematica](/docs/demos/extensions/mathematica) support CSV. The SheetJS | ||||
| integrations generate clean CSVs and use built-in CSV processors. | ||||
| 
 | ||||
| - The ["Command-Line Tools"](/docs/demos/cli/) demo covers techniques for making | ||||
| standalone command-line tools for file conversion. | ||||
| 
 | ||||
| ::: | ||||
| 
 | ||||
| ### Single Worksheet | ||||
| 
 | ||||
| For a single worksheet, a SheetJS pre-processing step can write the CSV rows to | ||||
| @ -257,6 +279,17 @@ The demo [`LoadOfSheet` loader](pathname:///loadofsheet/loadofsheet.mjs) will | ||||
| generate one Document per data row across all worksheets. It will also attempt | ||||
| to build metadata and attributes for use in self-querying retrievers. | ||||
| 
 | ||||
| ```js title="Sample usage" | ||||
| /* read and parse `data.xlsb` */ | ||||
| const loader = new LoadOfSheet("./data.xlsb"); | ||||
| 
 | ||||
| /* generate documents */ | ||||
| const docs = await loader.load(); | ||||
| 
 | ||||
| /* synthesized attributes for the SelfQueryRetriever */ | ||||
| const attributes = loader.attributes; | ||||
| ``` | ||||
| 
 | ||||
| <details> | ||||
|   <summary><b>Sample SheetJS Loader</b> (click to show)</summary> | ||||
| 
 | ||||
|  | ||||
| @ -25,6 +25,6 @@ ultimately displayed to the user in a HTML table. | ||||
| 
 | ||||
| ## Loading Sheets | ||||
| 
 | ||||
| ["Loading Sheets"](/docs/getting-started/examples/loader) explores deep SheetJS | ||||
| The ["Loader Tutorial"](/docs/getting-started/examples/loader) explores SheetJS | ||||
| integrations. Based on the existing CSV and binary loaders, a spreadsheet loader | ||||
| for LangChain is developed and tested. | ||||
| is developed and tested in a natural language query workflow. | ||||
|  | ||||
| @ -81,8 +81,8 @@ Each browser demo was tested in the following environments: | ||||
| 
 | ||||
| | Browser     | Date       | | ||||
| |:------------|:-----------| | ||||
| | Chrome 120  | 2024-01-30 | | ||||
| | Safari 17.2 | 2024-01-15 | | ||||
| | Chrome 126  | 2024-06-19 | | ||||
| | Safari 17.3 | 2024-06-19 | | ||||
| 
 | ||||
| ::: | ||||
| 
 | ||||
|  | ||||
| @ -135,8 +135,8 @@ Each browser demo was tested in the following environments: | ||||
| 
 | ||||
| | Browser     | Date       | | ||||
| |:------------|:-----------| | ||||
| | Chrome 120  | 2024-01-15 | | ||||
| | Safari 17.3 | 2024-02-21 | | ||||
| | Chrome 126  | 2024-06-19 | | ||||
| | Safari 17.3 | 2024-06-19 | | ||||
| 
 | ||||
| ::: | ||||
| 
 | ||||
|  | ||||
| @ -288,7 +288,7 @@ The script will create a file `SheetJSCheerio.xlsx` that can be opened. | ||||
| ### DenoDOM | ||||
| 
 | ||||
| [DenoDOM](https://deno.land/x/deno_dom) provides a DOM framework for Deno. For | ||||
| the tested version (`0.1.43`), the following patches were needed: | ||||
| the tested version (`0.1.46`), the following patches were needed: | ||||
| 
 | ||||
| - TABLE `rows` property (explained above) | ||||
| - TR `cells` property (explained above) | ||||
| @ -299,7 +299,7 @@ This example fetches [a sample table](pathname:///dom/SheetJSTable.html): | ||||
| // @deno-types="https://cdn.sheetjs.com/xlsx-${current}/package/types/index.d.ts" | ||||
| import * as XLSX from 'https://cdn.sheetjs.com/xlsx-${current}/package/xlsx.mjs'; | ||||
| \n\ | ||||
| import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.43/deno-dom-wasm.ts'; | ||||
| import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.46/deno-dom-wasm.ts'; | ||||
| \n\ | ||||
| const doc = new DOMParser().parseFromString( | ||||
|   await (await fetch('https://docs.sheetjs.com/dom/SheetJSTable.html')).text(), | ||||
| @ -323,7 +323,12 @@ XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");`} | ||||
| 
 | ||||
| :::note Tested Deployments | ||||
| 
 | ||||
| This demo was last tested on 2024 January 27 against DenoDOM `0.1.43` | ||||
| This demo was tested in the following deployments: | ||||
| 
 | ||||
| | Architecture | DenoDOM | Deno   | Date       | | ||||
| |:-------------|:--------|:-------|:-----------| | ||||
| | `darwin-x64` | 0.1.46  | 1.44.4 | 2024-06-19 | | ||||
| | `darwin-arm` | 0.1.46  | 1.44.4 | 2024-06-19 | | ||||
| 
 | ||||
| ::: | ||||
| 
 | ||||
|  | ||||
| @ -970,6 +970,73 @@ cargo run pres.numbers | ||||
| If the program succeeded, the CSV contents will be printed to console and the | ||||
| file `sheetjsw.xlsb` will be created.  That file can be opened with Excel. | ||||
| 
 | ||||
| ### Java | ||||
| 
 | ||||
| [Javet](https://www.caoccao.com/Javet/) is a Java binding to the V8 engine. | ||||
| Javet simplifies conversions between Java data structures and V8 equivalents. | ||||
| 
 | ||||
| Java byte arrays (`byte[]`) are projected in V8 as `Int8Array`. The SheetJS | ||||
| `read` method expects a `Uint8Array`. The following script snippet performs a | ||||
| zero-copy conversion: | ||||
| 
 | ||||
| ```js title="Zero-copy conversion from Int8Array to Uint8Array" | ||||
| // assuming `i8` is an Int8Array | ||||
| const u8 = new Uint8Array(i8.buffer, i8.byteOffset, i8.length); | ||||
| ``` | ||||
| 
 | ||||
| :::note Tested Deployments | ||||
| 
 | ||||
| This demo was last tested in the following deployments: | ||||
| 
 | ||||
| | Architecture | V8 Version    | Javet   | Java    | Date       | | ||||
| |:-------------|:--------------|:--------|:--------|:-----------| | ||||
| | `darwin-x64` | `12.6.228.13` | `3.1.3` | 22      | 2024-06-19 | | ||||
| | `darwin-arm` | `12.6.228.13` | `3.1.3` | 11.0.23 | 2024-06-19 | | ||||
| 
 | ||||
| ::: | ||||
| 
 | ||||
| 1) Create a new project: | ||||
| 
 | ||||
| ```bash | ||||
| mkdir sheetjs-javet | ||||
| cd sheetjs-javet | ||||
| ``` | ||||
| 
 | ||||
| 2) Download the Javet JAR. There are different archives for different platforms. | ||||
| The following command runs on `darwin-x64` and `darwin-arm`: | ||||
| 
 | ||||
| ```bash | ||||
| curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet-macos/3.1.3/javet-macos-3.1.3.jar | ||||
| ``` | ||||
| 
 | ||||
| 3) Download the SheetJS Standalone script and test file. Save both files in the | ||||
| project directory: | ||||
| 
 | ||||
| <ul> | ||||
| <li><a href={`https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js`}>xlsx.full.min.js</a></li> | ||||
| <li><a href="https://docs.sheetjs.com/pres.xlsx">pres.xlsx</a></li> | ||||
| </ul> | ||||
| 
 | ||||
| <CodeBlock language="bash">{`\ | ||||
| curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js | ||||
| curl -LO https://docs.sheetjs.com/pres.xlsx`} | ||||
| </CodeBlock> | ||||
| 
 | ||||
| 4) Download [`SheetJSJavet.java`](pathname:///v8/SheetJSJavet.java): | ||||
| 
 | ||||
| ```bash | ||||
| curl -LO https://docs.sheetjs.com/v8/SheetJSJavet.java | ||||
| ``` | ||||
| 
 | ||||
| 5) Build and run the Java application: | ||||
| 
 | ||||
| ```bash | ||||
| javac -cp ".:javet-macos-3.1.3.jar" SheetJSJavet.java | ||||
| java -cp ".:javet-macos-3.1.3.jar" SheetJSJavet pres.xlsx | ||||
| ``` | ||||
| 
 | ||||
| If the program succeeded, the CSV contents will be printed to console. | ||||
| 
 | ||||
| ## Snapshots | ||||
| 
 | ||||
| At a high level, V8 snapshots are raw dumps of the V8 engine state. It is much | ||||
|  | ||||
							
								
								
									
										30
									
								
								docz/static/v8/SheetJSJavet.java
									
									
									
									
									
										Normal file
									
								
							
							
								
								
								
								
								
									
									
								
							
						
						
									
										30
									
								
								docz/static/v8/SheetJSJavet.java
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,30 @@ | ||||
| import java.nio.file.Files; | ||||
| import java.nio.file.Paths; | ||||
| import java.util.Scanner; | ||||
| import com.caoccao.javet.interop.V8Host; | ||||
| import com.caoccao.javet.interop.V8Runtime; | ||||
| 
 | ||||
| public class SheetJSJavet { | ||||
|   public static void main(String[] args) throws Exception { | ||||
|     /* initialize */ | ||||
|     V8Runtime v8Runtime = V8Host.getV8Instance().createV8Runtime(); | ||||
| 
 | ||||
|     /* read script file */ | ||||
|     v8Runtime.getExecutor("var global = (function(){ return this; }).call(null);").executeVoid(); | ||||
|     v8Runtime.getExecutor(new Scanner(SheetJSJavet.class.getResourceAsStream("/xlsx.full.min.js")).useDelimiter("\\Z").next()).executeVoid(); | ||||
| 
 | ||||
|     System.out.println(v8Runtime.getExecutor("'SheetJS Version ' + XLSX.version").executeString()); | ||||
| 
 | ||||
|     /* read spreadsheet bytes */ | ||||
|     v8Runtime.getGlobalObject().set("i8", Files.readAllBytes(Paths.get(args[0]))); | ||||
|     v8Runtime.getExecutor("var u8 = new Uint8Array(i8.buffer, i8.byteOffset, i8.length);").executeVoid(); | ||||
| 
 | ||||
|     /* parse workbook */ | ||||
|     v8Runtime.getExecutor("var wb = XLSX.read(u8, {type: 'array'})").executeVoid(); | ||||
| 
 | ||||
|     /* get first worksheet as CSV */ | ||||
|     v8Runtime.getExecutor("var ws = wb.Sheets[wb.SheetNames[0]];").executeVoid(); | ||||
|     String res = v8Runtime.getExecutor("XLSX.utils.sheet_to_csv(ws)").executeString(); | ||||
|     System.out.println(res); | ||||
|   } | ||||
| } | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user