5.2 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	| title | sidebar_label | description | pagination_prev | pagination_next | 
|---|---|---|---|---|
| Spreadsheet Processing in Mathematica | Mathematica | Build complex data pipelines in Mathematica Notebooks. Seamlessly create datasets with SheetJS. Leverage the Mathematica ecosystem to analyze data from Excel workbooks. | demos/cloud/index | demos/bigdata/index | 
import current from '/version.js'; import CodeBlock from '@theme/CodeBlock';
Mathematica is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions.
SheetJS is a JavaScript library for reading and writing data from spreadsheets.
This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica.
:::note
This demo was last tested in 2023 August 21 in Mathematica 13.2.1.
:::
Integration Details
The SheetJS NodeJS module can be
loaded in NodeJS scripts, including scripts invoked using the "NodeJS" mode
of the ExternalEvaluate1 Mathematica function.
:::caution pass
In local testing, there were incompatibilities with recent NodeJS versions.
This is a Mathematica bug.
:::
The current recommendation involves a dedicated command-line tool that leverages SheetJS libraries to to perform spreadsheet processing.
Command-Line Tools
The "Command-Line Tools" demo creates xlsx-cli, a
command-line tool that reads a spreadsheet file and generates CSV rows from the
first worksheet.
ExternalEvaluate2 can run command-line tools and capture standard output.
The following snippet processes ~/Downloads.pres.numbers and pulls CSV data
into a variable in Mathematica:
cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];
ImportString3 can interpret the CSV data as a Dataset4. Typically the
first row of the CSV output is the header row. The HeaderLines5 option
controls how Mathematica parses the data:
data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]
The following diagram depicts the workbook waltz:
flowchart LR
  subgraph SheetJS operations
    file[(workbook\nfile)]
    csv(CSV)
  end
  csvstr(CSV\nString)
  data[(Dataset)]
  file --> |`xlsx-cli`\nSheetJS Ops| csv
  csv --> |ExternalEvaluate\nMathematica| csvstr
  csvstr --> |ImportString\nMathematica| data
Complete Demo
:::info pass
This demo was tested in macOS. The path names will differ in other platforms.
:::
- Create the standalone xlsx-clibinary6:
{\ cd /tmp npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz exit-on-epipe commander@2 curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js npx nexe -t 14.15.3 xlsx-cli.js}
- Move the generated xlsx-clito a fixed location in/usr/local/bin:
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/
Reading a Local File
- In a new Mathematica notebook, run the following snippet:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
  "Shell" -> "StandardOutput",
  "/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
- 
Download https://sheetjs.com/pres.numbers and save to Downloads folder. 
- 
In the Mathematica notebook, run the new function. If the file was saved to the Downloads folder, the path will be "~/Downloads/pres.numbers"in macOS:
data = SheetJSImportFile["~/Downloads/pres.numbers"]
The result should be displayed in a concise table.
Reading from a URL
FetchURL7 downloads a file from a specified URL and returns a path to the
file. This function will be wrapped in a new function called SheetJSImportURL.
- In the same notebook, run the following:
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
  path = FetchURL[x];
  SheetJSImportFile[path]
)];
- Test by downloading the test file in the notebook:
data = SheetJSImportURL["https://sheetjs.com/pres.numbers"]
- 
See the ExternalEvaluateNode.js example in the Mathematica documentation. ↩︎
- 
See ExternalEvaluatein the Mathematica documentation. ↩︎
- 
See ImportStringin the Mathematica documentation. ↩︎
- 
A Datasetwill be created when using the"Dataset"element inImportString↩︎
- 
See HeaderLinesin the Mathematica documentation. ↩︎
- 
See "Command-line Tools" for more details. ↩︎