5.2 KiB
| title | sidebar_label | description | pagination_prev | pagination_next |
|---|---|---|---|---|
| Spreadsheet Processing in Mathematica | Mathematica | Build complex data pipelines in Mathematica Notebooks. Seamlessly create datasets with SheetJS. Leverage the Mathematica ecosystem to analyze data from Excel workbooks. | demos/cloud/index | demos/bigdata/index |
import current from '/version.js'; import CodeBlock from '@theme/CodeBlock';
Mathematica is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions.
SheetJS is a JavaScript library for reading and writing data from spreadsheets.
This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica.
:::note
This demo was last tested in 2023 August 21 in Mathematica 13.2.1.
:::
Integration Details
The SheetJS NodeJS module can be
loaded in NodeJS scripts, including scripts invoked using the "NodeJS" mode
of the ExternalEvaluate1 Mathematica function.
:::caution pass
In local testing, there were incompatibilities with recent NodeJS versions.
This is a Mathematica bug.
:::
The current recommendation involves a dedicated command-line tool that leverages SheetJS libraries to to perform spreadsheet processing.
Command-Line Tools
The "Command-Line Tools" demo creates xlsx-cli, a
command-line tool that reads a spreadsheet file and generates CSV rows from the
first worksheet.
ExternalEvaluate2 can run command-line tools and capture standard output.
The following snippet processes ~/Downloads.pres.numbers and pulls CSV data
into a variable in Mathematica:
cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];
ImportString3 can interpret the CSV data as a Dataset4. Typically the
first row of the CSV output is the header row. The HeaderLines5 option
controls how Mathematica parses the data:
data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]
The following diagram depicts the workbook waltz:
flowchart LR
subgraph SheetJS operations
file[(workbook\nfile)]
csv(CSV)
end
csvstr(CSV\nString)
data[(Dataset)]
file --> |`xlsx-cli`\nSheetJS Ops| csv
csv --> |ExternalEvaluate\nMathematica| csvstr
csvstr --> |ImportString\nMathematica| data
Complete Demo
:::info pass
This demo was tested in macOS. The path names will differ in other platforms.
:::
- Create the standalone
xlsx-clibinary6:
{\ cd /tmp npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz exit-on-epipe commander@2 curl -LO https://docs.sheetjs.com/cli/xlsx-cli.js npx nexe -t 14.15.3 xlsx-cli.js}
- Move the generated
xlsx-clito a fixed location in/usr/local/bin:
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/
Reading a Local File
- In a new Mathematica notebook, run the following snippet:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
-
Download https://sheetjs.com/pres.numbers and save to Downloads folder.
-
In the Mathematica notebook, run the new function. If the file was saved to the Downloads folder, the path will be
"~/Downloads/pres.numbers"in macOS:
data = SheetJSImportFile["~/Downloads/pres.numbers"]
The result should be displayed in a concise table.
Reading from a URL
FetchURL7 downloads a file from a specified URL and returns a path to the
file. This function will be wrapped in a new function called SheetJSImportURL.
- In the same notebook, run the following:
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
path = FetchURL[x];
SheetJSImportFile[path]
)];
- Test by downloading the test file in the notebook:
data = SheetJSImportURL["https://sheetjs.com/pres.numbers"]
-
See the
ExternalEvaluateNode.js example in the Mathematica documentation. ↩︎ -
See
ExternalEvaluatein the Mathematica documentation. ↩︎ -
See
ImportStringin the Mathematica documentation. ↩︎ -
A
Datasetwill be created when using the"Dataset"element inImportString↩︎ -
See
HeaderLinesin the Mathematica documentation. ↩︎ -
See "Command-line Tools" for more details. ↩︎