forked from sheetjs/docs.sheetjs.com
		
	
		
			
	
	
		
			306 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			306 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|  | --- | ||
|  | title: Modern Spreadsheets in Stata | ||
|  | sidebar_label: Stata | ||
|  | pagination_prev: demos/cloud/index | ||
|  | pagination_next: demos/bigdata/index | ||
|  | --- | ||
|  | 
 | ||
|  | import current from '/version.js'; | ||
|  | import CodeBlock from '@theme/CodeBlock'; | ||
|  | 
 | ||
|  | export const b = {style: {color:"blue"}}; | ||
|  | 
 | ||
|  | [Stata](https://www.stata.com/) is a statistical software package. It offers a | ||
|  | robust C-based extension system. | ||
|  | 
 | ||
|  | [SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing | ||
|  | data from spreadsheets. | ||
|  | 
 | ||
|  | This demo uses SheetJS to pull data from a spreadsheet for further analysis | ||
|  | within Stata. We'll create a Stata native extension that loads the | ||
|  | [Duktape](/docs/demos/engines/duktape) JavaScript engine and uses the SheetJS | ||
|  | library to read data from spreadsheets and converts to a Stata-friendly format. | ||
|  | 
 | ||
|  | ```mermaid | ||
|  | flowchart LR | ||
|  |   ofile[(workbook\nXLSB file)] | ||
|  |   nfile[(clean file\nXLSX)] | ||
|  |   data[[Stata\nVariables]] | ||
|  |   ofile --> |Stata Extension\nSheetJS + Duktape| nfile | ||
|  |   nfile --> |Stata command\nimport excel|data | ||
|  | ``` | ||
|  | 
 | ||
|  | The demo will read [a Numbers workbook](https://sheetjs.com/pres.numbers) and | ||
|  | generate variables for each column. A sample Stata session is shown below: | ||
|  | 
 | ||
|  |  | ||
|  | 
 | ||
|  | :::note | ||
|  | 
 | ||
|  | This demo was last tested by SheetJS users on 2023 October 09. | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | :::info pass | ||
|  | 
 | ||
|  | Stata has limited support for processing spreadsheets through the `import excel` | ||
|  | command[^1]. At the time of writing, it lacked support for XLSB, NUMBERS, and | ||
|  | other common spreadsheet formats. | ||
|  | 
 | ||
|  | SheetJS libraries help fill the gap by normalizing spreadsheets to a form that | ||
|  | Stata can understand. | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | ## Integration Details
 | ||
|  | 
 | ||
|  | The current recommendation involves a native plugin that reads arbitrary files | ||
|  | and generates clean XLSX files that Stata can import. | ||
|  | 
 | ||
|  | The extension function ultimately pairs the SheetJS `read`[^2] and `write`[^3] | ||
|  | methods to read data from the old file and write a new file: | ||
|  | 
 | ||
|  | ```js | ||
|  | var wb = XLSX.read(original_file_data, {type: "buffer"}); | ||
|  | var new_file_data = XLSX.write(wb, {type: "array", bookType: "xlsx"}); | ||
|  | ``` | ||
|  | 
 | ||
|  | The extension function `cleanfile` will take one or two arguments: | ||
|  | 
 | ||
|  | `plugin call cleanfile, "pres.numbers"` will generate `sheetjs.tmp.xlsx` from | ||
|  | the first argument (`"pres.numbers"`) and print instructions to load the file. | ||
|  | 
 | ||
|  | `plugin call cleanfile, "pres.numbers" verbose` will additionally print CSV | ||
|  | contents of each worksheet in the workbook. | ||
|  | 
 | ||
|  | ```mermaid | ||
|  | flowchart LR | ||
|  |   ofile{{File\nName}} | ||
|  |   subgraph JS Operations | ||
|  |     ojbuf[(Buffer\nFile Bytes)] | ||
|  |     wb(((SheetJS\nWorkbook))) | ||
|  |     njbuf[(Buffer\nXLSX bytes)] | ||
|  |   end | ||
|  |   obuf[(File\nbytes)] | ||
|  |   nbuf[(New file\nbytes)] | ||
|  |   nfile[(XLSX\nFile)] | ||
|  |   ofile --> |C\nRead File| obuf | ||
|  |   obuf --> |Duktape\nBuffer Ops| ojbuf | ||
|  |   ojbuf --> |SheetJS\n`read`| wb | ||
|  |   wb --> |SheetJS\n`write`| njbuf | ||
|  |   njbuf --> |Duktape\nBuffer Ops| nbuf | ||
|  |   nbuf --> |C\nWrite File| nfile | ||
|  | ``` | ||
|  | 
 | ||
|  | ### C Extensions
 | ||
|  | 
 | ||
|  | Stata C extensions are shared libraries or DLLs that use special Stata methods | ||
|  | for parsing arguments and returning values. | ||
|  | 
 | ||
|  | Arguments are passed to the `stata_call` function in the DLL. | ||
|  | 
 | ||
|  | `SF_display` and `SF_error` display text and error messages respectively. | ||
|  | 
 | ||
|  | ### Duktape JS Engine
 | ||
|  | 
 | ||
|  | This demo uses the [Duktape JavaScript engine](/docs/demos/engines/duktape). The | ||
|  | SheetJS + Duktape demo covers engine integration details in more detail. | ||
|  | 
 | ||
|  | The [SheetJS Standalone scripts](/docs/getting-started/installation/standalone) | ||
|  | can be loaded in Duktape by reading the source from the filesystem. | ||
|  | 
 | ||
|  | ## Complete Demo
 | ||
|  | 
 | ||
|  | :::info pass | ||
|  | 
 | ||
|  | This demo was tested in Windows x64. The path names and build commands will | ||
|  | differ in other platforms and operating systems. | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | The [`cleanfile.c`](pathname:///stata/cleanfile.c) extension defines one plugin | ||
|  | function. It can be chained with `import excel`: | ||
|  | 
 | ||
|  | ```stata | ||
|  | program cleanfile, plugin | ||
|  | plugin call cleanfile, "pres.numbers" verbose | ||
|  | program drop cleanfile | ||
|  | import excel "sheetjs.tmp.xlsx", firstrow | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Create Plugin
 | ||
|  | 
 | ||
|  | 0) Ensure "Windows Subsystem for Linux" (WSL) and Visual Studio are installed. | ||
|  | 
 | ||
|  | 1) Open a new "x64 Native Tools Command Prompt" window and create a project | ||
|  | folder `c:\sheetjs-stata`: | ||
|  | 
 | ||
|  | ```powershell | ||
|  | cd c:\ | ||
|  | mkdir sheetjs-stata | ||
|  | cd sheetjs-stata | ||
|  | ``` | ||
|  | 
 | ||
|  | 2) Enter WSL: | ||
|  | 
 | ||
|  | ```powershell | ||
|  | bash | ||
|  | ``` | ||
|  | 
 | ||
|  | 3) Download [`stplugin.c`](https://www.stata.com/plugins/stplugin.c) and | ||
|  | [`stplugin.h`](https://www.stata.com/plugins/stplugin.h) from the Stata website: | ||
|  | 
 | ||
|  | ```bash | ||
|  | curl -LO https://www.stata.com/plugins/stplugin.c | ||
|  | curl -LO https://www.stata.com/plugins/stplugin.h | ||
|  | ``` | ||
|  | 
 | ||
|  | 4) Still within WSL, install Duktape: | ||
|  | 
 | ||
|  | ```bash | ||
|  | curl -LO https://duktape.org/duktape-2.7.0.tar.xz | ||
|  | tar -xJf duktape-2.7.0.tar.xz | ||
|  | mv duktape-2.7.0/src/*.{c,h} . | ||
|  | ``` | ||
|  | 
 | ||
|  | 5) Still within WSL, download the demo source | ||
|  | [`cleanfile.c`](https://docs.sheetjs.com/stata/cleanfile.c): | ||
|  | 
 | ||
|  | ```bash | ||
|  | curl -LO https://docs.sheetjs.com/stata/cleanfile.c | ||
|  | ``` | ||
|  | 
 | ||
|  | 6) Exit WSL: | ||
|  | 
 | ||
|  | ```bash | ||
|  | exit | ||
|  | ``` | ||
|  | 
 | ||
|  | The window will return to the command prompt. | ||
|  | 
 | ||
|  | 7) Build the DLL: | ||
|  | 
 | ||
|  | ```powershell | ||
|  | cl /LD cleanfile.c stplugin.c duktape.c | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Install Plugin
 | ||
|  | 
 | ||
|  | 8) Copy the DLL to `cleanfile.plugin` in the Stata data directory. For example, | ||
|  | with a shared data directory `c:\data`: | ||
|  | 
 | ||
|  | ```powershell | ||
|  | mkdir c:\data | ||
|  | copy cleanfile.dll c:\data\cleanfile.plugin | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Download SheetJS Scripts
 | ||
|  | 
 | ||
|  | 9) Move to the `c:\data` directory | ||
|  | 
 | ||
|  | ```powershell | ||
|  | cd c:\data | ||
|  | ``` | ||
|  | 
 | ||
|  | 10) Enter WSL | ||
|  | 
 | ||
|  | ```powershell | ||
|  | bash | ||
|  | ``` | ||
|  | 
 | ||
|  | 11) Within WSL, download SheetJS scripts and the test file. | ||
|  | 
 | ||
|  | <CodeBlock language="bash">{`\ | ||
|  | curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/shim.min.js | ||
|  | curl -LO https://cdn.sheetjs.com/xlsx-${current}/package/dist/xlsx.full.min.js | ||
|  | curl -LO https://sheetjs.com/pres.numbers`} | ||
|  | </CodeBlock> | ||
|  | 
 | ||
|  | 12) Exit WSL: | ||
|  | 
 | ||
|  | ```bash | ||
|  | exit | ||
|  | ``` | ||
|  | 
 | ||
|  | The window will return to the command prompt. | ||
|  | 
 | ||
|  | ### Stata Test
 | ||
|  | 
 | ||
|  | :::note pass | ||
|  | 
 | ||
|  | The screenshot in the introduction shows the result of steps 13 - 19 | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | 13) Open Stata | ||
|  | 
 | ||
|  | 14) Move to the `c:\data` directory in Stata: | ||
|  | 
 | ||
|  | ```stata | ||
|  | cd c:\data | ||
|  | ``` | ||
|  | 
 | ||
|  | 15) Load the `cleanfile` plugin: | ||
|  | 
 | ||
|  | ```stata | ||
|  | program cleanfile, plugin | ||
|  | ``` | ||
|  | 
 | ||
|  | 16) Read the `pres.numbers` test file: | ||
|  | 
 | ||
|  | ```stata | ||
|  | plugin call cleanfile, "pres.numbers" verbose | ||
|  | ``` | ||
|  | 
 | ||
|  | The result will show the data from `pres.numbers`: | ||
|  | 
 | ||
|  | <pre> | ||
|  | <b>. plugin call cleanfile, "pres.numbers" verbose</b>{'\n'} | ||
|  | Worksheet 0 Name: Sheet1{'\n'} | ||
|  | Name,Index{'\n'} | ||
|  | Bill Clinton,42{'\n'} | ||
|  | GeorgeW Bush,43{'\n'} | ||
|  | Barack Obama,44{'\n'} | ||
|  | Donald Trump,45{'\n'} | ||
|  | Joseph Biden,46{'\n'} | ||
|  | {'\n'} | ||
|  | Saved to `sheetjs.tmp.xlsx`{'\n'} | ||
|  | <span {...b}>import excel "sheetjs.tmp.xlsx", firstrow</span> will read the first sheet and use headers{'\n'} | ||
|  | for more help, see <span {...b}>import excel</span> | ||
|  | </pre> | ||
|  | 
 | ||
|  | 17) Close the plugin: | ||
|  | 
 | ||
|  | ```stata | ||
|  | program drop cleanfile | ||
|  | ``` | ||
|  | 
 | ||
|  | 18) Clear the current session: | ||
|  | 
 | ||
|  | ```stata | ||
|  | clear | ||
|  | ``` | ||
|  | 
 | ||
|  | <p>19) In the result of Step 16, click the link on <code><span {...b}>import | ||
|  | excel "sheetjs.tmp.xlsx", firstrow</span></code></p> | ||
|  | 
 | ||
|  | Alternatively, manually type the command: | ||
|  | 
 | ||
|  | ```stata | ||
|  | import excel "sheetjs.tmp.xlsx", firstrow | ||
|  | ``` | ||
|  | 
 | ||
|  | The output will show the import result: | ||
|  | 
 | ||
|  | <pre> | ||
|  | <b>. import excel "sheetjs.tmp.xlsx", firstrow</b>{'\n'} | ||
|  | (2 vars, 5 obs) | ||
|  | </pre> | ||
|  | 
 | ||
|  | 20) Open the Data Editor (in Browse or Edit mode) and compare to the screenshot: | ||
|  | 
 | ||
|  |  | ||
|  | 
 | ||
|  | [^1]: Run `help import excel` in Stata or see ["import excel"](https://www.stata.com/manuals/dimportexcel.pdf) in the Stata documentation. | ||
|  | [^2]: See [`read` in "Reading Files"](/docs/api/parse-options) | ||
|  | [^3]: See [`write` in "Writing Files"](/docs/api/write-options) |