forked from sheetjs/docs.sheetjs.com
		
	github
This commit is contained in:
		
							parent
							
								
									30e51a2244
								
							
						
					
					
						commit
						4a2314409e
					
				| @ -1,40 +1,38 @@ | ||||
| --- | ||||
| title: Data in Version Control | ||||
| pagination_prev: demos/hosting/index | ||||
| title: GitHub | ||||
| pagination_prev: demos/ml | ||||
| pagination_next: solutions/input | ||||
| --- | ||||
| 
 | ||||
| Git is a popular system for organizing a historical record of source code and | ||||
| changes. Git can also store and track binary data artifacts, but data tools | ||||
| are more effective in processing data stored in plain text formats like CSV. | ||||
| 
 | ||||
| Many official data releases by governments and organizations include XLSX or | ||||
| XLS files. SheetJS trivializes the conversion to CSV. For example, in NodeJS: | ||||
| XLS files. Unfortunately some data sources do not retain older versions. | ||||
| 
 | ||||
| ```js | ||||
| const XLSX = require("xlsx"); | ||||
| Git is a popular system for organizing a historical record of source code and | ||||
| changes.  Git can also store and track binary data artifacts. | ||||
| 
 | ||||
| (async() => { | ||||
|   /* Download Data */ | ||||
|   const f = await fetch("https://docs.sheetjs.com/pres.xlsx"); | ||||
|   const data = await f.arrayBuffer(); | ||||
| GitHub is a popular host for Git repositories.  GitHub's "Flat Data" project | ||||
| explores storing and comparing versions of structured CSV and JSON data. The | ||||
| official "Excel to CSV" example uses SheetJS to generate CSV data from files: | ||||
| 
 | ||||
|   /* Parse workbook */ | ||||
|   // highlight-next-line | ||||
|   const wb = XLSX.read(data); | ||||
| 
 | ||||
|   /* Convert first worksheet to CSV */ | ||||
|   const ws = wb.Sheets[wb.SheetNames[0]]; | ||||
|   // highlight-next-line | ||||
|   const csv = XLSX.utils.sheet_to_csv(ws); | ||||
|   console.log(csv); | ||||
| })(); | ||||
| ```mermaid | ||||
| sequenceDiagram | ||||
|   autonumber | ||||
|   participant R as GH Repo | ||||
|   participant A as GH Action | ||||
|   participant S as Data Source | ||||
|   loop Regular Interval (cron) | ||||
|     A->>R: clone repo | ||||
|     R->>A: old repo | ||||
|     A->>S: fetch file | ||||
|     S->>A: spreadsheet | ||||
|     Note over A: SheetJS<br/>convert to CSV | ||||
|     alt Data changed | ||||
|       Note over A: commit new data | ||||
|       A->>R: push new commit | ||||
|     end | ||||
|   end | ||||
| ``` | ||||
| 
 | ||||
| GitHub's "Flat Data" project explores storing and comparing versions of | ||||
| structured CSV and JSON data. The official "Excel to CSV" example uses SheetJS | ||||
| under the hood to generate CSV data from an XLSX file. | ||||
| 
 | ||||
| This demo covers implementation details elided in the official write-up. | ||||
| 
 | ||||
| ## Flat Data | ||||
| @ -49,7 +47,7 @@ As a project from the company, the entire lifecycle uses GitHub offerings: | ||||
| 
 | ||||
| :::caution | ||||
| 
 | ||||
| A GitHub account is required. At the time of writing (2022 November 08), free | ||||
| A GitHub account is required. At the time of writing (2023 February 11), free | ||||
| GitHub accounts have no Actions usage limits for public repositories. | ||||
| 
 | ||||
| Using private GitHub repositories is not recommended because the Flat Viewer | ||||
| @ -79,14 +77,16 @@ The `githubocto/flat` action can be added as a step in a workflow: | ||||
|           postprocess: ./postprocess.ts | ||||
| ``` | ||||
| 
 | ||||
| The `http_url` will be fetched and saved to `downloaded_filename` in the repo. | ||||
| This action performs the following steps: | ||||
| 
 | ||||
| 1) `http_url` will be fetched and saved to `downloaded_filename` in the repo. | ||||
| This can be approximated with the following command: | ||||
| 
 | ||||
| ```bash | ||||
| curl -L -o data.xlsx https://docs.sheetjs.com/pres.xlsx | ||||
| ``` | ||||
| 
 | ||||
| After saving, the `postprocess` script will be run. When a `.ts` file is the | ||||
| 2) After saving, the `postprocess` script will be run. When a `.ts` file is the | ||||
| script, it will run the script in the Deno runtime. The `postprocess` script is | ||||
| expected to read the downloaded file and create or overwrite files in the repo. | ||||
| This can be approximated with the following command: | ||||
| @ -95,7 +95,7 @@ This can be approximated with the following command: | ||||
| deno run -A ./postprocess.ts data.xlsx | ||||
| ``` | ||||
| 
 | ||||
| The action will then compare the contents of the repo, creating a new commit if | ||||
| 3) The action will compare the contents of the repo, creating a new commit if | ||||
| the source data or artifacts from the `postprocess` script changed. | ||||
| 
 | ||||
| 
 | ||||
| @ -153,7 +153,7 @@ Deno.writeFileSync(out_file, new TextEncoder().encode(csv)); | ||||
| 
 | ||||
| :::note | ||||
| 
 | ||||
| This was tested on 2022 November 08 using the GitHub UI. | ||||
| This was tested on 2023 February 11 using the GitHub UI. | ||||
| 
 | ||||
| ::: | ||||
| 
 | ||||
| @ -81,7 +81,7 @@ run in the web browser, demos will include interactive examples. | ||||
| ### File Hosting Services | ||||
| 
 | ||||
| - [`Dropbox`](/docs/demos/hosting/dropbox) | ||||
| - [`Git`](/docs/demos/git) | ||||
| - [`GitHub`](/docs/demos/hosting/github) | ||||
| 
 | ||||
| ### Platforms and Integrations | ||||
| 
 | ||||
|  | ||||
| @ -172,6 +172,7 @@ const config = { | ||||
|         { from: '/docs/getting-started/demos/', to: '/docs/demos/' }, | ||||
|         { from: '/docs/getting-started/demos/excel', to: '/docs/demos/' }, | ||||
|         { from: '/docs/demos/content', to: '/docs/demos/static/' }, | ||||
|         { from: '/docs/demos/git', to: '/docs/demos/hosting/github/' }, | ||||
|         /* frontend */ | ||||
|         { from: '/docs/demos/angular', to: '/docs/demos/frontend/angular/' }, | ||||
|         { from: '/docs/demos/react', to: '/docs/demos/frontend/react/' }, | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user