forked from sheetjs/docs.sheetjs.com
		
	
		
			
	
	
		
			729 lines
		
	
	
		
			20 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			729 lines
		
	
	
		
			20 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|  | --- | ||
|  | title: Summary Statistics | ||
|  | sidebar_label: Summary Statistics | ||
|  | pagination_prev: demos/index | ||
|  | pagination_next: demos/frontend/index | ||
|  | --- | ||
|  | 
 | ||
|  | import current from '/version.js'; | ||
|  | import Tabs from '@theme/Tabs'; | ||
|  | import TabItem from '@theme/TabItem'; | ||
|  | import CodeBlock from '@theme/CodeBlock'; | ||
|  | 
 | ||
|  | export const bs = ({borderStyle:"none", background:"none", textAlign:"left" }); | ||
|  | 
 | ||
|  | Summary statistics help people quickly understand datasets and make informed | ||
|  | decisions. Many interesting datasets are stored in spreadsheet files. | ||
|  | 
 | ||
|  | [SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing | ||
|  | data from spreadsheets. | ||
|  | 
 | ||
|  | This demo uses SheetJS to process data in spreadsheets. We'll explore how to | ||
|  | extract spreadsheet data and how to compute simple summary statistics. This | ||
|  | demo will focus on two general data representations: | ||
|  | 
 | ||
|  | - ["Arrays of Objects"](#arrays-of-objects) simplifies processing by translating | ||
|  |   from the SheetJS data model to a more idiomatic data structure. | ||
|  | - ["Dense Worksheets"](#dense-worksheets) directly analyzes SheetJS worksheets. | ||
|  | 
 | ||
|  | :::tip pass | ||
|  | 
 | ||
|  | The [Import Tutorial](/docs/getting-started/examples/import) is a guided example | ||
|  | of extracting data from a workbook. It is strongly recommended to review the | ||
|  | tutorial first. | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | :::note Tested Deployments | ||
|  | 
 | ||
|  | This browser demo was tested in the following environments: | ||
|  | 
 | ||
|  | | Browser     | Date       | | ||
|  | |:------------|:-----------| | ||
|  | | Chrome 119  | 2024-01-06 | | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | ## Data Representations
 | ||
|  | 
 | ||
|  | Many worksheets include one header row followed by a number of data rows. Each | ||
|  | row is an "observation" and each column is a "variable". | ||
|  | 
 | ||
|  | :::info pass | ||
|  | 
 | ||
|  | The "Array of Objects" explanations use more idiomatic JavaScript patterns. It | ||
|  | is suitable for smaller datasets. | ||
|  | 
 | ||
|  | The "Dense Worksheets" approach is more performant, but the code patterns are | ||
|  | reminiscent of C. The low-level approach is only encouraged when the traditional | ||
|  | patterns are prohibitively slow. | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | ### Arrays of Objects
 | ||
|  | 
 | ||
|  | The idiomatic JavaScript representation of the dataset is an array of objects. | ||
|  | Variable names are typically taken from the first row. Those names are used as | ||
|  | keys in each observation. | ||
|  | 
 | ||
|  | <table><thead><tr><th>Spreadsheet</th><th>JS Data</th></tr></thead><tbody><tr><td> | ||
|  | 
 | ||
|  |  | ||
|  | 
 | ||
|  | </td><td> | ||
|  | 
 | ||
|  | ```js | ||
|  | [ | ||
|  |   { Name: "Bill Clinton", Index: 42 }, | ||
|  |   { Name: "GeorgeW Bush", Index: 43 }, | ||
|  |   { Name: "Barack Obama", Index: 44 }, | ||
|  |   { Name: "Donald Trump", Index: 45 }, | ||
|  |   { Name: "Joseph Biden", Index: 46 } | ||
|  | ] | ||
|  | ``` | ||
|  | 
 | ||
|  | </td></tr></tbody></table> | ||
|  | 
 | ||
|  | The SheetJS `sheet_to_json` method[^1] can generate arrays of objects from a | ||
|  | worksheet object. For example, the following snippet fetches a test file and | ||
|  | creates an array of arrays from the first sheet: | ||
|  | 
 | ||
|  | ```js | ||
|  | const url = "https://docs.sheetjs.com/typedarray/iris.xlsx"; | ||
|  | 
 | ||
|  | /* fetch file and pull file data into an ArrayBuffer */ | ||
|  | const file = await (await fetch(url)).arrayBuffer(); | ||
|  | 
 | ||
|  | /* parse workbook */ | ||
|  | const workbook = XLSX.read(file, {dense: true}); | ||
|  | 
 | ||
|  | /* first worksheet */ | ||
|  | const first_sheet = workbook.Sheets[workbook.SheetNames[0]]; | ||
|  | 
 | ||
|  | /* generate array of arrays */ | ||
|  | // highlight-next-line | ||
|  | const aoo = XLSX.utils.sheet_to_json(first_sheet); | ||
|  | ``` | ||
|  | 
 | ||
|  | ### Dense Worksheets
 | ||
|  | 
 | ||
|  | SheetJS "dense" worksheets[^2] store cells in an array of arrays. The SheetJS | ||
|  | `read` method[^3] accepts a special `dense` option to create dense worksheets. | ||
|  | 
 | ||
|  | The following example fetches a file: | ||
|  | 
 | ||
|  | ```js | ||
|  | /* fetch file and pull file data into an ArrayBuffer */ | ||
|  | const url = "https://docs.sheetjs.com/typedarray/iris.xlsx"; | ||
|  | const file = await (await fetch(url)).arrayBuffer(); | ||
|  | 
 | ||
|  | /* parse workbook */ | ||
|  | // highlight-next-line | ||
|  | const workbook = XLSX.read(file, {dense: true}); | ||
|  | 
 | ||
|  | /* first worksheet */ | ||
|  | const first_dense_sheet = workbook.Sheets[workbook.SheetNames[0]]; | ||
|  | ``` | ||
|  | 
 | ||
|  | The `"!data"` property of a dense worksheet is an array of arrays of cell | ||
|  | objects[^4]. Cell objects include attributes including data type and value. | ||
|  | 
 | ||
|  | ## Analyzing Variables
 | ||
|  | 
 | ||
|  | Individual variables can be extracted by looping through the array of objects | ||
|  | and accessing specific keys. For example, using the Iris dataset: | ||
|  | 
 | ||
|  |  | ||
|  | 
 | ||
|  | <Tabs groupId="style"> | ||
|  |   <TabItem name="aoo" value="Array of Objects"> | ||
|  | 
 | ||
|  | The following snippet shows the first entry in the array of objects: | ||
|  | 
 | ||
|  | ```js | ||
|  | { | ||
|  |   "sepal length": 5.1, | ||
|  |   "sepal width": 3.5, | ||
|  |   "petal length": 1.4, | ||
|  |   "petal width": 0.2, | ||
|  |   "class ": "Iris-setosa" | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | The values for the `sepal length` variable can be extracted by indexing each | ||
|  | object. The following snippet prints the sepal lengths: | ||
|  | 
 | ||
|  | ```js | ||
|  | for(let i = 0; i < aoo.length; ++i) { | ||
|  |   const row = aoo[i]; | ||
|  |   const sepal_length = row["sepal length"]; | ||
|  |   console.log(sepal_length); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | ```jsx live | ||
|  | function SheetJSAoOExtractColumn() { | ||
|  |   const [col, setCol] = React.useState([]); | ||
|  | 
 | ||
|  |   React.useEffect(() => { (async() => { | ||
|  |     const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer(); | ||
|  |     const wb = XLSX.read(ab, {dense: true}); | ||
|  |     const aoo = XLSX.utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]]); | ||
|  |     /* store first 5 sepal lengths in an array */ | ||
|  |     const col = []; | ||
|  |     for(let i = 0; i < aoo.length; ++i) { | ||
|  |       const row = aoo[i]; | ||
|  |       const sepal_length = row["sepal length"]; | ||
|  |       col.push(sepal_length); if(col.length >= 5) break; | ||
|  |     } | ||
|  |     setCol(col); | ||
|  |   })(); }, []); | ||
|  | 
 | ||
|  |   return ( <> | ||
|  |   <b>First 5 Sepal Length Values</b><br/> | ||
|  |   <table><tbody> | ||
|  |     {col.map(sw => (<tr><td>{sw}</td></tr>))} | ||
|  |   </tbody></table> | ||
|  |   </> | ||
|  |   ); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  |   <TabItem name="ws" value="Dense Worksheet"> | ||
|  | 
 | ||
|  | The column for the `sepal length` variable can be determined by testing the cell | ||
|  | values in the first row. | ||
|  | 
 | ||
|  | **Finding the column index for the variable** | ||
|  | 
 | ||
|  | The first row of cells will be the first row in the `"!data"` array: | ||
|  | 
 | ||
|  | ```js | ||
|  | const first_row = first_dense_sheet["!data"][0]; | ||
|  | ``` | ||
|  | 
 | ||
|  | When looping over the cells in the first row, the cell must be tested in the | ||
|  | following order: | ||
|  | 
 | ||
|  | - confirm the cell object exists (entry is not null) | ||
|  | - cell is a text cell (the `t` property will be `"s"`[^5]) | ||
|  | - cell value (`v` property[^6]) matches `"sepal length"` | ||
|  | 
 | ||
|  | ```js | ||
|  | let C = -1; | ||
|  | for(let i = 0; i < first_row.length; ++i) { | ||
|  |   let cell = first_row[i]; | ||
|  |   /* confirm cell exists */ | ||
|  |   if(!cell) continue; | ||
|  |   /* confirm cell is a text cell */ | ||
|  |   if(cell.t != "s") continue; | ||
|  |   /* compare the text */ | ||
|  |   if(cell.v.localeCompare("sepal length") != 0) continue; | ||
|  |   /* save column index */ | ||
|  |   C = i; break; | ||
|  | } | ||
|  | /* throw an error if the column cannot be found */ | ||
|  | if(C == -1) throw new Error(`"sepal length" column cannot be found! `); | ||
|  | ``` | ||
|  | 
 | ||
|  | **Finding the values for the variable** | ||
|  | 
 | ||
|  | After finding the column index, the rest of the rows can be scanned. This time, | ||
|  | the cell type will be `"n"`[^7] (numeric). The following snippet prints values: | ||
|  | 
 | ||
|  | ```js | ||
|  | const number_of_rows = first_dense_sheet["!data"].length; | ||
|  | for(let R = 1; R < number_of_rows; ++R) { | ||
|  |   /* confirm row exists */ | ||
|  |   let row = first_dense_sheet["!data"][R]; | ||
|  |   if(!row) continue; | ||
|  |   /* confirm cell exists */ | ||
|  |   let cell = row[C]; | ||
|  |   if(!cell) continue; | ||
|  |   /* confirm cell is a numeric cell */ | ||
|  |   if(cell.t != "n") continue; | ||
|  |   /* print raw value */ | ||
|  |   console.log(cell.v); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | **Live Demo** | ||
|  | 
 | ||
|  | The following snippet prints the sepal lengths: | ||
|  | 
 | ||
|  | ```js | ||
|  | for(let i = 0; i < aoo.length; ++i) { | ||
|  |   const row = aoo[i]; | ||
|  |   const sepal_length = row["sepal length"]; | ||
|  |   console.log(sepal_length); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | ```jsx live | ||
|  | function SheetJSDensExtractColumn() { | ||
|  |   const [msg, setMsg] = React.useState(""); | ||
|  |   const [col, setCol] = React.useState([]); | ||
|  | 
 | ||
|  |   React.useEffect(() => { (async() => { | ||
|  |     const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer(); | ||
|  |     const wb = XLSX.read(ab, {dense: true}); | ||
|  |     /* first worksheet */ | ||
|  |     const first_dense_sheet = wb.Sheets[wb.SheetNames[0]]; | ||
|  | 
 | ||
|  |     /* find column index */ | ||
|  |     const first_row = first_dense_sheet["!data"][0]; | ||
|  |     let C = -1; | ||
|  |     for(let i = 0; i < first_row.length; ++i) { | ||
|  |       let cell = first_row[i]; | ||
|  |       /* confirm cell exists */ | ||
|  |       if(!cell) continue; | ||
|  |       /* confirm cell is a text cell */ | ||
|  |       if(cell.t != "s") continue; | ||
|  |       /* compare the text */ | ||
|  |       if(cell.v.localeCompare("sepal length") != 0) continue; | ||
|  |       /* save column index */ | ||
|  |       C = i; break; | ||
|  |     } | ||
|  |     /* throw an error if the column cannot be found */ | ||
|  |     if(C == -1) return setMsg(`"sepal length" column cannot be found! `); | ||
|  | 
 | ||
|  |     /* store first 5 sepal lengths in an array */ | ||
|  |     const col = []; | ||
|  |     const number_of_rows = first_dense_sheet["!data"].length; | ||
|  |     for(let R = 1; R < number_of_rows; ++R) { | ||
|  |       /* confirm row exists */ | ||
|  |       let row = first_dense_sheet["!data"][R]; | ||
|  |       if(!row) continue; | ||
|  |       /* confirm cell exists */ | ||
|  |       let cell = row[C]; | ||
|  |       if(!cell) continue; | ||
|  |       /* confirm cell is a numeric cell */ | ||
|  |       if(cell.t != "n") continue; | ||
|  |       /* add raw value */ | ||
|  |       const sepal_length = cell.v; | ||
|  |       col.push(sepal_length); if(col.length >= 5) break; | ||
|  |     } | ||
|  | 
 | ||
|  |     setCol(col); | ||
|  |     setMsg("First 5 Sepal Length Values"); | ||
|  |   })(); }, []); | ||
|  | 
 | ||
|  |   return ( <><b>{msg}</b><br/><table><tbody> | ||
|  |     {col.map(sw => (<tr><td>{sw}</td></tr>))} | ||
|  |   </tbody></table></> ); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  | </Tabs> | ||
|  | 
 | ||
|  | ## Average (Mean)
 | ||
|  | 
 | ||
|  | For a given sequence of numbers $x_1\mathellipsis x_{count}$ the mean $M$ is | ||
|  | defined as the sum of the elements divided by the count: | ||
|  | 
 | ||
|  | $$ | ||
|  | M[x;count] = \frac{1}{count}\sum_{i=1}^{count} x_i | ||
|  | $$ | ||
|  | 
 | ||
|  | In JavaScript terms, the mean of an array is the sum of the numbers in the array | ||
|  | divided by the total number of numeric values. | ||
|  | 
 | ||
|  | Non-numeric elements and array holes do not affect the sum and do not contribute | ||
|  | to the count. Algorithms are expected to explicitly track the count and cannot | ||
|  | assume the array `length` property will be the correct count. | ||
|  | 
 | ||
|  | :::info pass | ||
|  | 
 | ||
|  | This definition aligns with the spreadsheet `AVERAGE` function. | ||
|  | 
 | ||
|  | `AVERAGEA` differs from `AVERAGE` in its treatment of string and Boolean values: | ||
|  | string values are treated as zeroes and Boolean values map to their coerced | ||
|  | numeric equivalent (`true` is `1` and `false` is `0`). | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | :::note JavaScript Ecosystem | ||
|  | 
 | ||
|  | Some JavaScript libraries implement functions for computing array means. | ||
|  | 
 | ||
|  | | Library                 | Implementation                                | | ||
|  | |:------------------------|:----------------------------------------------| | ||
|  | | `jStat`[^8]             | Textbook sum (divide at end)                  | | ||
|  | | `simple-statistics`[^9] | Neumaier compensated sum (divide at end)      | | ||
|  | | `stdlib.js`[^10]        | Trial mean (`mean`) / van Reeken (`incrmean`) | | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | ### Textbook Sum
 | ||
|  | 
 | ||
|  | The mean of a sequence of values can be calculated by computing the sum and | ||
|  | dividing by the count. | ||
|  | 
 | ||
|  | <Tabs groupId="style"> | ||
|  |   <TabItem name="aoo" value="Array of Objects"> | ||
|  | 
 | ||
|  | The following function accepts an array of objects and a key. | ||
|  | 
 | ||
|  | ```js | ||
|  | function aoa_average_of_key(aoo, key) { | ||
|  |   let sum = 0, cnt = 0; | ||
|  |   for(let R = 0; R < aoo.length; ++R) { | ||
|  |     const row = aoo[R]; | ||
|  |     if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |     const field = row[key]; | ||
|  |     if(typeof field != "number") continue; | ||
|  | 
 | ||
|  |     sum += field; ++cnt; | ||
|  |   } | ||
|  |   return cnt == 0 ? 0 : sum / cnt; | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | <details><summary><b>Live Demo</b> (click to show)</summary> | ||
|  | 
 | ||
|  | ```jsx live | ||
|  | function SheetJSAoOAverageKey() { | ||
|  |   const [avg, setAvg] = React.useState(NaN); | ||
|  | 
 | ||
|  |   function aoa_average_of_key(aoo, key) { | ||
|  |     let sum = 0, cnt = 0; | ||
|  |     for(let R = 0; R < aoo.length; ++R) { | ||
|  |       const row = aoo[R]; | ||
|  |       if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |       const field = row[key]; | ||
|  |       if(typeof field != "number") continue; | ||
|  | 
 | ||
|  |       sum += field; ++cnt; | ||
|  |     } | ||
|  |     return cnt == 0 ? 0 : sum / cnt; | ||
|  |   } | ||
|  | 
 | ||
|  |   React.useEffect(() => { (async() => { | ||
|  |     const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer(); | ||
|  |     const wb = XLSX.read(ab, {dense: true}); | ||
|  |     const aoo = XLSX.utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]]); | ||
|  |     setAvg(aoa_average_of_key(aoo, "sepal length")); | ||
|  |   })(); }, []); | ||
|  | 
 | ||
|  |   return ( <b>The average Sepal Length is {avg}</b> ); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | </details> | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  |   <TabItem name="ws" value="Dense Worksheet"> | ||
|  | 
 | ||
|  | The following function accepts a SheetJS worksheet and a column index. | ||
|  | 
 | ||
|  | ```js | ||
|  | function ws_average_of_col(ws, C) { | ||
|  |   const data = ws["!data"]; | ||
|  |   let sum = 0, cnt = 0; | ||
|  |   for(let R = 1; R < data.length; ++R) { | ||
|  |     const row = data[R]; | ||
|  |     if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |     const field = row[C]; | ||
|  |     if(!field || field.t != "n") continue; | ||
|  | 
 | ||
|  |     sum += field.v; ++cnt; | ||
|  |   } | ||
|  |   return cnt == 0 ? 0 : sum / cnt; | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | <details><summary><b>Live Demo</b> (click to show)</summary> | ||
|  | 
 | ||
|  | ```jsx live | ||
|  | function SheetJSDenseAverageKey() { | ||
|  |   const [avg, setAvg] = React.useState(NaN); | ||
|  | 
 | ||
|  |   function ws_average_of_col(ws, C) { | ||
|  |     const data = ws["!data"]; | ||
|  |     let sum = 0, cnt = 0; | ||
|  |     for(let R = 1; R < data.length; ++R) { | ||
|  |       const row = data[R]; | ||
|  |       if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |       const field = row[C]; | ||
|  |       if(!field || field.t != "n") continue; | ||
|  | 
 | ||
|  |       sum += field.v; ++cnt; | ||
|  |     } | ||
|  |     return cnt == 0 ? 0 : sum / cnt; | ||
|  |   } | ||
|  | 
 | ||
|  |   React.useEffect(() => { (async() => { | ||
|  |     const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer(); | ||
|  |     const wb = XLSX.read(ab, {dense: true}); | ||
|  |     const ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  | 
 | ||
|  |     /* find column index */ | ||
|  |     const first_row = ws["!data"][0]; | ||
|  |     let C = -1; | ||
|  |     for(let i = 0; i < first_row.length; ++i) { | ||
|  |       let cell = first_row[i]; | ||
|  |       /* confirm cell exists */ | ||
|  |       if(!cell) continue; | ||
|  |       /* confirm cell is a text cell */ | ||
|  |       if(cell.t != "s") continue; | ||
|  |       /* compare the text */ | ||
|  |       if(cell.v.localeCompare("sepal length") != 0) continue; | ||
|  |       /* save column index */ | ||
|  |       C = i; break; | ||
|  |     } | ||
|  | 
 | ||
|  |     setAvg(ws_average_of_col(ws, C)); | ||
|  |   })(); }, []); | ||
|  | 
 | ||
|  |   return ( <b>The average Sepal Length is {avg}</b> ); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | </details> | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  | </Tabs> | ||
|  | 
 | ||
|  | :::caution pass | ||
|  | 
 | ||
|  | The textbook method suffers from numerical issues when many values of similar | ||
|  | magnitude are summed. As the number of elements grows, the absolute value of the | ||
|  | sum grows to orders of magnitude larger than the absolute values of the | ||
|  | individual values and significant figures are lost. | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | ### van Reeken
 | ||
|  | 
 | ||
|  | Some of the issues in the textbook approach can be addressed with a differential | ||
|  | technique. Instead of computing the whole sum, it is possible to calculate and | ||
|  | update an estimate for the mean. | ||
|  | 
 | ||
|  | The van Reeken array mean can be implemented in one line of JavaScript code: | ||
|  | 
 | ||
|  | ```js | ||
|  | for(var n = 1, mean = 0; n <= x.length; ++n) mean += (x[n-1] - mean)/n; | ||
|  | ``` | ||
|  | 
 | ||
|  | <details><summary><b>Math details</b> (click to show)</summary> | ||
|  | 
 | ||
|  | Let $M[x;m] = \frac{1}{m}\sum_{i=1}^{m}x_m$ be the mean of the first $m$ elements. Then: | ||
|  | 
 | ||
|  | <table style={bs}><tbody style={bs}><tr style={bs}><td style={bs}> | ||
|  | 
 | ||
|  | $M[x;m+1]$ | ||
|  | 
 | ||
|  | </td><td style={bs}> | ||
|  | 
 | ||
|  | $= \frac{1}{m+1}\sum_{i=1}^{m+1} x_i$ | ||
|  | 
 | ||
|  | </td></tr><tr style={bs}><td style={bs}> </td><td style={bs}> | ||
|  | 
 | ||
|  | $= \frac{1}{m+1}\sum_{i=1}^{m} x_i + \frac{x_{m+1}}{m+1}$ | ||
|  | 
 | ||
|  | </td></tr><tr style={bs}><td style={bs}> </td><td style={bs}> | ||
|  | 
 | ||
|  | $= \frac{m}{m+1}(\frac{1}{m}\sum_{i=1}^{m} x_i) + \frac{x_{m+1}}{m+1}$ | ||
|  | 
 | ||
|  | </td></tr><tr style={bs}><td style={bs}> </td><td style={bs}> | ||
|  | 
 | ||
|  | $= \frac{m}{m+1}M[x;m] + \frac{x_{m+1}}{m+1}$ | ||
|  | 
 | ||
|  | </td></tr><tr style={bs}><td style={bs}> </td><td style={bs}> | ||
|  | 
 | ||
|  | $= (1 - \frac{1}{m+1})M[x;m] + \frac{x_{m+1}}{m+1}$ | ||
|  | 
 | ||
|  | </td></tr><tr style={bs}><td style={bs}> </td><td style={bs}> | ||
|  | 
 | ||
|  | $= M[x;m] + \frac{x_{m+1}}{m+1} - \frac{1}{m+1}M[x;m]$ | ||
|  | 
 | ||
|  | </td></tr><tr style={bs}><td style={bs}> </td><td style={bs}> | ||
|  | 
 | ||
|  | $= M[x;m] + \frac{1}{m+1}(x_{m+1}-M[x;m])$ | ||
|  | 
 | ||
|  | </td></tr><tr style={bs}><td style={bs}> | ||
|  | 
 | ||
|  | $new\_mean$ | ||
|  | 
 | ||
|  | </td><td style={bs}> | ||
|  | 
 | ||
|  | $= old\_mean + (x_{m+1}-old\_mean) / (m+1)$ | ||
|  | 
 | ||
|  | </td></tr></tbody></table> | ||
|  | 
 | ||
|  | Switching to zero-based indexing, the relation matches the following expression: | ||
|  | 
 | ||
|  | ```js | ||
|  | new_mean = old_mean + (x[m] - old_mean) / (m + 1); | ||
|  | ``` | ||
|  | 
 | ||
|  | This update can be succinctly implemented in JavaScript: | ||
|  | 
 | ||
|  | ```js | ||
|  | mean += (x[m] - mean) / (m + 1); | ||
|  | ``` | ||
|  | 
 | ||
|  | </details> | ||
|  | 
 | ||
|  | <Tabs groupId="style"> | ||
|  |   <TabItem name="aoo" value="Array of Objects"> | ||
|  | 
 | ||
|  | The following function accepts an array of objects and a key. | ||
|  | 
 | ||
|  | ```js | ||
|  | function aoa_mean_of_key(aoo, key) { | ||
|  |   let mean = 0, cnt = 0; | ||
|  |   for(let R = 0; R < aoo.length; ++R) { | ||
|  |     const row = aoo[R]; | ||
|  |     if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |     const field = row[key]; | ||
|  |     if(typeof field != "number") continue; | ||
|  | 
 | ||
|  |     mean += (field - mean) / ++cnt; | ||
|  |   } | ||
|  |   return cnt == 0 ? 0 : mean; | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | <details><summary><b>Live Demo</b> (click to show)</summary> | ||
|  | 
 | ||
|  | ```jsx live | ||
|  | function SheetJSAoOMeanKey() { | ||
|  |   const [avg, setAvg] = React.useState(NaN); | ||
|  | 
 | ||
|  |   function aoa_mean_of_key(aoo, key) { | ||
|  |     let mean = 0, cnt = 0; | ||
|  |     for(let R = 0; R < aoo.length; ++R) { | ||
|  |       const row = aoo[R]; | ||
|  |       if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |       const field = row[key]; | ||
|  |       if(typeof field != "number") continue; | ||
|  | 
 | ||
|  |       mean += (field - mean) / ++cnt; | ||
|  |     } | ||
|  |     return cnt == 0 ? 0 : mean; | ||
|  |   } | ||
|  | 
 | ||
|  |   React.useEffect(() => { (async() => { | ||
|  |     const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer(); | ||
|  |     const wb = XLSX.read(ab, {dense: true}); | ||
|  |     const aoo = XLSX.utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]]); | ||
|  |     setAvg(aoa_mean_of_key(aoo, "sepal length")); | ||
|  |   })(); }, []); | ||
|  | 
 | ||
|  |   return ( <b>The average Sepal Length is {avg}</b> ); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | </details> | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  |   <TabItem name="ws" value="Dense Worksheet"> | ||
|  | 
 | ||
|  | The following function accepts a SheetJS worksheet and a column index. | ||
|  | 
 | ||
|  | ```js | ||
|  | function ws_mean_of_col(ws, C) { | ||
|  |   const data = ws["!data"]; | ||
|  |   let mean = 0, cnt = 0; | ||
|  |   for(let R = 1; R < data.length; ++R) { | ||
|  |     const row = data[R]; | ||
|  |     if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |     const field = row[C]; | ||
|  |     if(!field || field.t != "n") continue; | ||
|  | 
 | ||
|  |     mean += (field.v - mean) / ++cnt; | ||
|  |   } | ||
|  |   return cnt == 0 ? 0 : mean; | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | <details><summary><b>Live Demo</b> (click to show)</summary> | ||
|  | 
 | ||
|  | ```jsx live | ||
|  | function SheetJSDenseMeanKey() { | ||
|  |   const [avg, setAvg] = React.useState(NaN); | ||
|  | 
 | ||
|  |   function ws_mean_of_col(ws, C) { | ||
|  |     const data = ws["!data"]; | ||
|  |     let mean = 0, cnt = 0; | ||
|  |     for(let R = 1; R < data.length; ++R) { | ||
|  |       const row = data[R]; | ||
|  |       if(typeof row == "undefined") continue; | ||
|  | 
 | ||
|  |       const field = row[C]; | ||
|  |       if(!field || field.t != "n") continue; | ||
|  | 
 | ||
|  |       mean += (field.v - mean) / ++cnt; | ||
|  |     } | ||
|  |     return cnt == 0 ? 0 : mean; | ||
|  |   } | ||
|  | 
 | ||
|  |   React.useEffect(() => { (async() => { | ||
|  |     const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer(); | ||
|  |     const wb = XLSX.read(ab, {dense: true}); | ||
|  |     const ws = wb.Sheets[wb.SheetNames[0]]; | ||
|  | 
 | ||
|  |     /* find column index */ | ||
|  |     const first_row = ws["!data"][0]; | ||
|  |     let C = -1; | ||
|  |     for(let i = 0; i < first_row.length; ++i) { | ||
|  |       let cell = first_row[i]; | ||
|  |       /* confirm cell exists */ | ||
|  |       if(!cell) continue; | ||
|  |       /* confirm cell is a text cell */ | ||
|  |       if(cell.t != "s") continue; | ||
|  |       /* compare the text */ | ||
|  |       if(cell.v.localeCompare("sepal length") != 0) continue; | ||
|  |       /* save column index */ | ||
|  |       C = i; break; | ||
|  |     } | ||
|  | 
 | ||
|  |     setAvg(ws_mean_of_col(ws, C)); | ||
|  |   })(); }, []); | ||
|  | 
 | ||
|  |   return ( <b>The average Sepal Length is {avg}</b> ); | ||
|  | } | ||
|  | ``` | ||
|  | 
 | ||
|  | </details> | ||
|  | 
 | ||
|  |   </TabItem> | ||
|  | </Tabs> | ||
|  | 
 | ||
|  | :::note Historical Context | ||
|  | 
 | ||
|  | This algorithm is generally attributed to Welford[^11]. However, the original | ||
|  | paper does not propose this algorithm for calculating the mean! | ||
|  | 
 | ||
|  | Programmers including Neely[^12] attributed a different algorithm to Welford. | ||
|  | van Reeken[^13] reported success with the algorithm presented in this section. | ||
|  | 
 | ||
|  | Knuth[^14] erroneously attributed this implementation of the mean to Welford. | ||
|  | 
 | ||
|  | ::: | ||
|  | 
 | ||
|  | [^1]: See [`sheet_to_json` in "Utilities"](/docs/api/utilities/array#array-output) | ||
|  | [^2]: See ["Dense Mode" in "Utilities"](/docs/csf/sheet#dense-mode) | ||
|  | [^3]: See [`read` in "Reading Files"](/docs/api/parse-options) | ||
|  | [^4]: See ["Dense Mode" in "Utilities"](/docs/csf/sheet#dense-mode) | ||
|  | [^5]: See ["Cell Types" in "Cell Objects"](/docs/csf/cell#cell-types) | ||
|  | [^6]: See ["Underlying Values" in "Cell Objects"](/docs/csf/cell#underlying-values) | ||
|  | [^7]: See ["Cell Types" in "Cell Objects"](/docs/csf/cell#cell-types) | ||
|  | [^8]: See [`mean()`](https://jstat.github.io/all.html#mean) in the `jStat` documentation. | ||
|  | [^9]: See [`mean`](http://simple-statistics.github.io/docs/#mean) in the `simple-statistics` documentation. | ||
|  | [^10]: See [`incrsum`](https://stdlib.io/docs/api/latest/@stdlib/stats/incr/sum) in the `stdlib.js` documentation. | ||
|  | [^11]: See "Note on a Method for Calculated Corrected Sums of Squares and Products" in Technometrics Vol 4 No 3 (1962 August). | ||
|  | [^12]: See "Comparison of Several Algorithms for Computation of Means, Standard Deviations and Correlation Coefficients" in CACM Vol 9 No 7 (1966 July). | ||
|  | [^13]: See "Dealing with Neely's Algorithms" in CACM Vol 11 No 3 (1968 March). | ||
|  | [^14]: See "The Art of Computer Programming: Seminumerical Algorithms" Third Edition page 232. |