docs.sheetjs.com/docz/docs/03-demos/32-extensions/41-ghidra.md

425 lines
12 KiB
Markdown
Raw Normal View History

2024-10-14 04:45:51 +00:00
---
title: Sheets in Ghidra
sidebar_label: Ghidra
pagination_prev: demos/cloud/index
pagination_next: demos/bigdata/index
sidebar_custom_props:
2025-03-27 02:49:13 +00:00
summary: Generate spreadsheets from bitfield tables extracted from programs
2024-10-14 04:45:51 +00:00
---
import current from '/version.js';
import CodeBlock from '@theme/CodeBlock';
[Ghidra](https://ghidra-sre.org/) is a software reverse engineering platform
with a robust Java-based extension system.
[SheetJS](https://sheetjs.com) is a JavaScript library for reading and writing
data from spreadsheets.
The [Complete Demo](#complete-demo) uses SheetJS to export data from a Ghidra
script. We'll create an extension that loads the [V8](/docs/demos/engines/v8)
JavaScript engine through the Ghidra.js[^1] integration and uses the SheetJS
library to export a bitfield table from Apple Numbers to a XLSX workbook.
:::note Tested Deployments
This demo was tested by SheetJS users in the following deployments:
| Architecture | Ghidra | Date |
|:-------------|:---------|:-----------|
| `darwin-arm` | `11.13.1`| 2025-03-17 |
2024-10-14 04:45:51 +00:00
:::
## Integration Details
Ghidra natively supports scripts that are run in Java. JS extension scripts
require a [JavaScript engine](/docs/demos/engines/) with Java bindings.
Ghidra.js[^1] is a Ghidra integration for [RhinoJS](/docs/demos/engines/rhino),
[GraalJS](/docs/demos/engines/graaljs) and [V8](/docs/demos/engines/v8#java).
The current version uses [the Javet V8 binding](https://www.caoccao.com/Javet).
### Loading SheetJS Scripts
The [SheetJS NodeJS module](/docs/getting-started/installation/nodejs) can be
loaded in Ghidra.js scripts using `require`:
```js title="Loading SheetJS scripts in Ghidra.js"
const XLSX = require("xlsx");
```
:::caution pass
SheetJS NodeJS modules must be installed in a folder in the Ghidra script path!
:::
### Bitfields and Sheets
Binary file formats commonly use bitfields to compactly store a set of Boolean
(true or false) flags. For example, in the XLSB file format, the `BrtRowHdr`
record[^2] encodes [row properties](/docs/csf/features/rowprops). Bit offsets
91-96 are interpreted as flags marking if a row is hidden or if it is collapsed.
#### Assembly Implementation
Functions that parse bitfields typically test each bit sequentially:
```nasm title="x86_64 sample assembly with mnemonics"
CASE_1c
41 0f ba e5 1c BT R13D,0x1c
73 69 JNC CASE_1d
;; .... Do some work here (bit offset 28)
CASE_1d
41 0f ba e5 1d BT R13D,0x1d
73 69 JNC CASE_1e
;; .... Do some work here (bit offset 29)
```
:::note pass
The assembly is approximated by the following TypeScript snippet:
```typescript title="Approximate TypeScript"
/* R13 is a 64-bit register */
declare let R13: BigInt;
/* NOTE: asm R13D is technically a live binding */
let R13D: number = Number(R13 & 0xFFFFFFFFn);
if((R13D >> 28) & 1) {
// .... Do some work here (bit offset 28)
}
if((R13D >> 29) & 1) {
// .... Do some work here (bit offset 29)
}
```
:::
#### Array of Objects
A bitmask or bit offset can be paired with a description in a JavaScript object.
For example, in the `BrtRowHdr` record, bit offset 92 indicates whether the row
is hidden (if the bit is set) or visible (if the bit is not set). The offset and
description can be stored as fields in an object:
```js title="Sample metadata for BrtRowHdr offset 92"
const metadata_92 = { Offset: 92, Description: "Hidden flag" };
```
Each object can be stored in an array:
```js title="Array of sample metadata for BrtRowHdr"
const metadata = [
{ Offset: 91, Description: "Collapsed flag" },
{ Offset: 92, Description: "Hidden flag" },
// ...
];
```
This is an ["Array of Objects"](/docs/api/utilities/array#arrays-of-objects).
The SheetJS `json_to_sheet` method[^3] can generate a SheetJS worksheet object
from the array:
```js title="Generating a worksheet from the metadata"
const ws = XLSX.utils.json_to_sheet(metadata);
```
The SheetJS `book_new` method[^4] generates a SheetJS workbook object that can
be written to the filesystem using the `writeFile` method[^5]:
```js title="Exporting the worksheet to file"
const wb = XLSX.utils.book_new(ws, "Offsets");
XLSX.utils.writeFile(wb, "SheetJSGhidra.xlsx");
```
### Java Binding
Ghidra.js exposes a number of globals for interacting with Ghidra, including:
- `currentProgram`: information about the loaded program.
- `JavaHelper`: Java helper to load classes.
Ghidra.js automatically bridges instance methods to Java method calls. It also
handles the plugin and file extension details.
#### Launching the Decompiler
`ghidra.app.decompiler.DecompInterface` is the primary Java interface to the
decompiler. In Ghidra.js, `JavaHelper.getClass` will load the class.
2025-03-27 02:49:13 +00:00
```js title="Launch decompiler process in Ghidra.js"
const DecompInterface = JavaHelper.getClass('ghidra.app.decompiler.DecompInterface');
const decompiler = new DecompInterface();
decompiler.openProgram(currentProgram);
```
<details>
<summary><b>Equivalent Java Code</b> (click to show)</summary>
The Ghidra.js snippet approximates the following Java code:
2024-10-14 04:45:51 +00:00
```java title="Launch decompiler process in Java (snippet)"
import ghidra.app.script.GhidraScript;
import ghidra.app.decompiler.DecompInterface;
import ghidra.program.model.listing.Program;
public class SheetZilla extends GhidraScript {
@Override public void run() throws Exception {
DecompInterface ifc = new DecompInterface();
boolean success = ifc.openProgram(currentProgram);
/* ... do work here ... */
}
}
```
2025-03-27 02:49:13 +00:00
</details>
2024-10-14 04:45:51 +00:00
#### Identifying a Function
The `getGlobalSymbols` method of a symbol table instance will return an array of
symbols matching the given name:
2025-03-27 02:49:13 +00:00
```js title="Find one symbol matching a given name"
2024-10-14 04:45:51 +00:00
/* name of function to find */
const fname = 'MyMethod';
/* find symbols matching the name */
// highlight-next-line
const fsymbs = currentProgram.getSymbolTable().getGlobalSymbols(fname);
/* get first result */
const fsymb = fsymbs[0];
```
The `getFunctionAt` method of a function manager instance will take an address
and return a reference to a function:
2025-03-27 02:49:13 +00:00
```js title="Fetch details about a function identified by a symbol"
2024-10-14 04:45:51 +00:00
/* get address */
const faddr = fsymb.getAddress();
/* find function */
// highlight-next-line
const fn = currentProgram.getFunctionManager().getFunctionAt(faddr);
```
#### Decompiling a Function
2025-03-27 02:49:13 +00:00
The `decompileFunction` method attempts to decompile the referenced function.
Once decompiled, the `getC` method retrieves the decompiled C code:
2024-10-14 04:45:51 +00:00
2025-03-27 02:49:13 +00:00
```js title="Decompile a function identified by a reference"
2024-10-14 04:45:51 +00:00
/* decompile function */
// highlight-next-line
const decomp = decompiler.decompileFunction(fn, 10000, null);
/* get generated C code */
const src = decomp.getDecompiledFunction().getC();
```
## Complete Demo
In this demo, we will inspect the `_TSTCellToCellStorage` method within the
`TSTables` framework of Apple Numbers 14.2. This particular method handles
serialization of cells to the NUMBERS file format.
The implementation has a number of blocks which look like the following script:
```js
if(flags >> 0x0d & 1) {
const field = "numberFormatID";
const current_value = cell[field];
// ... check if current_value is set, do other stuff
}
```
Based on the bit offset and the field name, we will generate the following row:
```js
const mask = 1 << 0x0d; // = 8192 = 0x2000
const name = "number format ID";
const row = { Mask: "0x" + mask.toString(16), "Internal Name": name };
```
Rows will be generated for each block and the final dataset will be exported.
### System Setup
0) Install Ghidra, Xcode, and Apple Numbers.
<details>
<summary><b>Installation Notes</b> (click to show)</summary>
On macOS, Ghidra was installed using Homebrew:
```bash
brew install --cask ghidra
```
</details>
1) Add the base Ghidra folder to the PATH variable. The following shell command
adds to the path for the current `zsh` or `bash` session:
```bash
export PATH="$PATH":$(dirname $(realpath `which ghidraRun`))
```
2) Install `ghidra.js` globally:
```bash
npm install -g ghidra.js
```
:::note pass
If the install fails with a permissions issue, install with the root user:
```bash
sudo npm install -g ghidra.js
```
:::
### Program Preparation
3) Create a temporary folder to hold the Ghidra project:
```bash
2025-03-27 02:49:13 +00:00
mkdir -p /tmp/sheetjs-ghidra
cd /tmp/sheetjs-ghidra
2024-10-14 04:45:51 +00:00
```
4) Copy the `TSTables` framework to the current directory:
```bash
cp /Applications/Numbers.app/Contents/Frameworks/TSTables.framework/Versions/Current/TSTables .
```
5) Create a "thin" binary by extracting the `x86_64` part of the framework:
```bash
lipo TSTables -thin x86_64 -output TSTables.macho
```
:::info pass
When this demo was last tested, the headless analyzer did not support Mach-O fat
binaries. `lipo` creates a new binary with support for one architecture.
:::
6) Analyze the program:
```bash
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -import TSTables.macho
```
<details>
2025-03-27 02:49:13 +00:00
<summary><b>Troubleshooting</b> (click to show)</summary>
**JDK Settings**
On macOS, you may see the prompt:
```
**********************************************************************
JDK 21+ (64-bit) could not be found and must be manually chosen!
**********************************************************************
Enter path to JDK home directory:
```
2025-03-27 02:49:13 +00:00
If prompted, enter the path to the JDK installation. On macOS, JDK installations
are typically stored in `/Library/Java/JavaVirtualMachines`. The required path
is the `/Contents/Home/` folder within the specific JDK installation.
2025-03-27 02:49:13 +00:00
When this demo was last tested, Zulu JDK 21 was installed with Homebrew:
2025-03-27 02:49:13 +00:00
```bash
brew install --cask zulu@21
```
2025-03-27 02:49:13 +00:00
The path to the JDK home directory was:
```
2025-03-27 02:49:13 +00:00
/Library/Java/JavaVirtualMachines/zulu-21.jdk/Contents/Home/
```
---
**Gatekeeper**
2025-03-27 02:49:13 +00:00
macOS Gatekeeper may prevent Ghidra from launching helper programs. When this
happens, macOS will show a message stating that a program was "Not opened":
![macOS apple silicon, decompile not opened notice](pathname:///ghidra/macos_apple_silicon/decompile_not_opened_notice.png)
2025-03-27 02:49:13 +00:00
An administrator account must explicitly allow the supporting programs to run:
2025-03-27 02:49:13 +00:00
1. Open "System Settings" from the Apple menu
2. In the left sidebar, scroll down and select the "Privacy & Security" tab.
3. In the main area, scroll down to the "Security" section
4. In the "Allow applications from" box, select "App Store & Known Developers"
5. Next to '"decompile" was blocked to protect your Mac', click "Allow Anyway".
![macOS apple silicon, privacy security allow application and allow decompile](pathname:///ghidra/macos_apple_silicon/privacy_security_allow_application_and_allow_decompile.png)
2025-03-27 02:49:13 +00:00
6. If macOS shows another `Open "decompile"` popup, click "Open Anyway".
![macOS apple silicon, Open "decompile" modal pop up](pathname:///ghidra/macos_apple_silicon/open_decompile_open_anyway.png)
2025-03-27 02:49:13 +00:00
7. If macOS shows a "Privacy & Security" popup, authenticate with a password by
clicking "Use Password..." and entering the password in the next popup.
![macOS apple silicon, "Privacy & Security" modal pop up](pathname:///ghidra/macos_apple_silicon/privacy_and_security_use_password.png)
</details>
2024-10-14 04:45:51 +00:00
:::note pass
This process may take a while and print a number of Java stacktraces. The errors
can be ignored.
:::
### SheetJS Integration
7) Download [`sheetjs-ghidra.js`](pathname:///ghidra/sheetjs-ghidra.js):
```bash
curl -LO https://docs.sheetjs.com/ghidra/sheetjs-ghidra.js
```
8) Install the [SheetJS NodeJS module](/docs/getting-started/installation/nodejs):
<CodeBlock language="bash">{`\
npm i --save https://cdn.sheetjs.com/xlsx-${current}/xlsx-${current}.tgz`}
</CodeBlock>
9) Run the script:
```bash
$(dirname $(realpath `which ghidraRun`))/support/analyzeHeadless /tmp/sheetjs-ghidra Numbers -process TSTables.macho -noanalysis -scriptPath `pwd` -postScript sheetjs-ghidra.js
```
2025-03-27 02:49:13 +00:00
10) Open the generated `SheetJSGhidraTSTCell.xlsx` spreadsheet with Numbers:
2024-10-14 04:45:51 +00:00
```bash
open -a "Numbers" SheetJSGhidraTSTCell.xlsx
```
2024-10-14 04:45:51 +00:00
[^1]: The project does not have a website. The [source repository](https://github.com/vaguue/Ghidra.js) is publicly available.
[^2]: `BrtRowHdr` is defined in the [`MS-XLSB` specification](/docs/miscellany/references)
[^3]: See [`json_to_sheet` in "Utilities"](/docs/api/utilities/array#array-of-objects-input)
[^4]: See [`book_new` in "Utilities"](/docs/api/utilities/wb)
[^5]: See [`writeFile` in "Writing Files"](/docs/api/write-options)