GP-3980 bsim tutorial
@ -20,7 +20,7 @@
|
||||
# below (POSTGRES_CONFIG_OPTIONS) may be adjusted if required
|
||||
# (e.g., build without openssl use, etc.).
|
||||
#
|
||||
# See https://www.postgresql.org/docs/10/install-procedure.html
|
||||
# See https://www.postgresql.org/docs/15/install-procedure.html
|
||||
# for supported postgresql config options.
|
||||
#
|
||||
# Additional packages may need to be installed include to perform the
|
||||
|
@ -2,21 +2,22 @@
|
||||
|
||||
The ``bsim`` command-line utility, located in the ``support`` directory of a Ghidra distribution, is used to create, populate, and manage BSim databases.
|
||||
It works for all BSim database backends.
|
||||
|
||||
This utility offers a number of commands, many of which have several options.
|
||||
In this section, we cover only a small subset of the possibilities.
|
||||
|
||||
Note that running ``bsim`` with no arguments will print a detailed usage message.
|
||||
Running ``bsim`` with no arguments will print a detailed usage message.
|
||||
|
||||
## Generating Signature Files
|
||||
|
||||
The first step is to create signature files from the binaries in the Ghidra project.
|
||||
Signature files are XML files which contain the BSim vectors and other metadata needed by the BSim server.
|
||||
|
||||
**Important**: If you have the ``postgres_object_files`` project open in Ghidra, close it now.
|
||||
Non-shared projects are locked when open, and the lock will prevent the signature-generating process from accessing the project.
|
||||
**Important**: It's simplest to exit Ghidra before performing the next steps, because:
|
||||
- The H2-backed database can only be accessed by one process at a time.
|
||||
- In case you have the ``postgres_object_files`` project open in Ghidra, signature generation will fail.
|
||||
Non-shared projects are locked when open, and the lock will prevent the signature-generating process from accessing the project.
|
||||
|
||||
To generate the signature files, execute the following commands in a shell (adjust as necessary for Windows)
|
||||
To generate the signature files, execute the following commands in a shell (adjust as necessary for Windows).
|
||||
|
||||
```bash
|
||||
cd <ghidra_install_dir>/support
|
||||
@ -37,6 +38,8 @@ Now, we commit the signatures to the BSim database with the following command (s
|
||||
./bsim commitsigs file:/<database_dir>/example ~/bsim_sigs
|
||||
```
|
||||
|
||||
Once the signatures have been committed, start Ghidra again.
|
||||
|
||||
## Aside: Creating a Database
|
||||
|
||||
We continue to use the database ``example``, so this step isn't necessary for the exercises.
|
||||
@ -64,10 +67,12 @@ For example, you could restrict a BSim query to search only in executables of th
|
||||
Executable categories in BSim are implemented using *program properties*, and function tags in BSim correspond to function tags in Ghidra. Properties and tags both have uses in Ghidra which are independent of BSim.
|
||||
So, if we want a BSim database to record a particular category or tag, we must indicate that explicitly.
|
||||
|
||||
For example, to inform the database that we wish to record the ``ORIGIN`` category, you would execute the command
|
||||
For example, to inform the database that we wish to record the ORIGIN category, you would execute the command
|
||||
|
||||
```bash
|
||||
./bsim addexecategory file:/<database_dir>/example ORIGIN
|
||||
```
|
||||
|
||||
Next Section: [Evaluating_Matches](BSimTutorial_Evaluating_Matches.md)
|
||||
Executable categories can be added to a program using the script ``SetExecutableCategoryScript.java``.
|
||||
|
||||
Next Section: [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md)
|
||||
|
@ -22,7 +22,7 @@ There are a number of ways to initiate a BSim query, including:
|
||||
|
||||
- **BSim -> Search Functions...** from the Code Browser.
|
||||
- Right-click in the Listing and select **BSim -> Search Functions...**
|
||||
- Click on the BSim icon in the toolbar.
|
||||
- Click on the BSim icon ![BSim toolbar icon](images/preferences-web-browser-shortcuts.png) in the Code Browser toolbar.
|
||||
|
||||
For these cases, the function(s) being queried depend on the current selection.
|
||||
If there is no selection, the function containing the current address is queried.
|
||||
@ -44,7 +44,7 @@ From the BSim Search Dialog, you can
|
||||
- Bound the number of results returned for each function.
|
||||
- Set query filters.
|
||||
|
||||
![](./images/bsim_search_dialog.png)
|
||||
![](images/bsim_search_dialog.png)
|
||||
|
||||
#### Selecting a BSim Database
|
||||
|
||||
@ -66,7 +66,7 @@ The respective fields in the dialog set lower bounds for these values for the ma
|
||||
- Sharing rare features contributes more to this score than sharing common features.
|
||||
- There is no upper bound for confidence when considered over all pairs of vectors.
|
||||
However, if you fix a vector *v*, the greatest possible confidence score for a comparison involving *v* occurs when *v* is compared to itself.
|
||||
The resulting confidence value is called the **self significance** of *v*.
|
||||
The resulting confidence value is called the **self-significance** of *v*.
|
||||
|
||||
Confidence is used to judge the significance of a match.
|
||||
For example, many executables contain a function which simply returns a constant value.
|
||||
@ -79,19 +79,18 @@ The results of a BSim query can be sorted by the similarity and/or confidence of
|
||||
The **Matches per Function** bound controls the number of results returned for a single function.
|
||||
Note that in large collections, certain small or common functions might have substantial numbers of identical matches.
|
||||
|
||||
Filters are discussed in [BSim Filters](BSimTutorial_Filters.md).
|
||||
|
||||
#### Performing the Query
|
||||
|
||||
Click the **Search** button in the dialog to perform a query.
|
||||
|
||||
**Notes**:
|
||||
|
||||
1. Filters are discussed in [BSim Filters](BSimTutorial_Filters.md).
|
||||
1. After successfully issuing a query, you will also see a **Search Function(s)** action (without the ellipsis) in certain contexts.
|
||||
After successfully issuing a query, you will also see a **Search Function(s)** action (without the ellipsis) in certain contexts.
|
||||
This will perform a BSim query on the selected functions using the same parameters as the last query (skipping the BSim Seach Dialog).
|
||||
|
||||
## Exercises:
|
||||
## Exercises
|
||||
|
||||
The database `example` contains vectors from a Linux executable used by Ghidra's GNU demangler.
|
||||
The database ``example`` contains vectors from a Linux executable used by Ghidra's GNU demangler.
|
||||
Ghidra ships with several other versions of this executable.
|
||||
We use these different versions to demonstrate some of the capabilities of BSim.
|
||||
|
||||
@ -105,22 +104,19 @@ We use these different versions to demonstrate some of the capabilities of BSim.
|
||||
- Note that the function names **are** present in ``demangler_gnu_v2_41``.
|
||||
1. Using the default query options, query `example` for matches to the function at ``140006760``.
|
||||
1. You should see the following search results:
|
||||
![results](./images/basic_query.png)
|
||||
![results](images/basic_query.png)
|
||||
- In this case, there is exactly one match, the similarity is 1.0, and the matching function has a non-default name (it won't always be this easy).
|
||||
- **Note**: The results window has two tables: the function-level results (upper table) and the executable-level results (lower table).
|
||||
The executable-level results are covered in [Executable-level Results](BSimTutorial_Exe_Results.md)
|
||||
1. Right-click on the row of a match to see the available actions:
|
||||
![actions](./images/actions.png)
|
||||
1. Select the **Compare Functions** action to bring up the side-by-side comparison.
|
||||
- The results window has two tables: the function-level results (upper table) and the executable-level results (lower table).
|
||||
The executable-level results are covered in [From Matching Functions to Matching Executables](BSimTutorial_Exe_Results.md).
|
||||
1. Right-click on the row of the match and select the **Compare Functions** action to bring up the side-by-side comparison.
|
||||
- The **Listing View** tab shows the disassembly.
|
||||
- The **Decompiler Diff View** tab shows the decompiled code.
|
||||
- Differences in the code are automatically highlighted in blue.
|
||||
- Either view can be toggled between a horizontal split and a vertical split using the drop-down menu.
|
||||
- **Note**: We cover the Decompiler Diff View in greater detail in [Applying Function Signatures](BSimTutorial_Applying_Function_Signatures.md)
|
||||
- Either view can be toggled between a horizontal split and a vertical split using the drop-down menu.
|
||||
1. Examine the diff views to verify that the match is valid.
|
||||
1. Using the `Apply Function Names and Namespaces` action, transfer the name from the search result to the queried function.
|
||||
1. Using the **Apply Name** action, apply the name from the search result to the queried function.
|
||||
|
||||
TODO: explain why there are different apply actions
|
||||
**Note**: We cover the Decompiler Diff View in greater detail and discuss the various "Apply" actions in [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md).
|
||||
|
||||
### Exercise 2: Changes to the Source Code
|
||||
|
||||
@ -128,7 +124,7 @@ TODO: explain why there are different apply actions
|
||||
- This executable is based on an earlier version of the source code than the executable in ``example``.
|
||||
1. Navigate to the function ``expandargv`` in ``demangler_gnu_v2_24`` and issue a BSim query.
|
||||
1. What differences do you see in the decompiled code?
|
||||
<details><summary>In demangler_gnu_v2_41...</summary> Answer: The call to dupargv is now in an if clause (and decompiler creates a related local variable) and there are two additional calls to free. </details>
|
||||
<details><summary>In demangler_gnu_v2_41...</summary> The main differences are that call to dupargv is now in an if clause (and decompiler creates a related local variable) and there are two additional calls to free. </details>
|
||||
1. The relevant source files are included with the Ghidra distribution:
|
||||
- ``<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler_gnu_v2_24/c/argv.c``
|
||||
- ``<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler/gnu_v2_41/c/argv.c``
|
||||
@ -140,9 +136,10 @@ TODO: explain why there are different apply actions
|
||||
``<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41``.
|
||||
- This executable is based on the same source code as the executable in `example` but compiled for a different architecture.
|
||||
- **Note**: this file has the same name as the one used to populate the BSim database, so you will have to give the resulting Ghidra program a different name or import it into a different directory in your Ghidra project.
|
||||
1. Navigate to ``_expandargv`` and issue a BSim query. What differences do you see regarding ``memmove`` and ``memcpy``?
|
||||
<details><summary>In the arm64 version...</summary> Answer: In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
|
||||
1. Examine the ``Listing View`` tab and verify that the architectures are different.
|
||||
1. Navigate to ``_expandargv`` and issue a BSim query.
|
||||
In the decompiler diff view, what differences do you see regarding ``memmove`` and ``memcpy``?
|
||||
<details><summary>In the arm64 version...</summary> In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
|
||||
1. Examine the **Listing View** tab and verify that the architectures are different.
|
||||
|
||||
|
||||
## A Remark on Query Thresholds and Indices
|
||||
|
@ -10,10 +10,10 @@ Next, perform the following steps from the Ghidra Code Browser:
|
||||
|
||||
1. Run the Ghidra script ``CreateH2BSimDatabaseScript.java``.
|
||||
1. In the resulting dialog:
|
||||
1. Enter "example" in the `Database Name` field.
|
||||
1. Select the new directory in the `Database Directory` field.
|
||||
1. Enter "example" in the **Database Name** field.
|
||||
1. Select the new directory in the **Database Directory** field.
|
||||
1. Don't change any of the other fields.
|
||||
1. Click OK.
|
||||
1. Click **OK**.
|
||||
|
||||
## Populating the Database
|
||||
|
||||
|
@ -12,7 +12,7 @@ To enable BSim, perform the following steps:
|
||||
1. Click on the ``Configure`` link of the ``BSim`` entry.
|
||||
1. In the resulting dialog, ensure that the checkbox for ``BSimSearchPlugin`` is checked.
|
||||
|
||||
![](./images/configure.png)
|
||||
![](images/configure.png)
|
||||
|
||||
Next Section: [Creating and Populating a BSim Database from the GUI](BSimTutorial_Creating_Database_From_GUI.md)
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Evaluating Matches and Transferring Information
|
||||
# Evaluating Matches and Applying Information
|
||||
|
||||
Summarizing what we've created over the last few sections, we now have:
|
||||
1. A stripped executable (``postgres``).
|
||||
@ -23,16 +23,18 @@ The corresponding function in `postgres` should have a default name.
|
||||
1. Examine this match in the side-by-side decompiler view.
|
||||
Note that the matching function has better data type information due to the debug information.
|
||||
1. Q: Why does the placement of the `double` argument between the functions?
|
||||
<details><summary>Answer</summary> Floating point values and integer/pointer values are passed in separate sets registers.
|
||||
Neither ordering is wrong since both are consistent with the instructions of the function.
|
||||
The debug info records a specific signature (and ordering) for the function, which Ghidra applies.
|
||||
In the version without debug information, the decompiler used heuristics to determine the function's signature.</details>
|
||||
<details><summary>Answer</summary> Floating point values and integer/pointer values are passed in separate sets registers.
|
||||
Neither ordering is wrong since both are consistent with the instructions of the function.
|
||||
The debug info records a specific signature (and ordering) for the function, which Ghidra applies.
|
||||
In the version without debug information, the decompiler used heuristics to determine the function's signature.</details>
|
||||
|
||||
For matches with a fair number of differences, the decompiler diff panel can get pretty colorful.
|
||||
Furthermore, as you click around, tokens will gain and lose highlight of various colors.
|
||||
Furthermore, as you click around, tokens will gain and lose highlights of various colors.
|
||||
It's worth giving a brief explanation of when highlighting happens and what the different colors mean.
|
||||
Some terminology: if you click on a token in a decompiler panel, that token becomes the *focused token*.
|
||||
|
||||
![Decomp Diff Window](images/decomp_diff.png)
|
||||
|
||||
The colors:
|
||||
|
||||
- Blue is used to highlight differences between the two functions.
|
||||
@ -43,36 +45,70 @@ Certain tokens, such as whitespace tokens or tokens used in variable declaration
|
||||
|
||||
## Exercise: Locking and Unlocking Scrolling
|
||||
|
||||
By default, scrolling in the diff window is synchronized.
|
||||
This means that scrolling within one window will also scroll within the other window.
|
||||
In the decompiler diff window, scrolling works by matching one line in the left function with one line in the right function.
|
||||
The two functions are aligned using those lines.
|
||||
Initially, the functions are aligned using the functions' signatures.
|
||||
|
||||
Before moving on, experiment with locking and unlocking scrolling.
|
||||
As you click around in either function, the "aligning lines" will change.
|
||||
If the focused token has a match, the scrolling is re-centered based on the lines containing the matched tokens.
|
||||
If the focused token does not have a match, the functions will be aligned using the closest token to the focused token which does have a match.
|
||||
|
||||
Synchronized scrolling can be toggled using the ![lock icon](images/lock.gif) and ![unlock icon](images/unlock.gif) icons in the toolbar.
|
||||
|
||||
Exercise:
|
||||
|
||||
1. Experiment with locking and unlocking synchronized scrolling.
|
||||
|
||||
## Exercise: Applying Signatures
|
||||
|
||||
If you are satisified with a given match, you might want to apply information about the match to the queried function.
|
||||
For example, you might want to apply the name or signature of the function.
|
||||
There are some subtleties which determine how much information is safe to apply.
|
||||
Hence there are three actions available under the **Apply From Other** menu when you right-click in the left panel:
|
||||
|
||||
1. **Function Name** will apply the function's name (and namespace) to the function on the left.
|
||||
1. **Function Signature** will apply the name and namespace and "skeleton" data types.
|
||||
Structure and union data types are not transferred.
|
||||
Instead, empty placeholder structures are created.
|
||||
1. **Function Signature and Data Types** will apply the name and signature with full data types.
|
||||
This may result in many data types being imported into the program (e.g., structures which refer to other structures).
|
||||
|
||||
**Warning**: You should be absolutely certain that the datatypes are the exactly the same before applying signatures and data types.
|
||||
If there have been any changes to a datatype's definition, you could end up bringing incorrect datatypes into a program, even using BSim matches with 1.0 similarity.
|
||||
Applying full data types is also problematic for cross-architecture matches.
|
||||
|
||||
Exercise:
|
||||
|
||||
1. Since we know it's safe, apply the function signature and data types to the left function.
|
||||
|
||||
There are similarly-named actions available on rows of Function Matches table in the BSim Search Results window.
|
||||
The **Status** column contains information about which rows have had their matches applied.
|
||||
|
||||
## Exercise: Comparing Callees
|
||||
|
||||
The token matching algorithm matches a function call in one program to a function call in another by considering the data flow into and out of the ``CALL`` instruction, but it does not do anything with the bodies of the callees.
|
||||
However, given a matched pair of calls, you can bring up a new comparison window and compare their bodies manually.
|
||||
However, given a matched pair of calls, you can bring up a new comparison window for the callees with the **Compare Matching Callees** action.
|
||||
|
||||
Ctrl f in left view
|
||||
FUN_
|
||||
find something
|
||||
1. Click in the left panel of the decompile diff window and press ``Ctrl-F``.
|
||||
1. Enter ``FUN_`` and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name.
|
||||
1. Right-click on one of the matched tokens and select the **Compare Matching Callees** action.
|
||||
1. In the comparison of the callees, apply the function signature and data types from the right function to the left function.
|
||||
Verify that the update is reflected in the decompiler diff view of the callers.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Exercise: Transferring Signatures
|
||||
|
||||
1. Transfer the signatures to the queried function via either:
|
||||
- The `Apply Function Signature to Other Side` action in the diff window.
|
||||
- The `Apply Function Names, Namespaces, and Signatures` action in the BSim Search Results window.
|
||||
|
||||
**Warning**: You should be absolutely certain that the datatypes are the same before applying signatures.
|
||||
If there have been any changes to a datatype's definition, you could end up bringing incorrect datatypes into a program, even using BSim matches with 1.0 similarity.
|
||||
|
||||
# Exercise: Multiple Comparisons
|
||||
|
||||
The function shown in a panel is controlled by a drop-down menu at the top of the panel.
|
||||
This can be useful when you'd like to evaluate multiple matches to a single function.
|
||||
|
||||
Exercise:
|
||||
|
||||
|
||||
1. In the BSim Search Results window, right-click on a table column name, select **Add/Remove Columns**, and enable the **Matches** column.
|
||||
1. Find two functions in ``postgres``, each of which has exactly two matches.
|
||||
Select the corresponding four rows in the matches table and perform the **Compare Functions** action.
|
||||
1. Experiment with the drop-downs in each panel.
|
||||
|
||||
In the next section, we discuss the Executable Results table.
|
||||
|
||||
|
@ -1,37 +1,37 @@
|
||||
# From Matching Functions to Matching Executables
|
||||
|
||||
In this section, we discuss the Executable results table.
|
||||
In this section, we discuss the Executable Results table.
|
||||
Each row of this table corresponds to one executable in the database.
|
||||
The information in one row is an aggregation of all of the function-level matches into that row's executable.
|
||||
Your Executable Results table from the previous query should look similar to the following:
|
||||
|
||||
Using the results window from the previous query, sort the Executable results table
|
||||
by "Function Count" (i.e., the number of results which are in a given executable). You should see the following:
|
||||
|
||||
![](./images/exe_results.png)
|
||||
![executable results](images/exe_results.png)
|
||||
|
||||
If you select a single row in the table and right-click on it, you will see the following actions:
|
||||
|
||||
![](./images/exe_results_actions.png)
|
||||
|
||||
- **Load Executable** will open a read-only copy of the program in the Code Browser.
|
||||
- **Filter on this Executable** applies a filter which restricts the matches shown in the Function Matches table to matches which occur in the given executable.
|
||||
- **Load Executable**
|
||||
Opens a read-only copy of the program in the Code Browser.
|
||||
- **Filter on this Executable**
|
||||
Applies a filter which restricts the matches shown in the Function Matches table to matches which occur in the given executable.
|
||||
|
||||
## Exercise
|
||||
|
||||
1. If you haven't already, sort the Executable results by descending Function Count.
|
||||
What position is `demangler_gnu_v2_33_1`?
|
||||
- <details><summary>A:</summary> 7 </details>
|
||||
1. The Confidence column shows the sum of the confidence scores of all matches into each executable. Sort the Executable results by descending confidence and observe that `demangler_gnu_v2_33_1` is now much further down the list.
|
||||
- <details><summary>What could explain this?</summary>
|
||||
If there are many function matches but the sum of all the confidences is relatively low,
|
||||
it is likely that many of the matches involve small functions. For such a match, it is
|
||||
more likely that the functions agree by chance rather than being derived from the same
|
||||
source code.
|
||||
</details>
|
||||
1. In the Executable match table, right click on `demangler_gnu_v2_33_1` and apply the filter
|
||||
action. Sort the filtered function matches by descending confidence. Starting at the top,
|
||||
perform some code comparisons and convince yourself that the given explanation is correct.
|
||||
- Note: You can remove the filter using the "Settings" icon in the upper right. We'll discuss this further in [BSim Filters](./BSimTutorial_Filters.md)
|
||||
1. Sort the Executable results by descending **Function Count**.
|
||||
An entry in this column shows the number of queried functions which have at least one match in the row's executable (if ``foo`` has 2 or more matches into a given executable, it still only contributes 1 to the function count).
|
||||
What position is ``demangler_gnu_v2_41``?
|
||||
<details><summary>In this table...</summary> It's in the first position.</details>
|
||||
1. An entry in the **Confidence** column shows the sum of the confidence scores of all matches into the corresponding executable.
|
||||
If ``foo`` has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score.
|
||||
Sort the Executable results by descending confidence and observe that ``demangler_gnu_v2_41`` is now much further down the list.
|
||||
<details><summary>What could explain this?</summary> If there are many function matches but the sum of all the confidences is relatively low, it is likely that many of the matches involve small functions. For such a match, it is more likely that the functions agree by chance rather than being derived from the same source code. </details>
|
||||
1. In the Executable match table, right click on ``demangler_gnu_v2_41`` and apply the filter action.
|
||||
Sort the filtered function matches by descending confidence.
|
||||
Starting at the top, perform some code comparisons and convince yourself that the given explanation is correct.
|
||||
- **Note**: You can remove the filter using the **Filter Results** icon ![Filter Results](images/exec.png) in the upper right.
|
||||
We'll discuss this further in [BSim Filters](BSimTutorial_Filters.md)
|
||||
|
||||
In the next section, describe a technique to restrict queries to functions which are likely to
|
||||
have "interesting" matches.
|
||||
From this exercise, we see that unrelated functions can be duplicates of each other, either because they are small or because they perform a common generic action.
|
||||
Keep in mind that such functions can "pollute" the results of a blanket query.
|
||||
In the next section, we demonstrate a technique to restrict queries to functions which are more likely to have meaningful matches.
|
||||
|
||||
Next Section: [Overview Queries](BSimTutorial_Overview.md)
|
||||
Next Section: [Overview Queries](BSimTutorial_Overview_Queries.md)
|
@ -1,36 +1,21 @@
|
||||
# BSim Filters
|
||||
|
||||
There are a number of filters that can be applied to BSim queries, involving names, architectures,
|
||||
compilers, ingest dates, and many other attributes.
|
||||
There are a number of filters that can be applied to BSim queries, involving names, architectures, compilers, ingest dates, user-defined executable categories, and many other attributes.
|
||||
|
||||
Filter be can applied *server-side* or *client-side*. Server-side filters affect the results sent
|
||||
to Ghidra from a BSim server. Client-side filters apply to the BSim Search results table and can
|
||||
be added and removed at will. However, to "undo" a server-side filter, you have to issue an
|
||||
additional BSim query without the filter.
|
||||
Filters be can applied *server-side* or *client-side*.
|
||||
Server-side filters affect the query results sent to Ghidra from a BSim server.
|
||||
Client-side filters apply to the BSim Search results table and can be added and removed at will.
|
||||
However, to "undo" a server-side filter, you have to issue an additional BSim query without the filter.
|
||||
|
||||
Note that overview queries cannot be filtered.
|
||||
|
||||
Server-side filters can be applied using the `Filters` drop-down in the BSim Search dialog.
|
||||
Server-side filters can be applied using the **Filters** drop-down in the BSim Search dialog.
|
||||
|
||||
## Exercise: Filters
|
||||
|
||||
1. Select all functions in `postgres` and bring up the BSim Search dialog.
|
||||
1. Use the default query bounds.
|
||||
1. Apply an `Executable name does not equal` filter with `demangler_gnu_v2_33_1` as the name to
|
||||
exclude.
|
||||
1. Perform the query and verify that `demangler_gnu_v2_33_1` is not in the list of executables
|
||||
with matches.
|
||||
<p align="center">
|
||||
<img src="./images/search_info.png"/>
|
||||
</p>
|
||||
1. Using the `Search Info` icon, you can see what server-side filters were applied to the query.
|
||||
1. Select all functions in ``postgres`` and bring up the BSim Search dialog.
|
||||
1. Apply an **Executable name does not equal** filter with ``demangler_gnu_v2_41`` as the name to exclude.
|
||||
1. Perform the query and verify ``demangler_gnu_v2_41`` is not in the list of executables with matches.
|
||||
1. Using the **Search Info** icon ![Search Info](images/information.png) in the BSim Search Results toolbar, you can see the server-side filters applied to the query.
|
||||
Verify that this information is correct.
|
||||
<p align="center">
|
||||
<img src="./images/filter_results.png"/>
|
||||
</p>
|
||||
1. Using the `Filter Results` icon, you can apply client-side filters to the query results.
|
||||
Experiment with applying and removing some client-side filters.
|
||||
|
||||
|
||||
Next Section: [Scripting and Visualization](BSimTutorial_Scripting.md)
|
||||
1. Using the **Filter Results** icon ![Filter Results](images/exec.png), you can apply client-side filters to the query results. Experiment with applying and removing some client-side filters.
|
||||
|
||||
Next Section: [Scripting and Visualization](BSimTutorial_Scripting.md)
|
@ -1,7 +1,7 @@
|
||||
# Ghidra Analysis from the Command Line
|
||||
|
||||
For the remaining exercises, we need to populate our BSim database with a number of binaries.
|
||||
We'd like a consistent set of binaries for the tutorial, but we don't want to clutter the Ghidra distribution with dozens of additional executables that aren't actually used by the codebase.
|
||||
We'd like a consistent set of binaries for the tutorial, but we don't want to clutter the Ghidra distribution with dozens of additional executables.
|
||||
Fortunately, the BSim plugin includes a script for building the PostgreSQL backend, and that build process creates hundreds of object files.
|
||||
So we can just build PostgreSQL and harvest the object files we need.
|
||||
|
||||
@ -11,6 +11,9 @@ We do not run any PostgreSQL code, we simply analyze some files produced when bu
|
||||
Note that these files must be built on a machine running Linux.
|
||||
Windows users can build these files in a Linux virtual machine.
|
||||
|
||||
First, download ``postgresql-15.3.tar.gz`` from the PostgreSQL web site.
|
||||
Put this file in ``<ghidra_install_dir>/Ghidra/Features/BSim``.
|
||||
|
||||
To build the files, execute the following commands in a shell: [^1]
|
||||
|
||||
[^1]: You may need to install additional packages and/or change some build options in order for PostgreSQL to build successfully.
|
||||
@ -22,13 +25,12 @@ export CFLAGS="-O2 -g"
|
||||
./make-postgres.sh
|
||||
mkdir ~/postgres_object_files
|
||||
cd build
|
||||
find . -name pl*.o -exec cp {} ~/postgres_object_files/ \;
|
||||
find . -name p*o -size +100000c -size -700000c -exec cp {} ~/postgres_object_files/ \;
|
||||
cd os/linux_x86_64/postgresql/bin
|
||||
strip -s postgres
|
||||
```
|
||||
|
||||
To continue on Windows, transfer the ``~/postgres_object_files`` directory and the (stripped) ``postgres`` executable to your Windows machine.
|
||||
|
||||
To continue on Windows, transfer the ``~/postgres_object_files`` directory and the stripped ``postgres`` executable to your Windows machine.
|
||||
|
||||
## Importing and Analyzing the Exercise Files
|
||||
|
||||
|
@ -29,7 +29,7 @@ The index drastically reduces the number of vector comparisons needed and allows
|
||||
databases holding up to 10 million unique vectors, and a *large* template, intended for databases holding up to 100 million unique vectors.
|
||||
|
||||
Querying ``foo`` against a BSim database typically yields a number of potential matches.
|
||||
Each individual match for ``foo`` can be compared to `foo` in a side-by-side view, and certain information (such as function name) can be quickly transferred from a match to ``foo``.
|
||||
Each individual match for ``foo`` can be compared to `foo` in a side-by-side view, and certain information (such as function name) can be quickly copied from a match to ``foo``.
|
||||
|
||||
We frequently call BSim vectors the *BSim signature* of a function, or just the *signature* when the context is clear.
|
||||
|
||||
@ -46,7 +46,7 @@ Using BSim involves the following components:
|
||||
- A *BSim Client*, i.e., an instance of Ghidra with the BSim plugin enabled.
|
||||
- This is where the reverse engineering happens.
|
||||
- A *BSim Database*, which stores the BSim signatures.
|
||||
- Also stores some metadata about each function and the containing executable.
|
||||
- Also stores some metadata about each function and its containing executable.
|
||||
- In particular, stores the ghidra:// URL of the associated Ghidra program.
|
||||
- Does not store disassembly or decompiled functions.
|
||||
- A *Ghidra Project*, which stores the analyzed programs used to populate the BSim database.
|
||||
|
@ -1,57 +0,0 @@
|
||||
# Overview Queries
|
||||
|
||||
An **Overview Query** queries a BSim database for the number of matches to each
|
||||
function in an executable. The matching functions themselves are not returned.
|
||||
Similarity and Confidence thresholds apply to an Overview query, but the
|
||||
"Matches per Function" bound does not.
|
||||
|
||||
To perform an Overview Query, select `BSim -> Perform Overview...` from the Code
|
||||
Browser.
|
||||
|
||||
## Exercise 1: Hit Counts and Self-Similarities
|
||||
|
||||
1. Perform an Overview query on `postgres` using the default query bounds. You should see
|
||||
the following result:
|
||||
![](./images/overview_window.png)
|
||||
1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-similarity. Verify that that is the case for this table.
|
||||
1. Q: Examine the functions with the highest hit count. Why are there so many matches, and
|
||||
why do they all have the same BSim feature vector?
|
||||
- <details><summary>A:</summary> These functions simply return constants. BSim feature vectors
|
||||
incorporate the fact that varnode is constant but do not incorporate the specific value.</details>
|
||||
|
||||
## Exercise 2: Selections and Queries
|
||||
|
||||
Using the hit count column, it is possible to exclude functions with large numbers of matches.
|
||||
|
||||
1. In the Overview Table, select all functions whose hit count is 5 or less.
|
||||
1. Right-click on the selection and perform the `Search Selected Functions` action. Sort the
|
||||
query results by `Function Count` and verify that `demangler_gnu_v2_33_1` is far down the list.
|
||||
|
||||
## Exercise 3: Vector Hashes
|
||||
|
||||
Suppose `foo` and `bar` have the same number of hits in the Overview table. There are two
|
||||
possibilities:
|
||||
- `foo` and `bar` have distinct feature vectors which happen to have the same number of matches.
|
||||
- `foo` and `bar` have the same feature vector.
|
||||
|
||||
An optional column, `Vector Hash`, can be used to distinguish between these two cases.
|
||||
|
||||
1. Enable the `Vector Hash` Column in the Overview Table.
|
||||
1. Sort the hit count column in ascending order, (multi)sort the Self Significance column in
|
||||
descending order, then (multi)sort the Vector Hash column in ascending order.
|
||||
1. Q: What are the first functions in the table with the same vector hash?
|
||||
- <details><summary>A:</summary> `ts_headline_json_byid_opt` and `ts_headline_jsob_byid_opt`
|
||||
</details>
|
||||
1. Examine the decompiled code of these two functions and verify that they should have identical
|
||||
BSim vectors.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Next Section: [Queries and Filters](BSimTutorial_Filters.md)
|
42
GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.md
Normal file
@ -0,0 +1,42 @@
|
||||
# Overview Queries
|
||||
|
||||
An **Overview Query** queries a BSim database for the number of matches to each function in an executable.
|
||||
The matching functions themselves are not returned.
|
||||
Similarity and Confidence thresholds can be set for an Overview query, but there is no "Matches per Function" bound and no filters can be set.
|
||||
|
||||
To perform an Overview Query, select **BSim -> Perform Overview...** from the Code Browser.
|
||||
|
||||
## Exercise 1: Hit Counts and Self-Significance.
|
||||
|
||||
1. Perform an Overview query on ``postgres`` using the default query thresholds.
|
||||
You should see the following result:
|
||||
![overview window](images/overview_window.png)
|
||||
1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-significance.
|
||||
Verify that that is the case for this table.
|
||||
1. Q: Examine the functions with the highest hit count. Why are there so many matches for these functions?
|
||||
<details><summary>Answer:</summary> These are all instances of PostgreSQL statistics-reporting functions. Their bodies are quite similar.</details>
|
||||
|
||||
## Exercise 2: Selections and Queries
|
||||
|
||||
Using the hit count column, it is possible to exclude functions with large numbers of matches.
|
||||
|
||||
1. In the Overview Table, select all functions whose hit count is 2 or less.
|
||||
1. Right-click on the selection and perform the **Search Selected Functions** action.
|
||||
Sort the query results by descending **Function Count** and verify that ``demangler_gnu_v2_41`` is far down the list.
|
||||
|
||||
## Exercise 3: Vector Hashes
|
||||
|
||||
Suppose ``foo`` and ``bar`` have the same number of hits in the Overview table.
|
||||
There are two possibilities:
|
||||
1. ``foo`` and ``bar`` have distinct feature vectors which happen to have the same number of matches.
|
||||
1. ``foo`` and ``bar`` have the same feature vector.
|
||||
|
||||
An optional column, **Vector Hash**, can be used to distinguish between these two cases.
|
||||
|
||||
1. Enable the **Vector Hash** Column in the Overview Table.
|
||||
1. Find two functions with the vector hash.
|
||||
1. Select the two corresponding rows in the table and then transfer the selection to the Listing using the ![make selection icon](images/text_align_justify.png) icon in the BSim Overview toolbar.
|
||||
1. In the Listing, press ``Shift-C`` or right-click and perform the **Compare Selected Functions** action.
|
||||
1. In the resulting Function Comparison window, convince yourself that these two functions should have the same BSim signature.
|
||||
|
||||
Next Section: [Queries and Filters](BSimTutorial_Filters.md)
|
@ -6,17 +6,17 @@ Finally, we briefly mention a few other topics related to BSim.
|
||||
|
||||
There are are number of example scripts in the ``BSim`` script category, which demonstrate how to interact with BSim programmatically:
|
||||
|
||||
![](./images/script_manager.png)
|
||||
![](images/script_manager.png)
|
||||
|
||||
## Visualizing Features
|
||||
|
||||
Finally, if you'd like to see the particular BSim features in a function, you can use the BSim Feature Visualizer.
|
||||
This plugin allows you to highlight regions of the decompiled code corresponding to a particular feature and to display a graph representing the feature.
|
||||
|
||||
To use this plugin, first enable the ``BSimFeatureVisualizerPlugin`` via **File -> Configure ** from the Code Browser.
|
||||
To use this plugin, first enable the ``BSimFeatureVisualizerPlugin`` via **File -> Configure** from the Code Browser.
|
||||
You can then bring it via **BSim -> BSim Feature Visualizer**.
|
||||
|
||||
![](./images/feature_visualizer.png)
|
||||
![](images/feature_visualizer.png)
|
||||
|
||||
This is the end of the tutorial.
|
||||
|
||||
|
@ -8,14 +8,14 @@ This tutorial demonstrates how create a small BSim database and walks through so
|
||||
**Detailed information about BSim can be found in the "BSim" entry of the Ghidra Help**.
|
||||
|
||||
1. [Introduction to BSim](BSimTutorial_Intro.md)
|
||||
1. [Enabling BSim](BSimTutorial_Enabling.md)
|
||||
1. [Starting Ghidra and Enabling BSim](BSimTutorial_Enabling.md)
|
||||
1. [Creating and Populating a BSim Database from the GUI](BSimTutorial_Creating_Database_From_GUI.md)
|
||||
1. [Basic BSim Queries](BSimTutorial_Basic_Queries.md)
|
||||
1. [Ghidra from the Command Line](BSimTutorial_Ghidra_Command_Line.md)
|
||||
1. [BSim from the Command Line](BSimTutorial_BSim_Command_Line.md)
|
||||
1. [Evaluating Matches](BSimTutorial_Evaluating_Matches.md)
|
||||
1. [From Matching Functions to Matching Executables](BSimTutorial_Exe_Results.md)
|
||||
1. [Overview Queries](BSimTutorial_Overview.md)
|
||||
1. [Overview Queries](BSimTutorial_Overview_Queries.md)
|
||||
1. [BSim Filters](BSimTutorial_Filters.md)
|
||||
1. [Scripting and Visualization](BSimTutorial_Scripting.md)
|
||||
|
||||
|
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 42 KiB |
BIN
GhidraDocs/GhidraClass/BSim/images/decomp_diff.png
Normal file
After Width: | Height: | Size: 185 KiB |
Before Width: | Height: | Size: 73 KiB After Width: | Height: | Size: 64 KiB |
BIN
GhidraDocs/GhidraClass/BSim/images/exec.png
Normal file
After Width: | Height: | Size: 1.0 KiB |
Before Width: | Height: | Size: 7.0 KiB |
BIN
GhidraDocs/GhidraClass/BSim/images/information.png
Normal file
After Width: | Height: | Size: 778 B |
BIN
GhidraDocs/GhidraClass/BSim/images/lock.gif
Normal file
After Width: | Height: | Size: 900 B |
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 43 KiB |
BIN
GhidraDocs/GhidraClass/BSim/images/preferences-web-browser-shortcuts.png
Executable file
After Width: | Height: | Size: 955 B |
Before Width: | Height: | Size: 9.3 KiB |
BIN
GhidraDocs/GhidraClass/BSim/images/text_align_justify.png
Normal file
After Width: | Height: | Size: 209 B |
BIN
GhidraDocs/GhidraClass/BSim/images/unlock.gif
Normal file
After Width: | Height: | Size: 900 B |
@ -1,7 +1,9 @@
|
||||
##VERSION: 2.0
|
||||
##MODULE IP: Creative Commons Attribution 2.5
|
||||
##MODULE IP: Crystal Clear Icons - LGPL 2.1
|
||||
##MODULE IP: FAMFAMFAM Icons - CC 2.5
|
||||
##MODULE IP: LGPL 2.1
|
||||
##MODULE IP: LGPL 3.0
|
||||
##MODULE IP: Modified Nuvola Icons - LGPL 2.1
|
||||
##MODULE IP: Nuvola Icons - LGPL 2.1
|
||||
##MODULE IP: Public Domain
|
||||
@ -28,20 +30,25 @@ GhidraClass/BSim/BSimTutorial_Exe_Results.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/BSimTutorial_Filters.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/BSimTutorial_Intro.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/BSimTutorial_Overview.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/BSimTutorial_Overview_Queries.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/BSimTutorial_Scripting.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/README.md||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/actions.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/basic_query.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/bsim_search_dialog.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/configure.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/decomp_diff.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/exe_results.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/exe_results_actions.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/exec.png||Crystal Clear Icons - LGPL 2.1||||END|
|
||||
GhidraClass/BSim/images/feature_visualizer.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/filter_results.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/information.png||FAMFAMFAM Icons - CC 2.5||||END|
|
||||
GhidraClass/BSim/images/lock.gif||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/overview_window.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/preferences-web-browser-shortcuts.png||LGPL 3.0||||END|
|
||||
GhidraClass/BSim/images/script_manager.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/search_info.png||GHIDRA||||END|
|
||||
GhidraClass/BSim/images/text_align_justify.png||FAMFAMFAM Icons - CC 2.5||||END|
|
||||
GhidraClass/BSim/images/unlock.gif||GHIDRA||||END|
|
||||
GhidraClass/Beginner/Images/GhidraLogo64.png||GHIDRA||||END|
|
||||
GhidraClass/Beginner/Introduction_to_Ghidra_Student_Guide.html||GHIDRA|||This file contains mostly Ghidra content, but also includes code that is available for distribution, without restrictions, from https://github.com/paulrouget/dzslides.|END|
|
||||
GhidraClass/Beginner/Introduction_to_Ghidra_Student_Guide_withNotes.html||Public Domain|||Slight modification of code that is available for distribution, without restrictions, (original extremely permissive wtf license allows us to change IP to Public Domain),from https://github.com/paulrouget/dzslides.|END|
|
||||
|