ghidra/GhidraDocs/languages/html/pcoderef.html
Andrew Dunbar 6a396af09e
minor English copyedit
The list ... are similar to ... -> The list ... is similar to ...
2024-10-05 11:40:55 +07:00

370 lines
18 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>P-Code Reference Manual</title>
<link rel="stylesheet" type="text/css" href="DefaultStyle.css">
<link rel="stylesheet" type="text/css" href="languages.css">
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot">
<link rel="home" href="pcoderef.html" title="P-Code Reference Manual">
<link rel="next" href="pcodedescription.html" title="P-Code Operation Reference">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<div class="navheader">
<table width="100%" summary="Navigation header">
<tr><th colspan="3" align="center">P-Code Reference Manual</th></tr>
<tr>
<td width="20%" align="left"> </td>
<th width="60%" align="center"> </th>
<td width="20%" align="right"> <a accesskey="n" href="pcodedescription.html">Next</a>
</td>
</tr>
</table>
<hr>
</div>
<div class="article">
<div class="titlepage">
<div>
<div><h1 class="title">
<a name="pcoderef_title"></a>P-Code Reference Manual</h1></div>
<div><p class="releaseinfo">Last updated March 2, 2023</p></div>
</div>
<hr>
</div>
<div class="table">
<a name="mytoc.htmltable"></a><table xml:id="mytoc.htmltable" width="90%" frame="none">
<col width="25%">
<col width="25%">
<col width="25%">
<col width="25%">
<tbody>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_copy" title="COPY">COPY</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_add" title="INT_ADD">INT_ADD</a></td>
<td><a class="link" href="pcodedescription.html#cpui_bool_or" title="BOOL_OR">BOOL_OR</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_load" title="LOAD">LOAD</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_sub" title="INT_SUB">INT_SUB</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_equal" title="FLOAT_EQUAL">FLOAT_EQUAL</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_store" title="STORE">STORE</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_carry" title="INT_CARRY">INT_CARRY</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_notequal" title="FLOAT_NOTEQUAL">FLOAT_NOTEQUAL</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_branch" title="BRANCH">BRANCH</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_scarry" title="INT_SCARRY">INT_SCARRY</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_less" title="FLOAT_LESS">FLOAT_LESS</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_cbranch" title="CBRANCH">CBRANCH</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_sborrow" title="INT_SBORROW">INT_SBORROW</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_lessequal" title="FLOAT_LESSEQUAL">FLOAT_LESSEQUAL</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_branchind" title="BRANCHIND">BRANCHIND</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_2comp" title="INT_2COMP">INT_2COMP</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_add" title="FLOAT_ADD">FLOAT_ADD</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_call" title="CALL">CALL</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_negate" title="INT_NEGATE">INT_NEGATE</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_sub" title="FLOAT_SUB">FLOAT_SUB</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_callind" title="CALLIND">CALLIND</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_xor" title="INT_XOR">INT_XOR</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_mult" title="FLOAT_MULT">FLOAT_MULT</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pseudo-ops.html#cpui_userdefined" title="USERDEFINED">USERDEFINED</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_and" title="INT_AND">INT_AND</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_div" title="FLOAT_DIV">FLOAT_DIV</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_return" title="RETURN">RETURN</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_or" title="INT_OR">INT_OR</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_neg" title="FLOAT_NEG">FLOAT_NEG</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_piece" title="PIECE">PIECE</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_left" title="INT_LEFT">INT_LEFT</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_abs" title="FLOAT_ABS">FLOAT_ABS</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_subpiece" title="SUBPIECE">SUBPIECE</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_right" title="INT_RIGHT">INT_RIGHT</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_sqrt" title="FLOAT_SQRT">FLOAT_SQRT</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_popcount" title="POPCOUNT">POPCOUNT</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_sright" title="INT_SRIGHT">INT_SRIGHT</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_ceil" title="FLOAT_CEIL">FLOAT_CEIL</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_lzcount" title="LZCOUNT">LZCOUNT</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_mult" title="INT_MULT">INT_MULT</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_floor" title="FLOAT_FLOOR">FLOAT_FLOOR</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_equal" title="INT_EQUAL">INT_EQUAL</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_div" title="INT_DIV">INT_DIV</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_round" title="FLOAT_ROUND">FLOAT_ROUND</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_notequal" title="INT_NOTEQUAL">INT_NOTEQUAL</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_rem" title="INT_REM">INT_REM</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float_nan" title="FLOAT_NAN">FLOAT_NAN</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_less" title="INT_LESS">INT_LESS</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_sdiv" title="INT_SDIV">INT_SDIV</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int2float" title="INT2FLOAT">INT2FLOAT</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_sless" title="INT_SLESS">INT_SLESS</a></td>
<td><a class="link" href="pcodedescription.html#cpui_int_srem" title="INT_SREM">INT_SREM</a></td>
<td><a class="link" href="pcodedescription.html#cpui_float2float" title="FLOAT2FLOAT">FLOAT2FLOAT</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_lessequal" title="INT_LESSEQUAL">INT_LESSEQUAL</a></td>
<td><a class="link" href="pcodedescription.html#cpui_bool_negate" title="BOOL_NEGATE">BOOL_NEGATE</a></td>
<td><a class="link" href="pcodedescription.html#cpui_trunc" title="TRUNC">TRUNC</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_slessequal" title="INT_SLESSEQUAL">INT_SLESSEQUAL</a></td>
<td><a class="link" href="pcodedescription.html#cpui_bool_xor" title="BOOL_XOR">BOOL_XOR</a></td>
<td><a class="link" href="pseudo-ops.html#cpui_cpoolref" title="CPOOLREF">CPOOLREF</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_zext" title="INT_ZEXT">INT_ZEXT</a></td>
<td><a class="link" href="pcodedescription.html#cpui_bool_and" title="BOOL_AND">BOOL_AND</a></td>
<td><a class="link" href="pseudo-ops.html#cpui_new" title="NEW">NEW</a></td>
</tr>
<tr>
<td></td>
<td><a class="link" href="pcodedescription.html#cpui_int_sext" title="INT_SEXT">INT_SEXT</a></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</div>
<div class="sect1">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="index"></a>A Brief Introduction to P-Code</h2></div></div></div>
<p>
P-code is a <span class="emphasis"><em>register transfer language</em></span> designed
for reverse engineering applications. The language is general enough
to model the behavior of many different processors. By modeling in
this way, the analysis of different processors is put into a common
framework, facilitating the development of retargetable analysis
algorithms and applications.
</p>
<p>
Fundamentally, p-code works by translating individual processor instructions
into a sequence of <span class="bold"><strong>p-code operations</strong></span> that take
parts of the processor state as input and output variables
(<span class="bold"><strong>varnodes</strong></span>). The set of unique p-code operations
(distinguished by <span class="bold"><strong>opcode</strong></span>) comprise a fairly tight set
of the arithmetic and logical actions performed by general purpose processors.
The direct translation of instructions into these operations is referred
to as <span class="bold"><strong>raw p-code</strong></span>. Raw p-code can be used to directly emulate
instruction execution and generally follows the same control-flow,
although it may add some of its own internal control-flow. The subset of
opcodes that can occur in raw p-code is described in
<a class="xref" href="pcodedescription.html" title="P-Code Operation Reference">the section called “P-Code Operation Reference”</a> and in <a class="xref" href="pseudo-ops.html" title="Pseudo P-CODE Operations">the section called “Pseudo P-CODE Operations”</a>, making up
the bulk of this document.
</p>
<p>
P-code is designed specifically to facilitate the
construction of <span class="emphasis"><em>data-flow</em></span> graphs for follow-on analysis of
disassembled instructions. Varnodes and
p-code operators can be thought of explicitly as nodes in these graphs.
Generation of raw p-code is a necessary first step in graph construction,
but additional steps are required, which introduces some new
opcodes. Two of these,
<span class="bold"><strong>MULTIEQUAL</strong></span> and <span class="bold"><strong>INDIRECT</strong></span>,
are specific to the graph construction process, but other opcodes can be introduced during
subsequent analysis and transformation of a graph and help hold recovered data-type relationships.
All of the new opcodes are described in <a class="xref" href="additionalpcode.html" title="Additional P-CODE Operations">the section called “Additional P-CODE Operations”</a>, none of which can occur
in the original raw p-code translation. Finally, a few of the p-code operators,
<span class="bold"><strong>CALL</strong></span>,
<span class="bold"><strong>CALLIND</strong></span>, and <span class="bold"><strong>RETURN</strong></span>,
may have their input and output varnodes changed during analysis so that they no
longer match their <span class="emphasis"><em>raw p-code</em></span> form.
</p>
<p>
The core concepts of p-code are:
</p>
<div class="sect2">
<div class="titlepage"><div><div><h3 class="title">
<a name="pcoderef_address_space"></a>Address Space</h3></div></div></div>
<p>
The <span class="bold"><strong>address space</strong></span> for p-code is a generalization
of RAM. It is defined simply as an indexed sequence of bytes that can
be read and written by the p-code operations. For a specific byte, the unique index
that labels it is the byte's <span class="bold"><strong>address</strong></span>. An address space has a
name to identify it, a size that indicates the number of distinct
indices into the space, and an <span class="bold"><strong>endianness</strong></span>
associated with it that indicates how integers and other multi-byte
values are encoded into the space. A typical processor
will have a <span class="bold"><strong>ram</strong></span> space, to model
memory accessible via its main data bus, and
a <span class="bold"><strong>register</strong></span> space for modeling the
processor's general purpose registers. Any data that a processor
manipulates must be in some address space. The specification for a
processor is free to define as many address spaces as it needs. There
is always a special address space, called
a <span class="bold"><strong>constant</strong></span> address space, which is
used to encode any constant values needed for p-code operations. Systems generating
p-code also generally use a dedicated <span class="bold"><strong>temporary</strong></span>
space, which can be viewed as a bottomless source of temporary registers. These
are used to hold intermediate values when modeling instruction behavior.
</p>
<p>
P-code specifications allow the addressable unit of an address
space to be bigger than just a byte. Each address space has
a <span class="bold"><strong>wordsize</strong></span> attribute that can be set
to indicate the number of bytes in a unit. A wordsize which is bigger
than one makes little difference to the representation of p-code. All
the offsets into an address space are still represented internally as
a byte offset. The only exceptions are
the <span class="bold"><strong>LOAD</strong></span> and
<span class="bold"><strong>STORE</strong></span> p-code
operations. These operations read a pointer offset that must be scaled properly to get the
right byte offset when dereferencing the pointer. The wordsize attribute has no effect on
any of the other p-code operations.
</p>
</div>
<div class="sect2">
<div class="titlepage"><div><div><h3 class="title">
<a name="pcoderef_varnode"></a>Varnode</h3></div></div></div>
<p>
A <span class="bold"><strong>varnode</strong></span> is a generalization of
either a register or a memory location. It is represented by the formal triple:
an address space, an offset into the space, and a size. Intuitively, a
varnode is a contiguous sequence of bytes in some address space that
can be treated as a single value. All manipulation of data by p-code
operations occurs on varnodes.
</p>
<p>
Varnodes by themselves are just a contiguous chunk of bytes,
identified by their address and size, and they have no type. The
p-code operations however can force one of three <span class="emphasis"><em>type</em></span> interpretations
on the varnodes: integer, boolean, and floating-point.
</p>
<div class="informalexample"><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: bullet; ">
<li class="listitem" style="list-style-type: disc">
Operations that manipulate integers always interpret a varnode as a
twos-complement encoding using the endianness associated with the
address space containing the varnode.
</li>
<li class="listitem" style="list-style-type: disc">
A varnode being used as a boolean value is assumed to be a single byte
that can only take the value 0, for <span class="emphasis"><em>false</em></span>, and 1,
for <span class="emphasis"><em>true</em></span>.
</li>
<li class="listitem" style="list-style-type: disc">
Floating-point operations use the encoding expected by the processor being modeled,
which varies depending on the size of the varnode.
For most processors, these encodings are described by the IEEE 754 standard, but
other encodings are possible in principle.
</li>
</ul></div></div>
<p>
</p>
<p>
If a varnode is specified as an offset into
the <span class="bold"><strong>constant</strong></span> address space, that
offset is interpreted as a constant, or immediate value, in any p-code
operation that uses that varnode. The size of the varnode, in this
case, can be treated as the size or precision available for the encoding
of the constant. As with other varnodes, constants only have a type forced
on them by the p-code operations that use them.
</p>
</div>
<div class="sect2">
<div class="titlepage"><div><div><h3 class="title">
<a name="pcoderef_pcode_operation"></a>P-code Operation</h3></div></div></div>
<p>
A <span class="bold"><strong>p-code operation</strong></span> is the analog of a
machine instruction. All p-code operations have the same basic format
internally. They all take one or more varnodes as input and optionally
produce a single output varnode. The action of the operation is determined by
its <span class="bold"><strong>opcode</strong></span>.
For almost all p-code operations, only the output varnode can have its
value modified; there are no indirect effects of the operation.
The only possible exceptions are <span class="emphasis"><em>pseudo</em></span> operations,
see <a class="xref" href="pseudo-ops.html" title="Pseudo P-CODE Operations">the section called “Pseudo P-CODE Operations”</a>, which are sometimes necessary when there
is incomplete knowledge of an instruction's behavior.
</p>
<p>
All p-code operations are associated with the address of the original
processor instruction they were translated from. For a single instruction,
a 1-up counter, starting at zero, is used to enumerate the
multiple p-code operations involved in its translation. The address and
counter as a pair are referred to as the p-code op's
unique <span class="bold"><strong>sequence number</strong></span>. Control-flow of
p-code operations generally follows sequence number order. When execution
of all p-code for one instruction is completed, if the
instruction has <span class="emphasis"><em>fall-through</em></span> semantics, p-code
control-flow picks up with the first p-code operation in sequence corresponding to
the instruction at the fall-through address. Similarly, if a p-code operation
results in a control-flow branch, the first p-code operation in sequence executes
at the destination address.
</p>
<p>
The list of possible
opcodes is similar to many RISC based instruction sets. The effect of
each opcode is described in detail in the following sections,
and a reference table is given
in <a class="xref" href="reference.html" title="Syntax Reference">the section called “Syntax Reference”</a>. In general, the size or
precision of a particular p-code operation is determined by the size
of the varnode inputs or output, not by the opcode.
</p>
</div>
</div>
</div>
<div class="navfooter">
<hr>
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right"> <a accesskey="n" href="pcodedescription.html">Next</a>
</td>
</tr>
<tr>
<td width="40%" align="left" valign="top"> </td>
<td width="20%" align="center"> </td>
<td width="40%" align="right" valign="top"> P-Code Operation Reference</td>
</tr>
</table>
</div>
</body>
</html>