mirror of
https://github.com/NationalSecurityAgency/ghidra.git
synced 2024-11-21 19:42:14 +00:00
365 lines
18 KiB
HTML
365 lines
18 KiB
HTML
<html>
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
<title>8. Using Context</title>
|
||
<link rel="stylesheet" type="text/css" href="DefaultStyle.css">
|
||
<link rel="stylesheet" type="text/css" href="languages.css">
|
||
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot">
|
||
<link rel="home" href="sleigh.html" title="SLEIGH">
|
||
<link rel="up" href="sleigh.html" title="SLEIGH">
|
||
<link rel="prev" href="sleigh_constructors.html" title="7. Constructors">
|
||
<link rel="next" href="sleigh_ref.html" title="9. P-code Tables">
|
||
</head>
|
||
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
||
<div class="navheader">
|
||
<table width="100%" summary="Navigation header">
|
||
<tr><th colspan="3" align="center">8. Using Context</th></tr>
|
||
<tr>
|
||
<td width="20%" align="left">
|
||
<a accesskey="p" href="sleigh_constructors.html">Prev</a> </td>
|
||
<th width="60%" align="center"> </th>
|
||
<td width="20%" align="right"> <a accesskey="n" href="sleigh_ref.html">Next</a>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
<hr>
|
||
</div>
|
||
<div class="sect1">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="sleigh_context"></a>8. Using Context</h2></div></div></div>
|
||
<p>
|
||
For most practical specifications, the disassembly and semantic
|
||
meaning of an instruction can be determined by looking only at the
|
||
bits in the encoding of that instruction. SLEIGH syntax reflects this
|
||
fact as every constructor or attached register is ultimately decided
|
||
by examining <span class="emphasis"><em>fields</em></span>, the syntactic representation
|
||
of these instruction bits. In some cases however, the instruction
|
||
encoding itself may not be enough. Additional information, which we
|
||
refer to as <span class="emphasis"><em>context</em></span>, may be necessary to fully
|
||
resolve the meaning of the instruction.
|
||
</p>
|
||
<p>
|
||
In truth, almost every modern processor has multiple modes of
|
||
operation, where the exact interpretation of instructions may depend
|
||
on that mode. Typical examples include distinguishing between a 16-bit
|
||
mode and a 32-bit mode, or between a big endian mode or a little
|
||
endian mode. But for the specification designer, these are of little
|
||
consequence because most software for such a processor will run in
|
||
only one mode without ever changing it. The designer need only pick
|
||
the most popular or most important mode for his projects and design to
|
||
that. If there is in fact software that runs under a different mode,
|
||
the other mode can be described in a separate specification.
|
||
</p>
|
||
<p>
|
||
However, for certain processors or software, the need to distinguish
|
||
between different interpretations of the same instruction encoding,
|
||
based on context, may be a crucial part of the disassembly and
|
||
analysis process. There are two typical situations where this becomes
|
||
necessary.
|
||
</p>
|
||
<div class="informalexample"><div class="itemizedlist"><ul class="itemizedlist compact" style="list-style-type: bullet; ">
|
||
<li class="listitem" style="list-style-type: disc">
|
||
The processor supports two (or more) separate instruction
|
||
sets. The set to use is usually determined by special bits in a status
|
||
register, and a single piece of software frequently switches between
|
||
these modes.
|
||
</li>
|
||
<li class="listitem" style="list-style-type: disc">
|
||
The processor supports instructions that temporarily affect
|
||
the execution of the immediately following instruction(s). For
|
||
example, many processors support hardware <span class="emphasis"><em>loop</em></span> instructions that
|
||
automatically cause the following instructions to repeat without an
|
||
explicit instruction causing the branching and loop counting.
|
||
</li>
|
||
</ul></div></div>
|
||
<p>
|
||
</p>
|
||
<p>
|
||
SLEIGH solves these problems by introducing <span class="emphasis"><em>context
|
||
variables</em></span>. The syntax for defining these symbols was
|
||
described in <a class="xref" href="sleigh_tokens.html#sleigh_context_variables" title="6.4. Context Variables">Section 6.4, “Context Variables”</a>. As mentioned
|
||
there, the easiest and most common way to use a context variable is as
|
||
just another field to use in our bit patterns. It gives us the extra
|
||
information we need to distinguish between different instructions
|
||
whose encodings are otherwise the same.
|
||
</p>
|
||
<div class="sect2">
|
||
<div class="titlepage"><div><div><h3 class="title">
|
||
<a name="sleigh_context_basic"></a>8.1. Basic Use of Context Variables</h3></div></div></div>
|
||
<p>
|
||
Suppose a processor supports the use of two different sets of
|
||
registers in its main addressing mode, based on the setting of a
|
||
status bit which can be changed dynamically. If an instruction is
|
||
executed with this bit cleared, then one set of registers is used, and
|
||
if the bit is set, the other registers are used. The instructions
|
||
otherwise behave identically.
|
||
</p>
|
||
<div class="informalexample"><pre class="programlisting">
|
||
define endian=big;
|
||
define space ram type=ram_space size=4 default;
|
||
define space register type=register_space size=4;
|
||
define register offset=0 size=4 [ r0 r1 r2 r3 r4 r5 r6 r7 ];
|
||
define register offset=0x100 size=4 [ s0 s1 s2 s3 s4 s5 s6 s7 ];
|
||
define register offset=0x200 size=4 [ statusreg ]; # define context bits (if defined, size must be multiple of 4-bytes)
|
||
|
||
define token instr(16)
|
||
op=(10,15) rreg1=(7,9) sreg1=(7,9) imm=(0,6)
|
||
;
|
||
define context statusreg
|
||
mode=(3,3)
|
||
;
|
||
attach variables [ rreg1 ] [ r0 r1 r2 r3 r4 r5 r6 r7 ];
|
||
attach variables [ sreg1 ] [ s0 s1 s2 s3 s4 s5 s6 s7 ];
|
||
|
||
Reg1: rreg1 is mode=0 & rreg1 { export rreg1; }
|
||
Reg1: sreg1 is mode=1 & sreg1 { export sreg1; }
|
||
|
||
:addi Reg1,#imm is op=1 & Reg1 & imm { Reg1 = Reg1 + imm; }
|
||
</pre></div>
|
||
<p>
|
||
</p>
|
||
<p>
|
||
In this example the symbol <span class="emphasis"><em>Reg1</em></span> uses the 3 bits
|
||
(7,9) to select one of eight registers. If the context
|
||
variable <span class="emphasis"><em>mode</em></span> is set to 0, it selects
|
||
an <span class="emphasis"><em>r</em></span> register, through
|
||
the <span class="emphasis"><em>rreg1</em></span> field. If <span class="emphasis"><em>mode</em></span> is
|
||
set to 1 on the other hand, an <span class="emphasis"><em>s</em></span> register is
|
||
selected instead
|
||
via <span class="emphasis"><em>sreg1</em></span>. The <span class="emphasis"><em>addi</em></span>
|
||
instruction (encoded as 0x0590 for example) can disassemble in one of
|
||
two ways.
|
||
</p>
|
||
<div class="informalexample"><pre class="programlisting">
|
||
addi r3,#0x10 <span class="bold"><strong>OR</strong></span>
|
||
addi s3,#0x10
|
||
</pre></div>
|
||
<p>
|
||
</p>
|
||
<p>
|
||
This is the same behavior as if <span class="emphasis"><em>mode</em></span> were defined
|
||
as a field instead of a context variable, except that there is nothing
|
||
in the instruction encoding itself which indicates which of the two
|
||
forms will be chosen. An engine doing the disassembly will have global
|
||
state associated with the <span class="emphasis"><em>mode</em></span> variable that will
|
||
make the final decision about which form to generate. The setting of
|
||
this state is (at least partially) out of the control of SLEIGH,
|
||
although see the following sections.
|
||
</p>
|
||
</div>
|
||
<div class="sect2">
|
||
<div class="titlepage"><div><div><h3 class="title">
|
||
<a name="sleigh_local_change"></a>8.2. Local Context Change</h3></div></div></div>
|
||
<p>
|
||
SLEIGH can make direct modifications to context variables through
|
||
statements in the disassembly action section of a constructor. The
|
||
left-hand side of an assignment statement in this section can be a context variable,
|
||
see <a class="xref" href="sleigh_constructors.html#sleigh_general_actions" title="7.5.2. General Actions and Pattern Expressions">Section 7.5.2, “General Actions and Pattern Expressions”</a>. Because the result of this
|
||
assignment is calculated in the middle of the instruction disassembly,
|
||
the change in value of the context variable can potentially affect any
|
||
remaining parsing for that instruction. A modal variable is being
|
||
added to what was otherwise a stateless grammar, a common technique in
|
||
many practical parsing engines.
|
||
</p>
|
||
<p>
|
||
Any assignment statement changing a context variable is immediately
|
||
executed upon the successful match of the constructor containing the
|
||
statement and can be used to guide the parsing of the constructor's
|
||
operands. We introduce two more instructions to the example
|
||
specification from the previous section.
|
||
</p>
|
||
<div class="informalexample"><pre class="programlisting">
|
||
:raddi Reg1,#imm is op=2 & Reg1 & imm [ mode=0; ] {
|
||
Reg1 = Reg1 + imm;
|
||
}
|
||
:saddi Reg1,#imm is op=3 & Reg1 & imm [ mode=1; ] {
|
||
Reg1 = Reg1 + imm;
|
||
}
|
||
</pre></div>
|
||
<p>
|
||
</p>
|
||
<p>
|
||
Notice that both new constructors modify the context
|
||
variable <span class="emphasis"><em>mode</em></span>. The raddi instruction sets mode to
|
||
0 and effectively guarantees that an <span class="emphasis"><em>r</em></span> register
|
||
will be produced by the disassembly. Similarly,
|
||
the <span class="emphasis"><em>saddi</em></span> instruction can force
|
||
an <span class="emphasis"><em>s</em></span> register. Both are in contrast to
|
||
the <span class="emphasis"><em>addi</em></span> instruction, which depends on a global
|
||
state. The changes to <span class="emphasis"><em>mode</em></span> made by these
|
||
instructions only persist for parsing of that single instruction. For
|
||
any following instructions, if the matching constructors
|
||
use <span class="emphasis"><em>mode</em></span>, its value will have reverted to its
|
||
original global state. The same holds for any context variable
|
||
modified with this syntax. If an instruction needs to permanently
|
||
modify the state of a context variable, the designer must use
|
||
constructions described in <a class="xref" href="sleigh_context.html#sleigh_global_change" title="8.3. Global Context Change">Section 8.3, “Global Context Change”</a>.
|
||
</p>
|
||
<p>
|
||
Clearly, the behavior of the above example could be easily replicated
|
||
without using context variables at all and having the selection of a
|
||
register set simply depend directly on the <span class="emphasis"><em>op</em></span>
|
||
field. But, with more complicated addressing modes, local modification
|
||
of context variables can drastically reduce the complexity and size of
|
||
a specification.
|
||
</p>
|
||
<p>
|
||
At the point where a modification is made to a context variable, the
|
||
specification designer has the guarantee that none of the operands of
|
||
the constructor have been evaluated yet, so if their matching depends
|
||
on this context variable, they will be affected by the change. In
|
||
contrast, the matching of any ancestor constructor cannot be
|
||
affected. Other constructors, which are not direct ancestors or
|
||
descendants, may or may not be affected by the change, depending on
|
||
the order of evaluation. It is usually best not to depend on this
|
||
ordering when designing the specification, with the possible exception
|
||
of orderings which are guaranteed
|
||
by <span class="bold"><strong>build</strong></span> directives.
|
||
</p>
|
||
</div>
|
||
<div class="sect2">
|
||
<div class="titlepage"><div><div><h3 class="title">
|
||
<a name="sleigh_global_change"></a>8.3. Global Context Change</h3></div></div></div>
|
||
<p>
|
||
It is possible for an instruction to attempt a permanent change to a
|
||
context variable, which would then affect the parsing of other
|
||
instructions, by using the <span class="bold"><strong>globalset</strong></span>
|
||
directive in a disassembly action. As mentioned in the previous
|
||
section, context variables have an associated global state, which can
|
||
be used during constructor matching. A complete model for this state
|
||
is, unfortunately, outside the scope of SLEIGH. The disassembly engine
|
||
has to make too many decisions about what is getting disassembled and
|
||
what assumptions are being made to give complete control of the
|
||
context to SLEIGH. Because of this caveat, SLEIGH syntax for making
|
||
permanent context changes should be viewed as a suggestion to the
|
||
disassembly engine.
|
||
</p>
|
||
<p>
|
||
For processors that support multiple modes, there are typically
|
||
specific instructions that switch between these modes. Extending the
|
||
example from the previous sections, we add two instructions to the
|
||
specification for permanently switching which register set is being
|
||
used.
|
||
</p>
|
||
<div class="informalexample"><pre class="programlisting">
|
||
:rmode is op=32 & rreg1=0 & imm=0
|
||
[ mode=0; globalset(inst_next,mode); ]
|
||
{}
|
||
:smode is op=33 & rreg1=0 & imm=0
|
||
[ mode=1; globalset(inst_next,mode); ]
|
||
{}
|
||
</pre></div>
|
||
<p>
|
||
</p>
|
||
<p>
|
||
The register set is, as before, controlled by
|
||
the <span class="emphasis"><em>mode</em></span> variable, and as with a local change to
|
||
context, the variable is assigned to inside the square
|
||
brackets. The <span class="emphasis"><em>rmode</em></span> instruction
|
||
sets <span class="emphasis"><em>mode</em></span> to 0, in order to
|
||
select <span class="emphasis"><em>r</em></span> registers
|
||
via <span class="emphasis"><em>rreg1</em></span>, and <span class="emphasis"><em>smode</em></span>
|
||
sets <span class="emphasis"><em>mode</em></span> to 1 in order to
|
||
select <span class="emphasis"><em>s</em></span> registers. As is described in
|
||
<a class="xref" href="sleigh_context.html#sleigh_local_change" title="8.2. Local Context Change">Section 8.2, “Local Context Change”</a>, these assignments by themselves
|
||
cause only a local context change. However, the
|
||
subsequent <span class="bold"><strong>globalset</strong></span> directives make
|
||
the change persist outside of the the instructions
|
||
themselves. The <span class="bold"><strong>globalset</strong></span> directive
|
||
takes two parameters, the second being the particular context variable
|
||
being changed. The first parameter indicates the first address where
|
||
the new context takes effect. In the example, the expectation is that
|
||
a mode change affects any subsequent instructions. So the first
|
||
parameter to <span class="bold"><strong>globalset</strong></span> here
|
||
is <span class="emphasis"><em>inst_next</em></span>, indicating that the new value
|
||
of <span class="emphasis"><em>mode</em></span> begins at the next address.
|
||
</p>
|
||
<div class="sect3">
|
||
<div class="titlepage"><div><div><h4 class="title">
|
||
<a name="sleigh_contextflow"></a>8.3.1. Context Flow</h4></div></div></div>
|
||
<p>
|
||
A global change to context that affects instruction decoding is typically
|
||
open-ended. I.e. once the mode switching instruction is executed, a permanent change
|
||
is made to the run-time processor state, and all future instruction decoding is
|
||
affected, until another mode switch is encountered. In terms of SLEIGH by default,
|
||
the effect of a <span class="bold"><strong>globalset</strong></span> directive
|
||
follows <span class="emphasis"><em>flow</em></span>. Starting from the address specified in the directive,
|
||
the change in context follows the control-flow of the instructions, through
|
||
branches and calls, until an execution path terminates or another context change
|
||
is encountered.
|
||
</p>
|
||
<p>
|
||
Flow following behavior can be overridden by adding the <span class="bold"><strong>noflow</strong></span>
|
||
attribute to the definition of the context field. (See <a class="xref" href="sleigh_tokens.html#sleigh_context_variables" title="6.4. Context Variables">Section 6.4, “Context Variables”</a>)
|
||
In this case, a <span class="bold"><strong>globalset</strong></span> directive only affects the context
|
||
of a single instruction at the specified address. Subsequent instructions
|
||
retain their original context. This can be useful in a variety of situations but is typically
|
||
used to let one instruction alter the behavior, not necessarily the decoding,
|
||
of the following instruction. In the example below,
|
||
an indirect branch instruction jumps through a link register <span class="emphasis"><em>lr</em></span>. If the previous
|
||
instruction moves the program counter in to <span class="emphasis"><em>lr</em></span>, it communicates this to the
|
||
branch instruction through the <span class="emphasis"><em>LRset</em></span> context variable so that the branch can
|
||
be interpreted as a return, rather than a generic indirect branch.
|
||
</p>
|
||
<div class="informalexample"><pre class="programlisting">
|
||
define context contextreg
|
||
LRset = (1,1) noflow # 1 if the instruction before was a mov lr,pc
|
||
;
|
||
<span class="weak">...</span>
|
||
mov lr,pc is opcode=34 & lr & pc
|
||
[ LRset=1; globalset(inst_next,LRset); ] { lr = pc; }
|
||
<span class="weak">...</span>
|
||
blr is opcode=35 & reg=15 & LRset=0 { goto [lr]; }
|
||
blr is opcode=35 & reg=15 & LRset=1 { return [lr]; }
|
||
</pre></div>
|
||
<p>
|
||
</p>
|
||
<p>
|
||
An alternative to the <span class="bold"><strong>noflow</strong></span> attribute is to simply issue
|
||
multiple directives within a single constructor, so an explicit end to a context change
|
||
can be given. The value of the variable exported to the global state
|
||
is the one in effect at the point where the directive is issued. Thus,
|
||
after one <span class="bold"><strong>globalset</strong></span>, the same context
|
||
variable can be assigned a different value, followed by
|
||
another <span class="bold"><strong>globalset</strong></span> for a different
|
||
address.
|
||
</p>
|
||
<p>
|
||
Because context in SLEIGH is controlled by a disassembly process,
|
||
there are some basic caveats to the use of
|
||
the <span class="bold"><strong>globalset</strong></span> directive. With
|
||
<span class="emphasis"><em>flowing</em></span> context changes,
|
||
there is no guarantee of what global state will be in effect at a
|
||
particular address. During disassembly, at any given
|
||
point, the process may not have uncovered all the relevant directives,
|
||
and the known directives may not necessarily be consistent. In
|
||
general, for most processors, the disassembly at a particular address
|
||
is intended to be absolute. So given enough information, it should be
|
||
possible to make a definitive determination of what the context is at
|
||
a certain address, but there is no guarantee. It is up to the
|
||
disassembly process to fully determine where context changes begin and
|
||
end and what to do if there are conflicts.
|
||
</p>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<div class="navfooter">
|
||
<hr>
|
||
<table width="100%" summary="Navigation footer">
|
||
<tr>
|
||
<td width="40%" align="left">
|
||
<a accesskey="p" href="sleigh_constructors.html">Prev</a> </td>
|
||
<td width="20%" align="center"> </td>
|
||
<td width="40%" align="right"> <a accesskey="n" href="sleigh_ref.html">Next</a>
|
||
</td>
|
||
</tr>
|
||
<tr>
|
||
<td width="40%" align="left" valign="top">7. Constructors </td>
|
||
<td width="20%" align="center"><a accesskey="h" href="sleigh.html">Home</a></td>
|
||
<td width="40%" align="right" valign="top"> 9. P-code Tables</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
</body>
|
||
</html>
|