Fields are the basic building block for family symbols. The mechanisms
for building up from fields to the
root <spanclass="emphasis"><em>instruction</em></span> symbol are
the <spanclass="emphasis"><em>constructor</em></span> and <spanclass="emphasis"><em>table</em></span>.
</p>
<p>
A <spanclass="emphasis"><em>constructor</em></span> is the unit of syntax for building
new symbols. In essence a constructor describes how to build a new
family symbol, by describing, in turn, how to build a new display
meaning, how to build a new semantic meaning, and how encodings map to
these new meanings. A <spanclass="emphasis"><em>table</em></span> is a set of one or
more constructors and is the final step in creating a new family
symbol identifier associated with the pieces defined by
constructors. The name of the table is this new identifier, and it is
this identifier which can be used in the syntax for subsequent
constructors.
</p>
<p>
The difference between a constructor and table is slightly confusing
at first. In short, the syntactical elements described in this
chapter, for combining existing symbols into new symbols, are all used
to describe a single constructor. Specifications for multiple
constructors are combined to describe a single table. Since many
tables are built with only one constructor, it is natural and correct
to think of a constructor as a kind of table in and of itself. But it
is only the table that has an actual family symbol identifier
associated with it. Most of this chapter is devoted to describing how
to define a single constructor. The issues involved in combining
multiple constructors into a single table are addressed in <aclass="xref"href="sleigh_constructors.html#sleigh_tables"title="7.8.<2E>Tables">Section<EFBFBD>7.8, “Tables”</a>.
The identifier <spanclass="emphasis"><em>instruction</em></span> is actually reserved
for the root table, but should not be used in the table header as the
SLEIGH parser uses the blank identifier to help distinguish assembly
mnemonics from operands (see <aclass="xref"href="sleigh_constructors.html#sleigh_mnemonic"title="7.3.1.<2E>Mnemonic">Section<EFBFBD>7.3.1, “Mnemonic”</a>).
If the ‘^’ is used as the first (non-whitespace) character in the
display section of a base constructor, this inhibits the first
identifier in the display from being considered the mnemonic, as
described in <aclass="xref"href="sleigh_constructors.html#sleigh_mnemonic"title="7.3.1.<2E>Mnemonic">Section<EFBFBD>7.3.1, “Mnemonic”</a>. This allows
specification of less common situations, where the first part of the
mnemonic, rather than perhaps a later part, needs to be considered as
an operand. An initial ‘^’ character can also facilitate certain
recursive constructions.
</p>
</div>
</div>
<divclass="sect2">
<divclass="titlepage"><div><div><h3class="title">
<aname="sleigh_bit_pattern"></a>7.4.<2E>The Bit Pattern Section</h3></div></div></div>
<p>
Syntactically, this section comes between the
keyword <spanclass="bold"><strong>is</strong></span> and the delimiter for the
following section, either an ‘{‘ or an ‘[‘. The <spanclass="emphasis"><em>bit pattern
section</em></span> describes a
constructor’s <spanclass="emphasis"><em>pattern</em></span>, the subset of possible
instruction encodings that the designer wants
to <spanclass="emphasis"><em>match</em></span> the constructor being defined.
The patterns required for processor specifications can almost always
be described as a mask and value pair. Given a specific instruction
encoding, we can decide if the encoding matches our pattern by looking
at just the bits specified by the <spanclass="emphasis"><em>mask</em></span> and seeing
if they match a specific <spanclass="emphasis"><em>value</em></span>. The fields, as
defined in <aclass="xref"href="sleigh_tokens.html#sleigh_defining_tokens"title="6.1.<2E>Defining Tokens and Fields">Section<EFBFBD>6.1, “Defining Tokens and Fields”</a>, typically give us
our masks. So to construct a pattern, we can simply require that the
field take on a specific value, as in the example below.
It is not necessary for a global symbol, which is needed by a
constructor, to appear in the display section of the definition. If
the global identifier is used in the pattern section as it would be
for a normal operand definition but the identifier was not used in the
display section, then the constructor defines an <spanclass="emphasis"><em>invisible
operand</em></span>. Such an operand behaves and is parsed exactly like
any other operand but there is absolutely no visible indication of the
operand in the final display of the assembly instruction. The one
common type of instruction that uses this is the relative branch (see
<aclass="xref"href="sleigh_constructors.html#sleigh_relative_branches"title="7.5.1.<2E>Relative Branches">Section<EFBFBD>7.5.1, “Relative Branches”</a>) but it is otherwise needed
only in more esoteric instructions. It is useful in situations where
you need to break up the parsing of an instruction along lines that
A constraint does not have to be of the form “field = constant”,
although this is almost always what is needed. In certain situations,
it may be more convenient to use a different kind of
constraint. Special care should be taken when designing these
constraints because they can substantially deviate from the mask/value
model used to implement most constraints. These more general
constraints are implemented by splitting it up into smaller states
which can be modeled as a mask/value pair. This is all done
automatically, and the designer may inadvertently create huge numbers
of parsing states for a single constraint.
</p>
<p>
A constraint can actually be built out of arbitrary
expressions. These <spanclass="emphasis"><em>pattern expressions</em></span> are more
commonly used in disassembly actions and are defined in
<aclass="xref"href="sleigh_constructors.html#sleigh_general_actions"title="7.5.2.<2E>General Actions and Pattern Expressions">Section<EFBFBD>7.5.2, “General Actions and Pattern Expressions”</a>, but they can also be used in
constraints. So in general, a constraint is any equation where the
left-hand side is a single family symbol, the right-hand side is an
arbitrary pattern expression, and the constraint operator is one of
For the sake of these expressions, integers are considered signed
values of arbitrary precision. Expressions can also make use of
parentheses. A family symbol can be used in an expression, only if it
can be resolved to a particular specific symbol. This generally means
that a global family symbol, such as a field, must be attached to a
local identifier before it can be used.
</p>
<p>
The left-hand side of an assignment statement can be a context
variable (see <aclass="xref"href="sleigh_tokens.html#sleigh_context_variables"title="6.4.<2E>Context Variables">Section<EFBFBD>6.4, “Context Variables”</a>). An
assignment to such a variable changes the context in which the current
instruction is being disassembled and can potentially have a drastic
effect on how the rest of the instruction is disassembled. An
assignment of this form is considered local to the instruction and
variable is reset to its original value before parsing other
instructions. The disassembly action may also contain one or
more <spanclass="bold"><strong>globalset</strong></span> directives, which
cause changes to context variables to become more permanent. This
directive is distinct from the operators in a pattern expression and
must be invoked as a separate statement. See
<aclass="xref"href="sleigh_context.html"title="8.<2E>Using Context">Section<EFBFBD>8, “Using Context”</a>, for a discussion of how to
effectively use context variables and
<aclass="xref"href="sleigh_context.html#sleigh_global_change"title="8.3.<2E>Global Context Change">Section<EFBFBD>8.3, “Global Context Change”</a>, for details of
the <spanclass="bold"><strong>globalset</strong></span> directive.
</p>
<p>
Note that there are two syntax forms for the logical operators in a
pattern expression. When an expression is used as part of a
constraint, the “$and” and “$or” forms of the operators must be used
in order to distinguish the bitwise operators from the special pattern
combining operators, ‘&’ and ‘|’ (as described in
<aclass="xref"href="sleigh_constructors.html#sleigh_ampandor"title="7.4.2.<2E>The '&' and '|' Operators">Section<EFBFBD>7.4.2, “The '&' and '|' Operators”</a>). However inside the square braces
of the disassembly action section, ‘&’ and ‘|’ are interpreted as
the usual logical operators.
</p>
</div>
</div>
<divclass="sect2">
<divclass="titlepage"><div><div><h3class="title">
<aname="sleigh_with_block"></a>7.6.<2E>The With Block</h3></div></div></div>
<p>
To avoid tedious repetition and to ease the maintenance of specifications
already having many, many constructors and tables, the <spanclass="emphasis"><em>with
block</em></span> is provided. It is a syntactic construct that allows a
designer to apply a table header, bit pattern constraints, and/or disassembly
actions to a group of constructors. The block starts at the
<spanclass="bold"><strong>with</strong></span> directive and ends with a closing brace.
:reg is reg & ind=0 [ mode=1; ] { <spanclass="weak">...</span> }
:[reg] is reg & ind=1 { <spanclass="weak">...</span> }
}
</pre></div>
<p>
In the example, both constructors are added to the table identified by
<spanclass="emphasis"><em>op1</em></span>. Both require the context field
<spanclass="emphasis"><em>mode</em></span> to be equal to 1. The listed constraints take the
form described in <aclass="xref"href="sleigh_constructors.html#sleigh_bit_pattern"title="7.4.<2E>The Bit Pattern Section">Section<EFBFBD>7.4, “The Bit Pattern Section”</a>, and they are joined to
those given in the constructor statement as if prepended using ‘&’. Similarly,
the actions take the form described in <aclass="xref"href="sleigh_constructors.html#sleigh_disassembly_actions"title="7.5.<2E>Disassembly Actions Section">Section<EFBFBD>7.5, “Disassembly Actions Section”</a>
and are prepended to the actions given in the constructor statement. Prepending
the actions allows the statement to override actions in the with block. Both
technically occur, but only the last one has a noticeable effect. The above
Expressions are built out of symbols and the binary and unary
operators listed in <aclass="xref"href="sleigh_ref.html#syntaxref.htmltable"title="Table<6C>5.<2E>Semantic Expression Operators and Syntax">Table<EFBFBD>5, “Semantic Expression Operators and Syntax”</a> in the
Appendix. All expressions evaluate to an integer, floating point, or
boolean value, depending on the final operation of the expression. The
value is then used depending on the kind of statement. Most of the
operators require that their input and output varnodes all be the same
size (see <aclass="xref"href="sleigh_constructors.html#sleigh_varnode_sizes"title="7.7.3.<2E>Varnode Sizes">Section<EFBFBD>7.7.3, “Varnode Sizes”</a>). The operators all
have a precedence, which is used by the SLEIGH compiler to determine
the ordering of the final p-code operations. Parentheses can be used
The dereference operator, which generates <spanclass="emphasis"><em>LOAD</em></span>
operations (and <spanclass="emphasis"><em>STORE</em></span> operations), has slightly
unfamiliar syntax. The ‘*’ operator, as is usual in many programming
languages, indicates that the affected variable is a pointer and that
the expression is <spanclass="emphasis"><em>dereferencing</em></span> the data being
pointed to. Unlike most languages, in SLEIGH, it is not immediately
clear what address space the variable is pointing into because there
may be multiple address spaces defined. In the absence of any other
information, SLEIGH assumes that the variable points into
the <spanclass="emphasis"><em>default</em></span> space, as labeled in the definition
of one of the address spaces with
the <spanclass="bold"><strong>default</strong></span> attribute. If that is not
the space desired, the default can be overridden by putting the
identifier for the space in square brackets immediately after the ‘*’.
</p>
<p>
It is also frequently not clear what the size of the dereferenced data
is because the pointer variable is typeless. The SLEIGH compiler can
frequently deduce what the size must be by looking at the operation in
the context of the entire statement (see
<aclass="xref"href="sleigh_constructors.html#sleigh_varnode_sizes"title="7.7.3.<2E>Varnode Sizes">Section<EFBFBD>7.7.3, “Varnode Sizes”</a>). But in some situations, this
may not be possible, so there is a way to specify the size
explicitly. The operator can be followed by a colon ‘:’ and an integer
indicating the number of bytes being dereferenced. This can be used
with or without the address space override. We give an example of each
with similar behavior and caveats (see <aclass="xref"href="sleigh_constructors.html#sleigh_bitrange_assign"title="7.7.2.8.<2E>Bit Range Assignments">Section<EFBFBD>7.7.2.8, “Bit Range Assignments”</a>).
We describe the types of semantic statements that are allowed in SLEIGH.
</p>
<divclass="sect4">
<divclass="titlepage"><div><div><h5class="title">
<aname="sleigh_assign_statements"></a>7.7.2.1.<2E>Assignment Statements and Temporary Variables</h5></div></div></div>
<p>
Of course SLEIGH allows assignment statements with the ‘=’ operator,
where the right-hand side is an arbitrary expression and the left-hand
side is the varnode being assigned. The assigned varnode can be any
specific symbol in the scope of the constructor, either a global
symbol or a local operand.
</p>
<p>
In SLEIGH, the keyword <spanclass="bold"><strong>local</strong></span>
is used to allocate temporary variables. If an assignment
statement is prepended with <spanclass="bold"><strong>local</strong></span>,
and the identifier on the left-hand side of an assignment does not match
any symbol in the scope of the constructor, a named temporary varnode is
created in the <spanclass="emphasis"><em>unique</em></span> address space to hold the
result of the expression. The new symbol becomes part of the local
scope of the constructor, and can be referred to in the following
semantic statements. The size of the new varnode is calculated by
examining the statement in context (see
<aclass="xref"href="sleigh_constructors.html#sleigh_varnode_sizes"title="7.7.3.<2E>Varnode Sizes">Section<EFBFBD>7.7.3, “Varnode Sizes”</a>). It is also possible to
explicitly indicate the size by using the colon ‘:’ operator followed
by an integer size in bytes. The following examples demonstrate the
temporary variable <spanclass="emphasis"><em>tmp</em></span> being defined using both
The table symbol associated with the constructor becomes
a <spanclass="emphasis"><em>reference</em></span> to the varnode being exported, not a
copy of the value. If the table symbol is written to, as the left-hand
side of an assignment statement, in some other constructor, the
exported varnode is affected. A constant can be exported if its size
as a varnode is given explicitly with the ‘:’ operator.
</p>
<p>
It is not legal to put a full expression in
an <spanclass="bold"><strong>export</strong></span> statement, any expression
must appear in an earlier statement. However, a single ‘&’
operator is allowed as part of the statement and it behaves as it
would in a normal expression (see
<aclass="xref"href="sleigh_constructors.html#sleigh_addressof"title="7.7.1.6.<2E>Address-of Operator">Section<EFBFBD>7.7.1.6, “Address-of Operator”</a>). It causes the address of the
varnode being modified to be exported as an integer constant.
an <spanclass="bold"><strong>export</strong></span> statement, is the ‘*’
operator. The semantic meaning of this operator is the same as if it
were used in an expression (see
<aclass="xref"href="sleigh_constructors.html#sleigh_star_operator"title="7.7.1.2.<2E>The '*' Operator">Section<EFBFBD>7.7.1.2, “The '*' Operator”</a>), but it is worth examining the
effects of this form of export in detail. Bearing in mind that
an <spanclass="bold"><strong>export</strong></span> statement exports
a <spanclass="emphasis"><em>reference</em></span>, using the ‘*’ operator in the
statement exports a <spanclass="emphasis"><em>dynamic reference</em></span>. The
varnode being modified by the ‘*’ is interpreted as a pointer to
another varnode. It is this varnode being pointed to which is
exported, even though the address may be dynamic and cannot be
determined at disassembly time. This is not the same as dereferencing
the pointer into a temporary variable that is then exported. The
dynamic reference can be both read
and <spanclass="emphasis"><em>written</em></span>. Internally, the SLEIGH compiler
keeps track of the pointer and inserts a <spanclass="emphasis"><em>LOAD</em></span>
or <spanclass="emphasis"><em>STORE</em></span> operation when the symbol associated
with the dynamic reference is referred to in other constructors.
In the first example, the effective address of an operand is
calculated from a register <spanclass="emphasis"><em>reg</em></span> and a field of the
instruction <spanclass="emphasis"><em>off</em></span>. The constructor does not export
the resulting pointer <spanclass="emphasis"><em>ea</em></span>, it exports the location
being pointed to by <spanclass="emphasis"><em>ea</em></span>. Notice the size of this
location (4) is given explicitly with the ‘:’ modifier. The ‘*’
operator can also be used on constant pointers. In the second example,
the constant operand <spanclass="emphasis"><em>reloc</em></span> is used as the offset
portion of an address into the <spanclass="emphasis"><em>ram</em></span> address
space. The constant <spanclass="emphasis"><em>reloc</em></span> is calculated at
disassembly time from the instruction
field <spanclass="emphasis"><em>abs</em></span>. This is a very common construction for
jump destinations (see <aclass="xref"href="sleigh_constructors.html#sleigh_relative_branches"title="7.5.1.<2E>Relative Branches">Section<EFBFBD>7.5.1, “Relative Branches”</a>) but
can be used in general. This particular combination of a disassembly
time action and a dynamic export is a very general way to construct a
family of varnodes.
</p>
<p>
Dynamic references are a key construction for effectively separating
addressing mode implementations from instruction semantics at higher
This section discusses statements that generate p-code branching
operations. These are listed in <aclass="xref"href="sleigh_ref.html#branchref.htmltable"title="Table<6C>7.<2E>Branching Statements">Table<EFBFBD>7, “Branching Statements”</a>, in the Appendix.
</p>
<p>
There are six forms covering the gamut of typical assembly language
branches, but in terms of actual semantics there are really only
<spanclass="emphasis"><em>CALL</em></span> is semantically equivalent to <spanclass="emphasis"><em>BRANCH</em></span>,
</li>
<liclass="listitem"style="list-style-type: disc">
<spanclass="emphasis"><em>CALLIND</em></span> is semantically equivalent to <spanclass="emphasis"><em>BRANCHIND</em></span>, and
</li>
<liclass="listitem"style="list-style-type: disc">
<spanclass="emphasis"><em>RETURN</em></span> is semantically equivalent to <spanclass="emphasis"><em>BRANCHIND</em></span>.
</li>
</ul></div></div>
<p>
The reason for this is that calls and returns imply the presence of
some sort of a stack. Typically an assembly language call instruction
does several separate actions, manipulating a stack pointer, storing a
return value, and so on. When translating the call instruction into
p-code, these actions must be implemented with explicit
operations. The final step of the instruction, the actual jump to the
destination of the call is now just a branch, stripped of its implied
meaning. The <spanclass="emphasis"><em>CALL</em></span>, <spanclass="emphasis"><em>CALLIND</em></span>,
and <spanclass="emphasis"><em>RETURN</em></span> operations, are kept as distinct from
their <spanclass="emphasis"><em>BRANCH</em></span> counterparts in order to provide
analysis software a hint as to the higher level meaning of the branch.
</p>
<p>
There are actually two fundamentally different ways of indicating a
destination for these branch operations. By far the most common way to
specify a destination is to give the <spanclass="emphasis"><em>address</em></span> of a
machine instruction. It bears repeating here that there is typically
more than one p-code operation per machine instruction. So specifying
a <spanclass="emphasis"><em>destination address</em></span> really means that the
destination is the first p-code operation for the (translated) machine
instruction at that address. For most cases, this is the only kind of
branching needed. The rarer case of <spanclass="emphasis"><em>p-code
relative</em></span> branching is discussed in the following section
(<aclass="xref"href="sleigh_constructors.html#sleigh_pcode_relative"title="7.7.2.6.<2E>P-code Relative Branching">Section<EFBFBD>7.7.2.6, “P-code Relative Branching”</a>), but for the remainder of
this section, we assume the destination is ultimately given as an
address.
</p>
<p>
There are two ways to specify a branching operation’s destination
address; directly and indirectly. Where a direct address is needed, as
for the <spanclass="emphasis"><em>BRANCH</em></span>, <spanclass="emphasis"><em>CBRANCH</em></span>,
and <spanclass="emphasis"><em>CALL</em></span> instructions, The specification can give
the integer offset of the jump destination within the address space of
the current instruction. Optionally, the offset can be followed by the
name of another address space in square brackets, if the destination
The bit range operator can appear on the left-hand side of an
assignment. But as with the ‘*’ operator, its meaning is slightly
different when used on this side. The bit range is specified in square
brackets, as before, by giving the integer specifying the least
significant bit of the range, followed by the number of bits in the
range. In contrast with its use on the right however (see
<aclass="xref"href="sleigh_constructors.html#sleigh_bitrange_operator"title="7.7.1.5.<2E>Bit Range Operator">Section<EFBFBD>7.7.1.5, “Bit Range Operator”</a>), the indicated bit range
is filled rather than extracted. Bits obtained from evaluating the
expression on the right are extracted and spliced into the result at
In terms of the rest of the assignment expression, the bit range
operator is again assumed to have a size equal to the minimum number
of bytes needed to hold the bit range. In particular, in order to
satisfy size restrictions (see
<aclass="xref"href="sleigh_constructors.html#sleigh_varnode_sizes"title="7.7.3.<2E>Varnode Sizes">Section<EFBFBD>7.7.3, “Varnode Sizes”</a>), the right-hand side must
match this size. Furthermore, it is assumed that any extra bits in the
right-hand side expression are already set to zero.
The SLEIGH compiler does not make assumptions about the size of a
constant variable based on the constant value itself. This is true of
values occurring explicitly in the specification and of values that
are calculated dynamically in a disassembly action. As described in
<aclass="xref"href="sleigh_constructors.html#sleigh_assign_statements"title="7.7.2.1.<2E>Assignment Statements and Temporary Variables">Section<EFBFBD>7.7.2.1, “Assignment Statements and Temporary Variables”</a>, temporary variables do not
need to have their size given explicitly.
</p>
<p>
The SLEIGH compiler can usually fill in the required size by examining
these situations in the context of the entire semantic section. Most
p-code operations have size restrictions on their inputs and outputs,
which when put together can uniquely determine the unspecified
sizes. Referring to <aclass="xref"href="sleigh_ref.html#syntaxref.htmltable"title="Table<6C>5.<2E>Semantic Expression Operators and Syntax">Table<EFBFBD>5, “Semantic Expression Operators and Syntax”</a> in the
Appendix, all arithmetic and logical operations, both integer and
floating point, must have inputs and outputs all of the same size. The
and <spanclass="emphasis"><em>INT_SRIGHT</em></span>, currently place no restrictions
on the <spanclass="emphasis"><em>shift amount</em></span> operand. All the comparison
operators, both integer and floating point, insist that their inputs
are all the same size, and the output must be a boolean variable, with
a size of 1 byte.
</p>
<p>
The operators without a size constraint are the load and store
operators, the extension and truncation operators, and the conversion
operators. As discussed in <aclass="xref"href="sleigh_constructors.html#sleigh_star_operator"title="7.7.1.2.<2E>The '*' Operator">Section<EFBFBD>7.7.1.2, “The '*' Operator”</a>, the
‘*’ operator cannot get size information for the dynamic (pointed-to)
object from the pointer itself. The other operators by definition
involve a change of size from input to output.
</p>
<p>
If the SLEIGH compiler cannot discover the sizes of constants and
temporaries, it will report an error stating that it could not resolve
variable sizes for that constructor. This can usually be fixed rapidly
by appending the size ‘:’ modifier to either the ‘*’ operator, the
temporary variable definition, or to an explicit integer. Here are
three examples of statements that generate a size resolution error,
each followed by a variation which corrects the error.
When the SLEIGH parser analyzes an instruction, it starts with the
root symbol <spanclass="emphasis"><em>instruction</em></span>, and decides which of the
constructors defined under it match. This particular constructor is
likely to be defined in terms of one or more other family symbols. The
parsing process recurses at this point. Each of the unresolved family
symbols is analyzed in the same way to find the matching specific
symbol. The matching is accomplished either with a table lookup, as
with a field with attached registers, or with the matching algorithm
described in <aclass="xref"href="sleigh_constructors.html#sleigh_matching"title="7.8.1.<2E>Matching">Section<EFBFBD>7.8.1, “Matching”</a>. By the end of the
parsing process, we have a tree of specific symbols representing the
parsed instruction. We present a small but complete SLEIGH
and <spanclass="emphasis"><em>xor</em></span>. The logical operations each take two
operands, <spanclass="emphasis"><em>reg1</em></span> and <spanclass="emphasis"><em>op2</em></span>. The
operand <spanclass="emphasis"><em>reg1</em></span> selects between the 8 registers of
the processor, <spanclass="emphasis"><em>r0</em></span>
through <spanclass="emphasis"><em>r7</em></span>. The operand <spanclass="emphasis"><em>op2</em></span>
is a table built out of more complicated addressing modes, determined
by the field <spanclass="emphasis"><em>mode</em></span>. The addressing mode can either
be direct, in which <spanclass="emphasis"><em>op2</em></span> is really just the
register selected by <spanclass="emphasis"><em>reg2</em></span>, it can be immediate,
in which case the same bits are interpreted as a constant
value <spanclass="emphasis"><em>imm</em></span>, or it can be an indirect mode, where
the register <spanclass="emphasis"><em>reg2</em></span> is interpreted as a pointer to
the actual operand. In any case, the two operands are combined by the
logical operation and the result is stored back
in <spanclass="emphasis"><em>reg1</em></span>.
</p>
<p>
The parsing proceeds from the root symbol down. Once a particular
matching constructor is found, any disassembly action associated with
that constructor is executed. After that, each operand of the
constructor is resolved in turn.
</p>
<divclass="figure">
<aname="sleigh_encoding_image"></a><divclass="figure-contents"><divclass="mediaobject"align="center"><tableborder="0"summary="manufactured viewport for HTML img"style="cellpadding: 0; cellspacing: 0;"width="100%"><tr><tdalign="center"><imgsrc="Diagram1.png"align="middle"width="540"height="225"alt="Two Encodings and the Resulting Specific Symbol Trees"></td></tr></table></div></div>
<pclass="title"><b>Figure<EFBFBD>1.<2E>Two Encodings and the Resulting Specific Symbol Trees</b></p>
</div>
<brclass="figure-break"><p>
In <aclass="xref"href="sleigh_constructors.html#sleigh_encoding_image"title="Figure<72>1.<2E>Two Encodings and the Resulting Specific Symbol Trees">Figure<EFBFBD>1, “Two Encodings and the Resulting Specific Symbol Trees”</a>, we can see the break down
of two typical instructions in the example instruction set. For each
instruction, we see the how the encodings split into the relevant
fields and the resulting tree of specific symbols. Each node in the
trees are labeled with the base family symbol, the portion of the bit
pattern that matches, and then the resulting specific symbol or
constructor. Notice that the use of the overlapping
fields, <spanclass="emphasis"><em>reg2</em></span> and <spanclass="emphasis"><em>imm</em></span>, is
determined by the matching constructor for
the <spanclass="emphasis"><em>op2</em></span> table. SLEIGH generates the disassembly
and p-code for these encodings by walking the trees.
If the nodes of each tree are replaced with the display information of
the corresponding specific symbol, we see how the disassembly
statement is built.
</p>
<divclass="figure">
<aname="sleigh_disassembly_image"></a><divclass="figure-contents"><divclass="mediaobject"align="center"><tableborder="0"summary="manufactured viewport for HTML img"style="cellpadding: 0; cellspacing: 0;"width="100%"><tr><tdalign="center"><imgsrc="Diagram2.png"align="middle"width="310"height="151"alt="Two Disassembly Trees"></td></tr></table></div></div>
<aclass="xref"href="sleigh_constructors.html#sleigh_disassembly_image"title="Figure<72>2.<2E>Two Disassembly Trees">Figure<EFBFBD>2, “Two Disassembly Trees”</a>, shows the resulting
disassembly trees corresponding to the specific symbol trees in
<aclass="xref"href="sleigh_constructors.html#sleigh_encoding_image"title="Figure<72>1.<2E>Two Encodings and the Resulting Specific Symbol Trees">Figure<EFBFBD>1, “Two Encodings and the Resulting Specific Symbol Trees”</a>. The display information comes
from constructor display sections, the names of attached registers, or
the integer interpretation of fields. The identifiers in a constructor
display section serves as placeholders for the subtrees below them. By
walking the tree, SLEIGH obtains the final illustrated assembly
statements corresponding to the original instruction encodings.
A similar procedure produces the resulting p-code translation of the
instruction. If each node in the specific symbol tree is replaced with
the corresponding p-code, we see how the final translation is built.
</p>
<divclass="figure">
<aname="sleigh_pcode_image"></a><divclass="figure-contents"><divclass="mediaobject"align="center"><tableborder="0"summary="manufactured viewport for HTML img"style="cellpadding: 0; cellspacing: 0;"width="100%"><tr><tdalign="center"><imgsrc="Diagram3.png"align="middle"width="405"height="149"alt="Two P-code Trees"></td></tr></table></div></div>
<aclass="xref"href="sleigh_constructors.html#sleigh_pcode_image"title="Figure<72>3.<2E>Two P-code Trees">Figure<EFBFBD>3, “Two P-code Trees”</a> lists the final p-code
translation for our example instructions and shows the trees from
which the translation is derived. Symbol names within the p-code for a
particular node, as with the disassembly tree, are placeholders for
the subtree below them. The final translation is put together by
concatenating the p-code from each node, traversing the nodes in a
depth-first order. Thus the p-code of a child tends to come before the
p-code of the parent node (but see
<aclass="xref"href="sleigh_constructors.html#sleigh_macros"title="7.9.<2E>P-code Macros">Section<EFBFBD>7.9, “P-code Macros”</a>). Placeholders are filled in with the
appropriate varnode, as determined by the export statement of the root