The s390 architecture defines two special per-CPU data pages
called the "prefix area". In s390-linux terminology this is usually
called "lowcore". This memory area contains system configuration
data like old/new PSW's for system call/interrupt/machine check
handlers and lots of other data. It is normally mapped to logical
address 0. This area can only be accessed when in supervisor mode.
This means that kernel code can dereference NULL pointers, because
accesses to address 0 are allowed. Parts of lowcore can be write
protected, but read accesses and write accesses outside of the write
protected areas are not caught.
To remove this limitation for debugging and testing, remap lowcore to
another address and define a function get_lowcore() which simply
returns the address where lowcore is mapped at. This would normally
introduce a pointer dereference (=memory read). As lowcore is used
for several very often used variables, add code to patch this function
during runtime, so we avoid the memory reads.
For C code get_lowcore() has to be used, for assembly code it is
the GET_LC macro. When using this macro/function a reference is added
to alternative patching. All these locations will be patched to the
actual lowcore location when the kernel is booted or a module is loaded.
To make debugging/bisecting problems easier, this patch adds all the
infrastructure but the lowcore address is still hardwired to 0. This
way the code can be converted on a per function basis, and the
functionality is enabled in a patch after all the functions have
been converted.
Note that this requires at least z16 because the old lpsw instruction
only allowed a 12 bit displacement. z16 introduced lpswey which allows
20 bits (signed), so the lowcore can effectively be mapped from
address 0 - 0x7e000. To use 0x7e000 as address, a 6 byte lgfi
instruction would have to be used in the alternative. To save two
bytes, llilh can be used, but this only allows to set bits 16-31 of
the address. In order to use the llilh instruction, use 0x70000 as
alternative lowcore address. This is still large enough to catch
NULL pointer dereferences into large arrays.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Move Absolute Lowcore Area allocation to the decompressor.
As result, get_abs_lowcore() and put_abs_lowcore() access
brackets become really straight and do not require complex
execution context analysis and LAP and interrupts tackling.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Temporary unsetting of the prefix page in memcpy_absolute() routine
poses a risk of executing code path with unexpectedly disabled prefix
page. This rework avoids the prefix page uninstalling and disabling
of normal and machine check interrupts when accessing the absolute
zero memory.
Although memcpy_absolute() routine can access the whole memory, it is
only used to update the absolute zero lowcore. This rework therefore
introduces a new mechanism for the absolute zero lowcore access and
scraps memcpy_absolute() routine for good.
Instead, an area is reserved in the virtual memory that is used for
the absolute lowcore access only. That area holds an array of 8KB
virtual mappings - one per CPU. Whenever a CPU is brought online, the
corresponding item is mapped to the real address of the previously
installed prefix page.
The absolute zero lowcore access works like this: a CPU calls the
new primitive get_abs_lowcore() to obtain its 8KB mapping as a
pointer to the struct lowcore. Virtual address references to that
pointer get translated to the real addresses of the prefix page,
which in turn gets swapped with the absolute zero memory addresses
due to prefixing. Once the pointer is not needed it must be released
with put_abs_lowcore() primitive:
struct lowcore *abs_lc;
unsigned long flags;
abs_lc = get_abs_lowcore(&flags);
abs_lc->... = ...;
put_abs_lowcore(abs_lc, flags);
To ensure the described mechanism works large segment- and region-
table entries must be avoided for the 8KB mappings. Failure to do
so results in usage of Region-Frame Absolute Address (RFAA) or
Segment-Frame Absolute Address (SFAA) large page fields. In that
case absolute addresses would be used to address the prefix page
instead of the real ones and the prefixing would get bypassed.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>