Hanno sostituito i traccianti di TranslatorX64 con la nuova rappresentazione intermedia HipHop (hhir) e un nuovo livello di riferimento indiretto in cui risiede la logica per generare hhir, che in realtà è indicato con lo stesso nome, hhir.
Da un alto livello, sta usando 6 istruzioni per fare le 9 istruzioni richieste prima, come notato qui: "Inizia con gli stessi tipografi ma il corpo della traduzione è di 6 istruzioni, significativamente migliore del 9 di TranslatorX64 "
"We solved this problem by adding a new layer of indirection. This new
layer is an SSA form intermediate representation, positioned between
the bytecodes in TranslatorX64’s tracelets and the x86 machine code we
want to end up with. It’s strongly typed and designed to facilitate a
number of optimizations we wanted to port from TranslatorX64 as well
as new optimizations in the future. This new IR, named hhir (short for
HipHop Intermediate Representation), completely replaced TranslatorX64
as hhvm’s JIT in May of 2013. While hhir specifically refers to the
representation itself, we often use the name to refer to all the
pieces of code that interact with it. If you’ve looked at our source
code recently you might have noticed that a class named TranslatorX64
still exists and contains a nontrivial amount of code. That’s mostly
an artifact of how the system is designed and is something we plan to
eventually clean up. All of the code left in TranslatorX64 is
machinery required to emit code and link translations together; the
code that understood how to translate individual bytecodes is gone
from TranslatorX64.
When hhir replaced TranslatorX64, it was generating code that was
roughly 5% faster and looked significantly better upon manual
inspection. We followed up its production debut with another
mini-lockdown and got an additional 10% in performance gains on top of
that. To see some of these improvements in action, let’s look at a
function addPositive and part of its translation.
function addPositive($arr) {
$n = count($arr);
$sum = 0;
for ($i = 0; $i < $n; $i++) {
$elem = $arr[$i];
if ($elem > 0) {
$sum = $sum + $elem;
}
}
return $sum;
}
This function looks like a lot of PHP code: it loops over an array and
does something with each element. Let’s focus on lines 5 and 6 for
now, along with their bytecode:
$elem = $arr[$i];
if ($elem > 0) {
// line 5
85: CGetM <L:0 EL:3>
98: SetL 4
100: PopC
// line 6
101: Int 0
110: CGetL2 4
112: Gt
113: JmpZ 13 (126)
These two lines load an element from an array, store it in a local
variable, then compare the value of that local with 0 and
conditionally jump somewhere based on the result. If you’re interested
in more detail about what’s going on in the bytecode, you can skim
through bytecode.specification. The JIT, both now and back in the
TranslatorX64 days, breaks this code up into two tracelets: one with
just the CGetM, then another with the rest of the instructions (a full
explanation of why this happens isn’t relevant here, but it’s mostly
because we don’t know at compile time what the type of the array
element will be). The translation of the CGetM boils down to a call to
a C++ helper function and isn’t very interesting, so we’ll be looking
at the second tracelet. This commit was TranslatorX64’s official
retirement, so let’s use its parent to see how TranslatorX64
translated this code.
cmpl $0xa, 0xc(%rbx)
jnz 0x276004b2
cmpl $0xc, -0x44(%rbp)
jnle 0x276004b2
101: SetL 4
103: PopC
movq (%rbx), %rax
movq -0x50(%rbp), %r13
104: Int 0
xor %ecx, %ecx
113: CGetL2 4
mov %rax, %rdx
movl $0xa, -0x44(%rbp)
movq %rax, -0x50(%rbp)
add $0x10, %rbx
cmp %rcx, %rdx
115: Gt
116: JmpZ 13 (129)
jle 0x7608200
The first four lines are typechecks verifying that the value in $elem
and the value on the top of the stack are the types we expect. If
either of them fails, we’ll jump to code that triggers a retranslation
of the tracelet, using the new types to generate a differently
specialized chunk of machine code. The meat of the translation
follows, and the code has plenty of room for improvement. There’s a
dead load on line 8, an easily avoidable register to register move on
line 12, and an opportunity for constant propagation between lines 10
and 16. These are all consequences of the bytecode-at-a-time approach
used by TranslatorX64. No respectable compiler would ever emit code
like this, but the simple optimizations required to avoid it just
don’t fit into the TranslatorX64 model.
Now let’s see the same tracelet translated using hhir, at the same
hhvm revision:
cmpl $0xa, 0xc(%rbx)
jnz 0x276004bf
cmpl $0xc, -0x44(%rbp)
jnle 0x276004bf
101: SetL 4
movq (%rbx), %rcx
movl $0xa, -0x44(%rbp)
movq %rcx, -0x50(%rbp)
115: Gt
116: JmpZ 13 (129)
add $0x10, %rbx
cmp $0x0, %rcx
jle 0x76081c0
It begins with the same typechecks but the body of the translation is
6 instructions, significantly better than the 9 from TranslatorX64.
Notice that there are no dead loads or register to register moves, and
the immediate 0 from the Int 0 bytecode was propagated down to the cmp
on line 12. Here’s the hhir that was generated between the tracelet
and that translation:
(00) DefLabel
(02) t1:FramePtr = DefFP
(03) t2:StkPtr = DefSP<6> t1:FramePtr
(05) t3:StkPtr = GuardStk<Int,0> t2:StkPtr
(06) GuardLoc<Uncounted,4> t1:FramePtr
(11) t4:Int = LdStack<Int,0> t3:StkPtr
(13) StLoc<4> t1:FramePtr, t4:Int
(27) t10:StkPtr = SpillStack t3:StkPtr, 1
(35) SyncABIRegs t1:FramePtr, t10:StkPtr
(36) ReqBindJmpLte<129,121> t4:Int, 0
The bytecode instructions have been broken down into smaller, simpler
operations. Many operations hidden in the behavior of certain
bytecodes are explicitly represented in hhir, such as the LdStack on
line 6 which is part of the SetL. By using unnamed temporaries (t1,
t2, etc…) instead of physical registers to represent the flow of
values, we can easily track the definition and use(s) of each value.
This makes it trivial to see if the destination of a load is actually
used, or if one of the inputs to an instruction is really a constant
value from 3 bytecodes ago. For a much more thorough explanation of
what hhir is and how it works, take a look at ir.specification.
This example showed just a few of the improvements hhir made over
TranslatorX64. Getting hhir deployed to production and retiring
TranslatorX64 in May 2013 was a great milestone to hit, but it was
just the beginning. Since then, we’ve implemented many more
optimizations that would be nearly impossible in TranslatorX64, making
hhvm almost twice as efficient in the process. It’s also been crucial
in our efforts to get hhvm running on ARM processors by isolating and
reducing the amount of architecture-specific code we need to
reimplement. Watch for an upcoming post devoted to our ARM port for
more details!"