为什么JVM不能在Windows x86上发出预取指令

正如标题所述，为什么OpenJDK JVM不能在Windows x86上发出预取指令？见OpenJDK的水银@http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/windows_x86/vm/prefetch_windows_x86.inline.hpp 为什么JVM不能在Windows x86上发出预取指令

inline void Prefetch::read (void *loc, intx interval) {} 
inline void Prefetch::write(void *loc, intx interval) {}

有任何意见，我已经没有发现其他资源之外的源代码。我问，因为它使Linux x86版本，请参阅http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/linux_x86/vm/prefetch_linux_x86.inline.hpp

inline void Prefetch::read (void *loc, intx interval) { 
#ifdef AMD64 
    __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval)); 
#endif // AMD64 
} 

inline void Prefetch::write(void *loc, intx interval) { 
#ifdef AMD64 

    // Do not use the 3dnow prefetchw instruction. It isn't supported on em64t. 
    // __asm__ ("prefetchw (%0,%1,1)" : : "r" (loc), "r" (interval)); 
    __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval)); 

#endif // AMD64 
}

来源

2017-06-04 naze

另外也预取用于Solaris x86_64的：VM/solaris_x86_64.il https://github.com/openjdk-mirror/jdk7u-hotspot/blob/50bdefc3afe944ca74c3093e7448d6b889cd20d1/src/os_cpu /solaris_x86/vm/solaris_x86_64.il#L122;但所有列出的预取不用于发出预取，它们是JVM热点机器代码本身使用的预取。在生成的（JITted）代码中发出预取是在所有操作系统的x86代码中：https://github.com/openjdk-mirror/jdk7u-hotspot/blob/50bdefc3afe944ca74c3093e7448d6b889cd20d1/src/cpu/x86/vm/c1_LIRAssembler_x86.cpp#L1335' LIR_Assembler :: prefetchr' /'LIR_Assembler :: prefetchw' – osgx

谢谢，那至少解释了一些事情。也许添加这个作为评论，我会接受它。我仍然在寻找JVM决定插入预取指令的部分。 – naze

你引用的所有具有汇编代码片段（inline assembler），这是在自己的代码中使用一些C/C++软件（如apangin, the JVM expert pointed的文件，主要是GC代码）。实际上有这样的区别：Linux,Solaris和BSD x86_64热点的变体在热点中有预取，而且窗口已禁用/未实现，这是部分奇怪的，部分无法解释的原因，也可能使JVM位（一些百分数;更多没有硬件预取的平台）在Windows上速度较慢，但仍然无助于销售更多Sun/Oracle的solaris/solaris付费支持合同。 Ross also guessed内联asm语法可能不支持MS C++编译器，但_mm_prefetch应该（谁将打开JDK bug来添加它to the file？）。（JIT将代码从它自己的函数复制到生成的代码或发出对支持函数的调用，预取是JIT代码是由JIT发出（生成）为字节作为热点中的字节发射）。我们怎样才能找到它是如何发射的？简单的在线方式是找到一些在线搜索jdk8u的副本（或更好地在cross-reference like metager），例如在github上：https://github.com/JetBrains/jdk8u_hotspot并搜索prefetch或prefetch emit或prefetchr或lir_prefetchr。有一些相关的结果：

在JVM的c1 compiler/LIR发出的实际字节jdk8u_hotspot/src/cpu/x86/vm/assembler_x86.cpp：

void Assembler::prefetch_prefix(Address src) { 
    prefix(src); 
    emit_int8(0x0F); 
} 

void Assembler::prefetchnta(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rax, src); // 0, src 
} 

void Assembler::prefetchr(Address src) { 
    assert(VM_Version::supports_3dnow_prefetch(), "must support"); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x0D); 
    emit_operand(rax, src); // 0, src 
} 

void Assembler::prefetcht0(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rcx, src); // 1, src 
} 

void Assembler::prefetcht1(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rdx, src); // 2, src 
} 

void Assembler::prefetcht2(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rbx, src); // 3, src 
} 

void Assembler::prefetchw(Address src) { 
    assert(VM_Version::supports_3dnow_prefetch(), "must support"); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x0D); 
    emit_operand(rcx, src); // 1, src 
}

使用在C1 LIR：src/share/vm/c1/c1_LIRAssembler.cpp

void LIR_Assembler::emit_op1(LIR_Op1* op) { 
    switch (op->code()) { 
... 
    case lir_prefetchr: 
     prefetchr(op->in_opr()); 
     break; 

    case lir_prefetchw: 
     prefetchw(op->in_opr()); 
     break;

现在我们知道the opcode lir_prefetchr and can search for it或和lir_prefetchw ，找到唯一的例子在src/share/vm/c1/c1_LIR.cpp

void LIR_List::prefetch(LIR_Address* addr, bool is_store) { 
    append(new LIR_Op1(
      is_store ? lir_prefetchw : lir_prefetchr, 
      LIR_OprFact::address(addr))); 
}

存在其中预取指令的定义（对于C2，如noted by apangin）其他地方，the src/cpu/x86/vm/x86_64.ad：

// Prefetch instructions. ... 
instruct prefetchr(memory mem) %{ 
    predicate(ReadPrefetchInstr==3); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHR $mem\t# Prefetch into level 1 cache" %} 
    ins_encode %{ 
    __ prefetchr($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchrNTA(memory mem) %{ 
    predicate(ReadPrefetchInstr==0); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHNTA $mem\t# Prefetch into non-temporal cache for read" %} 
    ins_encode %{ 
    __ prefetchnta($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchrT0(memory mem) %{ 
    predicate(ReadPrefetchInstr==1); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHT0 $mem\t# prefetch into L1 and L2 caches for read" %} 
    ins_encode %{ 
    __ prefetcht0($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchrT2(memory mem) %{ 
    predicate(ReadPrefetchInstr==2); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHT2 $mem\t# prefetch into L2 caches for read" %} 
    ins_encode %{ 
    __ prefetcht2($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchwNTA(memory mem) %{ 
    match(PrefetchWrite mem); 
    ins_cost(125); 

    format %{ "PREFETCHNTA $mem\t# Prefetch to non-temporal cache for write" %} 
    ins_encode %{ 
    __ prefetchnta($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

// Prefetch instructions for allocation. 

instruct prefetchAlloc(memory mem) %{ 
    predicate(AllocatePrefetchInstr==3); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHW $mem\t# Prefetch allocation into level 1 cache and mark modified" %} 
    ins_encode %{ 
    __ prefetchw($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchAllocNTA(memory mem) %{ 
    predicate(AllocatePrefetchInstr==0); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHNTA $mem\t# Prefetch allocation to non-temporal cache for write" %} 
    ins_encode %{ 
    __ prefetchnta($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchAllocT0(memory mem) %{ 
    predicate(AllocatePrefetchInstr==1); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHT0 $mem\t# Prefetch allocation to level 1 and 2 caches for write" %} 
    ins_encode %{ 
    __ prefetcht0($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchAllocT2(memory mem) %{ 
    predicate(AllocatePrefetchInstr==2); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHT2 $mem\t# Prefetch allocation to level 2 cache for write" %} 
    ins_encode %{ 
    __ prefetcht2($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%}

来源

2017-06-04 16:21:14 osgx

JVM实际决定是否预取的更有趣的部分之一是https://github.com/JetBrains/jdk8u_hotspot/blob/435f973f98771edfa2126d5e6b6dea9bbf272e86/src/share/vm/opto/macro.cpp – naze

我其实在一个科学论文中工作，包括像“JVM JIT预取”这样的句子。由于没有关于JVM内部的真实论文，我只需挖掘即便其常识即可找到证据。学术界只是没有这样:) – naze

naze，我找不到PrefetchAllocationNode如何实现到真正的操作码，它有一些奇怪的ABIO标记。可能需要在本地编译JVM/JDK才能生成所有要生成的文件，然后在完整的代码上进行搜索（可能使用一些C++交叉引用工具;但要注意非C++文件（如asm和ad）通过交叉引用，只能通过'grep'）。 – osgx

作为JDK-4453409指示，预取中的HotSpot JVM被实施在JDK 1.4来加速GC。那是在15年前，没有人会记得现在为什么它没有在Windows上实现。我的猜测是，Visual Studio（一直用于在Windows上构建HotSpot）基本上不了解这些时间的预取指令。看起来像一个改进的地方。

无论如何，您询问的代码是由JVM垃圾收集器在内部使用的。这不是JIT产生的。 C2 JIT代码生成器规则位于架构定义文件x86_64.ad中，并且有rules将PrefetchRead,PrefetchWrite和PrefetchAllocation节点转换为相应的x64指令。

令人不安的事实是PrefetchRead和PrefetchWrite节点不会在代码中的任何位置创建。它们仅支持Unsafe.prefetchX内在函数，但是，它们是JDK 9中的removed。

JIT生成预取指令的唯一情况是PrefetchAllocation节点。您可以使用-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly验证在对象分配后确实生成了PREFETCHNTA，在Linux和Windows上均确实生成了。

class Test { public static void main(String[] args) { byte[] b = new byte[0]; for (;;) { b = Arrays.copyOf(b, b.length + 1); } } }

java.exe -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Test

# {method} {0x00000000176124e0} 'main' '([Ljava/lang/String;)V' in 'Test' ... 0x000000000340e512: cmp $0x100000,%r11d 0x000000000340e519: ja 0x000000000340e60f 0x000000000340e51f: movslq 0x24(%rsp),%r10 0x000000000340e524: add $0x1,%r10 0x000000000340e528: add $0x17,%r10 0x000000000340e52c: mov %r10,%r8 0x000000000340e52f: and $0xfffffffffffffff8,%r8 0x000000000340e533: cmp $0x100000,%r11d 0x000000000340e53a: ja 0x000000000340e496 0x000000000340e540: mov 0x60(%r15),%rbp 0x000000000340e544: mov %rbp,%r9 0x000000000340e547: add %r8,%r9 0x000000000340e54a: cmp 0x70(%r15),%r9 0x000000000340e54e: jae 0x000000000340e496 0x000000000340e554: mov %r9,0x60(%r15) 0x000000000340e558: prefetchnta 0xc0(%r9) 0x000000000340e560: movq $0x1,0x0(%rbp) 0x000000000340e568: prefetchnta 0x100(%r9) 0x000000000340e570: movl $0x200000f5,0x8(%rbp) ; {metadata({type array byte})} 0x000000000340e577: mov %r11d,0xc(%rbp) 0x000000000340e57b: prefetchnta 0x140(%r9) 0x000000000340e583: prefetchnta 0x180(%r9) ;*newarray ; - java.util.Arrays::[email protected] (line 3236) ; - Test::[email protected] (line 9)

来源

2017-06-04 18:20:06 apangin

我真的很想知道为什么这是downvoted。 – EJP

+1，用于发现预取仅用于分配的上下文中。我会猜测，当迭代现有数组时，也会执行预取。看来我的假设是错误的。感谢您澄清 – naze

@naze，在迭代数组时会有预取;但它不是软件预取，而是硬件预取。您可以关闭它并测量以发现其对英特尔的影响：https://software.intel.com/zh-cn/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors（使用' wrmsr -p N 0x1a4' for each core）; “旧的处理器型号使用0x1A0位9和19” - https://stackoverflow.com/a/36339469。英特尔的hw预取是积极的，但限于4KB页面：如果它们捕获两个存储器访问A和B，其中N = B-A的ptrdiff，并且B + N在相同的4 KB中，则它们预取。 – osgx

为什么JVM不能在Windows x86上发出预取指令

回答

相关问题