非阻塞I/O与使用线程（上下文切换有多差？）

我们在我工作的程序中大量使用套接字，并且我们有时同时处理最多约100台机器的连接。我们结合使用非阻塞I/O与状态表来管理它以及使用线程的传统Java套接字。非阻塞I/O与使用线程（上下文切换有多差？）

我们在非阻塞套接字中遇到了很多问题，我个人喜欢用线程来更好地处理套接字。所以我的问题是：

在单个线程上使用非阻塞套接字可以节省多少钱？使用线程涉及的上下文切换有多糟糕，以及可以扩展到在Java中使用线程模型的并发连接数量有多少？

2009-11-17 Benj

I/O和非阻塞I/O选择取决于您的服务器活动配置文件。例如。如果您使用长时间连接和数千个客户端，由于系统资源耗尽，I/O可能会变得过于昂贵。但是，不排除CPU缓存的直接I/O比非阻塞I/O更快。有一篇很好的文章 - Writing Java Multithreaded Servers - whats old is new。

关于上下文切换成本 - 这是芯片操作。考虑下面的简单的测试：

package com; 

import java.util.ArrayList; 
import java.util.List; 
import java.util.Random; 
import java.util.Set; 
import java.util.concurrent.ConcurrentSkipListSet; 
import java.util.concurrent.CountDownLatch; 
import java.util.concurrent.TimeUnit; 
import java.util.concurrent.atomic.AtomicLong; 

public class AAA { 

    private static final long DURATION = TimeUnit.NANOSECONDS.convert(30, TimeUnit.SECONDS); 
    private static final int THREADS_NUMBER = 2; 
    private static final ThreadLocal<AtomicLong> COUNTER = new ThreadLocal<AtomicLong>() { 
     @Override 
     protected AtomicLong initialValue() { 
      return new AtomicLong(); 
     } 
    }; 
    private static final ThreadLocal<AtomicLong> DUMMY_DATA = new ThreadLocal<AtomicLong>() { 
     @Override 
     protected AtomicLong initialValue() { 
      return new AtomicLong(); 
     } 
    }; 
    private static final AtomicLong DUMMY_COUNTER = new AtomicLong(); 
    private static final AtomicLong END_TIME = new AtomicLong(System.nanoTime() + DURATION); 

    private static final List<ThreadLocal<CharSequence>> DUMMY_SOURCE = new ArrayList<ThreadLocal<CharSequence>>(); 
    static { 
     for (int i = 0; i < 40; ++i) { 
      DUMMY_SOURCE.add(new ThreadLocal<CharSequence>()); 
     } 
    } 

    private static final Set<Long> COUNTERS = new ConcurrentSkipListSet<Long>(); 

    public static void main(String[] args) throws Exception { 
     final CountDownLatch startLatch = new CountDownLatch(THREADS_NUMBER); 
     final CountDownLatch endLatch = new CountDownLatch(THREADS_NUMBER); 

     for (int i = 0; i < THREADS_NUMBER; i++) { 
      new Thread() { 
       @Override 
       public void run() { 
        initDummyData(); 
        startLatch.countDown(); 
        try { 
         startLatch.await(); 
        } catch (InterruptedException e) { 
         e.printStackTrace(); 
        } 
        while (System.nanoTime() < END_TIME.get()) { 
         doJob(); 
        } 
        COUNTERS.add(COUNTER.get().get()); 
        DUMMY_COUNTER.addAndGet(DUMMY_DATA.get().get()); 
        endLatch.countDown(); 
       } 
      }.start(); 
     } 
     startLatch.await(); 
     END_TIME.set(System.nanoTime() + DURATION); 

     endLatch.await(); 
     printStatistics(); 
    } 

    private static void initDummyData() { 
     for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) { 
      threadLocal.set(getRandomString()); 
     } 
    } 

    private static CharSequence getRandomString() { 
     StringBuilder result = new StringBuilder(); 
     Random random = new Random(); 
     for (int i = 0; i < 127; ++i) { 
      result.append((char)random.nextInt(0xFF)); 
     } 
     return result; 
    } 

    private static void doJob() { 
     Random random = new Random(); 
     for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) { 
      for (int i = 0; i < threadLocal.get().length(); ++i) { 
       DUMMY_DATA.get().addAndGet(threadLocal.get().charAt(i) << random.nextInt(31)); 
      } 
     } 
     COUNTER.get().incrementAndGet(); 
    } 

    private static void printStatistics() { 
     long total = 0L; 
     for (Long counter : COUNTERS) { 
      total += counter; 
     } 
     System.out.printf("Total iterations number: %d, dummy data: %d, distribution:%n", total, DUMMY_COUNTER.get()); 
     for (Long counter : COUNTERS) { 
      System.out.printf("%f%%%n", counter * 100d/total); 
     } 
    } 
}

我做了四个测试两个十线程场景，它显示的性能损失约为2.5％（78626次迭代两个线程，76754十线程），系统资源使用线程大致相等。

另外“的java.util.concurrent”作者假设上下文切换时间为约2000-4000个CPU周期：

public class Exchanger<V> { 
    ... 
    private static final int NCPU = Runtime.getRuntime().availableProcessors(); 
    .... 
    /** 
    * The number of times to spin (doing nothing except polling a 
    * memory location) before blocking or giving up while waiting to 
    * be fulfilled. Should be zero on uniprocessors. On 
    * multiprocessors, this value should be large enough so that two 
    * threads exchanging items as fast as possible block only when 
    * one of them is stalled (due to GC or preemption), but not much 
    * longer, to avoid wasting CPU resources. Seen differently, this 
    * value is a little over half the number of cycles of an average 
    * context switch time on most systems. The value here is 
    * approximately the average of those across a range of tested 
    * systems. 
    */ 
    private static final int SPINS = (NCPU == 1) ? 0 : 2000;

来源

2009-11-17 14:19:16

非常感谢，很高兴有一个测试用例。 – Benj 2009-11-17 14:21:52

感谢您发布“编写Java多线程服务器 - 旧版本是新的”链接。我忘记了它的名字，找不到它。 – 2009-11-17 15:05:09

对于您的问题，最好的方法可能是建立一个测试程序，获取一些硬测量数据并根据数据做出最佳决策。我通常在做出这样的决定时会这样做，而且有很多数据可以帮助您支持您的观点。

在开始之前，你在说多少个线程？你使用什么类型的硬件来运行你的软件？

来源

2009-11-17 14:04:01

好主意，我工作的程序是点对点，其中一个对等可能与另外一个对等。同行可以是Linux/Windows/Mac（各种风格），它通常会在个人电脑上运行，通常在办公室环境中的个人电脑（即2+ cpus）。 – Benj 2009-11-17 14:17:55

对于100个连接是不可能有与封闭IO和使用中的问题每个连接两个线程（一个用于读取和写入）这是最简单的模型恕我直言。

但是，您可能会发现使用JMS是管理连接的更好方法。如果您使用类似ActiveMQ的东西，则可以整合所有连接。

来源

2009-11-17 20:17:06

非阻塞I/O与使用线程（上下文切换有多差？）

回答

相关问题