2009-11-17 35 views
10

我们在我工作的程序中大量使用套接字,并且我们有时同时处理最多约100台机器的连接。我们结合使用非阻塞I/O与状态表来管理它以及使用线程的传统Java套接字。非阻塞I/O与使用线程(上下文切换有多差?)

我们在非阻塞套接字中遇到了很多问题,我个人喜欢用线程来更好地处理套接字。所以我的问题是:

在单个线程上使用非阻塞套接字可以节省多少钱?使用线程涉及的上下文切换有多糟糕,以及可以扩展到在Java中使用线程模型的并发连接数量有多少?

回答

10

I/O和非阻塞I/O选择取决于您的服务器活动配置文件。例如。如果您使用长时间连接和数千个客户端,由于系统资源耗尽,I/O可能会变得过于昂贵。但是,不排除CPU缓存的直接I/O比非阻塞I/O更快。有一篇很好的文章 - Writing Java Multithreaded Servers - whats old is new

关于上下文切换成本 - 这是芯片操作。考虑下面的简单的测试:

package com; 

import java.util.ArrayList; 
import java.util.List; 
import java.util.Random; 
import java.util.Set; 
import java.util.concurrent.ConcurrentSkipListSet; 
import java.util.concurrent.CountDownLatch; 
import java.util.concurrent.TimeUnit; 
import java.util.concurrent.atomic.AtomicLong; 

public class AAA { 

    private static final long DURATION = TimeUnit.NANOSECONDS.convert(30, TimeUnit.SECONDS); 
    private static final int THREADS_NUMBER = 2; 
    private static final ThreadLocal<AtomicLong> COUNTER = new ThreadLocal<AtomicLong>() { 
     @Override 
     protected AtomicLong initialValue() { 
      return new AtomicLong(); 
     } 
    }; 
    private static final ThreadLocal<AtomicLong> DUMMY_DATA = new ThreadLocal<AtomicLong>() { 
     @Override 
     protected AtomicLong initialValue() { 
      return new AtomicLong(); 
     } 
    }; 
    private static final AtomicLong DUMMY_COUNTER = new AtomicLong(); 
    private static final AtomicLong END_TIME = new AtomicLong(System.nanoTime() + DURATION); 

    private static final List<ThreadLocal<CharSequence>> DUMMY_SOURCE = new ArrayList<ThreadLocal<CharSequence>>(); 
    static { 
     for (int i = 0; i < 40; ++i) { 
      DUMMY_SOURCE.add(new ThreadLocal<CharSequence>()); 
     } 
    } 

    private static final Set<Long> COUNTERS = new ConcurrentSkipListSet<Long>(); 

    public static void main(String[] args) throws Exception { 
     final CountDownLatch startLatch = new CountDownLatch(THREADS_NUMBER); 
     final CountDownLatch endLatch = new CountDownLatch(THREADS_NUMBER); 

     for (int i = 0; i < THREADS_NUMBER; i++) { 
      new Thread() { 
       @Override 
       public void run() { 
        initDummyData(); 
        startLatch.countDown(); 
        try { 
         startLatch.await(); 
        } catch (InterruptedException e) { 
         e.printStackTrace(); 
        } 
        while (System.nanoTime() < END_TIME.get()) { 
         doJob(); 
        } 
        COUNTERS.add(COUNTER.get().get()); 
        DUMMY_COUNTER.addAndGet(DUMMY_DATA.get().get()); 
        endLatch.countDown(); 
       } 
      }.start(); 
     } 
     startLatch.await(); 
     END_TIME.set(System.nanoTime() + DURATION); 

     endLatch.await(); 
     printStatistics(); 
    } 

    private static void initDummyData() { 
     for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) { 
      threadLocal.set(getRandomString()); 
     } 
    } 

    private static CharSequence getRandomString() { 
     StringBuilder result = new StringBuilder(); 
     Random random = new Random(); 
     for (int i = 0; i < 127; ++i) { 
      result.append((char)random.nextInt(0xFF)); 
     } 
     return result; 
    } 

    private static void doJob() { 
     Random random = new Random(); 
     for (ThreadLocal<CharSequence> threadLocal : DUMMY_SOURCE) { 
      for (int i = 0; i < threadLocal.get().length(); ++i) { 
       DUMMY_DATA.get().addAndGet(threadLocal.get().charAt(i) << random.nextInt(31)); 
      } 
     } 
     COUNTER.get().incrementAndGet(); 
    } 

    private static void printStatistics() { 
     long total = 0L; 
     for (Long counter : COUNTERS) { 
      total += counter; 
     } 
     System.out.printf("Total iterations number: %d, dummy data: %d, distribution:%n", total, DUMMY_COUNTER.get()); 
     for (Long counter : COUNTERS) { 
      System.out.printf("%f%%%n", counter * 100d/total); 
     } 
    } 
} 

我做了四个测试两个十线程场景,它显示的性能损失约为2.5%(78626次迭代两个线程,76754十线程),系统资源使用线程大致相等。

另外“的java.util.concurrent”作者假设上下文切换时间为约2000-4000个CPU周期:

public class Exchanger<V> { 
    ... 
    private static final int NCPU = Runtime.getRuntime().availableProcessors(); 
    .... 
    /** 
    * The number of times to spin (doing nothing except polling a 
    * memory location) before blocking or giving up while waiting to 
    * be fulfilled. Should be zero on uniprocessors. On 
    * multiprocessors, this value should be large enough so that two 
    * threads exchanging items as fast as possible block only when 
    * one of them is stalled (due to GC or preemption), but not much 
    * longer, to avoid wasting CPU resources. Seen differently, this 
    * value is a little over half the number of cycles of an average 
    * context switch time on most systems. The value here is 
    * approximately the average of those across a range of tested 
    * systems. 
    */ 
    private static final int SPINS = (NCPU == 1) ? 0 : 2000; 
+0

非常感谢,很高兴有一个测试用例。 – Benj 2009-11-17 14:21:52

+0

感谢您发布“编写Java多线程服务器 - 旧版本是新的”链接。我忘记了它的名字,找不到它。 – 2009-11-17 15:05:09

1

对于您的问题,最好的方法可能是建立一个测试程序,获取一些硬测量数据并根据数据做出最佳决策。我通常在做出这样的决定时会这样做,而且有很多数据可以帮助您支持您的观点。

在开始之前,你在说多少个线程?你使用什么类型的硬件来运行你的软件?

+0

好主意, 我工作的程序是点对点,其中一个对等可能与另外一个对等。同行可以是Linux/Windows/Mac(各种风格),它通常会在个人电脑上运行,通常在办公室环境中的个人电脑(即2+ cpus)。 – Benj 2009-11-17 14:17:55

1

对于100个连接是不可能有与封闭IO和使用中的问题每个连接两个线程(一个用于读取和写入)这是最简单的模型恕我直言。

但是,您可能会发现使用JMS是管理连接的更好方法。如果您使用类似ActiveMQ的东西,则可以整合所有连接。