2014-08-27 65 views
0

我在使用plyr编写逻辑代码时遇到了一些麻烦。我的问题涉及到两个不同长度的大dataframes,有如下例子:通过ddply设置数据框的子集,然后在子集上应用adply的函数R

dfSample <- 
structure(list(Type = structure(c(8L, 100L, 86L, 86L, 86L, 86L, 
33L, 8L, 105L, 44L, 36L, 107L, 107L, 78L, 33L, 105L, 99L, 10L, 
16L, 75L), .Label = c("Alumni Services", "Anti-Virus and Malware", 
"Application Integration", "Application Monitoring", "Application Testing", 
"Audio Visual Support", "Audio Visual Support - CLS", "Audio Visual Support - Non-CLS", 
"Backup Services", "Banner", "Bus and Law", "Business Analysis", 
"Careers", "Common Learning Spaces", "Communication and Marketing", 
"Computer Aided Assessment", "Conference Accounts", "Content Management", 
"Database Services", "Datacentre", "Desktop Monitoring", "Desktop Software", 
"Document Management", "Email", "Email Programs", "Encryption", 
"Eng and the Enviro", "Equipment Disposal", "Estates and Facilities", 
"Examination Papers", "Faculty Engagement", "Filestore Support Services", 
"Finance Services", "General Admin Services", "General InfoSec Advice", 
"Generic Accounts", "Grid Accounts (HPC)", "Health Sciences", 
"High Performance Computing (HPC)", "Hosted webspace (LAMP/IIS)", 
"HR and Payroll Services", "HR General", "HR Recruitment", "HR Systems", 
"Hub Rooms", "Humanities", "ICT Facilities", "ID Card Services", 
"Identity Management (User accounts)", "Identity Services", "Information Policy Breaches", 
"Information Risk Analysis", "iSolutions Admin Services", "iSolutions Administration", 
"IT Training and Development", "Large File Transfer", "Lecture Capture", 
"Lecture Capture - CLS", "Lecture Capture - Non-CLS", "Legacy Corporate Systems", 
"Library Services", "Licence Management", "Managed Print Service", 
"Management Servers", "Media Asset Management", "Media Support", 
"Medicine", "Meet and Greet", "Misuse and Security Incidents", 
"Misuse Of Systems", "Mobile Apps", "Mobile Devices", "Natural and Enviro Sci", 
"Network Access Services", "Network Services", "OS Builds", "Other Learning Systems", 
"Personal Filestore", "Personal web pages", "Phys and Applied", 
"Printing (Managed)", "Printing (Not MPS)", "Project Management and Resourcing", 
"Repair", "Reporting Services", "Request for Software", "Research Filestore", 
"Research Governance", "Research Management", "Research Output", 
    "Resource Filestore", "Risk Analysis and Assessment", "Security", 
"Self Service Help", "Server Monitoring", "Service Hosting", 
"ServiceLine", "Soc and Human Sci", "Software Configuration Management", 
"Software Licensing and Management", "Software Services", "SportRec", 
"Staff Accounts", "Staff Desktop Deployment", "Staff Desktop Services", 
"Staff Desktop Services (Not UoS Build)", "Student Accounts", 
"Student Admin Services", "Student Personal Workstations", "SUSSED", 
"Switchboard", "Switchboard Infrastructure", "System Access Request", 
"Telephony", "University Admin Services", "Unmanaged Printing", 
"Videoconferencing", "Videoconferencing - CLS", "Videoconferencing - Non-CLS", 
"Virtual Learning Environment (VLE)", "Visitor Accounts", "Web Statistics", 
"Windows Core Environment"), class = "factor"), Tkt.Category = structure(c(19L, 
17L, 17L, 17L, 17L, 17L, 2L, 19L, 5L, 2L, 9L, 9L, 9L, 4L, 2L, 
5L, 20L, 2L, 19L, 20L), .Label = c("Communication and Collaboration", 
"Corporate Services", "Data Centre", "Data Storage Services", 
"Desktop IT", "Faculty IT", "Help Services", "HR", "Identity Management (User accounts)", 
"Information Security", "Logistics", "Programmes and Projects", 
"Quality and Testing", "Research Services", "Security", "SLO Corporate Services", 
"Software", "Standard", "Teaching Services", "Underpinning Services", 
"Web Services"), class = "factor"), `CreateDateTime` = structure(c(1370087940, 
1370156160, 1370162340, 1370178840, 1370190000, 1370240400, 1370242920, 
1370243040, 1370243040, 1370243280, 1370243280, 1370243520, 1370243580, 
1370243880, 1370243880, 1370244000, 1370244120, 1370244240, 1370244300, 
1370244360), class = c("POSIXct", "POSIXt")), `ClosingDateTime` = structure(c(1374501300, 
1372068300, 1379062020, 1390487100, 1379062080, 1375090560, 1373984760, 
1370856420, 1370440140, 1370508240, 1370338080, 1370243820, 1370243700, 
1370255520, 1370341440, 1370248680, 1370353560, 1370338800, 1370257140, 
1374222600), class = c("POSIXct", "POSIXt"))), .Names = c("Type", 
"Tkt.Category", "CreateDateTime", "ClosingDateTime" 
), row.names = c(NA, 20L), class = "data.frame") 

而且

DF2<- 
structure(list(DateTime = structure(c(1370041200, 1370052000, 
1370062800, 1370073600, 1370084400, 1370095200, 1370106000, 1370116800, 
1370127600, 1370138400, 1370149200, 1370160000, 1370170800, 1370181600, 
1370192400, 1370203200, 1370214000, 1370224800, 1370235600, 1370246400 
), class = c("POSIXct", "POSIXt"))), .Names = "DateTime", row.names = c(NA, 
20L), class = "data.frame") 

我想获得的基于某些条件,包括dfSample的一个子集的长度从DF2数据如下每个Tkt.Category:

QCalc <- function(m) { 
    adply(DF2, 1, transform, q=as.character(
           nrow(subset(m, CreateDateTime <= DateTime & 
               ClosingDateTime >= DateTime)))) 
} 

ServiceQueue <- ddply(dfSample, .(Tkt.Category), QCalc) 

这似乎并没有工作,所以我猜一定有与我制定的功能为的方式问题因为这块下方作品码一部分,当我用我的所有数据(而不是由Tkt.Category分组):

Q <- adply(DF2, 1, transform, q=as.character(
            nrow(subset(dfSample, CreateDateTime<= DateTime & 
                 `ClosingDateTime>= DateTime)))) 

当使用ddply,错误消息我得到的是该对象“m”无法找到。有人能指出我解决这个问题的正确方向吗?

回答

0

如果我们可以重申您的问题,我想我们可以看到一个更简单的方法来解决它。您想要统计每个类型的票证类别和列表中的每个时间戳,多少个票据在之前开始,结束于之后,并具有该类别。在SQL我们会写类似:

SELECT Tkt.Category, DateTime, count(*) 
FROM dfSample join DF2 on 
CreateDateTime<= DateTime 
and ClosingDateTime>= DateTime 
GROUP BY Tkt.Category, DateTime 

但这不是SQL,它的R - 和基础R不允许(虽然也许它应该是,你从一个关系数据库拉动这些数据?)我们用不平等来合并。所以不是我们可以用合并的小动作,并避免plyr一起:

dfSample$id <- rownames(dfSample) 
DFc <- merge(dfSample,DF2) 
DFlimited <- DFc[DFc$CreateDateTime <= DFc$DateTime & DFc$ClosingDateTime >= DFc$DateTime,] 
DFagg <- aggregate(id ~ Tkt.Category + DateTime, data = DFlimited, length) 

这可能是相当缓慢的,这取决于你的表的大小,因为它基本上是做一个完全外部联接,然后过滤。如果您发现这种情况,请查看Data.Table软件包 - 您可以查看此Stack Overflow问题以获取更多信息。

+0

我在合并两个数据框时遇到问题,他们是=不同长度(一个有70,816行,另一个有2921行)。我尝试过使用all = TRUE,但它一直冻结我的电脑,有没有其他方法可以做到这一点? – NarT 2014-08-28 14:45:44

+0

我想使用plyr,因为更进一步,我将不得不在后面按类型和Tkt.Category对计数进行分组。 – NarT 2014-08-28 14:47:57