我在使用plyr编写逻辑代码时遇到了一些麻烦。我的问题涉及到两个不同长度的大dataframes,有如下例子:通过ddply设置数据框的子集,然后在子集上应用adply的函数R
dfSample <-
structure(list(Type = structure(c(8L, 100L, 86L, 86L, 86L, 86L,
33L, 8L, 105L, 44L, 36L, 107L, 107L, 78L, 33L, 105L, 99L, 10L,
16L, 75L), .Label = c("Alumni Services", "Anti-Virus and Malware",
"Application Integration", "Application Monitoring", "Application Testing",
"Audio Visual Support", "Audio Visual Support - CLS", "Audio Visual Support - Non-CLS",
"Backup Services", "Banner", "Bus and Law", "Business Analysis",
"Careers", "Common Learning Spaces", "Communication and Marketing",
"Computer Aided Assessment", "Conference Accounts", "Content Management",
"Database Services", "Datacentre", "Desktop Monitoring", "Desktop Software",
"Document Management", "Email", "Email Programs", "Encryption",
"Eng and the Enviro", "Equipment Disposal", "Estates and Facilities",
"Examination Papers", "Faculty Engagement", "Filestore Support Services",
"Finance Services", "General Admin Services", "General InfoSec Advice",
"Generic Accounts", "Grid Accounts (HPC)", "Health Sciences",
"High Performance Computing (HPC)", "Hosted webspace (LAMP/IIS)",
"HR and Payroll Services", "HR General", "HR Recruitment", "HR Systems",
"Hub Rooms", "Humanities", "ICT Facilities", "ID Card Services",
"Identity Management (User accounts)", "Identity Services", "Information Policy Breaches",
"Information Risk Analysis", "iSolutions Admin Services", "iSolutions Administration",
"IT Training and Development", "Large File Transfer", "Lecture Capture",
"Lecture Capture - CLS", "Lecture Capture - Non-CLS", "Legacy Corporate Systems",
"Library Services", "Licence Management", "Managed Print Service",
"Management Servers", "Media Asset Management", "Media Support",
"Medicine", "Meet and Greet", "Misuse and Security Incidents",
"Misuse Of Systems", "Mobile Apps", "Mobile Devices", "Natural and Enviro Sci",
"Network Access Services", "Network Services", "OS Builds", "Other Learning Systems",
"Personal Filestore", "Personal web pages", "Phys and Applied",
"Printing (Managed)", "Printing (Not MPS)", "Project Management and Resourcing",
"Repair", "Reporting Services", "Request for Software", "Research Filestore",
"Research Governance", "Research Management", "Research Output",
"Resource Filestore", "Risk Analysis and Assessment", "Security",
"Self Service Help", "Server Monitoring", "Service Hosting",
"ServiceLine", "Soc and Human Sci", "Software Configuration Management",
"Software Licensing and Management", "Software Services", "SportRec",
"Staff Accounts", "Staff Desktop Deployment", "Staff Desktop Services",
"Staff Desktop Services (Not UoS Build)", "Student Accounts",
"Student Admin Services", "Student Personal Workstations", "SUSSED",
"Switchboard", "Switchboard Infrastructure", "System Access Request",
"Telephony", "University Admin Services", "Unmanaged Printing",
"Videoconferencing", "Videoconferencing - CLS", "Videoconferencing - Non-CLS",
"Virtual Learning Environment (VLE)", "Visitor Accounts", "Web Statistics",
"Windows Core Environment"), class = "factor"), Tkt.Category = structure(c(19L,
17L, 17L, 17L, 17L, 17L, 2L, 19L, 5L, 2L, 9L, 9L, 9L, 4L, 2L,
5L, 20L, 2L, 19L, 20L), .Label = c("Communication and Collaboration",
"Corporate Services", "Data Centre", "Data Storage Services",
"Desktop IT", "Faculty IT", "Help Services", "HR", "Identity Management (User accounts)",
"Information Security", "Logistics", "Programmes and Projects",
"Quality and Testing", "Research Services", "Security", "SLO Corporate Services",
"Software", "Standard", "Teaching Services", "Underpinning Services",
"Web Services"), class = "factor"), `CreateDateTime` = structure(c(1370087940,
1370156160, 1370162340, 1370178840, 1370190000, 1370240400, 1370242920,
1370243040, 1370243040, 1370243280, 1370243280, 1370243520, 1370243580,
1370243880, 1370243880, 1370244000, 1370244120, 1370244240, 1370244300,
1370244360), class = c("POSIXct", "POSIXt")), `ClosingDateTime` = structure(c(1374501300,
1372068300, 1379062020, 1390487100, 1379062080, 1375090560, 1373984760,
1370856420, 1370440140, 1370508240, 1370338080, 1370243820, 1370243700,
1370255520, 1370341440, 1370248680, 1370353560, 1370338800, 1370257140,
1374222600), class = c("POSIXct", "POSIXt"))), .Names = c("Type",
"Tkt.Category", "CreateDateTime", "ClosingDateTime"
), row.names = c(NA, 20L), class = "data.frame")
而且
DF2<-
structure(list(DateTime = structure(c(1370041200, 1370052000,
1370062800, 1370073600, 1370084400, 1370095200, 1370106000, 1370116800,
1370127600, 1370138400, 1370149200, 1370160000, 1370170800, 1370181600,
1370192400, 1370203200, 1370214000, 1370224800, 1370235600, 1370246400
), class = c("POSIXct", "POSIXt"))), .Names = "DateTime", row.names = c(NA,
20L), class = "data.frame")
我想获得的基于某些条件,包括dfSample的一个子集的长度从DF2数据如下每个Tkt.Category:
QCalc <- function(m) {
adply(DF2, 1, transform, q=as.character(
nrow(subset(m, CreateDateTime <= DateTime &
ClosingDateTime >= DateTime))))
}
ServiceQueue <- ddply(dfSample, .(Tkt.Category), QCalc)
这似乎并没有工作,所以我猜一定有与我制定的功能为的方式问题因为这块下方作品码一部分,当我用我的所有数据(而不是由Tkt.Category
分组):
Q <- adply(DF2, 1, transform, q=as.character(
nrow(subset(dfSample, CreateDateTime<= DateTime &
`ClosingDateTime>= DateTime))))
当使用ddply
,错误消息我得到的是该对象“m
”无法找到。有人能指出我解决这个问题的正确方向吗?
我在合并两个数据框时遇到问题,他们是=不同长度(一个有70,816行,另一个有2921行)。我尝试过使用all = TRUE,但它一直冻结我的电脑,有没有其他方法可以做到这一点? – NarT 2014-08-28 14:45:44
我想使用plyr,因为更进一步,我将不得不在后面按类型和Tkt.Category对计数进行分组。 – NarT 2014-08-28 14:47:57