0
我正在写一个简单的程序,它应该将一个.tsv文件解析为多个.csv文件。问题在于它耗时如此之久(我认为〜5万行9分钟是可怕的表现)。请有人看看我的代码,并告诉我我做错了什么?R迭代通过50k数据帧花了很长时间
我有一个表,其中包含name of participant
,name of media
,timestamp
,和一些坐标数据。在我的数据中可以有一个或多个参与者,每个参与者使用两个媒体文件。并且我想为每个media files
创建csv文件与具体的参与者一起工作。
比如我有2名人参加P1
和P2
和每个工作中的媒体文件M1
和M2
。所以我想创建P1_M1.csv
,P1_M2.csv
,P2_M1.csv
,P2_M2.csv
。
的数据是这样的:
P1 | M1 | data...
P1 | M1 | data...
...
P1 | M2 | data...
...
P2 | m1 | data...
...
...
这里是我的代码:
data = read.table("./data.tsv", header = T, sep = "\t", stringsAsFactors = F) # load data from tsv
# function for creating csv file
writeData = function(filename, d){
filename = paste("./", filename, ".csv", sep = "")
write.csv(d, file = filename, row.names = F)
}
# initialize auxiliary variables
participantName = ""
mediaName = ""
# initialize empty dataframe
subdata <- data.frame(TimeStamp = numeric(), GazeLeftX = integer(), GazeLeftY = integer(), GazeRightX = integer(), GazeRightY = integer())
# for each row in original data...
for(r in 1:nrow(data))
{
# check if last participant is same as participant on actual row
if(participantName != data[r, 'ParticipantName']){
# check if last participant is not empty (like no participant was processed yet)
if(participantName != ""){
# if it is not than participant and also his work on media file ended so write data to csv
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
# empty auxiliary dataframe and also mediaName
subdata = subdata[0,]
mediaName = ""
}
# we detected new participant so record it into last participant variable
participantName = data[r, 'ParticipantName']
}
# do same checks for media file because there can also change only mediafile and participant can be the same
if(mediaName != data[r, 'MediaName']){
if(mediaName != ""){
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
subdata = subdata[0,]
}
mediaName = data[r, 'MediaName']
}
# in every iteration append actual row into auxilliary dataframe
subdata = rbind(subdata,
TimeStamp = data.frame(data[r, 'EyeTrackerTimestamp'],
GazeLeftX = data[r, 'GazeLeftX'],
GazeLeftY = data[r, 'GazeLeftY'],
GazeRightX = data[r, 'GazeRightX'],
GazeRightY = data[r, 'GazeRightY']))
}
# if there are any data left in auxiliary dataframe, save it to csv
if(nrow(subdata) != 0){
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
}
请参阅'?split'。尝试实例'split(data,data [,c(“ParticipantName”,“MediaName”)])'。 – nicola
@nicola非常感谢你。太棒了。如果你愿意,你可以发表一个答案,我会将其标记为解决方案。现在我只有一个问题,我的代码只创建一个csv文件,但在我的代码中可能只是一些愚蠢的错误:) – Gondil