2016-12-28 80 views
2

我想创建一个图形,就像名为Fathom的软件一样。如何为分类数据创建“聚类点图”?

http://fathom.concord.org/help/HelpFiles/_img331.png

我有希望创造这样一个波动的情节绝对频率数据的双向表,但关键的区别是,你可以看到各个数据点。 我试过ggfluctuation(...),levelplots(...)和各种包装(如ggplot2),但没有成功。我在任何论坛上都找不到任何帮助。

如果有人能够帮助我指导或创建一些能达到我目标的代码,我将非常感激。

+1

你好达山。我很乐意提供一些示例数据,但我不确定如何在此论坛上发布它。你能否建议最好的格式让你接受并运行这个请求? – Nevil

+2

欢迎来到StackOverflow。请看看这些关于如何产生[最小,完整和可验证的例子](http://stackoverflow.com/help/mcve)的技巧,以及这篇文章[在R中创建一个很好的例子]( http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)。 – lmo

+0

好吧,这里是一个样本数据集,我在“频率”向量寻找已经在y轴和“品位”在x轴“设置”情节,用数据对驾驶点数显示。 (“1”,“1”,“0”,“0”,“0”,“0”,“0”),sample_data < - data.frame(“set”= c(“09t0101 TJ”,“09t0102 MW”,“09t0201 EH”,“09t0202 NH” “1”, “1”, “2”, “2”, “2”, “2”, “3”, “3”, “3”, “3”, “4”, “4”,“4 “,”4“),”freq“= sample.int(length(0:10),16,replace = TRUE)) – Nevil

回答

3

这是改进版本。

sample_data = structure(list(set = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), class = "factor", .Label = c("09t0101 TJ", 
"09t0102 MW", "09t0201 EH", "09t0202 NH")), grade = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", 
"2", "3", "4"), class = "factor"), freq = c(7L, 8L, 2L, 3L, 11L, 
4L, 11L, 3L, 3L, 8L, 3L, 8L, 3L, 9L, 3L, 2L)), .Names = c("set", 
"grade", "freq"), row.names = c(NA, -16L), class = "data.frame") 

group = unique(sample_data$set) #Obtain the unique 'set' values for y-axis 
max_x = length(unique(sample_data$grade)) #Obtain the maximum number of 'grades' to plot on x-axis 
max_y = length(group) #Obtain the maximum number of 'set' to plot on y-axis 
pdf(file="plot.pdf",width=8,height=6) 
par(mar = c(5, 10, 4, 2)) #c(bottom, left, top, right) 
plot(max_x,max_y,xlim=c(0.5,max_x+0.5),ylim=c(0.5,max_y +0.5),pch=NA,xlab="Grades",ylab=NA,xaxt="n",yaxt="n",asp=1) #asp = 1 IMPORTANT 
axis(side = 2, at=c(1:length(group)), labels=c(as.vector(group)),las=2) 
axis(side = 1, at=c(1:length(unique(sample_data$grade))), labels=c(as.vector(unique(sample_data$grade)))) 

r = 0.15 #The diameter of circles to be plotted 

for (i in 1:length(group)){ 
a = subset(sample_data,sample_data$set==group[i]) #Subset new data.frame corresponding to first 'set' 

for (j in 1:nrow(a)){ 
matrix_sz = ceiling(sqrt(a$freq[j])) #Determine the size of square matrix that can accomodate all the frequency 
matrix_x = matrix(nrow = matrix_sz, ncol = matrix_sz) #Initiate matrix 
matrix_y = matrix(nrow = matrix_sz, ncol = matrix_sz) #Initiate matrix 
matrix_x[,1] = -1*((matrix_sz/2) - 0.5) #Find out relatve x co-ordinates for first column 
matrix_y[1,] = 1*((matrix_sz/2) - 0.5) #Find out relatve y co-ordinates for first row 

# Find out other relative co-ordinates if the size of square matrix is more than 1x1 
if (matrix_sz > 1){ 
for (column in 2:matrix_sz){ 
matrix_x[,column] = matrix_x[,column - 1] + 1 
} 
for (row in 2:matrix_sz){ 
matrix_y[row,] = matrix_y[row-1,] - 1 
} 
} 

#Determine the co-ordinate of the center of the square matrix grid 
xx = as.integer(a$grade[j]) 
yy = i 
fq = 1 #To keep track of the corresponding 'freq' 

# Plot circles around the center based on relative co-ordinates 
for (row in 1:matrix_sz){ 
for (column in 1:matrix_sz){ 
if (fq > a$freq[j]){break} #Break if the necessary number of points have been plotted 
xx1 = xx + r * matrix_x[row, column] 
yy1 = yy + r * matrix_y[row, column] 
# points (x = xx1, y = yy1, pch=1) 
fq = fq + 1 
symbols (x = xx1, y = yy1, circles=c(r/2.25),add =TRUE,inches=FALSE,bg = "gray") 
} 
} 
} 
} 
dev.off() 

enter image description here

+0

你好Darshan 这看起来非常有希望!感谢您在这个项目上投入的时间。我很想知道为什么一些点在主要区域有点“漂泊”,比如'09t02010 EH'的二级点。 我会逐行浏览你的代码,并试图弄清楚它是如何做的,但这需要我一些时间。任何您可以添加的评论都会被感激地解释! – Nevil

+0

啊!我想我知道为什么有些观点是漂泊的。变量'theta'仍然以pi/4步增加,当它需要以更小的步幅增加时,离群集的'中心'越远越远。这也会影响'斜边'的价值观。我可以看到你是如何绘制一个“螺旋”的点,这些点可以捕捉到各种各样的“整数网格”。聪明!我们只是概括你的方法的频率数量的任何尺寸,而不仅仅是那些不到10 ..... – Nevil

+0

在谷歌搜索周围的算法来生成一个螺旋四方形的一点发现我这个(但没有它是在'r')。这种代码解决方案是否可以适应,以避免将绘图基于圆形底层结构? http://stackoverflow.com/questions/398299/looping-in-a-spiral – Nevil