2016-05-30 106 views
1

如何在Tableau Desktop中实现以下设置的差异/组区分任务?设置差异:为Tableau Desktop中的两个组找到不同成员

我有已经调整或没有调整过的产品(adjusted = 1adjusted= 0)。请注意,每个产品可能会多次列出(因为实际数据集是每个产品的堆叠时间序列矩阵)。

我想知道有多少产品至少有一次调整,有多少次没有。

这是怎么做到这一点的R:

示例数据:

dat <- data.frame(
    product = c("4005808588763", "4005808250004", "4005808157822", 
    "4005800031052", "4005808855735", "4005808651818", "4005808322053", 
    "4005808236879", "4005800091629", "4005808361434", "42277248", 
    "4005808224074", "9005800249858", "42277835", "4005808627356", 
    "8005800010985", "4005808323197", "4005808186129", "4005800059254", 
    "4005808818587", "4005900175410", "72140018627", "4005800059292", 
    "72140008499", "4005808125968", "42269847", "4005808675173", 
    "72140016371", "4005808765157", "400590", "4005808816019", 
    "4005800062575", "4005808293872", "4005900143952", "8850029006536", 
    "4005800136986", "42231493", "4005808715688", "4005800053085", 
    "4005800059629", "4005808847419", "4005800031656", "4005900273994", 
    "4005900261038", "6009661219022", "42240181", "8850029016030", 
    "4005900146274", "42176152", "4005808158096"), 
    adjusted = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
     1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 
     0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 
     1L) 
) 
#   product adjusted 
# 1 4005808588763  1 
# 2 4005808250004  1 
# 3 4005808157822  0 
# 4 4005800031052  1 
# 5 4005808855735  0 
# 6 4005808651818  1 
# 7 4005808322053  1 
# 8 4005808236879  0 
# 9 4005800091629  1 
# 10 4005808361434  0 
# 11  42277248  1 
# 12 4005808224074  1 
# 13 9005800249858  0 
# 14  42277835  0 
# 15 4005808627356  0 
# 16 8005800010985  0 
# 17 4005808323197  0 
# 18 4005808186129  1 
# 19 4005800059254  0 
# 20 4005808818587  0 
# 21 4005900175410  1 
# 22 72140018627  1 
# 23 4005800059292  1 
# 24 72140008499  1 
# 25 4005808125968  1 
# 26  42269847  0 
# 27 4005808675173  1 
# 28 72140016371  1 
# 29 4005808765157  1 
# 30 400590
# 31 4005808816019  0 
# 32 4005800062575  0 
# 33 4005808293872  1 
# 34 4005900143952  0 
# 35 8850029006536  1 
# 36 4005800136986  1 
# 37  42231493  1 
# 38 4005808715688  1 
# 39 4005800053085  0 
# 40 4005800059629  0 
# 41 4005808847419  0 
# 42 4005800031656  1 
# 43 4005900273994  1 
# 44 4005900261038  1 
# 45 6009661219022  1 
# 46  42240181  1 
# 47 8850029016030  1 
# 48 4005900146274  1 
# 49  42176152  0 
# 50 4005808158096  1 

分成两个数据帧:

g1 <- filter(dat, adjusted == 0) 
g2 <- filter(dat, adjusted == 1) 

找到独特的产品ID:

(id_1 <- unique(g2$product)) 
# [1] "4005808588763" "4005808250004" "4005800031052" "4005808651818" "4005808322053" 
# [6] "4005800091629" "42277248"  "4005808224074" "4005808186129" "4005900175410" 
# [11] "72140018627" "4005800059292" "72140008499" "4005808125968" "4005808675173" 
# [16] "72140016371" "4005808765157" "4005808293872" "8850029006536" "4005800136986" 
# [21] "42231493"  "4005808715688" "4005800031656" "4005900273994" "4005900261038" 
# [26] "6009661219022" "42240181"  "8850029016030" "4005900146274" "4005808158096" 

(id_2 <- setdiff(unique(g1$product), id_1)) 
# [1] "4005808157822" "4005808855735" "4005808236879" "4005808361434" "9005800249858" 
# [6] "42277835"  "4005808627356" "8005800010985" "4005808323197" "4005800059254" 
# [11] "4005808818587" "42269847"  "400590" "4005808816019" "4005800062575" 
# [16] "4005900143952" "4005800053085" "4005800059629" "4005808847419" "42176152" 

因为我对Tableau非常陌生,所以我不知道如何去实现这样的查询。

+0

如果您将数据显示在一个简单的表格中,而不是让读者根据R代码推断您的字段所代表的内容,那么您的问题将更容易遵循。它看起来像你有2个字段:id和值。什么定义了一个time_bucket? id是否定义了一个产品? –

+0

@AlexBlakemore:谢谢你的回答和对不起:我完全忘记打印数据集,以便获得对结构的视觉感受。不要介意什么是时间段。我只是在数据集中存在不止一次的产品,无论是“调整= 0”还是“调整= 1”。 – Rappster

回答

1

Tableau中至少有两个功能对像这样的问题很有用:计算集和LOD计算。还有其他的可能性。

以下是使用基于Product_ID字段的(计算)集合来指示哪些产品至少有一次价格调整的方法。选择Product_ID字段,右键单击并创建一个集合。选择常规选项卡上的“全部使用”选项,然后切换到条件选项卡。然后选择“按字段”,字段“调整”,并将条件设置为SUM()> 0.用SQL语句,新的集合包含那些HAVING SUM(调整后)> 0的Product_ID。

然后,在行架上显示IN/OUT,然后在列架上放置COUNT DISTINCT(Product_ID),以显示集合中有多少产品,以及没有多少产品。

+0

听起来不错,我会试试。谢谢! – Rappster