根据多行中的值过滤R中的行

我试图过滤掉R中不需要的多行数据，但我不知道如何去做。根据多行中的值过滤R中的行

我使用的数据看起来有点像这样：

Category  Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3  Veg Potatoes  0  0  0 
4  Veg Onions  0  0  0 
5  Veg Carrots  0  0  0 
6 Dairy Yoghurt  0  0  0 
7 Dairy  Milk  0  1  0 
8 Dairy Cheese  0  0  0

我只是想保持大类，其中至少一个项目拥有的商店至少一个正值。

在这种情况下，我想摆脱所有Veg行，因为没有任何商店销售任何蔬菜。我希望将所有的Fruit行，我想保持所有的Dairy行，即使是那些在所有店铺零值，因为Dairy行之一确实有大于0

我的值试图在使用group_by(Category)之后试图使用colSums，希望它每次都能将类别的内容相加，但它不起作用。我也尝试在rowSums的最后添加一列，并根据频率进行过滤，但我只能以这种方式过滤单个行，而不是基于整个类别的行。

虽然我可以过滤出零值的单行（例如第3行），但我的难处在于像第6行和第8行那样行，其中每个商店的所有值都为零，但我想保留这些行因为其他Dairy行的值大于零。

来源

2017-07-31 Rose

1）子集和/ AVErowSums(...) > 0具有用于每行一个元素。如果该行中存在非零，则该元素为TRUE。它假定负值是不可能的。（如果可能为负值，则改为使用rowSums(DF[-1:-2]^2) > 0）。它还假定商店是前两列中的那些列。特别是，它可以用于任何数量的商店。然后ave为那些值为“真”的any组和subset仅保留这些值的组生成TRUE。没有包被使用。

subset(DF, ave(rowSums(DF[-1:-2]) > 0, Category, FUN = any))

，并提供：

Category Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
6 Dairy Yoghurt  0  0  0 
7 Dairy Milk  0  1  0 
8 Dairy Cheese  0  0  0

1A）这方面的一个变化将是以下，如果你不介意硬编码的商店：

subset(DF, ave(Shop1 + Shop2 + Shop3 > 0, Category, FUN = any))

2）dplyr

library(dplyr) 
DF %>% group_by(Category) %>% filter(any(Shop1, Shop2, Shop3)) %>% ungroup

给予：

# A tibble: 5 x 5 
# Groups: Category [2] 
    Category Item Shop1 Shop2 Shop3 
    <fctr> <fctr> <int> <int> <int> 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3 Dairy Yoghurt  0  0  0 
4 Dairy Milk  0  1  0 
5 Dairy Cheese  0  0  0

3）过滤器/分割另一个碱溶液是：

do.call("rbind", Filter(function(x) any(x[-1:-2]), split(DF, DF$Category)))

，并提供：

 Category Item Shop1 Shop2 Shop3 
Dairy.6 Dairy Yoghurt  0  0  0 
Dairy.7 Dairy Milk  0  1  0 
Dairy.8 Dairy Cheese  0  0  0 
Fruit.1 Fruit Apples  4  6  0 
Fruit.2 Fruit Oranges  0  2  7

4）dplyr/tidyr使用gather到将数据转换为那里的长格式是每个值的一行，然后使用any过滤组。最后转换回广泛的形式。

library(dplyr) 
library(tidyr) 
DF %>% 
    gather(shop, value, -(Category:Item)) %>% 
    group_by(Category) %>% 
    filter(any(value)) %>% 
    ungroup %>% 
    spread(shop, value)

，并提供：

# A tibble: 5 x 5 
    Category Item Shop1 Shop2 Shop3 
* <fctr> <fctr> <int> <int> <int> 
1 Dairy Cheese  0  0  0 
2 Dairy Milk  0  1  0 
3 Dairy Yoghurt  0  0  0 
4 Fruit Apples  4  6  0 
5 Fruit Oranges  0  2  7

注：在重现的形式输入：

Lines <- " Category  Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
3  Veg Potatoes  0  0  0 
4  Veg Onions  0  0  0 
5  Veg Carrots  0  0  0 
6 Dairy Yoghurt  0  0  0 
7 Dairy  Milk  0  1  0 
8 Dairy Cheese  0  0  0" 

DF <- read.table(text = Lines)

来源

2017-07-31 12:34:44

这很棒：feed'ave '作为第一个参数的逻辑向量，那么最终的输出可以直接用于子集化。 – lmo

哇，谢谢你的多种解决方案和清晰的解释！ – Rose

以下是基于R的方法，其中rowSums,ave和[。

dat[ave(rowSums(dat[grep("Shop", names(dat))]), dat$Category, FUN=max) > 0,]

rowSums计算销售在商店的变量每行（使用grep到子集）。产生的载体被送至ave，其由dat$Category组成，并返回每个的最大销售量。最后，原始数据框架是基于销售是否积极的子集。

这返回

Category Item Shop1 Shop2 Shop3 
1 Fruit Apples  4  6  0 
2 Fruit Oranges  0  2  7 
6 Dairy Yoghurt  0  0  0 
7 Dairy Milk  0  1  0 
8 Dairy Cheese  0  0  0

数据

dat <- 
structure(list(Category = structure(c(2L, 2L, 3L, 3L, 3L, 1L, 
1L, 1L), .Label = c("Dairy", "Fruit", "Veg"), class = "factor"), 
    Item = structure(c(1L, 6L, 7L, 5L, 2L, 8L, 4L, 3L), .Label = c("Apples", 
    "Carrots", "Cheese", "Milk", "Onions", "Oranges", "Potatoes", 
    "Yoghurt"), class = "factor"), Shop1 = c(4L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L), Shop2 = c(6L, 2L, 0L, 0L, 0L, 0L, 1L, 0L 
    ), Shop3 = c(0L, 7L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Category", 
"Item", "Shop1", "Shop2", "Shop3"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

来源

2017-07-31 12:33:14 lmo

尼斯。我准备发布df [!! ave（rowSums（df [3：5]），df $ Category，FUN = function（i）sum（i）> 0），]' – Sotos

根据多行中的值过滤R中的行

回答

相关问题