以下用于查找不同字符串的算法是否有效？

我最近发现了一句这样的问题：以下用于查找不同字符串的算法是否有效？

"Given an array of strings, return the number of distinct strings in that array."

我想出了这个解决方案：

1. Get number_of_strings, which equals the number of strings in the input array 
2. Get number_of_non_redundant, which equals the length of the input array cast as a set 
3. Return 2 times number_of_non_redundant - number_of_strings

所以，我的问题是，确实为所有数据集，该算法的工作？

来源

2012-08-15 mjgpy3

2次non_redundant - num_strings来自哪里？不仅仅是这套作品的长度？ – 2012-08-15 17:40:52

是不是'number_of_non_redundant'已经是答案？ – Chris 2012-08-15 17:41:12

正如其他人指出，将需要更长的时间来解决这个问题的理想方式是散列法，简单地返回number_of_non_redundant似乎是解决此问题的答案。

这里是用于确定number_of_non_redundant一个可能的解决方案：

1）创建哈希集合（语言特定的）

2）通过整个阵列迭代，到阵列检查中的每个元素看看哈希集中的元素是否存在，如果不存在，则添加它。

3）返回哈希集的大小。

使用哈希集在这里提供恒定时间操作（添加，包含）。

此外，我想指出，你不能（至少我不知道这是一种语言）只是铸数组到一组。铸造是一个恒定时间的操作。这些是两种不同的数据结构，为了从数组中获取元素并将它们放置在一个集合中，它需要遍历数组并将元素输入到集合中。

来源

2012-08-15 18:07:29

考虑字符串数组["a", "a", "a", "d", "d", "d"]。

number_of_strings是6; number_of_non_redundant为2.您建议返回2 * 2 - 6 = -2。所以...不，你的算法不适用于所有数据集。

除非我很大地误解了这个问题，不过，只要返回number_of_non_redundant就会一直有效，因为它是你想返回的定义。 :)

来源

2012-08-15 17:40:35 Dougal

谢谢，这是一个非常可靠的答案。 – mjgpy3 2012-08-15 18:06:13

如何首先按照字典顺序对数组进行排序，然后用一个标志变量遍历它，以跟踪元素i-th和（i-1）-th之间的变化。

来源

2012-08-15 17:57:24 linello

该算法不是适用于所有数据集。它可能适用于特定的例子。

say n = number of non redundant strings 
p = number of strings in original array

根据您2n-p = n => n= p

你的算法工作，只有当(number of non redundant strings = length of original array)，这意味着只有当原数组是一组。

只给一个提示，如果你有足够的可用内存，或者您可以使用排序的地方做，但比起散列

来源

2012-08-15 18:00:33 Jeff

以下用于查找不同字符串的算法是否有效？

回答

相关问题