2016-05-30 123 views
2

我有超过1000个字符串和一个固定的[sub]字符串数组。我想知道我的哪些字符串包含任何子字符串。 (同样,子串是恒定的。)我也想确保词是匹配的,而不是字符串。搜索某些字词或词组的字符串

什么是最有效高效这样做的方式?我可以比在所有子字符串上执行1000次indexOf()更好吗?

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell"  
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window"] 
str1.indexOf(fixedSearchStrings) // returns nil. "During" is not the word "ring". 
str2.indexOf(fixedSearchStrings) // returns 2. "knock on the door" substring found, no need to check further in the sentence. 
+0

每个字符串转换为正则表达式,使前/后只能是空格和标点符号。 – Sulthan

回答

1

请考虑这一点。这个解决方案的好处是已经准备好了fixedSearchStrings,你只能建立索引一次,然后有效地重用它。

class Index 
{ 
    var indexes: [String: Index] 
    var terminated: Bool = false 

    init() { 
     indexes = [String: Index]() 
    } 

    func searchFor(keywords: [String]) -> String? { 

     var ws = keywords 
     if ws.count > 0 { 

      let word = ws.removeFirst() 
      if let i = indexes[word] { 

       if i.terminated { 
        return word 
       } else { 

        if let rval = i.searchFor(ws) { 
         return "\(word) \(rval)" 
        } 
       } 
      } 
     } 
     return nil 
    } 

    func add(words: [String]) { 

     var ws = words 
     if ws.count > 0 { 
      let word = ws.removeFirst() 
      var index: Index! 
      if let i = indexes[word] { 
       index = i 
      } else { 
       let i = Index() 
       indexes[word] = i 
       index = i 
      } 
      index.add(ws) 
      index.terminated = ws.count == 0 || index.terminated 
     } 
    } 
} 

class SearchEngine { 

    var index: Index! 

    func buildIndex(keywords: [String]) { 

     index = Index() 
     for keyword in keywords { 
      let words = keyword.characters.split(" ").map(String.init) 
      index.add(words) 
     } 
    } 

    func firstEntryIn(string: String) -> String? { 

     var strArr = string.characters.split(" ").map(String.init) 
     var rval: String? 
     while strArr.count > 0 { 

      if let r = index.searchFor(strArr) { 
       rval = r 
       break 
      } 
      strArr.removeFirst() 
     } 
     return rval 
    } 
} 

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell" 
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window"] 

let se = SearchEngine() 
se.buildIndex(fixedSearchStrings) 
se.firstEntryIn(str1) 
se.firstEntryIn(str2) 

的结果

nil 
"knock on the door" 
0
func foundSubString(str:String,array:[String]) -> Bool { 
     var count = 0 
     repeat { 
      print("count : \(count)") 
      if str.lowercaseString.rangeOfString(array[count].lowercaseString) != nil { 
       print("founded") 
       return true 
      } 
      count += 1 
     } while count < array.count 
     return false 
} 

使用

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell" 
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window"] 
let exist: Bool = foundSubString(str2,array: fixedSearchStrings) 

结果

enter image description here

如果你想了解你的更多细节,例如,如果你找到一个窝第三,你需要知道这是什么字,他的位置是:

func foundSubString2(str:String,array:[String]) -> (Bool,[(String,Int)]) { 
     var count: Int = 0 
     var matched = [(String,Int)]() 

     repeat { 
      if str.lowercaseString.rangeOfString(array[count].lowercaseString) != nil { 
       matched.append((array[count],count)) 
      } 
      count += 1 
     } while count < array.count 

     if matched.count>0 { 
      return (true,matched) 
     } 
     return (false,[("",0)]) 
} 

使用

let str1 = "During the winter holiday I'll go skiing." 
let str2 = "Do knock on the door or chime the bell" 
let fixedSearchStrings = ["ring the", "chime the bell", "knock on the door", "knock on the window", "knock on the door"] 
let (exist,matched) = foundSubString2(str2,array: fixedSearchStrings) 
if exist { print (matched) } 

结果

enter image description here

0

使用正则表达式。这将比indexOf或类似方法快大约1000倍。内部正则表达式将构建一个状态机,它将能够在一次传递中匹配所需的所有字符串。

+0

你能提供样本代码吗? – Daniel

+0

正则表达式应该看起来像'^ | (响铃)|(铃声铃声)| ... |(最后一串匹配)| $'。有关如何使用正则表达式,请参见http://stackoverflow.com/questions/28776945/swift-regex-matching – Sorin