从列表中计算字符串中元素的出现次数？

我试图计算我收集的一些演讲中出现口头收缩的次数。一个特殊的演讲是这样的：从列表中计算字符串中元素的出现次数？

speech = "I've changed the path of the economy, and I've increased jobs in our own 
home state. We're headed in the right direction - you've all been a great help."

所以，在这种情况下，我想计算四（4）个收缩。我有宫缩的列表，这里有一些最初的几个术语：

contractions = {"ain't": "am not; are not; is not; has not; have not", 
"aren't": "are not; am not", 
"can't": "cannot",...}

我的代码看起来是这样的，首先：

count = 0 
for word in speech: 
    if word in contractions: 
     count = count + 1 
print count

我不是这个Anywhere入门但是，因为代码遍历每一个字母，而不是整个单词。

来源

2015-10-06 blacksite

for word in speech.split（''）： – Monkpit

我没有得到你的字典中的值在做什么，你有一个字典顺便说一句btw没有列表 –

我在我的答案中添加了很多东西应该给你一些额外的。 – colidyre

使用str.split()拆就空白的字符串：

for word in speech.split():

这将各执任意空白;这意味着空格，制表符，换行符和一些更具异国情调的空白字符，以及任意数量的连续字符。

您可能需要使用str.lower()小写你的话（否则Ain't不会被发现，例如），并去掉标点符号：

from string import punctuation 

count = 0 
for word in speech.lower().split(): 
    word = word.strip(punctuation) 
    if word in contractions: 
     count += 1

我使用str.strip() method这里;它会从单词的开头和结尾中删除在string.punctuation string中找到的所有内容。

来源

2015-10-06 20:28:23

你正在遍历一个字符串。所以这些项目是字符。为了从字符串中获得单词，你可以使用一些天真的方法，例如str.split()，它可以为你创建（现在你可以迭代一个字符串列表（在str.split（）的参数上分割的单词，默认：在空格上分割）。甚至有re.split()，这是更强大。但我不认为你需要用拆分正则表达式中的文本。

，你所要做的，至少是str.lower()为小写的字符串或把所有可能出现次数（也是大写字母），我强烈推荐第一个替代方案，后者并不是真正可行的，去除标点符号也是一个责任，但这仍然是天真的，如果你需要更复杂的方法，你必须通过词分词器分割文本。NLTK是一个很好的起点，请参阅nltk tokenizer。但我强烈地认为这个问题不是你的主要问题，或者真的影响你解决你的问题。 :)

speech = """I've changed the path of the economy, and I've increased jobs in our own home state. We're headed in the right direction - you've all been a great help.""" 
# Maybe this dict makes more sense (list items as values). But for your question it doesn't matter. 
contractions = {"ain't": ["am not", "are not", "is not", "has not", "have not"], "aren't": ["are not", "am not"], "i've": ["i have", ]} # ... 

# with re you can define advanced regexes, but maybe 
# from string import punctuation (suggestion from Martijn Pieters answer 
# is still enough for you) 
import re 

def abbreviation_counter(input_text, abbreviation_dict): 
    count = 0 
    # what you want is a list of words. str.split() does this job for you. 
    # " " is default and you can also omit this. But if you really need better 
    # methods (see answer text abover), you have to take a word tokenizer tool 
    # or have to write your own. 
    for word in input_text.split(" "): 
     # and also clean word (remove ',', ';', ...) afterwards. The advantage of 
     # using re over `from string import punctuation` is that you have more 
     # control in what you want to remove. That means that you can add or 
     # remove easily any punctuation mark. It could be very handy. It could be 
     # also overpowered. If the latter is the case, just stick to Martijn Pieters 
     # solution. 
     if re.sub(',|;', '', word).lower() in abbreviation_dict: 
      count += 1 

    return count 

print abbrev_counter(speech, contractions) 
2 # yeah, it worked - I've included I've in your list :)

这是一个豆蔻有点沮丧给在作为的Martijn Pieters的做同样的时间回答），但我希望我仍然产生了一些价值你。这就是为什么我编辑了我的问题，以便为未来的工作提供一些提示。

来源

2015-10-06 20:24:21 colidyre

感谢您的输入，但我已经从这个问题转向了。但是，您的解决方案确实奏效！我只是不想回去重新格式化我的整个'contractions'字典:) – blacksite

是的，这只是一个建议。如果能够以任何方式提供帮助，我将很乐意为我的工作得到赞扬。 :) – colidyre

我已经得到你:) – blacksite

A for Python中的循环遍历迭代中的所有元素。在字符串的情况下，元素是字符。

您需要将字符串拆分为包含单词的字符串的列表（或元组）。您可以使用.split(delimiter)。

你的问题是相当普遍的，所以Python有一个快捷方式：speech.split()拆分任何数量的空格/制表符/换行符，所以你只能在列表中获得你的单词。

所以，你的代码应该是这样的：

count = 0 
for word in speech.split(): 
    if word in contractions: 
     count = count + 1 
print(count)

speech.split(" ")工作过，但只在拆分空格而不是制表符，换行符，如果有双空格，你会得到你的结果列表空元素。

来源

2015-10-06 20:38:18 cg909

从列表中计算字符串中元素的出现次数？

回答

相关问题