2013-04-26 58 views
2

我试图将一段文字读入一个字符串向量,然后创建字典,记录每个字的出现次数。到目前为止,它只加载文本的第一个单词,我不知道如何继续。我知道我不清楚如何正确使用这些成员函数。将一段文字读入一个字符串向量

int main() 
    { 
     ifstream input1; 
     input1.open("Base_text.txt"); 

    vector<string> base_file; 
    vector<int> base_count; 


    if (input1.fail()) 
    { 
     cout<<"Input file 1 opening failed."<<endl; 
     exit(1); 
    } 

    make_dictionary(input1, base_file, base_count); 


} 

void make_dictionary(istream& file, vector<string>& words, vector<int>& count) 
{ 


    string line; 


    while (file>>line) 
    { 
     words.push_back(line); 
    } 

    cout<<words[0]; 



} 

预期输出:

This is some simple base text to use for comparison with other files. 
You may use your own if you so choose; your program shouldn't actually care. 
For getting interesting results, longer passages of text may be useful. 
In theory, a full novel might work, although it will likely be somewhat slow. 

实际输出:

This 

回答

1

好了,你只打印第一个字:(这个想法IST告诉你为什么宥不得不爱STL)

cout<<words[0]; 

你可以

for(string& word : words)    cout<<word; 

for(size_t i=0; i<words.size(); ++i) cout<<words[i]; 

要打印所有的然后。 一个非常简单的解决方案来算的话是到位矢量的使用map

map<string,size_t> words; 
... 
string word; 
while (file>>word)   ++words[word]; 
... 
for(const auto& w : words) cout<<endl<<w.first<<":"<<w.second; 

WhozCraig提出了挑战。通过频率指令字:

multimap<int,string,greater<int>> byFreq; 
for(const auto& w : words) byFreq.insert(make_pair(w.second, w.first)); 
for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first; 

All will (ideone):

#include <iostream> 
#include <map> 
#include <functional> 
#include <utility> 
#include <cctype> 
using namespace std; 

int main() 
{ 
    map<string,size_t> words; 
    string word; 

    while (cin>>word) 
    { 
     for(char&c:word)c=tolower(c); 
     ++words[word]; 
    } 
    cout<<" ----- By word: ------" ; 
    for(const auto& w : words) cout<<endl<<w.first<<":"<<w.second; 
    cout<<endl<<endl<<" ----- By frequency: ------"; 
    multimap<size_t,string,greater<int>> byFreq; 
    for(const auto& w : words) byFreq.insert(make_pair(w.second, w.first)); 
    for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first; 
    return 0; 
} 
+0

任何想法,我将如何进行跟踪出现的每个字的数量? – iamthewalrus 2013-04-26 19:46:14

+1

@AndyMiller,地图,也许? – chris 2013-04-26 19:46:51

+0

@WhozCraig提出了一个挑战。要按频率排序: – qPCR4vir 2013-04-27 21:01:22

1

我想你必须移动cout << words[0]内环路,否则当循环结束它只被调用一次。不过,每次迭代只会打印第一个单词。因此,打印硬道理每次:

while (file>>line) 
{ 
    words.push_back(line); 
    cout<<words.back(); // or cout << line, same thing really 
} 

最后一件事 - while(file >> line)将字读字,作为变量的名字所暗示的不是逐行。如果你想要的话,请使用while (getline(file, line))

+0

关于如何继续跟踪每个单词出现次数的任何想法? – iamthewalrus 2013-04-26 19:50:36

1

将文本文件中的单词内容读入字符串向量是相当直接的。下面的代码假设被解析的文件名是第一个命令行参数。

#include <iostream> 
#include <fstream> 
#include <iterator> 
#include <vector> 
#include <string> 
#include <map> 
using namespace std; 

int main(int argc, char *argv[]) 
{ 
    if (argc < 2) 
     return EXIT_FAILURE; 

    // open file and read all words into the vector. 
    ifstream inf(argv[1]); 
    istream_iterator<string> inf_it(inf), inf_eof; 
    vector<string> words(inf_it, inf_eof); 

    // for populating a word-count dictionary: 
    map<string, unsigned int> dict; 
    for (auto &it : words) 
     ++dict[it]; 

    // print the dictionary 
    for (auto &it : dict) 
     cout << it.first << ':' << it.second << endl; 

    return EXIT_SUCCESS; 
} 

然而,你应该(可能)合并两种操作为一个循环,并完全避免中间载体:

#include <iostream> 
#include <fstream> 
#include <string> 
#include <map> 
using namespace std; 

int main(int argc, char *argv[]) 
{ 
    if (argc < 2) 
     return EXIT_FAILURE; 

    // open file and read all words into the vector. 
    ifstream inf(argv[1]); 
    map<string, unsigned int> dict; 
    string str; 
    while (inf >> str) 
     ++dict[str]; 

    // print the dictionary 
    for (auto &it : dict) 
     cout << it.first << ':' << it.second << endl; 

    return EXIT_SUCCESS; 
} 

在最高排序它最低的发生是不是很琐碎,但可行与分类床矢量和std::sort()。此外,条带化前导和尾随非字母字符(标点符号)也是一种增强。另一种方法是在插入地图之前将这些词缩小为全部小写。这允许球和球占用计数为2的单个字典插槽。

0

我有以下实现,它试图将单词转换为小写和删除标点符号。

#include<iostream> 
#include<iterator> 
#include<algorithm> 
#include<fstream> 
#include<string> 
#include<unordered_map> 

int main() { 
    std::vector<std::string> words; 
    { 
    std::ifstream fp("file.txt", std::ios::in); 
    std::copy(std::istream_iterator<std::string>(fp), 
       std::istream_iterator<std::string>(), 
       std::back_insert_iterator<std::vector<std::string>>(words)); 
    } 

    std::unordered_map<std::string, int> frequency; 
    for(auto it=words.begin(); it!=words.end(); ++it) { 
    std::string word; 
    std::copy_if(it->begin(), it->end(), 
       std::back_insert_iterator<std::string>(word), ::isalpha); 
    std::transform(word.begin(), word.end(), word.begin(), ::tolower); 
    frequency[word]++; 
    } 

    for(auto p:frequency) { 
    std::cout<<p.first<<" => "<<p.second<<std::endl; 
    } 
    return 0; 
} 

如果file.txt有以下内容:

hello hello hello bye BYE dog DOG' dog. 

word Word worD w'ord 

该方案将产生:

word => 4 
dog => 3 
bye => 2 
hello => 3 
相关问题