使用libcurl时出现的问题：它似乎没有得到整个页面

我开始使用libcurl时遇到困难。下面的代码似乎没有从指定的URL中检索整个页面。我哪里错了？使用libcurl时出现的问题：它似乎没有得到整个页面

#include <stdio.h> 
#include <stdlib.h> 
#include <unistd.h> 
#include <string.h> 
#include <curl/curl.h> 
#include <curl/types.h> 
#include <curl/easy.h> 

using namespace std; 

char buffer[1024]; 

size_t tobuffer(char *ptr, size_t size, size_t nmemb, void *stream) 
{ 
    strncpy(buffer,ptr,size*nmemb); 
    return size*nmemb; 
} 

int main() { 
    CURL *curl; 
    CURLcode res; 


    curl = curl_easy_init(); 
    if(curl) { 
     curl_easy_setopt(curl, CURLOPT_URL, "http://google.co.in"); 
     curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION,1); 
     curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &tobuffer); 

     res = curl_easy_perform(curl); 

     printf("%s",buffer); 

     curl_easy_cleanup(curl); 
    } 
    return 0; 
}

来源

2010-08-26 raj

char buffer[1024];

你怎么能得到整个网页时你的缓冲区大小被限制在1024？

来源

2010-08-26 08:13:32

我不知道该库，但在我看来，你正在重新使用缓冲区...如果你下载的页面不适合，那么你会反复写它，并可能只看到最后一个片段。例如，如果我们字母表复制到一个10字符缓存器中，我们得到：

 
ABCDEFGHIJ - first copy stores this 
KLMNOPQRST - second copy stores this 
UVWXYZ  - third copy stores this

根据报告的数据大小是否包括终止0/NUL字符时，缓冲器可被看作UVWXYZ（其的printf（％s）将解释为“UVWXYZ”），或者作为“UVWXYZQRST”（printf（％s）将继续尝试打印超过缓冲区的末尾，直到碰巧找到0/NUL）。

res = curl_easy_perform（curl）强烈建议它给你一个结果/错误代码，你有没有打算检查它的价值和文档说的意思？

你真的应该学会自己诊断这些事情......如果你不是复制到缓冲区，而是将一个std :: cout语句放入你的回调函数中以显示数据，它被称为多少次。将事情分解，直到找到问题。

来源

2010-08-26 08:14:37

如所看到的at the libcurl documentation for curl_easy_setopt()，根据需要多次调用回调函数以传递所提取页面的所有字节。

你的函数在每次调用时都会覆盖同一个缓冲区，结果在curl_easy_perform()完成读取文件之后，只有在最后调用tobuffer()左边有任何适合。

简而言之，您的函数tobuffer()必须执行一些操作，而不是覆盖每次调用时相同的缓冲区。

更新

例如，你可以不喜欢以下完全未经测试的代码：

struct buf { 
    char *buffer; 
    size_t bufferlen; 
    size_t writepos; 
} buffer = {0}; 

size_t tobuffer(char *ptr, size_t size, size_t nmemb, void *stream) 
{ 
    size_t nbytes = size*nmemb; 
    if (!buffer.buffer) { 
     buffer.buffer = malloc(1024); 
     buffer.bufferlen = 1024; 
     buffer.writepos = 0; 
    } 
    if (buffer.writepos + nbytes < buffer.bufferlen) { 
     buffer.bufferlen = 2 * buffer.bufferlen; 
     buffer.buffer = realloc(buffer, buffer.bufferlen); 
    } 
    assert(buffer.buffer != NULL); 
    memcpy(buffer.buffer+buffer.writepos,ptr,nbytes); 
    return nbytes; 
}

在你的程序以后的某个时候，你将需要释放所分配的内存是这样的：

void freebuffer(struct buf *b) { 
    free(b->buffer); 
    b->buffer = NULL; 
    b->bufferlen = 0; 
    b->writepos = 0; 
}

此外，请注意，我已使用memcpy()而不是strncpy()将数据移动到缓冲区。这很重要，因为libcurl不会声称传递给回调函数的数据实际上是NUL终止的ASCII字符串。特别是，如果您检索.gif图像文件，它肯定可以（并且将）在文件中包含零个字节，您将在缓冲区中保留该字节。 strncpy()将在源数据中看到的第一个NUL后停止复制。

作为读者的练习，我已经将所有的错误处理都留在了这段代码之外。你必须放一些。此外，我还留下了一个多汁的内存泄漏，因此致电realloc()的呼叫失败。

另一个改进是使用允许来自libcurl调用方的回调参数stream的值的选项。这可以用来分配管理你的缓冲区而不使用全局变量。我强烈建议你这样做。

来源

2010-08-26 08:16:59 RBerteig

可以告诉我，我该如何将整个内容存储到全局char数组中 – raj 2010-08-26 08:20:46

我已经添加了一些示例代码，它没有经过测试，但它应该是一个起点 – RBerteig 2010-08-26 08:43:52

您正在使用libcurl执行简单获取操作。您可以使用此示例程序作为参考。为什么不在回调中打印缓冲区或写入文件，如本例所示？

#include <stdio.h> 
#include <stdlib.h> 
#include <unistd.h> 

#include <curl/curl.h> 
#include <curl/types.h> 
#include <curl/easy.h> 

static size_t write_data(void *ptr, size_t size, size_t nmemb, void *stream) 
{ 
    int written = fwrite(ptr, size, nmemb, (FILE *)stream); 
    return written; 
} 

int main(int argc, char **argv) 
{ 
    CURL *curl_handle; 
    static const char *headerfilename = "head.out"; 
    FILE *headerfile; 
    static const char *bodyfilename = "body.out"; 
    FILE *bodyfile; 

    curl_global_init(CURL_GLOBAL_ALL); 

    /* init the curl session */ 
    curl_handle = curl_easy_init(); 

    /* set URL to get */ 
    curl_easy_setopt(curl_handle, CURLOPT_URL, "http://curl.haxx.se"); 

    /* no progress meter please */ 
    curl_easy_setopt(curl_handle, CURLOPT_NOPROGRESS, 1L); 

    /* send all data to this function */ 
    curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, write_data); 

    /* open the files */ 
    headerfile = fopen(headerfilename,"w"); 
    if (headerfile == NULL) { 
    curl_easy_cleanup(curl_handle); 
    return -1; 
    } 
    bodyfile = fopen(bodyfilename,"w"); 
    if (bodyfile == NULL) { 
    curl_easy_cleanup(curl_handle); 
    return -1; 
    } 

    /* we want the headers to this file handle */ 
    curl_easy_setopt(curl_handle, CURLOPT_WRITEHEADER, headerfile); 

    /* 
    * Notice here that if you want the actual data sent anywhere else but 
    * stdout, you should consider using the CURLOPT_WRITEDATA option. */ 

    /* get it! */ 
    curl_easy_perform(curl_handle); 

    /* close the header file */ 
    fclose(headerfile); 

    /* cleanup curl stuff */ 
    curl_easy_cleanup(curl_handle); 

    return 0; 
}

来源

2010-08-26 08:21:22

我需要处理获得它后的内容....在将其保存到文件中并检索它之后，处理过程很费时 – raj 2010-08-26 08:24:42

那么你将不得不将内容保存在contiguos块中以避免覆盖。为此，您必须维护数据写入的值并在该位置写入新内容。 – 2010-08-26 08:26:45

您似乎错过了CURLOPT_WRITEDATA选项。其通过的第一个参数WRITEFUNCION to_buffer（字符* PTR ...

curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);

来源

2010-08-26 11:48:38 Biber

提示：使用字符串流只需用一个字符串流和输出更换你的缓冲区内容： (string)<streamname>.str() 对我的作品！ !!

来源

2011-06-30 10:42:08 AAa

你可以详细解释答案。 – ppaulojr 2012-11-18 09:17:54

使用libcurl时出现的问题：它似乎没有得到整个页面

回答

相关问题