我对Node相当陌生,我对内存泄漏并不十分熟悉,但我相信我有。我有一个简单的Node/Express应用程序,允许用户上传PDF文件(每个文章最多可以有10,000个文件)。当文件被上传他们用猫鼬保存到MongoDB中,以如下的路线(多线略)(请注意,我用Multer得到req.files
):节点pdf2json内存泄漏?
app.post('/articles', upload.array('pdfs', 10000), (req, res) => {
req.files.forEach(function(file) {
var newArticle = new Article(file);
newArticle.save();
}
));
此代码正确执行,我可以在单个POST中向数据库添加10,000个文件。接下来,我使用猫鼬中间件解析PDF文件pdf2json,并在save
之后向DB文档添加text
字段。型号如下:
'use strict'
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var PDFParser = require("pdf2json");
var articleSchema = new Schema({
originalname: String,
normalizedName: String,
filename: String,
mimetype: String,
text: String,
processed: Boolean,
pageCount: Number,
createdAt: Date
});
var pdfParser = new PDFParser();
articleSchema.post('save', function(article) {
if(!article.processed) {
pdfParser.on("pdfParser_dataError", errData => {
console.log('pdfParser Error');
console.error(errData);
});
pdfParser.on("pdfParser_dataReady", pdfData => {
article.text = pdfParser.getRawTextContent();
article.processed = true;
article.save(function (err, article) {
//if (err) res.send(err)
//console.log(err);
console.log('Text parsed: ' + article.originalname);
});
});
pdfParser.loadPDF(__dirname + "/../public/uploads/" + article.filename);
}
});
module.exports = mongoose.model('Article', articleSchema);
以这种方式解析PDF文件似乎是造成内存泄漏。在解析PDF时,我可以使用node --trace_gc来观察内存爬升。当我上传〜50个典型的PDF文档时,所有内容都会运行文件,但是当我尝试一次上传〜100时,应用程序崩溃,并且出现“javascript堆内存不足”错误。我需要能够一次上传10,000个PDF文件。
[31819:0x102800000] 206275 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 432.5/0 ms [last resort gc].
<--- Last few GCs --->
204991 ms: Mark-sweep 395.7 (494.5) -> 394.5 (494.5) MB, 431.3/0 ms [allocation failure] [GC in old space requested].
205410 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 419.7/0 ms [allocation failure] [GC in old space requested].
205843 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 432.2/0 ms [last resort gc].
206275 ms: Mark-sweep 394.5 (494.5) -> 394.5 (494.5) MB, 432.5/0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x24c97a4c9e31 <JS Object>
1: transform(aka ctxTransform) [0x24c97a404189 <undefined>:~40152] [pc=0xd9a435bd068] (this=0x221695b86801 <a CanvasRenderingContext2D_ with map 0x21159eaffc09>,a=0xd844e0103d1 <Number: 8.5>,b=0,c=0,d=0xd844e0103e1 <Number: 8.5>,e=0xd844e0103f1 <Number: 42.0094>,f=0xd844e010401 <Number: 608.882>)
2: showText(aka CanvasGraphics_showText) [0x24c97a404189 <undefined>:~41068] [pc=0xd9a436c8...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
2: node::FatalException(v8::Isolate*, v8::Local<v8::Value>, v8::Local<v8::Message>) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
3: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
4: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
5: v8::internal::LCodeGenBase::PopulateDeoptimizationData(v8::internal::Handle<v8::internal::Code>) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
6: v8::internal::LChunk::Codegen() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
7: v8::internal::OptimizedCompileJob::GenerateCode() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
8: v8::internal::Compiler::GetConcurrentlyOptimizedCode(v8::internal::OptimizedCompileJob*) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
9: v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
10: v8::internal::StackGuard::HandleInterrupts() [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
11: v8::internal::Runtime_StackGuard(int, v8::internal::Object**, v8::internal::Isolate*) [/Users/mattsears/.nvm/versions/node/v6.3.1/bin/node]
12: 0xd9a4260961b
[1] 31819 abort node --optimize_for_size --max_old_space_size=460 --trace_gc server.js
该应用程序将在免费的Heroku 512MB测功机上运行,所以我不能只是增加--max-old-space-size
(我不认为)。我相信我需要一个能减少内存使用量的解决方案。
任何人都可以在这里发现内存泄漏吗?其他建议?
请注意,我与Multer或pdf2json没有任何关系。
我不相信这是一个泄漏。基本上,我认为你正在将1万个信封塞入5个信封的邮箱中。你是否熟悉编写算法,设计CPU和内存等? –
我在想这可能是这种情况。你会如何解决这个问题?我并不是很熟悉队列,所以我不知道这样的事情是否会有所帮助。处理所有文件需要多长时间并不重要,我只需要它最终处理它们,而不会超过内存限制。 – mattsears18
看起来像你想出来的。我会说解析上传,但这就是你所做的。免责声明:我不知道节点,但我已经编程。 Node是js,js是自动垃圾收集的,所以内存泄漏应该很少。 –