在谷歌的Diff-比赛贴片项目股份的一些想法的维基。从 http://code.google.com/p/google-diff-match-patch/wiki/Plaintext:
One method is to strip the tags from the HTML using a simple regex or node-walker. Then diff the HTML content against the text content. Don't perform any diff cleanups. This diff enables one to map character positions from one version to the other (see the diff_xIndex function). After this, one can apply all the patches one wants against the plain text, then safely map the changes back to the HTML. The catch with this technique is that although text may be freely edited, HTML tags are immutable.
Another method is to walk the HTML and replace every opening and closing tag with a Unicode character. Check the Unicode spec for a range that is not in use. During the process, create a hash table of Unicode characters to the original tags. The result is a block of text which can be patched without fear of inserting text inside a tag or breaking the syntax of a tag. One just has to be careful when reconverting the content back to HTML that no closing tags are lost.
我有一种预感,第二个想法,地图HTML标签对Unicode的占位符,可能会更好地工作比一个原本想......特别是如果你的HTML标签是从一些缩小的集合,以及在显示交错(删除线/加下划线)diff标记时可以执行一点点打开/关闭修改。
另一种可能使用简单样式的方法是删除HTML标签,但记住受影响的字符索引。例如,“职位8-15是粗体”。然后,执行明文差异。最后,使用wiki第一种方法中的diff_xIndex位置映射思想,智能地重新插入HTML标签以重新应用样式到存活/添加的范围。 (也就是说,如果老位置8-13活了下来,但转移到20-25,插入周围还有在B标记。)
Gamers2000,感谢您的评论。我曾尝试过SynchoEdit,但沙箱和开发版本都没有工作。顺便说一句,我也在你原来的“OT库问题”中提出一个问题,你是否也在使用google-diff-match-patc?你如何使用它丰富的格式htmlstrings?感谢您的任何意见。 – Steve 2010-01-27 02:17:10
您好Steve,我正在使用diff-match-patch,但我正在使用它来同步纯文本。 另外,我实际上使用了MobWrite(http://code.google.com/p/google-mobwrite),它是一个diff-match-patch的实现。 对不起,我不能有太大的帮助! – gamers2000 2010-01-27 03:38:06
感谢您的快速评论。 – Steve 2010-01-27 05:01:26