通过html文件循环，获取文件名并插入每个文件

我正在将网站迁移到Wordpress ...旧网站使用定制的发布系统，PHP模板调用单独的静态HTML文件每篇文章。有很多职位需要迁移（超过1000人）。通过html文件循环，获取文件名并插入每个文件

我正在使用一个可以导入HTML文件并将每个文件转换为Wordpress文章的插件，但重要的是每个文章的原始日期设置正确。方便的是，插件允许我从每个文件的HTML标签中选择每篇文章的日期。

我的问题是日期都是在文件名中，而不是文件本身。这些文件都是通过YY-MM-DD命名，但没有破折号，让它们看起来像：
"130726.htm"（为2013年7月26日）
"121025.htm"（为2012年10月25日）

所以基本上我需要通过这些文件的目录，并为每一个地遍历 - 获取文件名，添加斜线，然后在类似这样的标记与将其插入文件<body>后：
<p class="origDate">13/07/26</p>

我不知道的最好的方法去... ...一个Python脚本，一个记事本++宏，批处理文件或其他任何。任何人都可以提供任何帮助/提示/建议吗？他们将不胜感激！

在此先感谢！

来源

2014-10-17 StrangeBiscuit

我在理解问题和第一个脚本时犯了一个错误。

此脚本扫描日期目录中的文件（我假设日期目录只包含所需格式的html文件），然后打开文件并在主体下面插入段落。

日期文件夹的内容示例：

121214.html 121298.html 121299.html

PHP脚本（脚本放在同一目录日期文件夹）：

<?php 
$dir = "dates"; 
$a = scandir($dir); 

$a = array_diff($a, array(".", "..")); 



foreach ($a as $value) 
{ 


    $string = file_get_contents("dates/".$value); 





    $newstring = substr($value,0,-5); 
    $newstring1 = substr($newstring,0,2); 
    $newstring2 = substr($newstring,2,2); 
    $newstring3 = substr($newstring,4,2); 
    $para = '<p class="origDate">' .$newstring1 . "/" . $newstring2 . "/" . $newstring3 . '</p>' . "<br>"; 
    $pattern = '/<body[\w\s="-:;]*>/'; 
    $replacement = '${0}'.$para; 
    $newpara = preg_replace($pattern, $replacement, $string); 



    $filename ="dates/".$value; 
    $file = fopen($filename, "r+"); 

    fwrite($file, $newpara); 
    fclose($file); 

} 
?>

我已在此使用.html，使用.htm，修改此行：

$newstring = substr($value,0,-5);

到

$newstring = substr($value,0,-4);

之前的示例HTML：

<!DOCTYPE html> 
<html> 

<body marginwidth=0 style="margin-left: 30px;" onclick="myfunction()"> 

<ul><li>Coffee</li><li>Tea</li></ul> 

</body> 
</html>

样本HTML后：

<!DOCTYPE html> 
<html> 
<body marginwidth=0 style="margin-left: 30px;" onclick="myfunction()"><p class="origDate">12/12/14</p><br> 

<ul><li>Coffee</li><li>Tea</li></ul> 



</body> 
</html>

来源

2014-10-17 21:33:36 Charles

哇，你几乎完全钉它。没想到能得到如此完整的答复，非常感谢！唯一的问题是标签中的一些标签在其中具有奇怪的属性：。是否有任何简单的修改可以使它在之后附加'origDate'，而不是试图只替换？ – StrangeBiscuit 2014-10-21 21:53:48

@StrangeBiscuit，是的（或我认为）。让我将str_replace修改为reg表达式函数。 – Charles 2014-10-22 12:14:35

@StrangeBiscuit，我已经修改了正则表达式的解决方案，它应该捕获body元素标签中的所有内容。我已经用预期的输出测试了这个，但是让我知道你是否有任何问题。 – Charles 2014-10-22 12:33:26

通过html文件循环，获取文件名并插入每个文件

回答

相关问题