如何每一件事情标签后复制Beautifulsoup

对家庭作业的工作，我有“doc.html”文件与数据：如何每一件事情标签后复制Beautifulsoup

<span class="descriptor">Title:</span> Automated Scalable Bayesian Inference via Hilbert Coresets 
<span class="descriptor">Title:</span> PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference 
<span class="descriptor">Title:</span> Covariances, Robustness, and Variational Bayes 
<span class="descriptor">Title:</span> Edge-exchangeable graphs and sparsity (NIPS 2016) 
<span class="descriptor">Title:</span> Fast Measurements of Robustness to Changing Priors in Variational Bayes 
<span class="descriptor">Title:</span> Boosting Variational Inference

对于每一行，我想</span>后得到任何东西 - 所以预期的输出应该是：

Automated Scalable Bayesian Inference via Hilbert Coresets 
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference 
Covariances, Robustness, and Variational Bayes 
Edge-exchangeable graphs and sparsity (NIPS 2016) 
Fast Measurements of Robustness to Changing Priors in Variational Bayes 
Boosting Variational Inference

我试过下面的代码（不工作）。

from bs4 import BeautifulSoup 

with open("doc.html") as fp: 
    soup = BeautifulSoup(fp, 'html.parser') 
    for line in soup.find_all('span'): 
     print line.get_text()

缺失的是什么？

来源

2017-10-29 cpuNram

您需要span元素的nextSibling而不是text范围内！

注意：使用strip（）删除尾随换行符。

>>> with open("doc.html") as fp: 
...  soup = BeautifulSoup(fp, 'html.parser') 
...  for line in soup.find_all('span'): 
...   print line.nextSibling.strip() 
... 
Automated Scalable Bayesian Inference via Hilbert Coresets 
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference 
Covariances, Robustness, and Variational Bayes 
Edge-exchangeable graphs and sparsity (NIPS 2016) 
Fast Measurements of Robustness to Changing Priors in Variational Bayes 
Boosting Variational Inference 
>>>

来源

2017-10-29 06:13:14

如何每一件事情标签后复制Beautifulsoup

回答

相关问题