2012-04-15 86 views
10

我想找到有关使用curl网页上的信息,但在Python,所以到目前为止,我有这样的:如何从Python脚本捕捉卷曲的输出

os.system("curl --head www.google.com") 

如果我运行的是,它打印出:

HTTP/1.1 200 OK 
Date: Sun, 15 Apr 2012 00:50:13 GMT 
Expires: -1 
Cache-Control: private, max-age=0 
Content-Type: text/html; charset=ISO-8859-1 
Set-Cookie: PREF=ID=3e39ad65c9fa03f3:FF=0:TM=1334451013:LM=1334451013:S=IyFnmKZh0Ck4xfJ4; expires=Tue, 15-Apr-2014 00:50:13 GMT; path=/; domain=.google.com 
Set-Cookie: NID=58=Giz8e5-6p4cDNmx9j9QLwCbqhRksc907LDDO6WYeeV-hRbugTLTLvyjswf6Vk1xd6FPAGi8VOPaJVXm14TBm-0Seu1_331zS6gPHfFp4u4rRkXtSR9Un0hg-smEqByZO; expires=Mon, 15-Oct-2012 00:50:13 GMT; path=/; domain=.google.com; HttpOnly 
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info." 
Server: gws 
X-XSS-Protection: 1; mode=block 
X-Frame-Options: SAMEORIGIN 
Transfer-Encoding: chunked 

我想要做的,是能够使用正则表达式(我不需要与帮助)在它的200匹配,但是,我不能找到一种方法,所有的文本转换上面放入一个字符串。我怎么做? 我试过:info = os.system("curl --head www.google.com")info只是0

+0

“子进程模块为产生新进程和检索其结果提供了更强大的功能;使用该模块比使用此函数更可取,请参阅Replac使用子过程文档中的子流程模块部分的旧功能以获得一些有用的配方。“ -http://docs.python.org/library/os.html#os.system – 2012-04-15 01:02:21

回答

2

试试这个:

import httplib 
conn = httplib.HTTPConnection("www.python.org") 
conn.request("GET", "/index.html") 
r1 = conn.getresponse() 
print r1.status, r1.reason 
+8

这并没有真正回答关于如何从curl捕获输出的问题。通常你需要curl发送特定的cookie和其他参数。 – 576i 2014-01-21 10:55:43

17

试试这个,使用subprocess.Popen()

import subprocess 
proc = subprocess.Popen(["curl", "--head", "www.google.com"], stdout=subprocess.PIPE) 
(out, err) = proc.communicate() 
print out 

由于在规定的documentation

的子模块,可以让你产生新的进程,连接到它们的输入/输出/错误管道,获取他们的返回码。该模块打算更换其他几个,旧的模块和功能,如:

os.system 
os.spawn* 
os.popen* 
popen2.* 
commands.* 
+0

为什么?解释plz – Billjk 2012-04-15 01:03:49

+0

@ user1333973:因为'subprocess'工作,'os.system()'不。 – 2012-04-15 01:04:38

+0

@ user1333973添加链接到文档 – 2012-04-15 01:06:38

0

你可以使用一个HTTP库或HTTP客户端库在Python,而不是调用一个curl命令。事实上,你可以安装一个curl库(只要你在你的OS上有一个编译器)。

其他选择是httplib2(推荐),它是一个相当完整的支持缓存的http协议客户端,也可以是纯粹的httplib或名为Request的库。

如果你真的想只是运行curl命令并捕获它的输出,那么你就可以POPEN这里记录的内置子模块中做到这一点:http://docs.python.org/library/subprocess.html

0

好吧,有一个更容易阅读,但更混乱的方式来做到这一点。那就是:

import os 
outfile='' #put your file path there 
os.system("curl --head www.google.com>>{x}".format(x=str(outfile)) #Outputs command to log file (and creates it if it doesnt exist). 
readOut=open("{z}".format(z=str(outfile),"r") #Opens file in reading mode. 
for line in readOut: 
    print line #Prints lines in file 
readOut.close() #Closes file 
os.system("del {c}".format(c=str(outfile)) #This is optional, as it just deletes the log file after use. 

这应该为您的需求正常工作。 :)

8

出于某种原因...我需要用卷曲(无pycurl,httplib2的...),也许这可以帮助别人:

import os 
result = os.popen("curl http://google.es").read() 
print result 
+2

感谢这比其他答案更直观,方便肮脏/快速创建的脚本:) – 2016-09-05 18:49:16

2
import os 
cmd = 'curl https://randomuser.me/api/' 
os.system(cmd) 

结果

{"results":[{"gender":"male","name":{"title":"mr","first":"çetin","last":"nebioğlu"},"location":{"street":"5919 abanoz sk","city":"adana","state":"kayseri","postcode":53537},"email":"çetin.nebioğ[email protected]","login":{"username":"heavyleopard188","password":"forgot","salt":"91TJOXWX","md5":"2b1124732ed2716af7d87ff3b140d178","sha1":"cb13fddef0e2ce14fa08a1731b66f5a603e32abe","sha256":"cbc252db886cc20e13f1fe000af1762be9f05e4f6372c289f993b89f1013a68c"},"dob":"1977-05-10 18:26:56","registered":"2009-09-08 15:57:32","phone":"(518)-816-4122","cell":"(605)-165-1900","id":{"name":"","value":null},"picture":{"large":"https://randomuser.me/api/portraits/men/38.jpg","medium":"https://randomuser.me/api/portraits/med/men/38.jpg","thumbnail":"https://randomuser.me/api/portraits/thumb/men/38.jpg"},"nat":"TR"}],"info":{"seed":"0b38b702ef718e83","results":1,"page":1,"version":"1.1"}}