2017-05-25 29 views
0

我正在编写脚本来做一些探索性分析。该脚本refrences一个API,用于ID的和API返回响应与XML输出(没有子对象如何在'for'循环中获取函数的输出并使用它构建数据框?

脚本:

import requests 
import xml.etree.ElementTree as et 


xml ='''  
<?xml version="1.0" encoding="UTF-8"?> 
<YM> 
    <Version>xxx</Version> 
    <ApiKey>xxx</ApiKey> 
    <CallID>xxx</CallID> 
    <></> 
    <SaPasscode>xxxx</SaPasscode> 
    <Call Method = "GetIDs"> 

    </Call> 
</YM> 
'''   
headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
r = requests.post('url', data=xml, headers=headers) 

输出示例:

<Members> 
<Sa.Members.All.GetIDs> 
<YourMembership_Response> 
<ID>1234</ID> 
<ID>4321</ID> 
</Members> 
</Sa.Members.All.GetIDs> 
</YourMembership_Response> 

我把这些ID并将它们插入另一个API调用以获取有关ID的更多信息,在同一脚本中通过迭代函数将来自上述API调用的ID解析为另一个可获取有关每个ID的信息的API调用:

脚本:

def xml_event_info(eventID):  
    xml ='''   
    <?xml version="1.0" encoding="UTF-8"?> 
    <YourMembership> 
     <Version>xxx</Version> 
     <ApiKey>xxx</ApiKey> 
     <CallID>xxx</CallID> 
     <></> 
     <SaPasscode>xxx</SaPasscode> 
     <Call Method = "Profile.Get"> 
      <ID>{}</ID> 
     </Call> 
    </YourMembership>   
    '''   
    headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
    r = requests.post('url', 
         data=xml.format(eventID), headers=headers)   
    print(r.text)  




# BUILD XML TREE OBJECT  
dom = et.fromstring(r.text) 

# PARSE EVENT ID TEXT AND PASS INTO FUNCTION 
for i in dom.iterfind('.//ID'): 
    xml_event_info(i.text) 

实施例输出(有更多的XML对象然后示出):

<?xml version="1.0" encoding="utf-8" ?> 

<Response> 
<ErrCode>xxx</ErrCode> 
<ExtendedErrorInfo>xxx</ExtendedErrorInfo> 
<Profile.Get> 
<ID>xxxx</ID> 
<WebsiteID>xxxx</WebsiteID> 
<EmailBounced>xxx</EmailBounced> 
<NamePrefix>xxx</NamePrefix> 
<FirstName>xxx</FirstName> 
</Profile.Get> 
</Response> 

我想利用与它的许多XML上面的例子中从第二API调用和地图属性他们到一个熊猫数据框。我遇到的问题是,当我尝试使用功能(xml_event_info(i.text))调用来自内部的for循环在这里发现,保持第二API调用输出:

# PARSE EVENT ID TEXT AND PASS INTO FUNCTION 
for i in dom.iterfind('.//ID'): 
    xml_event_info(i.text) 

我试图将XML映射到据帧和我不断收到错误“类型错误:解析()参数1必须是字符串或只读缓冲器,而不是无”

如何可以解析从多个API XML输出调用到大熊猫数据帧,其中每个XML标记是数据帧的标题

Example: 

---|ErrCode|ExtendedInfo|ID|FirstName---- 

脚本和网站我指的是把工作做好在这里找到(http://www.austintaylor.io/lxml/python/pandas/xml/dataframe/2016/07/08/convert-xml-to-pandas-dataframe/

脚本:

def xml2df(): 
    tree = et.fromstring(xml_event_info(i.text)) 
    root = tree.getroot() 
    all_records = [] 
    headers = [] 
    for i, child in enumerate(root): 
     record = [] 
     for subchild in child: 
      record.append(subchild.text) 
      if subchild.tag not in headers: 
       headers.append(subchild.tag) 
     all_records.append(record) 
    return pd.DataFrame(all_records, columns=headers) 

完整的脚本:

import requests 
import xml.etree.ElementTree as et 
import pandas as pd 
from lxml import etree 

xml ='''  
<?xml version="1.0" encoding="UTF-8"?> 
<YourMembership> 
    <Version>xxx</Version> 
    <ApiKey>xxxx</ApiKey> 
    <CallID>xxx</CallID> 
    <></> 
    <SaPasscode>xxx</SaPasscode> 
    <Call Method = "Events.All.GetIDs"> 
     <StartDate>2017/01/1</StartDate> 
     <EndDate>2017/01/31</EndDate> 
    </Call> 
</YourMembership> 
'''   
headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
r = requests.post('url', data=xml, headers=headers) 


def xml_event_info(eventID):  
    xml ='''   
    <?xml version="1.0" encoding="UTF-8"?> 
    <YourMembership> 
     <Version>xxx</Version> 
     <ApiKey>xxx</ApiKey> 
     <CallID>xxx</CallID> 
     <></> 
     <SaPasscode>xxx</SaPasscode> 
     <Call Method = "Event.Get"> 
      <EventID>{}</EventID> 
     </Call> 
    </YourMembership>   
    '''   
    headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
    r = requests.post('url', 
         data=xml.format(eventID), headers=headers)   
    print(r.text) 
    return r.text  




# BUILD XML TREE OBJECT  
dom = et.fromstring(r.text) 

# PARSE EVENT ID TEXT AND PASS INTO FUNCTION 
for i in dom.iterfind('.//EventID'): 
    y = xml_event_info(i.text) 

    for xml in y: 
     tree = et.fromstring(y) 
     root = tree.getchildren() 
     all_records = [] 
     headers = [] 
     for i , child in enumerate(root): 
      record = [] 
      for subchild in child: 
       record.append(subchild.text) 
       if subchild.tag not in headers: 
        headers.append(subchild.tag) 
       all_records.append(record) 
       #print all_records 
       print pd.DataFrame(all_records, columns=headers) 

编辑:

TLDR:

如何使从下面的函数的输出被映射到与该XML元素作为对数据帧的报头的数据帧:

import requests 
import xml.etree.ElementTree as et 
import pandas as pd 

xml ='''  
<?xml version="1.0" encoding="UTF-8"?> 
<YourMembership> 
    <Version>xxx</Version> 
    <ApiKey>xxxx</ApiKey> 
    <CallID>xxx</CallID> 
    <></> 
    <SaPasscode>xxxx</SaPasscode> 
    <Call Method = "GetIDs"> 

    </Call> 
</YourMembership> 
'''   
headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
r = requests.post('url', data=xml, headers=headers) 

def xml_event_info(eventID):  
    xml ='''   
    <?xml version="1.0" encoding="UTF-8"?> 
    <YourMembership> 
     <Version>xxx</Version> 
     <ApiKey>xxx</ApiKey> 
     <CallID>xxx</CallID> 
     <></> 
     <SaPasscode>xxx</SaPasscode> 
     <Call Method = "Profile.Get"> 
      <ID>{}</ID> 
     </Call> 
    </YourMembership>   
    '''   
    headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
    r = requests.post('url', 
         data=xml.format(eventID), headers=headers)   
    print(r.text)  

输出:

<?xml version="1.0" encoding="utf-8" ?> 

<Response> 
<ErrCode>xxx</ErrCode> 
<ExtendedErrorInfo>xxx</ExtendedErrorInfo> 
<Profile.Get> 
<ID>xxxx</ID> 
<WebsiteID>xxxx</WebsiteID> 
<EmailBounced>xxx</EmailBounced> 
<NamePrefix>xxx</NamePrefix> 
<FirstName>xxx</FirstName> 
</Profile.Get> 
</Response> 
+1

IMO,你的问题很详细。你可以给一个tldr;版。我很难理解你想要解决的问题。 – EyuelDK

+0

你错过了[MVE](https://stackoverflow.com/help/mcve) –

+0

@EyuelDK增加了tldr – RustyShackleford

回答

1

xml_event_info(eventID)功能不返回任何内容,只需在最后添加return声明并再试一次。

def xml_event_info(eventID):  
    xml ='''   
    <?xml version="1.0" encoding="UTF-8"?> 
    <YourMembership> 
     <Version>xxx</Version> 
     <ApiKey>xxx</ApiKey> 
     <CallID>xxx</CallID> 
     <></> 
     <SaPasscode>xxx</SaPasscode> 
     <Call Method = "Profile.Get"> 
      <ID>{}</ID> 
     </Call> 
    </YourMembership>   
    '''   
    headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
    r = requests.post('url', 
         data=xml.format(eventID), headers=headers)   
    print(r.text) 
    return r.text 
+0

这个工作,我现在可以使用该函数的输出,但我该如何构建来自这里的数据帧? – RustyShackleford

+0

我做了一个编辑。我在返回时使用了你的建议,并且创建了嵌套for循环来遍历最后一个xml输出以放入Dataframe。 Dataframe被创建,但不断挂起一个xml调用,并且不会循环整个循环。 – RustyShackleford

+0

请参阅'完整脚本'部分的编辑。谢谢 – RustyShackleford

相关问题