2014-10-01 66 views
0

我在解析C#中的一些XML数据时遇到了一些麻烦。XML文档中存在错误(155,23)。没有错误,总是在第13页

方法摘要:

该方法采用一个关键字,然后通过使用该网站的URI搜索该关键字在www.clinicaltrials.com。例如:

http://www.clinicaltrials.gov/ct2/results?term=ALL&Search=Search&displayxml=true

该URI将以临床试验的形式将临床研究存储为XML。由于大量的临床数据,他们每页只有20项研究。因此,要进入下一页,您必须添加& pg = 2,以转到第二页。我的代码解析所有页面并将每个页面转换为C#对象。

问题:

的问题是,当它到达13它与下面的错误崩溃页:

InvalidOperationException was unhandled: There is an error in XML document (155, 23)

当我复制XML为13页,每页12或任何其他页面接近第13页到XML验证器,它说它很好。当我自己搜索xml时,我找不到任何错误。我在想也许内存已满,但在240个对象之后?如果我搜索一个关键字,它可以检索到少于13页的结果。

我已经写了以检索并解析XML,你可以在这里阅读的代码:

public List<search_resultsClinical_study> SearchStudyByKeyword(string keyword) 
    { 
     int currentPage = 1; 
     double numberOfStudiesOnAPage = 20; 
     double totalPages = 1; //if not it will crash anyways 
     List<search_results> searchResult = new List<search_results>(); 

     try 
     { 
      while (totalPages >= currentPage) 
      { 
       //crashes if search is larger then 13 pages... have to figure out why.... 
       string newUri = URI + "ct2/results?term=" + keyword + "&Search=Search&displayxml=true&pg=" + currentPage ; 
       System.Xml.Serialization.XmlSerializer reader = new System.Xml.Serialization.XmlSerializer(typeof(search_results)); 
       XmlReader xmlReader = XmlReader.Create(newUri); 
       search_results studies = new search_results(); 
       studies = (search_results)reader.Deserialize(xmlReader); 
       searchResult.Add(studies); 
       totalPages = Math.Ceiling((double)studies.count/numberOfStudiesOnAPage); 
       currentPage += 1; 

      } 
      //return searchResult; 
      //Append all studies to one list, easier to handle for user 
      List<search_resultsClinical_study> result = new List<search_resultsClinical_study>(); 
      foreach (search_results sr in searchResult) 
      { 
       foreach (search_resultsClinical_study cs in sr.clinical_study) 
       { 
        result.Add(cs); 
       } 
      } 
      return result; 
     } 

     catch (WebException) 
     { 
      Debug.Write("404 - Might be a invalid search term "); 
      return null; 
     } 


    } 

错误出现在以下行:

studies = (search_results)reader.Deserialize(xmlReader); 

search_result类:

/// <remarks/> 
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)] 
[System.Xml.Serialization.XmlRootAttribute(Namespace = "", IsNullable = false)] 
public partial class search_results 
{ 

    private string queryField; 

    private search_resultsClinical_study[] clinical_studyField; 

    private uint countField; 

    /// <remarks/> 
    public string query 
    { 
     get 
     { 
      return this.queryField; 
     } 
     set 
     { 
      this.queryField = value; 
     } 
    } 

    /// <remarks/> 
    [System.Xml.Serialization.XmlElementAttribute("clinical_study")] 
    public search_resultsClinical_study[] clinical_study 
    { 
     get 
     { 
      return this.clinical_studyField; 
     } 
     set 
     { 
      this.clinical_studyField = value; 
     } 
    } 

    /// <remarks/> 
    [System.Xml.Serialization.XmlAttributeAttribute()] 
    public uint count 
    { 
     get 
     { 
      return this.countField; 
     } 
     set 
     { 
      this.countField = value; 
     } 
    } 
} 

/// <remarks/> 
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)] 
public partial class search_resultsClinical_study 
{ 

    private byte orderField; 

    private decimal scoreField; 

    private string nct_idField; 

    private string urlField; 

    private string titleField; 

    private search_resultsClinical_studyStatus statusField; 

    private string condition_summaryField; 

    private string last_changedField; 

    /// <remarks/> 
    public byte order 
    { 
     get 
     { 
      return this.orderField; 
     } 
     set 
     { 
      this.orderField = value; 
     } 
    } 

    /// <remarks/> 
    public decimal score 
    { 
     get 
     { 
      return this.scoreField; 
     } 
     set 
     { 
      this.scoreField = value; 
     } 
    } 

    /// <remarks/> 
    public string nct_id 
    { 
     get 
     { 
      return this.nct_idField; 
     } 
     set 
     { 
      this.nct_idField = value; 
     } 
    } 

    /// <remarks/> 
    public string url 
    { 
     get 
     { 
      return this.urlField; 
     } 
     set 
     { 
      this.urlField = value; 
     } 
    } 

    /// <remarks/> 
    public string title 
    { 
     get 
     { 
      return this.titleField; 
     } 
     set 
     { 
      this.titleField = value; 
     } 
    } 

    /// <remarks/> 
    public search_resultsClinical_studyStatus status 
    { 
     get 
     { 
      return this.statusField; 
     } 
     set 
     { 
      this.statusField = value; 
     } 
    } 

    /// <remarks/> 
    public string condition_summary 
    { 
     get 
     { 
      return this.condition_summaryField; 
     } 
     set 
     { 
      this.condition_summaryField = value; 
     } 
    } 

    /// <remarks/> 
    public string last_changed 
    { 
     get 
     { 
      return this.last_changedField; 
     } 
     set 
     { 
      this.last_changedField = value; 
     } 
    } 
} 

/// <remarks/> 
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)] 
public partial class search_resultsClinical_studyStatus 
{ 

    private string openField; 

    private string valueField; 

    /// <remarks/> 
    [System.Xml.Serialization.XmlAttributeAttribute()] 
    public string open 
    { 
     get 
     { 
      return this.openField; 
     } 
     set 
     { 
      this.openField = value; 
     } 
    } 

    /// <remarks/> 
    [System.Xml.Serialization.XmlTextAttribute()] 
    public string Value 
    { 
     get 
     { 
      return this.valueField; 
     } 
     set 
     { 
      this.valueField = value; 
     } 
    } 
} 

XML失败:

http://www.clinicaltrials.gov/ct2/results?term=ALL&Search=Search&displayxml=true&pg=13

有谁得到了,为什么会出现这个错误的线索?我还添加了一个XmlSchema,并尝试基于XmlSchema生成C#类!

感谢您的帮助!

+0

做这个简单的测试:在试图反序列化之前,将每个页面转储到硬盘上。你可以这样做:http://stackoverflow.com/questions/3988832/how-to-create-an-xml-file-from-a-xmlreader之后,尝试并反序列化硬盘上的文件。 – 2014-10-01 09:00:28

+0

嘿,谢谢你的回应!即使我在尝试反序列化之前将每个页面转储到硬盘,我仍然得到相同的错误。 – 2014-10-01 09:33:56

+0

附加您遇到问题的具体XML并添加search_results的结构。 – 2014-10-01 10:02:41

回答

1

private byte orderField;

Type Range Size .NET Framework type byte 0 to 255 Unsigned 8-bit integer System.Byte

只要它到达这个记录,它可能会崩溃。

<clinical_study> 
    <order>256</order> 
    <score>1.00</score> 
    <nct_id>NCT00006461</nct_id> 
    <url>http://ClinicalTrials.gov/show/NCT00006461</url> 
    <title> 
     Combination Chemotherapy Followed by Second-Look Surgery and ... 
    </title> 
    <status open="N">Completed</status> 
    <condition_summary> 
     Untreated Childhood Medulloblastoma; Untreated Childhood.. 
    </condition_summary> 
    <last_changed>August 7, 2013</last_changed> 
</clinical_study> 

正如你所看到的,字节不能与256的值为了保持你平时检测此类问题的方法是,你总是验证对反序列化之前的模式(S)的一切。

Ps你给定的模式似乎是3岁。它没有这样的属性,比如“condition_summary”等等。你可能最好从头开始创建自己的,或者从现有的XML创建自己的。

+0

谢谢!我将此标记为已解决,因为它现在已经有意义了!由于名誉太低,我不能投票,但以后会做!再次感谢! – 2014-10-01 11:30:27

相关问题