我在解析C#中的一些XML数据时遇到了一些麻烦。XML文档中存在错误(155,23)。没有错误,总是在第13页
方法摘要:
该方法采用一个关键字,然后通过使用该网站的URI搜索该关键字在www.clinicaltrials.com。例如:
http://www.clinicaltrials.gov/ct2/results?term=ALL&Search=Search&displayxml=true。
该URI将以临床试验的形式将临床研究存储为XML。由于大量的临床数据,他们每页只有20项研究。因此,要进入下一页,您必须添加& pg = 2,以转到第二页。我的代码解析所有页面并将每个页面转换为C#对象。
问题:
的问题是,当它到达13它与下面的错误崩溃页:
InvalidOperationException was unhandled: There is an error in XML document (155, 23)
当我复制XML为13页,每页12或任何其他页面接近第13页到XML验证器,它说它很好。当我自己搜索xml时,我找不到任何错误。我在想也许内存已满,但在240个对象之后?如果我搜索一个关键字,它可以检索到少于13页的结果。
我已经写了以检索并解析XML,你可以在这里阅读的代码:
public List<search_resultsClinical_study> SearchStudyByKeyword(string keyword)
{
int currentPage = 1;
double numberOfStudiesOnAPage = 20;
double totalPages = 1; //if not it will crash anyways
List<search_results> searchResult = new List<search_results>();
try
{
while (totalPages >= currentPage)
{
//crashes if search is larger then 13 pages... have to figure out why....
string newUri = URI + "ct2/results?term=" + keyword + "&Search=Search&displayxml=true&pg=" + currentPage ;
System.Xml.Serialization.XmlSerializer reader = new System.Xml.Serialization.XmlSerializer(typeof(search_results));
XmlReader xmlReader = XmlReader.Create(newUri);
search_results studies = new search_results();
studies = (search_results)reader.Deserialize(xmlReader);
searchResult.Add(studies);
totalPages = Math.Ceiling((double)studies.count/numberOfStudiesOnAPage);
currentPage += 1;
}
//return searchResult;
//Append all studies to one list, easier to handle for user
List<search_resultsClinical_study> result = new List<search_resultsClinical_study>();
foreach (search_results sr in searchResult)
{
foreach (search_resultsClinical_study cs in sr.clinical_study)
{
result.Add(cs);
}
}
return result;
}
catch (WebException)
{
Debug.Write("404 - Might be a invalid search term ");
return null;
}
}
错误出现在以下行:
studies = (search_results)reader.Deserialize(xmlReader);
search_result类:
/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)]
[System.Xml.Serialization.XmlRootAttribute(Namespace = "", IsNullable = false)]
public partial class search_results
{
private string queryField;
private search_resultsClinical_study[] clinical_studyField;
private uint countField;
/// <remarks/>
public string query
{
get
{
return this.queryField;
}
set
{
this.queryField = value;
}
}
/// <remarks/>
[System.Xml.Serialization.XmlElementAttribute("clinical_study")]
public search_resultsClinical_study[] clinical_study
{
get
{
return this.clinical_studyField;
}
set
{
this.clinical_studyField = value;
}
}
/// <remarks/>
[System.Xml.Serialization.XmlAttributeAttribute()]
public uint count
{
get
{
return this.countField;
}
set
{
this.countField = value;
}
}
}
/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)]
public partial class search_resultsClinical_study
{
private byte orderField;
private decimal scoreField;
private string nct_idField;
private string urlField;
private string titleField;
private search_resultsClinical_studyStatus statusField;
private string condition_summaryField;
private string last_changedField;
/// <remarks/>
public byte order
{
get
{
return this.orderField;
}
set
{
this.orderField = value;
}
}
/// <remarks/>
public decimal score
{
get
{
return this.scoreField;
}
set
{
this.scoreField = value;
}
}
/// <remarks/>
public string nct_id
{
get
{
return this.nct_idField;
}
set
{
this.nct_idField = value;
}
}
/// <remarks/>
public string url
{
get
{
return this.urlField;
}
set
{
this.urlField = value;
}
}
/// <remarks/>
public string title
{
get
{
return this.titleField;
}
set
{
this.titleField = value;
}
}
/// <remarks/>
public search_resultsClinical_studyStatus status
{
get
{
return this.statusField;
}
set
{
this.statusField = value;
}
}
/// <remarks/>
public string condition_summary
{
get
{
return this.condition_summaryField;
}
set
{
this.condition_summaryField = value;
}
}
/// <remarks/>
public string last_changed
{
get
{
return this.last_changedField;
}
set
{
this.last_changedField = value;
}
}
}
/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType = true)]
public partial class search_resultsClinical_studyStatus
{
private string openField;
private string valueField;
/// <remarks/>
[System.Xml.Serialization.XmlAttributeAttribute()]
public string open
{
get
{
return this.openField;
}
set
{
this.openField = value;
}
}
/// <remarks/>
[System.Xml.Serialization.XmlTextAttribute()]
public string Value
{
get
{
return this.valueField;
}
set
{
this.valueField = value;
}
}
}
XML失败:
http://www.clinicaltrials.gov/ct2/results?term=ALL&Search=Search&displayxml=true&pg=13
有谁得到了,为什么会出现这个错误的线索?我还添加了一个XmlSchema,并尝试基于XmlSchema生成C#类!
感谢您的帮助!
做这个简单的测试:在试图反序列化之前,将每个页面转储到硬盘上。你可以这样做:http://stackoverflow.com/questions/3988832/how-to-create-an-xml-file-from-a-xmlreader之后,尝试并反序列化硬盘上的文件。 – 2014-10-01 09:00:28
嘿,谢谢你的回应!即使我在尝试反序列化之前将每个页面转储到硬盘,我仍然得到相同的错误。 – 2014-10-01 09:33:56
附加您遇到问题的具体XML并添加search_results的结构。 – 2014-10-01 10:02:41