有人可以请教我什么是可能的方式与Python多线程? 我有一个XML文件(163 MB)。我的任务是需要如何使用python多线程从XMl插入数据库?
- 读取XML文件
- 将数据插入到一个DB(多表)
- 记录在日志文件中
我已经有插入的行数读取执行上述1,2和3步骤的xml文件的python代码。其实,我想用多线程来加速这个过程。我不知道如何开始工作。
这是XML结构。
<Content id="359366">
<Title>This title</Title>
<SortTitle>sorting</SortTitle>
<PublisherEntity id="2003">ABC Publishing Group</PublisherEntity>
<Publisher>ABC Publishing Group</Publisher>
<Imprint>Revell</Imprint>
<Language code = "en">English</Language>
<GeoRight>
<GeoCountry code = "WW" model = "Distribution">World</GeoCountry>
</GeoRight>
<Format type = "Adobe EPUB eBook">
<Identifier type = "DRMID">xxx-xxx-xx</Identifier>
<Identifier type = "ISBN">1234567</Identifier>
<SRP currency = "SGD">18.89</SRP>
<WholesaleCost currency = "SGD">11.14</WholesaleCost>
<OnSaleDate>01 Sep 2010</OnSaleDate>
<MinimumSoftwareVersion number="1.x">Adobe Digital Editions</MinimumSoftwareVersion>
<DownloadFileName>HouseonMalcolmStreet9781441213877</DownloadFileName>
<SecurityLevel value="ACS4">Adobe Content Server 4</SecurityLevel>
<ContentFileSize>473923</ContentFileSize>
<DownloadUrl>http://xxx.xx.com/</DownloadUrl>
<DownloadIDType>CRID</DownloadIDType>
<DrmInfo>
<Copy>
<Enabled>1</Enabled>
<Selections>2</Selections>
<Interval type = "Days">7</Interval>
</Copy>
<Print>
<Enabled>1</Enabled>
<Selections>20</Selections>
<Interval type = "Days">7</Interval>
</Print>
<Lend>
<Enabled>0</Enabled>
</Lend>
<ReadAloud>
<Enabled>0</Enabled>
</ReadAloud>
<Expires>
<Enabled>0</Enabled>
<Interval type = "Days">-1</Interval>
</Expires>
</DrmInfo>
</Format>
<Creator rank="1" id="923710">
<Name>name</Name>
<FileAs>Kelly, Leisha</FileAs>
<Role id="aut">Author</Role>
</Creator>
<SubTitle>A Novel</SubTitle>
<Edition></Edition>
<Series></Series>
<Coverage></Coverage>
<AgeGroup></AgeGroup>
<ContentType></ContentType>
<PublicationDate>09/01/2010</PublicationDate>
<ShortDescription>description</ShortDescription>
<FullDescription>full desc</FullDescription>
<Image type = "Cover Image">http://xxx.xx.jpg</Image>
<Image type = "Thumbnail Image">http://xxx.xx.jpg</Image>
<Subject code="FIC000000">Fiction</Subject>
<Subject code="FIC014000">Historical Fiction</Subject>
</Content>
这里是现有的Python代码download。
我建议你介绍一下你的当前代码,并计算出不同方面所花费的时间。我认为多线程不一定会加快任务速度。 – MattH
谢谢。我发布了xml结构和当前代码文件 – hhkhaing