CPAN上的XML模块可以分成三大类:对 XML 数据提供独特的接口(通常有关在XML实例和Perl数据之间的转换),实现某一标准XML API的模块,和对一些特定的XML相关任务进行简化的特殊用途模块。这个月我们先关注第一个,XML Perl专用接口。
<?xml version="1.0"?>
<camelids>
<species name="Camelus dromedarius">
<common-name>Dromedary, or Arabian Camel</common-name>
<physical-characteristics>
<mass>300 to 690 kg.</mass>
<appearance>
The dromedary camel is characterized by a long-curved
neck, deep-narrow chest, and a single hump.
...
</appearance>
</physical-characteristics>
<natural-history>
<food-habits>
The dromedary camel is an herbivore.
...
</food-habits>
<reproduction>
The dromedary camel has a lifespan of about 40-50 years
...
</reproduction>
<behavior>
With the exception of rutting males, dromedaries show
very little aggressive behavior.
...
</behavior>
<habitat>
The camels prefer desert conditions characterized by a
long dry season and a short rainy season.
...
</habitat>
</natural-history>
<conservation status="no special status">
<detail>
Since the dromedary camel is domesticated, the camel has
no special status in conservation.
</detail>
</conservation>
</species>
...
</camelids>
现在我们假设此完整文档(可从本月例子代码中获取)包含骆驼家族所有成员的全部信息,而不仅仅是上面的单峰骆驼信息。为了举例说明每一模块是如何从此文件中提取某一数据子集,我们将写一个很简短的脚本来处理camelids.xml文档和在STDOUT上输出我们找到的每一种类的普通名(common-name),拉丁名(用括号包起来),和当前保存状况。因此,处理完整个文档,每一个脚本的输出应该为如下结果: Bactrian Camel (Camelus bactrianus) endangered Dromedary, or Arabian Camel (Camelus dromedarius) no special status Llama (Lama glama) no special status Guanaco (Lama guanicoe) special concern Vicuna (Vicugna vicugna) endangered
Hash 如下:
my %camelid_links = (
one => { url => '
http://www.online.discovery.com/news/picture/may99/photo20.html',
description => 'Bactrian Camel in front of Great ' .
'Pyramids in Giza, Egypt.'},
two => { url => 'http://www.fotos-online.de/english/m/09/9532.htm',
description => 'Dromedary Camel illustrates the ' .
'importance of accessorizing.'},
three => { url => 'http://www.eskimo.com/~wallama/funny.htm',
description => 'Charlie - biography of a narcissistic llama.'},
four => { url => 'http://arrow.colorado.edu/travels/other/turkey.html',
description => 'A visual metaphor for the Perl5-porters ' .
'list?'},
five => { url => 'http://www.galaonline.org/pics.htm',
description => 'Many cool alpacas.'},
six => { url => 'http://www.thpf.de/suedamerikareise/galerie/vicunas.htm',
description => 'Wild Vicunas in a scenic landscape.'}
);
而我们所期望从hash中创建的文档例子为:
<?xml version="1.0">
<html>
<body>
<a href="http://www.eskimo.com/~wallama/funny.htm">Charlie -
biography of a narcissistic llama.</a>
<a href="http://www.online.discovery.com/news/picture/may99/photo20.html">Bactrian
Camel in front of Great Pyramids in Giza, Egypt.</a>
<a href="http://www.fotos-online.de/english/m/09/9532.htm">Dromedary
Camel illustrates the importance of accessorizing.</a>
<a href="http://www.galaonline.org/pics.htm">Many cool alpacas.</a>
<a href="http://arrow.colorado.edu/travels/other/turkey.html">A visual
metaphor for the Perl5-porters list?</a>
<a href="http://www.thpf.de/suedamerikareise/galerie/vicunas.htm">Wild
Vicunas in a scenic landscape.</a>
</body>
</html>
良好缩进的XML结果文件(如上面所显示的)对于阅读很重要,但这种良好的空格处理不是我们案例所要求的。我们所关心的是结果文档是结构良好的/well-formed和它正确地表现了hash里的数据。





