本備忘錄狀態(tài)
本備忘錄為Internet社區(qū)提供一些信息,但沒(méi)有規(guī)定任何Internet標(biāo)準(zhǔn)。本備忘錄的發(fā)布不受限制
版權(quán)信息
Copyright (C) The Internet Society (1999)。版權(quán)所有。
目錄
1. 摘要 1
2. Html、Dublin核心元數(shù)據(jù)及其他原數(shù)據(jù) 1
3.META標(biāo)簽 2
4、LINK標(biāo)簽 2
5、編碼建議 3
6、DC元數(shù)據(jù)的實(shí)際應(yīng)用 4
7、DC元素編碼 4
8、安全性問(wèn)題 10
9、附錄——處理用META標(biāo)簽編碼的元數(shù)據(jù)的Perl腳本程序 10
10. 作者地址 15
11、參考資料 15
12、版權(quán)聲明 17
1. 摘要
Dublin核心元數(shù)據(jù) [DC1]是描述信息資源的小的元數(shù)據(jù)元素集合,本文討論如何在HTML文檔[HTML4.0]中通過(guò)META和LINK標(biāo)簽表示這些元素。嵌入HTML的元數(shù)據(jù)用于描述文檔本身的信息。本文通過(guò)一些例子說(shuō)明了如何用現(xiàn)有的軟件來(lái)檢索、顯示和處理這些元數(shù)據(jù),軟件包括附錄中列出的[SWISH-E]、[freeWAIS-sf2.0]、[GLIMPSE]、 [HARVEST]、 [ISEARCH]以及Perl[PERL]腳本語(yǔ)言等。
2. HTML、Dublin核心元數(shù)據(jù)及其他原數(shù)據(jù)
[DCHOME]發(fā)起的Dublin核心元數(shù)據(jù)推出了一組少量的資源描述類(lèi)別DC1,或者叫元數(shù)據(jù)元素(從字面上看就是關(guān)于數(shù)據(jù)的數(shù)據(jù))。一般而言,元數(shù)據(jù)元素相對(duì)它們所描述的資源要小得多,而且假如資源格式支持可以把元數(shù)據(jù)嵌入到資源中。支持嵌入元數(shù)據(jù)的有兩類(lèi)資源:超文本標(biāo)記語(yǔ)言(HTML)與擴(kuò)展標(biāo)記語(yǔ)言(xml)。HTML已經(jīng)得到了廣泛的應(yīng)用,但是一旦標(biāo)準(zhǔn)化,XML與資源描述框架(RDF)一起有望提供對(duì)源數(shù)據(jù)進(jìn)行編碼的更有效的方式。RDF規(guī)范實(shí)際上描述了在HTML文檔中按照一種簡(jiǎn)潔語(yǔ)法應(yīng)用RDF的方法。
本文講述了如何在HTML4.0中對(duì)元數(shù)據(jù)進(jìn)行編碼,這些元數(shù)據(jù)元素的語(yǔ)義在其他文檔中定義。為了方便說(shuō)明,文中提及了某些元數(shù)據(jù)的語(yǔ)義,但不應(yīng)把這些語(yǔ)義看作是定義性的。
HTML編碼答應(yīng)DC元數(shù)據(jù)元素與其它元素混合使用(前提是那些元素的用法支持混合使用)。DC元素使用前綴“DC”標(biāo)記,其他元素則使用另外的標(biāo)記,比方說(shuō)AC表示來(lái)自A-Core[AC]的元素。
3.META標(biāo)簽
HTML中的META標(biāo)簽用于已經(jīng)命名的元數(shù)據(jù)元素進(jìn)行編碼,每個(gè)元素描述了文檔或者其他信息資源的一個(gè)方面。比方說(shuō) ,這個(gè)元素說(shuō)明創(chuàng)作者是Homer Simpson,其中Creator是DC元素集中定義的一個(gè)元素。更一般的形式為:
content = "ELEMENT_VALUE">
大寫(xiě)部分表示在應(yīng)用時(shí)要換成真正的標(biāo)記符,在上面的例子中,ELEMENT_NAME是Creator, ELEMENT_VALUE是Simpson, Homer而PREFIX則是DC。
在META標(biāo)簽中,DC元素名的第一個(gè)字母要大寫(xiě),但對(duì)元素值的大小寫(xiě)沒(méi)有要求,也沒(méi)有限制同時(shí)出現(xiàn)的META元素的個(gè)數(shù)與順序。同一個(gè)DC元素可以出現(xiàn)多次,每個(gè)DC元素都是可選的。下面的例子是對(duì)一本書(shū)的說(shuō)明,它有兩位作者、兩個(gè)標(biāo)題:
content = "The Communist Manifesto">
content = "Marx, K.">
content = "Engels, F.">
content = "Capital">
使用META編碼的所有DC元素都帶有“DC”前綴,與后面的元素名之間用點(diǎn)號(hào)(“.”)隔開(kāi)。每個(gè)非DC元素的編碼都應(yīng)該有相應(yīng)的前綴以便于跟蹤其來(lái)源和定義,前綴與元素定義之間的聯(lián)系通過(guò)LINK元素來(lái)完成,參閱下一節(jié)的說(shuō)明。非DC元素,比如來(lái)自AC的Email可以與DC元素混合使用:
content = "Da Costa, José">
content = "dacostaj@peoplesmail.org">
content = "Jesse "The Body" Ventura--A Biography">
這個(gè)例子還說(shuō)明了非凡字符的編碼,第一個(gè)元素作者名中使用HTML字符實(shí)體引用表示一個(gè)音標(biāo)符號(hào)——帶有重音號(hào)的字母E。類(lèi)似的,最后一行中有兩個(gè)雙引號(hào)使用的是數(shù)字字符引用,以便于元素內(nèi)容分隔符區(qū)別開(kāi)。
4、LINK標(biāo)簽
HTML的LINK可以把元素名前綴與元素的參考定義關(guān)聯(lián)在一起。假如沒(méi)有LINK標(biāo)簽與相應(yīng)的定義文檔關(guān)聯(lián),只有META標(biāo)簽描述的資源是不完整的。前面的例子再加上以下兩個(gè)元素就可以認(rèn)為是完整的了:
>
>
一般來(lái)說(shuō)這種聯(lián)系通常采用如下的形式:
其中的PREFIX要代換為實(shí)際使用的前綴,LOCATION_OF_DEFINITION則是定義文檔的URL或URN。嵌入在HTML文檔HEAD部分的LINK和META序列,描述的是該HTML文檔自身的信息。下面是帶有描述信息的一個(gè)完整的HTML文檔。
>
content = "A Dirge">
content = "Shelley, Percy Bysshe">
content = "poem">
content = "1820">
content = "text/html">
content = "en">
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
From: Acting Shift Supervisor
To: Plant Control Personnel
RE: (--mbtitle)
Date: (--mbfilemodtime)
Pursuant to directive DOH:10.2001/405aec of article B-2022,
subsection 48.2.4.4.1c regarding staff morale and employee
productivity standards, the current allocation of doughnut
acquisition funds shall be increased effective immediately.
由于替換在整個(gè)文檔范圍內(nèi)進(jìn)行,作者只要輸入標(biāo)題一次就可以了(通常標(biāo)題要在首部和HTML文檔體內(nèi)輸入兩次)。運(yùn)行腳本程序后,上面的文件就被轉(zhuǎn)換成:
content = "Simpson, Homer">
content = "Nutritional Allocation Increase">
content = "1999-03-08">
content = "http://moes.bar.com/doh/homer.html">
content = "text/html; 1320 bytes">
content = "en-BUREAUCRATESE">
content = "Springfield Nuclear">
>
>
content = "Memorandum">
From: Acting Shift Supervisor
To: Plant Control Personnel
RE: Nutritional Allocation Increase
Date: 1999-03-08
Pursuant to directive DOH:10.2001/405aec of article B-2022,
subsection 48.2.4.4.1c regarding staff morale and employee
productivity standards, the current allocation of doughnut
acquisition funds shall be increased effective immediately.
下面是完成這一轉(zhuǎn)換過(guò)程的腳本:
#!/depot/bin/perl
#
# This Perl script processes metadata block declarations of the form
# and variable references of the
# form (--mbVARNAME), replacing them with full metadata blocks and
# variable values, respectively. Requires a "template" file.
# Outputs an HTML file.
#
# Invoke this script with a single filename argument, "foo". It creates
# an output file "foo.html" using a temporary working file "foo.work".
# The size of foo.work is measured after variable replacement, and is
# later inserted into the file in such a way that the file's size does
# not change in the process. Has little or no error checking.
$infile = shift;
open(IN, "< $infile")
or die("Could not open input file /"$infile/"");
$workfile = "$infile.work";
unlink($workfile);
open(WORK, "+> $workfile")
or die("Could not open work file /"$workfile/"");
@offsets = (); # records locations for late size replacement
$title = ""; # gets the title during metablock processing
$language = "en"; # pre-set language here (not in the template)
$baseURL = "http://moes.bar.com/doh"; # pre-set base URL here also
$filename = "$infile.html"; # final output filename
$filesize = "(--mbfilesize)"; # replaced late (separate pass)
($year, $month, $day) = (localtime( (stat IN) [9] ))[5, 4, 3];
$filemodtime = sprintf "%s-%02s-%02s", 1900 + $year, 1 + $month, $day;
sub putout { # outputs current line with variable replacement
if (! //(--mb/) {
print WORK;
return;
}
if (//(--mbfilesize/)/) # remember where it was
{ push @offsets, tell WORK; } # but don't replace yet
s//(--mbtitle/)/$title/g;
s//(--mblanguage/)/$language/g;
s//(--mbbaseURL/)/$baseURL/g;
s//(--mbfilename/)/$filename/g;
s//(--mbfilemodtime/)/$filemodtime/g;
print WORK;
}
while (
if (! /(.*)
&putout;
next;
}
$title=$2;
$_=$1;
&putout;
if($title=~s//s*-->(.*)//) {
$remainder = $1;
}
else {
while (
$title .= $_;
last if (/(.*)/s*-->(.*)/);
}
$title .= $1;
$remainder = $2;
}
open(TPLATE, "< template")
or die("Could not open template file");
while (
{ &putout; }
close(TPLATE);
$_ = $remainder;
&putout;
}
close(IN);
# Now replace filesize variables without altering total byte count.
select( (select(WORK), $ = 1) [0] ); # first flush output so we
if (($size = -s WORK) < 100000) # can get final file size
{ $scale = 0; } # and set scale factor or
else { # compute it, keeping width of size field low
for ($scale = 0; $size >= 1000; $scale++)
{ $size /= 1024; }
}
$filesize = sprintf "%7.7s %sbytes",
$size, (" ", "K", "M", "G", "T", "P") [$scale];
foreach $pos (@offsets) { # loop through saved size locations
seek WORK, $pos, 0; # read the line found there
$_ =
# $filesize must be exactly as wide as "(--mbfilesize)"
s//(--mbfilesize/)/$filesize/g;
seek WORK, $pos, 0; # rewrite it with replacement
print WORK;
}
close(WORK);
rename($workfile, "$filename")
or die("Could not rename /"$workfile/" to /"$filename/"");
# ---- end of Perl script ----
10. 作者地址
John A. Kunze
Center for Knowledge Management
University of California, San Francisco
530 Parnassus Ave, Box 0840
San Francisco, CA 94143-0840, USA
Fax: +1 415-476-4653
EMail: jak@ckm.ucsf.edu
11、參考資料
[AAT]Art and Architecture Thesaurus, Getty Information Institute.
http://shiva.pub.getty.edu/aat_browser/
[AC]The A-Core: Metadata about Content Metadata, (inprogress)
http://metadata.net/ac/draft-iannella-admin-01.txt
[DC1]Weibel, S., Kunze, J., Lagoze, C. and M. Wolf,"Dublin Core Metadata for Resource Discovery", RFC2413, September 1998.
FTP://ftp.isi.edu/in-notes/rfc2413.txt
[DCHOME]Dublin Core Initiative Home Page.
http://purl.org/DC/
[DCPROJECTS]Projects Using Dublin Core Metadata.
http://purl.org/DC/projects/index.htm
[DCT1]Dublin Core Type List 1, DC Type Working Group, March 1999.
http://www.loc.gov/marc/typelist.html
[freeWAIS-sf2.0] The enhanced freeWAIS distribution, February 1999.
http://ls6-www.cs.uni-dortmund.de/ir/projects/freeWAIS-sf/
[GLIMPSE]Glimpse Home Page.
http://glimpse.cs.arizona.edu/
[HARVEST]Harvest Web Indexing.
http://www.tardis.ed.ac.uk/harvest/
[HTML4.0]Hypertext Markup Language 4.0 Specification, April 1998.
http://www.w3.org/TR/REC-html40/
[ISEARCH]Isearch Resources Page.
http://www.etymon.com/Isearch/
[ISO639-2]Code for the representation of names of languages, 1996.
http://www.indigo.ie/egt/standards/iso639/iso639-2-en.html
[ISO8601]ISO 8601:1988(E), Data elements and interchange formats -- Information interchange - Representation of dates and times, International Organization for standardization, June 1988.
http://www.iso.ch/markete/8601.pdf
[MARC]USMARC Format for Bibliographic Data, US Library of Congress.
http://lcweb.loc.gov/marc/marc.html
[PERL]L. Wall, T. Christiansen, R. Schwartz, Programming Perl, Second Edition, O'Reilly, 1996.
[RDF]Resource Description Framework Model and Syntax Specification, February 1999.
http://www.w3.org/TR/REC-rdf-syntax/
[RFC1766]Alvestrand, H., "Tags for the Identification of Languages", RFC1766, March 1996.
ftp://ftp.isi.edu/in-notes/rfc1766.txt
[SWISH-E]Simple Web Indexing System for Humans - Enhanced.
http://sunsite.Berkeley.EDU/SWISH-E/
[TGN]Thesaurus of Geographic Names, Getty Information Institute.
http://shiva.pub.getty.edu/tgn_browser/
[WTN8601]W3C Technical Note - Profile of ISO 8601 Date and Time Formats.
http://www.w3.org/TR/NOTE-datetime
[XML]Extensible Markup Language (XML).
http://www.w3.org/TR/REC-xml
12、版權(quán)聲明
Copyright (C) The Internet Society (1999). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise eXPlain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFCEditor function is currently provided by the Internet Society.
新聞熱點(diǎn)
疑難解答
圖片精選
網(wǎng)友關(guān)注