RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论
当前位置 :| 首页>人工智能>信息检索>

Content(1)

来源: 作者:unkonwn 时间:2004-12-01 点击:

(4/17/05)  The second edition of the book describes the massive changes in the information world that occurred after the first edition.  Well, now there have been some pretty significant changes just a few years after the publication of the second edition.  These include:
  • The near-complete transition to electronic publishing of biomedical journals - Over 4,500 of the 4,800 journals indexed in MEDLINE now have links from the MEDLINE record to the publisher′s site, i.e., the journals are available electronically.  Unfortunately, the rules governing access to these journals vary widely (and monetarily!), but we have reached the point where almost all biomedical journals are now available in electronic form.
  • Standalone content is a rarity now, particularly with textbooks.  Especially with the consolidation of the medical publishing industry to a few major publishers, most textbooks are now part of larger aggregations of content.  While they still can be accessed individually if desired, they are typically part of packages offered by subscription.  In addition, a growing number of search systems search over multiple resources simultaneously.
  • A whole new generation of Web-based content has emerged, such as blogs, Wikis, RSS feeds, and podcasting.  More of these throughout this chapter update.
  • A growing amount of information is available on personal digital assistant (PDA) devices.
A great source for updates on all content produced by the NLM is the NLM Technical Bulletin, available at:
http://www.nlm.nih.gov/pubs/techbull/tb.html

(4/20/03)  The Severe Acute Respiratory Syndrome (SARS) outbreak demonstrated not only how mainstream IR technologies have become, but also how quickly information can be obtained and disseminated.  The Centers for Disease Control quickly established a Web site that provides comprehensive information to researchers, clinicians, and patients (http://www.cdc.gov/ncidod/sars/).  Furthermore, the Internet enabled scientists to collaborate in unprecendented ways, with the prime example being the rapidity of which the genome of the newly discovered coronavirus causing the illness was sequenced (http://www.cdc.gov/ncidod/sars/sequence.htm). 数据挖掘研究院

4.1  Classification of health information

4.2  Bibliographic

4.2.1  Literature reference databases

4.2.1.1  MEDLINE

(4/17/05)  MEDLINE update:  It now contains over 13 million records and covers over 4,800 journals.  Over 500,000 new records are now added annually.  Nearly 89% of the citations are published in English, although 29 other languages are represented.  About 76% of the records have English abstracts, including some non-English articles.  The database is updated weekly.  An updated fact sheet about MEDLINE is available at:
http://www.nlm.nih.gov/pubs/factsheets/medline.html

The NLM recently developed a fact sheet concerning notices in MEDLINE records (http://www.nlm.nih.gov/pubs/factsheets/errata.html):
  • Errata - errors that occur anywhere in the publishing process, from simple typographical errors to substantive scientific ones.  If the error is small, the MEDLINE record will have an entry in the "Erratum in" (EIN) field that references the text in the journal correcting the error.  If the error is substantial, however, a new MEDLINE record will be created for the notice in the journal where the error is corrected.  In this case, the record will have the publication type Published Erratum and the reference for the original article will appear in the "Erratum for" (EFR) field.  In both instances, if there is a correction to the title or abstract, the notation "[corrected]" will appear by the corrected text.
  • Retraction - articles that are withdrawn by authors, sponsors, or publishers due to prevasive error or unsubstantiated data.  Retracted articles do not have their citations removed from MEDLINE, but instead, the record is given the publication type Retracted Publication and the retraction information appears in the "Retraction in" (RIN) field.  The latter usually links to a new record for the article in the journal describing the retraction.  This new record has a publication type Retraction of Publication and the links back to the record for the retracted publication via the "Retraction of" (ROF) field.
  • Correction and republished articles - articles that have errors or other problems substantial enough to warrant correction with republication.  These records are given the publication type Corrected and Republished Article and have a link back to the original publication in the "Corrected and republished from" (RPF) field.  The original citation has a link to the corrected article in its "Corrected and republishied in" (RPI) field.
  • Duplicate publication - articles with substantial amounts of duplicate text that are published in more than one location without acknowledgement.  All duplication publications are given the publication type Duplicate Publication.
  • Author response to comments - letters to the editor that include a response from the original author.  There is not a separate field, but the page field of the letter will include the text "author reply."
  • Updated articles - articles that update a previous article.  The original article points to the updated article in its "Update in" (UIN) field, while the updated article points back to the original article in its "Update of" (UOF) field.
  • Patient summaries - a growing number of articles include summaries for patients.  These are given their own MEDLINE records with the publication type Patient Education Handout.  The record links to the original record via its "Original report in" (ORI) field, while the original record links to it via its "Summary for patients in" (SPIN) field.

    数据挖掘实验室


Some older aspects of MEDLINE have been retired.  One of these is the paper version of the venerable Index Medicus (Anonymous, 2004).  For many older users of the biomedical literature, these huge paper catalogs used to be the entry point in the biomedical literature.  Now, of course, accessing MEDLINE is virtually ubiquitous through PubMed, which is freely available on the Web anywhere.  Another feature of MEDLINE recently retired is the old MEDLINE UI (Unique Identifier), which has been replaced by the PMID as the only unique identifier for MEDLINE and OLDMEDLINE citations (Tybaert and Rosov, 2004).

Despite new changes in MEDLINE, the OLDMEDLINE database, representing citations from before the official 1966 "start date" of MEDLINE, continues to grow (Demsey et al., 2003).  OLDMEDLINE now has over 1.7 million references dating back to 1950.  A fact sheet about it is available at:
http://www.nlm.nih.gov/databases/databases_oldmedline.html

The table of MEDLINE subsets in the book is somewhat out of date.  Here is a list of the current subsets:

Subset Contains citations about or includes
AIDS AIDS and HIV
Bioethics Bioethics
Cancer Cancer
Complementary and alternative medicine Complementary and alternative medicine
Core clinical journals Several hundred clinical journals
Dental journals Dentistry
History of medicine History of medicine
MEDLINE MEDLINE citations only
Nursing journals Nursing
Old MEDLINE MEDLINE citations prior to 1966
PubMed Central Articles in PubMed Central
Space life sciences Space life sciences
Toxicology Toxicology
数据挖掘实验室
Anonymous (2004). Index Medicus to Cease as Print Publication. NLM Technical Bulletin. May-June 2004. e2. http://www.nlm.nih.gov/pubs/techbull/mj04/mj04_im.html.
Demsey, A., Nahin, A., et al. (2003). OLDMEDLINE Citations Join PubMed. NLM Technical Bulletin. September-October, 2003. e2. http://www.nlm.nih.gov/pubs/techbull/so03/so03_oldmedline.html.
Tybaert, S. and Rosov, J. (2004). MEDLINE Data Changes - 2004. NLM Technical Bulletin. Bethesda, MD, National Library of Medicine: 335:e6. http://www.nlm.nih.gov/pubs/techbull/nd03/nd03_med_data_changes.html.

(5/6/03)  For an overview of the history of MEDLINE, see:
Zipser, J. (1998). MEDLINE to PubMed and Beyond. National Library of Medicine. http://www.nlm.nih.gov/bsd/historypresentation.html . Accessed: May 1, 2003.

数据挖掘实验室


4.2.1.2  Other NLM Bibliographic Resources

(4/17/05)  The LocatorPLUS system has been made accessible under the NLM′s Entrez system as the NLM Catalog (Jacobs, 2004).  This allows improved searching functionality and integration with all of the other resources in Entrez.

Jacobs, A. (2004). New Entrez database: NLM Catalog. NLM Technical Bulletin. September-October, 2004. e2. http://www.nlm.nih.gov/pubs/techbull/so04/so04_entrez_cat.html.

4.2.1.3  Non-NLM Bibliographic Databases

(5/6/03)  A database of peer-reviewed journal literature for the complementary and alternative medicine field is the Manual Alternative and Natural Therapy Index System (MANTIS, http://www.healthindex.com/MANTISAbout.html ).  MANTIS indexes over 1,000 journals and has over 60,000 records in its database.  Some full-text articles are being added as well. 数据挖掘研究院

(4/19/04)  A number of bibliographic resources are valuable especially for the biomedical informatics field.  One of these was mentioned in the book in the Preface and Chapter 10, but really should be mentioned here.  This is CiteSeer (also at one point called ResearchIndex, http://citeseer.ist.psu.edu/cis ), which maintains a database of computer science-oriented (including biomedical informatics) scientific literature.  Each record contains bibliographic data, links to the full text (if available), and links to other papers that it cites as well as those that cite it.

(4/18/04)  Other bibliographic databases for computer science include:

4.2.2  Web Catalogs

(4/17/05)  Not really a Web catalog per se, but a growing bibliographic-type resource on the Web is RSS, which is claimed to stand for either Really Simple Syndication or Rich Site Summary (Pilgrim, 2002, Hammersley, 2003; King, 2004).  RSS "feeds" provide short summaries, typically of news or other recent postings on Web sites.  Many news sites, such as CNN (www.cnn.com), BBC (www.bbc.co.uk), and USA Today (www.usatoday.com) provide them.  Users receive RSS feeds by an RSS aggregator that can typically be configured for the site(s) desired and to filter based on content.  (An RSS aggregator is built into the new FireFox Web browser from Mozilla.org.)

There are unfortunately a number of different versions of RSS, although each has the fundamental fields and most aggregators can handle all of the different versions.  The various versions can be grouped into two categories.  One category (version 1.0) builds on the Resource Description Framework (RDF) and aims to allow rich metadata, while the other category (version 2.0) uses plain XML and aims to keep things very simple.  The fundamental fields of RSS include:
  • Title - name of item
  • Link - URL of full page
  • Description - brief description of page
Here is an example of XML code from an RSS item from the BBC:
<title>
Google maps give fresh perspective
</title>
<link>
http://news.bbc.co.uk/go/rss/-/2/hi/technology/4448807.stm
</link>
<description>
Search engine Google offers users the chance to see satellite photos of many locations in North America. 数据挖掘研究院
</description>

数据挖掘研究院

RSS is not limited to news feeds.  In fact, there are a growing number of innovative uses for it in scientific fields (Hammond et al., 2004).  Certainly it can be used for newly published scientific papers an an information notification application, similar to the electronic table of contents most journals already offer.  This is already being done by the Nature Publishing Group as well as some of the journals published by Highwire Press.  Nature also circulates its job advertisements.

Hammersley, B. (2003). Content Syndication with RSS. Sebastopol, CA. O′Reilly & Associates.
Hammond, T., Hannay, T., et al. (2004). The role of RSS in science publishing:  syndication and annotation on the Web. D-Lib Magazine, 10(12). http://www.dlib.org/dlib/december04/hammond/12hammond.html.
King, A. (2004). Introduction to RSS. WebReference.com. http://www.webreference.com/authoring/languages/xml/rss/intro/. Accessed: April 17, 2005. 数据挖掘研究院
Pilgrim, M. (2002). What is RSS? XML.com. http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html.

(4/19/04)  Another Web catalog that is limited exclusively to "high-quality," i.e., evidence-based resources, is the Translating Research into Practice (TRIP, http://www.tripdatabase.com ).  The TRIP database allows searching over the titles and/or full-text of over 70 on-line resources, from full-text journals (e.g., British Medical Journal) to electronic textbooks (e.g., eMedicine) to EBM databases (e.g., Bandolier).  There is a basic free version and a commercial version that is more enhanced.

Two Web catalogs mentioned in the book have changed since publication.  Medical Matrix is now a commercial product requiring a paid subscription, while CliniWeb , my own pride and joy from the early days of the Web, is now defunct. 数据挖掘实验室
上一篇:Content(2)
下一篇:System Evaluation
最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?