RSS
热门关键字:  数据挖掘  数据仓库  人工智能  搜索引擎  数据挖掘导论
当前位置 :| 首页>人工智能>信息检索>

Health and Biomedical Information(1)

来源: 作者:unkonwn 时间:2004-12-01 点击:

2.1  What is information?

2.2  Theories of Information

2.2.1  Shannon and Weaver

2.2.2  Other models of information

(4/6/04)  Another model of information is as "bundles," which consist of small, annotated, and organized units that humans employ as "track" through information space (Gorman et al., 2000).  The methods to observe their use and guide development of digital libraries has been described by Gorman et al. (2002).  Ash et al. (2001) have described how bundles appear in clinical settings, from formal information instruments such as flowsheets to scribbled annotations on the wrappers of gauze pads.

Gorman, P., Ash, J., Lavelle, M., Lyman, J., Delcambre, L. and Maier, D. (2000). Bundles in the wild:  managing information to solve problems and maintain situation awareness. Library Trends, 49: 266-289.
Gorman, P., Lavelle, M., Delcambre, L. and Maier, D. (2002). Following experts at work in their own information spaces: using observational methods to develop tools for the digital library. Journal of the American Society for Information Science & Technology, 53: 1245-1250. 数据挖掘研究院
Ash, J., Gorman, P., Lavelle, M., Lyman, J., Delcambre, L., Maier, D., Bowers, S. and Weaver, M. (2002). Bundles:  meeting clinical information needs. Journal of the Medical Library Association, 89: 294-296.

2.3  Properties of scientific information

2.3.1  Growth

(4/5/03)  Another means to assess the growth of medical knowledge has been to measure the weight of the paper volumes of Index Medicus (the paper-based forerunner to MEDLINE).  Durack (1978) found that Index Medicus remained relatively steady in weight from its inception in 1879 through the mid-1940s at approximately 2.0 kg.  After that point, however, it began to increase in weight exponentially, growing nearly sevenfold in size between 1955 and 1978.  Durack noted that this growth aproximately followed de Solla Price′s observations about the "doubling time" of scientific literature.  He expressed concern that if this trend were to continue, Index Medicus would consist of 200 volumes and weigh 1,000 kg by 1985.  (Interestingly, he did not think about the database in its electronic MEDLINE form and the much lighter weight of electronic data.  In fact, the current version of MEDLINE, even with its 40 gigabytes of size, weights practically nothing on a set of DVD disks!  But that′s not really what he was measuring.) 数据挖掘研究院

Madlon-Kay (1989) extended the analysis for another decade.  She found that the growth after 1978 reverted to a linear rate.  She expressed relief that Index Medicus would not exceed 1,000 kg until the year 2027.  (As if the paper version will still be published by then!)

Durack, D. (1978). The weight of medical knowledge. New England Journal of Medicine, 298: 773-775.
Madlon-Kay, D. (1989). The weight of medical knowledge:  still gaining. New England Journal of Medicine, 321: 908.

2.3.2  Obsolescence

(3/1/03)  Another study of information obsolescence looked at the "truth survival" of conclusions in the domain of liver cirrhosis and hepatitis (Poynard et al., 2002).  The goal of the study was to determine whether information generated by the best evidence-based means had a longer survival when obtained in studies of higher methodological quality.  The authors identified 474 conclusions in the published literature from 1945-1999 and found that 285 (60%) were still true in 2000, 91 (19%) were obsolete, and 98 (21%) were false.  The half-life of truth in this domain was 45 years (in stark constrast to the half-life figures for citations presented in the book).  The most interesting results were that survival of conclusions was not higher in studies of better methodological quality than those of lesser quality, and that the 20-year survival of conclusions derived from meta-analysis was lower (57%) than in non-randomized studies (87%)or randomized controlled trials (85%).

Poynard, T., Munteanu, M., et al. (2002). Truth survival in clinical research:  an evidence-based requiem? Annals of Internal Medicine , 136: 888-895.

(4/4/05)  An additional study about the long lead time between publication of scientific results and their adoption into clinical practice comes from Balas and Boren (2000), who noted an average of 17 years between something first being discovered and put into routine clinical practice.  Many medical tests and treatments do get accepted into practice more quickly than this, but for some, the lag time is probably too long.

Balas, E. and Boren, S. (2000). Managing Clinical Knowledge for Health Care Improvement, 65-70, in vanBemmel, J. and McCray, A., eds. Yearbook of Medical Informatics 2000 - Patient-Centered Systems. Stuttgart, Germany. Schattauer.

2.3.3  Fragmentation

2.3.4  Linkage

(4/3/03)  Another large-scale project of bibiographic linkage is the Erdös Number Project.  This project is part of the "folklore of mathematicians," who measure their distance in co-authorship from the prolific Hungarian mathematician, Paul Erdös.  Erdös published over 1,400 scientific papers and had over 500 co-author collaborators.  The mathematical community has undertaken building a collaboration graph for its community with approximately 337,000 authors of 1.6 million authored items in the Math Review database.  Erdös is at the center of that graph. An “Erdös number” is thus the smallest number of coauthorship links between an individual and Erdös.  Therefore, someone who co-authored with Erdös has an Erdös number of 1.  Anyone who co-authored with any one of those co-authors has an Erdös number of 2.  I was surprised to find that I have a relatively low Erdös Number of 4, thanks to my former postdoc Andrew Turpin (http://goanna.cs.rmit.edu.au/~aht/), who was a graduate student of Alistair Moffat ( http://www.cs.mu.oz.au/~alistair/ ), who has one of the lowest Erdös numbers (2) in the IR community. 数据挖掘研究院

The Web site for the Erdös Number Project is at:
http://www.oakland.edu/~grossman/erdoshp.html

(4/3/03)  A paper analyzing bibliometric issues in the ACM SIGIR community was published in 2002:
Smeaton, A., Keogh, G., et al. (2002). Analysis of papers from twenty-five years of SIGIR conferences:  What have we been doing for the last quarter of a century? SIGIR Forum, 36(2): 39-43. http://www.acm.org/sigir/forum/F2002/smeaton.pdf .

(3/31/04)  Garfield et al. (2003) have developed a tool called HistCite, which provides a historiographic depiction of the citations emanating to and from papers.  They demonostrate that for seminal papers, the important literature of a field can be easily identified.

Garfield, E., Pudovkin, A. and Istomin, V. (2003). Mapping the output of topical searches in the Web of Knowledge and the case of Watson-Crick. Information Technology & Libraries, 22: 183-187. 数据挖掘研究院

(4/3/05)  The most recent impact factors (IFs) for general medical journals and medical informatics journals respectively are shown in the tables below from the 2003 edition of the ISI Journal Citations Report.

Journal
Impact Factor
New England Journal of Medicine
34.8
Journal of the American Medical Assocation
21.5
Lancet
18.3
Annals of Internal Medicine
12.4
Annual Review of Medicine
11.4
British Medical Journal
7.2
Archives of Internal Medicine
6.8
Canadian Medical Association Journal
4.8
Medicine
4.5 数据挖掘实验室
American Journal of Medicine
4.4

Journal
Impact Factor
Journal of the American Medical Informatics Association
2.5
Statistical Methods in Medicine and Research
1.9
Medical Decision Making
1.7
Methods of Information in Medicine
1.4
IEEE Transactions on Information Technology in Biomedicine
1.3
Artificial Intelligence in Medicine
1.2
Journal of Evaluation of Clinical Practice
1.2
International Journal of Medical Informatics
1.2
Statistics in Medicine 数据挖掘研究院
1.1
Medical Informatics and the Internet
0.9

(4/3/05)  The debate over the true value of IFs continues and will probably never end.  Nakayama et al. (2003) assessed the IFs of the citations included in the US government′s Guide to Clinical Preventive Services, Second Edition.  This guide reflects the best evidence for clinical preventive services.  Not surprisingly, the largest number of citations came from journals with high IFs.  Of the 1,740 citations in the 25 chapters of the report, the most commonly represented journals were Journal of the American Medical Association (135), American Journal of Preventive Medicine (102), British Medical Journal (77), and Lancet (70).  The IFs of the 56 journals having five or more citations in the report were widely distributed, however.  Six (11%) journals had an IF >10, but half of the journals (28, or 50%) had an IF < 3, and the median IF was 2.76.  There was a correlation between IFs and number of times cited in these guidelines.  However, this analysis showed that many articles having high-quality clinical evidence were published in low-IF journals.  An editorial in British Medical Journal assessed some of the other social aspects of IFs, such as academic promotion, in light of these findings and advocated the dumping of IFs (Abbasi, 2004).

数据挖掘研究院



Abbasi, K. (2004). Let′s dump impact factors. British Medical Journal, 329. http://bmj.bmjjournals.com/cgi/content/full/329/7471/0-h.
Anonymous (1996). Guide to Clinical Preventive Services, Second Edition. Washington, DC. Office of Disease Prevention and Health Promotion, Department of Health & Human Services. http://odphp.osophs.dhhs.gov/pubs/GUIDECPS/.
Nakayama, T., Fukuhara, S., et al. (2003). Comparison between impact factors and citations in evidence-based practice guidelines. Journal of the American Medical Association, 290: 755-756.

2.3.4.1  Citations

(3/31/04)  Another type of analysis done in bibliometrics is co-citation analysis, which measures the number of times that pairs of authors are cited together by another paper.  Co-citation analysis can help show authors whose work is similar in scope.  Andrews (2003) performed such an analysis for the field of medical informatics, with a particular focus on members of the American College of Medical Informatics ( http://www.amia.org/acmi/acmi.html ), a body of elected fellows who have made significant and sustained contributions to the field.    This article shows not only that my work is closest to that of Keith Campbell, Betsy Humpreys, Mark Tuttle, and Christopher Chute, but that I am the 21st most highly cited individual in this group of leaders of the field.

Andrews, J. (2003). An author co-citation analysis of medical informatics. Journal of the Medical Library Association, 91: 47-56.

2.3.4.2  Author productivity - Lotka′s Law

2.3.4.3  Subject dispersion - Bradford′s Law

(3/31/04)  Another area where Bradford′s Law has been shown to apply is Web site citation analysis.  Cui (1999) analyzed the Web citations (links) for library sites from 19 of the top 25 ranked medical schools in the US.  The distribution of top-level domain (e.g., .com or .edu), first-level domain (e.g., the part of the URL up to the first slash, e.g., www.irbook.info/), and whole URLs were analyzed.  When the total number of first-level domains were segregated into three groups based on total frequency (1731), their absolute counts came close to obeying the 1:n:n 2 distribution (78:452:1201 or 1:4:42).  This study also found that 90% of the top-level domains were for the US-based top-level domains (.com, .edu, .gov, .org). 数据挖掘研究院

Cui, L. (1999). Rating health web sites using the principles of citation analysis:  a bibliometric approach. Journal of Medical Internet Research , 1: e4. http://jmir.org/1999/1/e4 .

2.3.4.4  Journal importance - Impact Factor

(4/1/03)  Another study assessing the validity of impact factor was recently performed (Saha et al., 2003), asking 113 physician who were predominantly practitioners and 151 physicians who were graduates of advanced training programs in clinical and health services research to rate the quality of nine general medical journals.  The correlation of impact factor and physicians′ rating of journal quality was high overall (r 2 = 0.82), and higher for the group of researchers (r 2 = 0.83) than the practitioners (r 2 = 0.62).  Table 2 in this paper provides a fascinating look at the subjects′ rating of quality, the impact factors, and the numbers who read and/or subscribe to the journals.

Saha, S., Saint, S., et al. (2003). Impact factor:  a valid measure of journal quality? Journal of the Medical Library Association , 91: 42-46.

(4/5/03)  A footnote to Table 2.1:  In 2001, the impact factor for the Journal of the American Medical Informatics fell precipitously to under 1.0.  The reason for this is that the Proceedings of the AMIA Annual Symposium , labeled as the "JAMIA Supplement," were included in the calculation.  This vastly increased the formula′s denominator far out of proportion to the numerator.  As such, AMIA has removed the "JAMIA Supplement" moniker from the conference proceedings and the impact factor should return to its previous level.

2.3.5  Propagation

(3/31/04)  Although not a section in the original book, interest in the notion of information propagation has been revived with the growth of the Internet and Web, which provides a vast new medium for information spread.  The notion of the propagation of information can be traced back to Dawkins (1976), whose book laid out the ideas of memes , which are information patterns that are held in a person′s memory but can be copied to another.  The field that studies the replication and evolution of memes is called memetics .  There are many Web sites devoted to memetics, e.g., (Heylighen et al., 1999). 数据挖掘研究院

Dawkins give examples of memes as "tunes, ideas, [and] catch-phrases," that propagate from "brain to brain."  Memes have been likened to genes, but are more appropriately compared to T-phage viruses, which cannot replicate themselves but take over a cell′s DNA to cause it to make millions of copies of itself (Heylighen et al., 1999).  Memes can affect the mind like a parasite, causing an individual to change his or her behavior and/or pass the idea on to others.  Memes are selected or, in genetic terms, have fitness by a variety of properties such as novelty, coherence, and self-reinforcement.  If they do not have the capability to survive, then they may die out.

The Internet is a (relatively) new medium for the wide spread of memes.  The frequent forwarding of emails as well as visiting of Web sites are common means for memes to propagate.  One consequence of such easy spread of information is the propagation of misinformation, which are sometimes called "urban legends" (e.g., http://www.urbanlegends.com , http://www.snopes.com ). 数据挖掘研究院

Heylighen, F., Joslyn, C. and Turchin, V. (1999). Principia Cybernetica Web. http://pespmc1.vub.ac.be/TOC.html . Accessed: October 20, 2003.
Dawkins, R. (1976). The Selfish Gene. New York. Oxford University Press.

2.4  A classification of textual health information

2.5  Production of health information

2.5.1  The generation of scientific information

(4/3/05)  Another way to think of the scientific literature is via its "life cycle."  The figure below somewhat recapitulates what is described in the book, but also shows some of the additional wrinkles of the modern scientific discovery and publishing process.

Life cycle

Many also argue that there is much more to scientific data than publications, and that the sheer volulme of information is enormous.  This is perhaps described most eloquently by Insel et al. (2003), who note the massive amounts of things like nucleotides in the genome and neurons in the brain.  These authors also note that the scientific publication process is "slow and expensive," and advocate for more data sharing among researchers.

数据挖掘研究院



Of course, some knowledge is never obtained because it is "forbidden" to be studied (Kempner, 2005).  Knowledge may be forbidden because it can only be obtained through unethical means, e.g., human experiments conducted by Nazis.  But other research is prohibited by what Kempner et al. call "informal constraints."  This may involve fear from results being attacked by political groups across the spectrum, from religious groups to animal rights activists.  Clearly there must be some ethical constraints on the conduct of science, but not merely if they offend the political agenda of a particular group.

Related to forbidden knowledge is the focus of scientific literature on diseases and their treatments pertinent to developed countries.  Raja and Singer (2004) note that little content in all of the "major" journals is relevant to developing countries, but that British journals do a better job than their American counterparts.

Kempner, J., Perlis, C., et al. (2005). Forbidden knowledge. Science, 307: 854.

数据挖掘研究院


Insel, T., Volkow, N., et al. (2003). Neuroscience networks:  data-sharing in an information age. PLoS Biology, 1(1): E17. http://biology.plosjournals.org/plosonline/?request=get-document&doi=10.1371/journal.pbio.0000017.
Raja, A. and Singer, P. (2004). Transatlantic divide in publication of content relevant to developing countries. British Medical Journal, 329: 1429-1430.

(4/3/04)  How prevalent is the problem of duplicate publication?  von Elm et al. (2004) analyzed 141 systematic reviews published in anesthesiology and available on the Internet.  Of these reviews, the authors of 56 acknowledged identification of duplicate articles (excluding abstracts, letters, and book chapters), leading them to identify 103 duplicates of 78 articles (60 were published twice and the remainder more than twice).  The duplicates were not mere reproductions, but fell into more complex (one might say covert) patterns (number of article pairs in parentheses): 数据挖掘研究院
  • Study samples identical
    • Outcomes identical - one report (21) or more than one report (16)
    • Outcomes different (24)
  • Study samples different
    • Outcomes identical - increasing sample (11) or decreasing sample (11)
    • Outcomes different (20)
All but 5.3% of the papers referenced the earlier duplicates.  Two-thirds differed in authorship partially or completely.  The annual citation rate was about equal for each in the pair.  The median appearance of the duplicate was at about one year.

von Elm, E., Poglia, G., Walder, B. and Tramer, M. (2004). Different patterns of duplciate publication:  an analysis of articles used in systematic reviews. Journal of the American Medical Association , 291: 974-980.

(4/5/04)  Another concern about the prodcution of biomedical literature is that the clinical trials carried out do not meet the needs of "decision makers," in particular, those who develop policy, practice guidelines, and so forth.  Tunis et al. (2003) have called for more effort on pragmatic or practical clinical trials (PCTs).  The characteristics of PCTs they deem most important include the selection of clinically relevant interventions for comparison, diverse populations of study participants, recruitment from heterogeneous practice settings, and data collection from a broad range of clinical outcomes.  They lament that the major funders of clinical research, namely the National Institutes of Health and the medical products industry, do not focus on supporting these types of clinical trials.

数据挖掘研究院



Tunis, S., Stryer, D. and Clancy, C. (2003). Practical clinical trials - increasing the value of clinical research for decision making in clinical and health policy. Journal of the American Medical Association, 290: 1624-1632.

2.5.2  Study of the peer review process

(4/3/05)  I agree with my colleague Tefko Saracevic, who has said (personal communication) that the peer review process determines more where an article is published than whether it is published.  Another example of those who are already successful continuing to achieve success can be found in the book, The Jordan Rules (Smith, 1994), in which an analysis of calls by NBA referees found a tendency to give this (and probably other) superstars the benefit of the doubt in foul calls.

Smith, S. (1994). The Jordan Rules. New York, NY. Pocket Books.

(4/3/05)  Another study of problems with the peer review process comes from the social work literature (Epstein, 2004).  In this study, two "stimulus" articles, written with both a positive and negative interpretation, were submitted to 31 social work journals.  The acceptance rates between the positive and negative versions were significant for one of the articles but not the other.  The timeliness and quality of the peer reviews were considered inadequate in 73.5% of the reviews. 数据挖掘研究院

Epstein, W. (2004). Confirmational response bias and the quality of the editorial processes among American social work journals. Research on Social Work Practice, 14: 450-458.

(4/4/04)  The systematic review of the peer review process by Jefferson et al. (2002) described in the text showed that research has not demonstrated the benefits of this process.  Twenty-one studies of the process were found and led to a variety of conclusions (number supporting each conclusion in parentheses):
  • Concealing identities of peer reivewers or authors does not appear to affect quality of reviews (9)
  • Checklists and other attempts at standardizing the process do not appear to help (2)
  • Training of referees does not improve the quality of reviews (2)
  • Electronic media do not improve quality (2)
  • Peer review does not detect bias against unconventional drugs (1)
  • The process may improve readability and general quality of papers (2) 数据挖掘研究院
Jefferson, T., Wager, E. and Davidoff, F. (2002). Measuring the quality of editorial peer review. Journal of the American Medical Association , 287: 2786-2790.

2.5.3  Primary literature and its limitations

(4/3/05)  There continues to be concern about the reporting of clinical trials, both that their results are selctively reported or that they are not reported at all.  Evidence for the former was reported by Chan et al. (2004), who assessed 102 clinical trials and their clinical outcome measures that were approved by ethics committees in Denmark during 1994-1995.  They found that 50% of the efficacy outcomes and 65% of the harm outcomes were incompletely reported.  About 62% of the trials had at least one clinical outcome that had been changed, introduced, or omitted from the original study protocol.  A survey of trial authors denied the existence of unreported outcomes despite their existence identified by Chan et al..  This study validates a concern that reports of clinical trials are incomplete. 数据挖掘实验室

A related concern, described in the book, is studies that do not get published at all.  An instance of this came to light with a study that a pharmaceutical manufacturer had concealed negative information about an antidepressant drug (Steinbrook, 2003).  This in turn has prompted calls for a clinical trials registry so that negative results do not get suppressed.  One possibility would be to expand the ClinicalTrials.gov database for this purpose, although it does not currently have all of the fields that would be required (Steinbrook, 2003).  Another possibility would be the use of the new drug application (NDA) database of the US Food and Drug Administration (Turner, 2004).  A recent statement by the ICMJE advocates registration of clinical trials (DeAngelis et al., 2004).

Chan, A., Hrobjartsson, A., et al. (2004). Empirical evidence for selective reporting of outcomes in randomized trials:  comparison of protocols to published articles. Journal of the American Medical Association, 291: 2457-2465.
DeAngelis, C., Drazen, J., et al. (2004). Clinical trial registration: a statement from the International Committee of Medical Journal Editors. Journal of the American Medical Association, 292: 1363-1364.
Steinbrook, R. (2004). Public registration of clinical trials. New England Journal of Medicine, 351: 315-317.
Turner, E. (2004). A taxpayer-funded clinical trials registry and results database. PLoS Medicine, 1: 180-182.

(4/3/05)  Which journals publish the most high-quality (i.e., evidence-based) studies?  McKibbon et al. (2004) looked at publications that provide summaries of "clinically important" articles (e.g, ACP Journal Club, Evidence-Based Medicine), finding that in a given area, a small number of journals publish the lion′s share of high-quality clinical studies.  They assessed 60,352 articles in 170 journals and found the following results by field:

数据挖掘研究院


  • In internal medicine (ACP Journal Club), four titles provide 56.5% of articles, while 27 supply the rest.
  • In general/family medicine (Evidence-Based Medicine), five titles provide 50.7% of articles, while 40 supply the rest.
  • In nursing (Evidence-Based Nursing), seven titles provide 51.0% of the articles, while 34 supply the rest.
  • In mental health (Evidence-Based Mental Health), nine titles provide 53.2% of the articles, while 34 supply the rest.
For the medical (but not the nursing) fields, there was a correlation between number of clinically important articles and the journal′s IF.

McKibbon, K., Wilczynski, N., et al. (2004). What do evidence-based secondary journals tell us about the publication of clinically important articles in primary healthcare journals. BMC Medicine, 2: 33. http://www.biomedcentral.com/1741-7015/2/33. 数据挖掘研究院

(4/7/05)  A whole host of other limitations continue to impact the biomedical literature.  These problems do not necessarily invalidate the literature or imply that the process is fatally flawed.  But they do remind us that we must be aware of the limitations of the scientific process and strive to minimize them.

One problem that was described in the book and continues now is statistical reporting.  An analysis by Garcia-Berthou and Alcaraz (2004) found that one or more incongruencies occured with statistical reporting in the pretigious journals Nature and British Medical Journal in 38% and 25% of papers respectively.  In 12% of these instances, the significance levels (P value) could be incorrect by an order of magnitude or more.  Most errors were presumed to be due to rounding, transcription, or type-setting problems.

Another continuing problem is errors in references.  Aronsky et al. (2005) assessed this problem in the biomedical informatics literature and found problems similar to those described in earlier studies in other scientific publications.  These authors assessed the five biomedical informatics journals with the highest IFs for each journal′s first issue of 2004.  They found 311 errors in 225 of the 656 references (34.3%) in 37 articles.  The percentage of articles with errors varied by journal, from 22.1% for Journal of the American Medical Informatics Association to 40.7% for International Journal of Medical Informatics.  The most common element with an error was the author name (31%), followed by the title (17%), page (7.4%), and year (3.5%). 数据挖掘研究院

A new wrinkle to the problem of inadequate citations is Web references provided in scientific papers that are inaccessible or incorrect.  Crichlow et al. (2004) assessed URLs in the references of all original research papers in five major medical journals that were published in January, 2004.  In 91 articles analyzed, there were 68 URLs in the references, 8.6% of which were inaccessible.  These authors noted that de Lacey et al. (1985) had found a similar 8% overall rate of errors in citations in the paper-based journal literature in 1985.

Despite the improvement in abstracts with the introduction of structured abstracts in the 1980s, they typically do not mention limitations of studies.  In 2004, the Annals of Internal Medicine introducted a new section to their structured abstract, Limitations.

Anonymous (2004). Addressing the limitations of structured abstracts. Annals of Internal Medicine, 140: 480-481.
Crichlow, R., Winbush, N., et al. (2004). The accessibility and accuracy of Web references in five major medical journals. Journal of the American Medical Association, 292: 2723-2724.
Aronsky, D., Ransom, J., et al. (2005). Accuracy of reference in five biomedical informatics journals. Journal of the American Medical Informatics Association, 12: 225-228.
de Lacey, G., Record, C., et al. (1985). How accurate are quotations and references in medical journals? British Medical Journal, 291: 884-886.
Garcia-Berthou, E. and Alcaraz, C. (2004). Incongruence between test statistics and P values. BMC Medical Research, 4: 13. http://www.biomedcentral.com/1471-2288/4/13.

(4/3/04)  A further systematic review of the fate of biomedical meeting abstracts verified that only about 46% of abstracts presented at such meetings achieve publication of the full paper in a journal (von Elm et al., 2003).  These authors also reviewed the fate of abstracts originally rejected, finding 27% were eventually published as full papers.  Studies in basic science and those having a positive outcome were more likely to eventually be published as papers.  Abstracts were more likely to be published if they were presented orally, at a small meeting, or a US meeting. 数据挖掘研究院

von Elm, E., Costanza, M., Walder, B. and Tramer, M. (2003). More insight into the fate of biomedical meeting abstracts:  a systematic review. BMC Medical Research Methodology, 3: 12. http://www.biomedcentral.com/1471-2288/3/12 .

(4/4/04)  There are additional concerns about compromised validity of information in journals short of outright fraud.  One oft-cited culprit is the pharamceutical industry, both through its influence on the content as well as advertising in journals.  In a systematic review of studies comparing research sponsored by the pharmaceutical industry with that sponsored by others, Lexchin et al. (2003) analyzed 30 studies and found that the latter was more likely to have a positive outcome and less likely to be published.  Their analysis did not, however, find these studies were of poorer quality.  The better outcomes were explained by inappropriate comparator products and publication bias. 数据挖掘研究院

Another concern related to pharmaceutical industry influence is selective reporting of results.  One widely known case concerned the drug celecoxib, a non-steroidal antiinflammatory drug (NSAID) claimed to have reduced risk of gastrointenstinal complications than others in its class.  The authors of a clinical trial comparing this drug with others were taken to task for not reporting the same details in a published clinical trial that were reported to the US Food & Drug Administration (FDA) in their application to receive approval for its use (Jüni et al., 2002).  Jüni et al. noted that the paper had already been cited 169 times while these issues came to light and that the company ordered 30,000 reprints from the publisher.  Clearly the stakes of publications in journals can be high.

Another area of concern related to misleading drug information concerns advertisements in journals.  In an editoral, Fletcher (2003) notes that advertisements are a major source of revenue for journals and provide resources to support the journal or the organization (often a professional society) that publishes it.  However, he notes that while physicians claim not to have their practices influenced by advertisements, the advertisers would unlikely spend thousands of dollars per physician per year that they do if they had no effect. 数据挖掘研究院

Advertisements themselves, which readers encounter alongside the scientific papers in journals, can be misleading in their content.  Wilkes et al. (1992) found that 44% of advertising would lead to improper prescribing if the physician had no other information.  They also noted that 92% of advertisements included at least one area that did not comply with regulations of the FDA.  Villanueva et al. (2003) looked at all advertisements for blood pressure-lowering and lipid-lowering medications in six Spanish medical journals during 1997.  In a sample of references cited in the advertisements, they found that while 18% of the references could not be retrieved.  In addition, 44% of the claims made were not completely supported by the reference, usually due to the drug being recommended in a patient group other than which it was studied.

Also problematic in advertisements may be the graphics.  Cooper et al. (2003) analyzed all ads in ten US medical journals in 1999.  Half of the ad area consisted of nonscientific figures and images.  About 1.6% of the area contained scientific graphs.  Over a third had some numerical distortion that led to overestimation or underestimation of the quantity being graphed, which is specifically prohibited by FDA regulations.

数据挖掘研究院



Another area of concern described in the text is conflict of interest.  A recent episode led to the partial retraction of a paper by Lancet, when it was discovered that the primary author did not disclose funded by a group of lawyers representing alleged victims autism due to the measles, mumps, and rubella (MMR) vaccine (Horton, 2004).  This demonstrates that conflict of interests are not necessarily limited to those who stand to gain from sale of products.

Lexchin, J., Bero, L., Djulbegovic, B. and Clark, O. (2003). Pharamceutical industry sponsorship and research outcome and quality:  systematic review. British Medical Journal, 326: 1167-1170.
Jüni, P., Rutjes, A. and Dieppe, P. (2002). Are selective COX 2 inhibitors superior to traditional non steroidal anti-inflammatory drugs? British Medical Journal, 324: 1287-1288.
Fletcher, R. (2003). Adverts in medical journals:  caveat lector. Lancet, 361: 10-11.
Wilkes, M., Doblin, B. and Shapiro, M. (1992). Pharmaceutical advertisements in leading medical journals:  experts′ assessments. Annals of Internal Medicine, 116: 912-919. 数据挖掘实验室
Villanueva, P., Piero, S., Librero, J. and Pereiro, I. (2003). Accuracy of pharmaceutical advertisements in medical journals. Lancet, 361: 27-32.
Cooper, R., Schriger, D., Wallace, R., Mikulich, V. and Wilkes, M. (2003). The quantity and quality of scientific graphs in pharmaceutical advertisements. Journal of General Internal Medicine, 18: 294-297.
Horton, R. (2004). A statement by the editors of The Lancet. Lancet , 363: 820-821.

2.5.4  Meta-analysis and its limitations

(4/3/05)  This section should really focus first and foremost on systematic reviews, with the view that meta-analysis may or may not be possible in a given systematic review.  A key point about systematic reviews is that they represent a systematic attempt to bring together all the knowledge in a given area.  If it is appropriate to pool results of experiments, this can be done via meta-analysis.  But even if meta-analysis is not done, just the process of a systematic review can generate new knowledge by giving insights to the (relatively) complete overall picture of an area.

数据挖掘研究院



What are the characteristics of systematic reviews published in the medical literature?  Montori et al. (2003) assessed 170 well-known clinical journals for the year 2000, counting 60,330 articles.  Of these articles, 26,694 were original research reports and 3,193 were review articles.  Of the review articles, 768 (24%) were systematic reviews, defined as articles that clearly stated a clinical topic, how the evidence was retrieved, what sources the evidence was retrieved from, and what the inclusion and exclusion criteria were.  The majority of systematic reviews were about therapy (63%), followed by causation and safety (29%), diagnosis (4.4%), and prognosis (2.1%).  About 80% of all the systematic reviews were published in 11% of the journals.  The IF of these journals was weakly but significantly assocation with the publication of systematic reviews.  Systematic reviews were more likely to be cited by other papers in these journals than narrative reviews.

One aspect that characterizes systematic reviews is their reporting of explicit search strategies and assessment of their effectiveness.  Patrick et al. found that while the majority of meta-analyses (71%) reported a search strategy, only a small number (6.7%) reported evidence of the strategy′s effectiveness.

Several references describe the methods for producing systematic reviews in great detail:  the Cochrane Reviewers′ Handbook (Anonymous, 2005) and the book by Glasziou et al. (2001).

Anonymous (2005). Cochrane Reviewers′ Handbook. Cochrane Collaboration. http://www.cochrane.org/resources/handbook. Accessed: March 27, 2005.
Glasziou, P., Irwig, L., et al. (2001). Systematic Reviews in Health Care:  A Practical Guide. Cambridge, UK. Cambridge University Press.
Montori, V., Wilczynski, N., et al. (2004). Systematic reviews:  a cross-sectional study of location and citation counts. BMC Medicine, 1: 2. http://www.biomedcentral.com/1741-7015/1/2. 数据挖掘研究院
Patrick, T., Demiris, G., et al. (2004). Evidence-based retrieval in evidence-based medicine. Journal of the Medical Library Association, 92: 196-199.

(4/4/04)  Another analysis of the grey (unpublished) literature was at odds with the analysis of McAuley et al. (2002) cited in the text.  Hopewell et al. (2003) found that published trials demonstrated an overall larger treatment effect.  This analysis also found that published trials were were likely to be larger and of higher methodologic quality.  Grey literature most commonly consisted of either abstracts (49%) or unpublished data (33%).

Hopewell, S., McDonald, S., Clarke, M. and Egger, M. (2003). Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Library. http://www.cochrane.org/cochrane/mrabstr/mr000010.htm .

(4/4/04)  Despite the prevalence of IR systems, handsearching of the literature is still required to identify all trials to include in meta-analyses.  Thirty-four assessments in a variety of topical areas have demonstrated that handsearching yields 92-100% of all reports of RCTs, whereas searching of MEDLINE and other databases reveals only 49-67% (Hopewell et al., 2003) . 数据挖掘实验室

Hopewell, S., Clarke, M., Lefebvre, C. and Scherer, R. (2003). Handsearching versus electronic searching to identify reports of randomized trials. Cochrane Library. http://www.cochrane.org/cochrane/mrabstr/mr000001.htm .

2.5.5  Secondary literature and its limitations

(4/4/04)  The text notes that new research findings are often slow to appear in textbooks.  However, another problem in textbooks concerns the promulgation of information not very well investigated in the first place.  One area where this has been found to be problematic concerns findings on physical examination in diseases.  Richardson and Wilson (2002) have noted, for example, that frequencies of findings in diseases are often not described in popular internal medicine textbooks.

In this era of continually increasing costs, there is an oft-stated desire for physicians to rely more on the physical exam and less on expensive testing.  Medical textbooks are particularly noteworthy for their description of eponymous physical findings, often which are used by elderly attending physicians to demonstrate their knowledge.  Most of these findings, however, have been less subject to the modern scrutiny of diagnostic test evaluation.  Babu et al. (2003) looked at 12 eponymous signs of aortic regurgitation, the condition where the aortic valve does not close completely and allows blood to "leak" back into the heart.  This can lead to congestive heart failure and other complications.  While these signs are described in major textbooks, the actual evidence supporting them is for the most part minimal.  The authors call for physical findings to be evaluated using the more modern techniques of evidence-based medicine (see Section 2.8). 数据挖掘研究院

Richardson, W. and Wilson, M. (2002). Textbook descriptions of disease - where′s the beef? ACP Journal Club, 137: A11-A12.
Babu, A., Kymes, S. and Carpenter-Fryer, S. (2003). Eponyms and the diagnosis of aortic regurgitation:  what says the evidence? Annals of Internal Medicine, 138: 736-742.

2.6  Electronic publishing

(4/3/05)  Electronic publishing of scientific journals has continued to evolve since publication of the book.  Virtually all biomedical journals are now available in electronic form, and academic medical center libraries subscribe to hundreds if not thousands.  The biggest challenge now are the non-technical issues, particularly economic ones.  This has motivated the advocacy of "open access" publishing as described below.

Another major shake-up to the publishing world is the Google Scholar system (scholar.google.com).  Google has entered into an agreement with several large and prestigous universities to digitize their collections (Markoff and Wyatt, 2004).  Non-copyrighted works will be completely available, while only excerpts of copyrighted publications will be accessible.  The process of capturing documents with high-reolution cameras will be very labor-intensive.  Google Scholar also contains scientific publications it has found from its crawling the Web for content in its regular system.  It also has established linkages across publications.  Google′s effort is not the only large-scale one to digitize collections; the Library of Congress is undertaking a similar effort (Markoff and Wyatt, 2004).  Nor is the plan without controversy, as publishers have expressed concern about copyright violations, although most will likely opt in to it (Butler, 2005). 数据挖掘研究院

Butler, D. (2005). Publishers irritated by Google′s digital library. Nature, 433: 446.
Markoff, J. and Wyatt, E. (2004). Google is adding major libraries to its database. New York Times. December 14, 2004. http://www.nytimes.com/2004/12/14/technology/14google.html.

(4/1/03)  Controversy and tension over public databases continues is not limited to biomedicine.  In 2002, the Department of Energy closed down the PubScience database and its associated Web site.  PubScience was a bibliographic database covering a wide spectrum of chemistry and physics literature.   PubScience was to the physics and chemistry communities what PubMed is to the biomedical community.  The closure was felt by some to be in part due to the pressure of commerical interests.  An overview of the issues can be found in a statement published by the American Library Association (Sheketoff, 2002). 数据挖掘研究院

Sheketoff, E., Baish, M., et al. (2002). PubSCIENCE: A Unique and Needed Scientific Resource. American Library Association. http://www.ala.org/washoff/pubscience.pdf .

(4/1/03)  An overview of electronic journals in medicine was provided by Curran (2002).  He reviewed the historical evolution of electronic medical journals and described how the electronic process improves their production and timeliness.  He also claimed that publication bias against studies with negative results could occur with them, though does not really provide any data to support it being any different than publication bias with paper journals.

Curran, C. (2002). The medical journal meets the Internet. First Monday, 7: 6. http://firstmonday.org/issues/issue7_6/curran/index.html .
最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?