RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

New CLAIRLIB Release

来源: 作者:互联网作品 时间:2007-02-11 点击:

IntroductionCLAIR (Computational Linguistics And Information Retrieval) group is happy to present the first release of the Clair library.

The University of Michigan's 数据挖掘研究院

The Clair library is intended to simplify a number of generic tasks in Natural Language Processing (NLP) and Information Retrieval (IR). Its architecture also allows for external software to be plugged in with very little effort.

数据挖掘研究院

FunctionalityDownload PrerequisitesMEAD

Native: Tokenization, Summarization, LexRank, Biased LexRank, Document Clustering, Document Indexing, PageRank, Biased Pagerank, Web Graph Analysis, Bioinformatics Text Analysis, Political Science Text Analysis, Network Building, Power Law Distribution Analysis, Network Analysis and Computation (Watts-Strogatz Clustering Coefficient, Cosines, Random Walks), Tf, Idf 数据挖掘研究院

Imported: Stemming, Sentence Segmentation, Web Page Download, Web Crawling, XML Parsing, XML Tree Building, XML Writing 数据挖掘研究院

 

The current version is available for beta testing. Write to radev@umich.edu to get a beta copy.

 

You need Perl, some external software, and a number of external modules that you can download from CPAN (see list below).
  •  
  • Adwait Ratnaparkhi's MxTerminator
  • from CPAN: Net::Google, HTML::LinkExtractor, HTML::Parse, Statistics::ChisqIndep, Graph::Directed, BerkeleyDB, Math::MatrixReal, Lingua::Stem, IO::File, POSIX, Math::Random, IO::Handle, IO::File, IPC::Open2, Carp, IO::Pipe, Getopt::Long
A number of the Clair modules require Perl 5.8.2 to run - otherwise you will experience errors when trying to run certain processes. The modules requiring Perl 5.8.2 include:
  • Clair::Cluster
  • Clair::Document
  • Clair::Network
  • Clair::NetworkWrapper
  • CIDR::Wrapper
  • MEAD::Wrapper
  • Findbin

ModulesGetting startedREADME file contains information about how to set up Clairlib. This file is also available is included in the Clairlib tar.gz file.

Unit Teststest_aleextract.txt

Here is the content of a number of the tests included in you distribution.

DocumentationClair lib tutorial

AcknowledgmentsAboutClair group at the University of Michigan.
  • Project design: Dragomir R. Radev
  • Main implementers: Anthony Fader, Mark Hodges
  • Additional code by: Adam Winkel, Samuela Pollack, Scott Gifford, Timothy Allison, Gunes Erkan, Patrick Jordan, Aaron Elkiss, Michael Dagitses, Mark Joseph, Joshua Gerrish

This work has been supported in part by grants R01 LM008106 and U54 DA021519 from the National Institutes of Health as well as grant IDM 0329043 "Probabilistic and link-based Methods for Exploiting Very Large Textual Repositories" from the National Science Foundation.

 

The Clair Library is developed by the
  • Tf.pm
  • Idf.pm
  • TFIDFUtils.pm
  • WebSearch.pm
  • MxTerminator.pm
  • Robot2.pm
  • Parse.pm
  • CorpusDownload.pm
  • CIDR/Wrapper.pm
  • Essence/IDF.pm
  • Essence/Centroid.pm
  • Essence/Text.pm
  • MEAD/DocsentConverter.pm
  • MEAD/Wrapper.pm

 

The
最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?