Optimizing Scoring Functions and Indexes for Proximity Searc

1  Introduction

Information Retrieval (IR) and XML query systems are headed for significant unification. Starting with simple fielded search [2], IR engines are getting more sophisticated at handling more complex annotation tags inlined with source text [25]. Meanwhile, XML engines are adding text search features [13,5,30,14,1]. The Unstructured Information Management Architecture (UIMA, http://www.alphaworks.ibm.com/tech/uima) is moving to standardize annotation and indexing modules and define a query paradigm based on XML fragments.
Ontologies (e.g. http://www.ontoweb.org/), linguistic knowledge bases [26] and processors [8], semantic taggers [15,10], and named entity recognizers are the major sources of annotations added to a corpus. Consider the noun is-a (hypernym) hierarchy in WordNet, and suppose every noun (phrase) in the corpus has been annotated with all its hypernyms. E.g., if the token Einstein appears at some position, we recognize that a special case of types Physicist and person (among others) appears at this position.
Suitable positional indexes of such annotations, integrated with text, are key enablers of the Semantic Web vision, because they let us ask powerful queries that combine syntactic match with type constraints.
Such queries can be used to guide contextual information extraction and aggregation where scanning the whole corpus (e.g. the Web) would be prohibitive. E.g., we may be interested in extracting numeric tokens adjacent to hours and close to the words laptop and battery. We may even use instances of specified types to activate instances of other types, e.g., we may want to instantiate type person near type money amount and words UN and oil.
Apart from soft aggregation, we can also use such queries as initial approximations to natural language questions. E.g., to answer the question "Who discovered the theory of relativity", we can seek instances of person in close proximity to stems discover, theory and relativity, using a suitable function to weight and aggregate the evidence from occurrences of the three keywords. Recent NLP work [21] has enabled mapping questions to target entity types with high accuracy. Here we consider the task of retrieving answer candidates using those types.
In this paper we will work with the is-a relation, but our system works with any partial order on entities, such as is-part-of, is-contained-in, etc. While small type systems may suffice for specific applications, open-domain query systems need very large type systems to be useful. Bootstrapping large type systems and semantic relations from the Web [10,28,11,6] is an active research area, but efficient index and query support are needed to fully realize their potential.

1.1  Our contributions

We identify a broad class of proximity queries that combine text and annotation. Despite its simplicity, our framework is surprisingly general and useful for a number of text mining applications. We describe a system we are building around Lucene [2] and UIMA (http://www.alphaworks.ibm.com/tech/uima) that enables efficient annotation indexing and search. We intend to make the system publicly available. We identify two key technical problems that must be solved: learning proximity scoring functions and workload-adaptive index optimization.
Scoring functions are fairly standardized in traditional IR but largely secret in Web search. Early XML query semantics were set- and path-oriented; later, scoring schemes were adopted from IR and Pagerank [13,5,30,16,14,1], but these are not yet grounded in evaluation as extensive as standard IR. Therefore, we must learn good proximity scoring functions automatically before we can consider index and query optimization.
We design a simple and efficient learning algorithm to fit a proximity-based scoring function to logs of queries and known answers. When it is plugged into the query processor, we get high accuracy: recall at rank 300 increases from 0.8 (standard IR) to 0.9 and typically, a correct answer token appears at rank 2-4. Surprisingly, we find the best-fit scoring function to be non-monotonic wrt to the distance between selectors and candidate tokens.
Large and deep type hierarchies are essential to support open-domain search. Consequently, the index space required for the type annotations becomes very large compared to the standard inverted index, which can be compressed significantly [31]. We devise new algorithms that exploit the skew in the distribution of answer types (atypes) in query logs (i.e., historical workload statistics) and cut down the additional index storage requirement by 85-90%, while sacrificing query processing speed by a modest factor between 1 and 3. Our indexes occupy about the same total space as the corpus in gzipped form and an optimized IR index.
We present large-scale experiments with two 5 GB corpora having a million documents each, some 400 million distinct token positions, an atype hierarchy with 80,000 types (18,000 non-leaf types), and several hundred real-life queries from the TREC (http://trec.nist.gov/data/qa.html) competition. We use TREC data mainly because of the "truthed" collection of questions and answers, but our conclusions appear general. We present data on both the quality of answers and the performance of our system.

1.2  Related work

IR engines can support basic proximity clauses with user-supplied windows. The popular IR package Lucene [2] provides basic fielded search with typically a limited number of fields, and does not natively support proximity clauses across fields. INDRI [25] has a rich and powerful query language that can support hard proximity clauses across tagged fields, but a hard proximity window must be specified by the user; also, we do not know of any workload-sensitive index optimization in INDRI.
The Bindings Engine [4] (BE) is a critical breakthrough toward semantic pattern searches, and is specifically optimized for text, but it depends strongly on immediate juxtaposition of ground constants with patterns to instantiate, as in "cruel ProperNoun said" or "cities such as NounPhraseList". BE is reported as using a handful of types (parts of speech and named entities like person, place, time) which still leads to an index 10 times larger than an IR index. However, thanks to the more restricted query class, BE uses more sequential disk access than our system.
The database and XML communities are steadily adding IR support [20]. COMPASS [14] and XXL [30] are notable systems integrating text and structure search, but to our knowledge do not learn textual proximity. COMPASS supports "concept" searches of the form "concept=value" where the concept may have soft aliases, but not "concept near value" in our sense.
Index space is a significant concern in adding IR support to XML systems. Florescu et al. [12] survey work in the database and XML communities and propose a text extension to XML-QL that allow element-level word matches. They report an index that is 12 times larger than the uncompressed XML source, and do not consider proximity-based scoring.
In principle, we can turn our corpus into a giant XML graph with one token per node and try to apply activation-based query systems such as ObjectRank [3], XRank [16] or TeXQuery [1]. However, given an average English token is 4-5 bytes, and compresses down to 0.5-1 byte in the IR index, even a 32-bit ID per such node would be prohibitive. Another problem is that (as we shall see) good scoring functions in the linear token space do not necessarily decay monotonically with distance.
Elegant data structures have been proposed to deal with "hard" proximity constraints [23,27], but they do not support learnable proximity scores, type containment or self-tuning indexes that exploit the query distribution. Earlier IR systems have exploited query skew in index-pruning [9] and caching [29] to achieve impressive reduction of IR index sizes and query times, but do not consider queries involving a type hierarchy.
fig/fig-system.png
Figure 1: System architecture. In this paper we are concerned with the blocks highlighted with dotted lines: learning a scoring function, and workload-driven index optimization.

2  System architecture

Figure 1 shows our system architecture. The system input consists of a text corpus, an atype taxonomy with associated annotators, and pairs of sample queries with truthed response tokens embedded in their document contexts. From the query-response pairs, we learn a scoring function that rewards and combines proximity between candidate tokens and matched selectors. This module is described in Section 3. The scoring function is plugged into the query processor. We also optimize our indexes using the query logs; this is described in Section 4.

2.1  Atype taxonomy, corpus and annotators

The atype taxonomy is a DAG where nodes are atypes and edges represent the is-a relation. A number of sources can provide atype taxonomy information. Here we use WordNet [26], but we are also integrating data from http://en.wikipedia.org/ to increase recall. Is-a instances can also be bootstrapped from the Web effectively [11]. In this paper, we will refer to nodes using WordNet synset notation, e.g., Einstein#n#2 is the second noun sense of Einstein, meaning genius.
The corpus is a set of documents. Each document is a sequence of tokens. Tokens can be compound, such as New_York. An annotator module (Figure 1) connects some tokens to nodes in the atype taxonomy. E.g. the string token Einstein might be connected to both senses Einstein#n#1 (the specific Physicist) Einstein#n#2 (genius). Now, through is-a connections in WordNet, we know that Einstein is-a person#n#1. Disambiguation can be integrated into the annotator module, but is an extensive research area in NLP [24] and is outside our current scope.

2.2  Query and candidate responses

A query consists of two parts. The first part is an atype from the taxonomy. The second part is a set of indexable predicates on token strings. The simplest predicate is string or stem equality, but we can also use a fixed library of regular expressions, also called surface patterns. (Multiple atypes can be allowed easily, but we will not discuss this here.)
To find the distance between Hamburg and Munich, we might use the atype linear_measure#n#1 and literal matches for Hamburg and Munich. The system and experiments in this paper used only equality predicates on stems; in such cases, we call string literals selectors. The atype may also be specified by a surface pattern, in this example, say, by [:digit:]+, a sequence of digits.
Any token in the corpus that is connected to a descendant of linear_measure#n#1 is a candidate answer token. We admit a candidate only if at least one selector appears within W (typically, 50) tokens of the candidate. Many IR systems use similar pruning policies.
The score of a candidate depends on the selectors that appear nearby, as described in Section 3. For some applications, we simply need to report k candidate tokens in order of decreasing score. In other applications such as question answering, a short context around the candidate is submitted to a more sophisticated NLP module.

2.3  Indexes

The query processor opens a cursor on the posting list for each atype and selector, and does a BE-style [4] multiway merge while pushing candidate tokens with scores into a scoring heap. In Section 4 we will build smaller indexes where most atypes will be "missing"-in that case, the query atype must be generalized to a coarser atype, and tentative result tokens checked using a forward index and a reachability index (described later in this section).

2.3.1  The stem and full atype indexes

As shown in Figure 1, in the first pass we build, apart from the forward and reachability indexes, two ordinary inverted indexes [31]. One maps from stems to posting lists. Each posting is a record containing a document ID where the stem appears, the number of times it appears in the document, and a list of token offsets where it appears. For English corpora, the size of the positional index is typically between 15-40% of size of the original (uncompressed) corpus, depending on the index compression techniques used. An IR index provides efficient sequential access through a posting list, so that multiple postings can be merged efficiently.
While indexing a stem, we also look for any atype attached to the token, and then follow all is-a links in the atype DAG up to the roots. For example, from the token Einstein in some document d, we can reach synsets physicist#n#1, intellectual#n#1, scientist#n#1, person#n#1, organism#n#1, living_thing#n#1, object#n#1, causal_agent#n#1, entity#n#1. We index all these atypes as if they all occurred at the same token offset in d as Einstein, in a second inverted index called the (full) atype index, i.e., in the full atype index, the posting list of each of the above atypes will include d.
A realistic and useful atype taxonomy can be quite deep-many prominent noun classes in WordNet are 6-12 links down from the noun roots. Consequently, as Figure 2 shows, the full atype index is almost the size of the uncompressed original corpus, over thrice the size of the gzipped corpus, and almost five times the size of the stem index. This would be unacceptable in most search applications, definitely large-scale services that need to cache substantial parts of the index in RAM. Luckily the query workload is very skewed, so we can choose only a subset of atypes to index; this is the topic of Section 4.

2.3.2  Search using stem and full atype indexes

The posting list for a term (an atype or a selector) helps us scan through (docid, tokenOffset) pairs in increasing lexicographic order. We use a bounded max-heap (of size k where k is the number of results sought) to keep track of the highest k scores. The memory required is bounded by the sum of the size of the score heap and the space required for holding all occurrences of candidates and selectors within a single document. We scan the postings lock-step, completely processing one document and inserting all its candidate locations into the score heap (evicting low-score locations if needed) and then move on to the next document.
Corpus/index Size (GB)
Original corpus 5.72
Gzipped corpus 1.33
Stem index 0.91
Full atype index 4.30
Reachability index 0.005
Forward index 1.16
Figure 2: Relative sizes of the corpus and various indexes for TREC 2000.

2.3.3  Reachability index

In Section 4 we will also need a reachability index: given two atypes a1 and a2, or an atype a1 and a token w, it can quickly (in O(1) time) tell us if a2 or w is-a a1 i.e. if a1 is an ancestor or generalization of a2 or w in the atype taxonomy. Because a type hierarchy is "largely" a tree, a simple Dewey coding [16] of the atype taxonomy leads to a reachability index small enough (5 MB from 80,000 atypes of which 18,000 are non-leaf, see Figure 2) to fit easily in RAM. Given a token, we can verify if it is a candidate, but the reachability index does not give us a direct index into candidate positions in the corpus. Obtaining compact reachability indices for arbitrary graphs is more complicated [7].

2.3.4  Forward index

Another useful index is the forward index which basically stores the original corpus as close to the gzipped size as possible, while allowing fast (docid,tokenOffset) queries, returning the actual token (which can now be tested using the reachability index). Most search systems need to keep around a forward index to generate query-specific snippets along with the top responses. We will use the reachability and forward indexes in conjunction in Section 4.
The forward index1 is built in two passes over the corpus. In the first pass, we count the frequency of each token in the corpus. Then we sort tokens by frequency and assign them variable-length integer IDs; frequent tokens get shorter IDs. In the second pass, we rescan the corpus and store packed bit-vectors for each document that can be unpacked and mapped back to tokens or token IDs.
Observe in Figure 2 that the forward index is even smaller than the gzipped corpus. Also, the forward index plus the reachability index add up to only about 25% of the full atype index.
Access to the forward index does require one random disk access per probe (a disk block generally holds several complete documents), but, as we shall see in Section 4, these accesses happen only for tokens that are top contenders for answering the query.

2.4  Experimental setup and evaluation

We used the TREC collection because it comes with a set of questions that can be mapped to precise atypes [21], and "truthed" answer passages. We used the TIPSTER and AQUAINT corpora adding up to over two million documents, which led to over 400 million term positions. Approximately 500 TREC questions were annotated with an atype.
Query time is measured as usual using real time and CPU time as appropriate. Our experiments were run on otherwise-unloaded P3/Xeon computers with 2-4GB RAM and Ultra320 SCSI disk. Index size is measured in bytes on disk.
The quality of a ranking of candidate tokens is measured in two standard ways. First, we measure the "recall-at-k": if our system returns k top-scoring tokens, in what fraction of questions do we nail an answer token (in the correct document and context specified by TREC)? Second, we measure the "mean reciprocal rank" or MRR used in the TREC community: if the first answer to query q is at rank rq, award a score of 1/rq, and average over the query set Q to get the MRR 1/|Q|åq Î Q(1/rq).

3  Learning to score candidates

Our goal here is to learn how to score and rank candidate tokens. The score of a candidate token (i.e., one that is a subtype or instance of the query atype) depends on three sets of quantities: statistical properties of matched selectors in the vicinity, the distance between those selectors and the candidate, and the manner in which contributions from different selectors are aggregated at the candidate token.
*  Selector energy Each matched selector s has an associated positive number called its energy, denoted energy(s). A common notion of energy is the inverse document frequency or IDF standard in IR: the number N of documents in the corpus divided by the number Ns of documents containing the selector token s. This is a linear form of IDF, the logarithmic form log(1+N/Ns) is more commonly used. We use the term selector energy in place of the more familiar term weight to highlight that our scoring function has a spreading-activation form.
*  Gap and Decay The gap between a candidate token w and a matched selector s, called gap(w,s), is one plus the number of intervening tokens. We consider each selector as spreading activation or energy to the candidate. We define the energy transmitted by the selector to the candidate to be energy(s) decay(gap(w,s)), where decay(g) is some function of the gap.
In many graph-based scoring systems such as ObjectRank [3], XRank [16] or TeXQuery [1] it is common to exploit a monotone decreasing decay(g)=dg, where 0 < d < 1 is a magic decay factor, for fast query execution. However, as we shall see, this form may not perfectly match data behavior.
fig/fig-score-eg.png
Figure 3: Selector activation example. Stems television and invent are selectors, person#n#1 is the atype, John Baird is an atype candidate.
*  Aggregation A selector s can appear multiple times near a candidate, we call this set {si}. If a is the candidate, our generic scoring function looks like
 
  score
(a)
=
 

数据挖掘论坛


Å
s 
 
Æ
i 
  energy
(si)   decay
( gap
(si,a)),
  (1)
where Æ aggregates over multiple occurrences of s and Å aggregates over different selectors. Sum or max can be used for Æ and Å, although sum behaves poorly as Æ because even a low-IDF selector can make the score less reliable by appearing often near a candidate.
Our query processor does not require any assumptions on the aggregation function, but we can think of other execution strategies that can take advantage of Æ = max and Å = S.

3.1  Data collection and representation

A few thousand TREC questions are available annotated with atypes [21]. We collected questions for which the answer tokens prescribed by TREC included at least one instance or subtype of the atype of the question. We report results on 261 usable questions from TREC 2000. For each question, we need positive (answer) and negative (candidate but not answer) tokens, and, to learn their distinction well, we should collect negative tokens that are "closest" to the positive ones, i.e., strongly activated by selectors.
To achieve this, we used the full atype index to locate all candidate tokens, and made a generous estimate of the activation from (the nearest occurrence of) each selector. This generous estimate used the log IDF as energy and no decay, i.e., energy was accrued unattenuated at the candidate position. For each query, we retained all positive answer tokens and the 300 negative tokens with top scores. Overall, we finished with 169662 positive and negative contexts. 5-fold cross-validation (i.e. 80% training, 20% testing in each fold) was used.
The next job was to turn contexts into feature vectors. Recall that there must be at least one selector match within W tokens of the candidate a. We set up this window with 2W+1 tokens centered at a, and retain only one instance of each selector, the one closest to a2.
As a first-cut, we can represent a context by a W-dimensional vector f, where fj is the energy sourced at position ±j wrt the candidate. If we are not sure which energy(·) function to use (log or linear form), we can always stick both values in a 2W-dimensional feature vector. Clearly, we can extend to more than two forms of the energy function, and can use separate parameters for the left and right context. In experiments, the log form by itself with symmetric decay gave better results, so, for simplicity, we will assume we have W-dimensional feature vectors.

3.2  Learning decay with hinge loss

For every gap value 1 £ j £ W, suppose the decay is bj. Then the score of a feature vector f is åj=1W fjbj, or b·f. If f+ (f-) is a feature vector for a positive (negative) context, b should ensure that b·f+ > b·f-, or b·(f+ - f-) > 0. From the answer contexts collected as above, we construct pairs (f+i,f-i) of positive and negative answer contexts, indexed by i, so that the constraints on b look like 数据挖掘论坛
 
b·(f+i - f-i)
³
0       for all i.
(2)
Herbrich et al. [17] and Joachims [19] suggested elegant max-margin solutions to ordering problems that we generically call RankSVM:
 
min
b,s ³ 0 
  1

2
b¢b + C
å
i 
si  s.t. for all i,    b ·xi + si ³ 1
(3)
where xi is shorthand for the difference vector f+i-f-i and no offset parameter b0 is needed. As with support vector classifiers, C is a tuned parameter that trades off the model complexity ||b|| against violations of the ordering requirements.
From our application perspective, 169662 passage contexts and millions of difference vectors x is just a beginning; the data is still sparse and any additional data helps accuracy. However, RankSVM executed millions of iterations with hundreds of millions of kernel evaluations, and, for some folds, failed to terminate in a day on a 3GHz CPU.

3.3  Learning decay with exponential loss

In equation (3), xi is assigned penalty Cmax{0,1-b·xi}, which can be bounded above by Cexp(-b·xi), giving us the unconstrained optimization that we call RankExp:

数据挖掘研究院


 
min
b 
  1

2
b¢b + C
å 数据挖掘研究院
i 
exp(-b·xi)
(4)
which may be potentially less accurate than a hinge-loss formulation, but allows us to use simpler optimizers such as L-BFGS [22]. This lets RankExp scale much better with increasing training set size, as shown in Figure 4. On identical data sets, for C Î [0.01,0.3] the fraction of orderings satisfied by RankSVM and RankExp, as well as the MRRs were typically within 3% of each other, while RankExp took 14-40 iterations or 10-20 minutes to train and RankSVM took between 2 and 24 hours.
fig/fig-svm-exp-cpu.png
Figure 4: Relative CPU times needed by RankSVM and RankExp as a function of the number of ordering constraints.

3.4  Typical parameters and roughness

Figure 5 shows a typical b vector, where bj gives the relative importance of a selector match at gap j (bjs can be shifted or scaled without changing the ranking). We did not expect the clearly non-monotonic behavior near j=0, and only in hindsight found that this is a property of language (perhaps already appreciated by linguists): selectors are often named entities, and are often connected to the answer token via prepositions and articles that creates a gap. This goes against conventional wisdom that spreading activation should monotonically decay with distance. We believe that a more detailed study of proximity functions in IR, taking linguistic effects into account, is overdue.
fig/fig-score-param.png
Figure 5: bj shows a noisy unimodal pattern.
In spite of the prominent non-monotonicity near j=1¼5, there is much noise at larger gap. To bias the learner to greater smoothness, we can pin a phantom bW+1=0, and penalize deviation of bj from bj+1:

数据挖掘研究院


 
min
b 
  W
å
j=1 
(bj - bj+1)2 + C 数据挖掘论坛
å
i 
exp(-b·xi),
(5)
where C is set by cross-validation. Figure 5 shows the smooth b, which also improves cross-validation accuracy slightly. It also gives us confidence in the non-monotonic nature of the best decay function.

3.5  Accuracy of learnt score function

In a standard IR system [31], the score of a snippet would be decided by a vector space model using selectors alone. We gave the standard score the additional benefit of considering only those snippets centered at an atype candidate, and considering each matched selector only once (use only IDF and not TF). Even so, a basic IR scoring approach was significantly worse than the result of plugging in the scoring function learnt by RankExp, as shown in Figure 6. Both recall and MRR (see Section 2.4) over held-out test data improve substantially.
b from Train Test R300 MRR
IR-IDF - 2000 211 0.16
RankExp 1999 2000 231 0.27
RankExp 2000 2000 235 0.31
RankExp 2001 2000 235 0.29
Figure 6: End-to-end accuracy using RankExp b is significantly better than IR-style ranking. Train and test years are from 1999, 2000, 2001. R300 is recall at k=300 out of 261 test questions. C=0.1, C=1 and C=10 gave almost identical results.

4  Workload-tuned atype index

As we saw in Figure 2, for a 5.72 GB corpus, the stem index occupied only 0.91 GB while the full atype index took 4.30 GB. In this section we explore how to exploit the skew in the distribution of query atypes to achieve a graceful trade-off between index space and query performance.

4.1  Characterizing a skewed workload

Figure 7 shows a sample from TREC query logs. There is much skew, but, as is generally appreciated in the Web search community, the distribution is heavy-tailed and the skew may be somewhat misleading.
100 integer#n#1
78 location#n#1
77 person#n#1
20 city#n
[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:Time-Dependent Semantic Similarity Measure of Queries Using
下一篇:北大天网用户数据(九、十月份)分析
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • Web数据挖掘的研究现状及发展
  • Web数据挖掘技术综述
  • 百度申请精确广告专利 欲抑制Google步伐
  • Web数据自动采集及其应用研究
  • 信息安全中的数据挖掘
  • 面向Web的数据挖掘
  • Extended Log File Format
  • 基于XML的Web数据挖掘在数字图书馆中的应用
  • XML与Web数据挖掘
  • Web数据挖掘
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • Any Extract (AE) 网站在线编辑
  • 信息安全中的数据挖掘
  • 基于XML的Web数据挖掘在数字图书馆中的应用
  • Web数据挖掘技术综述
  • Web数据挖掘
  • 北大计算机所万小军博士接连在国际一流学术
  • Refereed Papers on WWW2007
  • WWW2007 tutorials
  • WWW2007 workshops
  • Why ’08 Matters for the Web
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静