RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

Oracle: Separating Numbers and Letters

来源: 作者:unkonwn 时间:2004-11-21 点击:

The time has come to make account "numbers" in your customer database real numbers or real alphanumeric strings across the board. Why would you ever choose to do something like that, especially if you are dealing with millions of records? One reason is that you may be forced into doing so because your company was acquired, and the database integration between your system and your new parent company requires such an undertaking. Another reason is based on the rampant use of Social Security Numbers (SSN) as a means of identification. Chances are you have online accounts, which use all or part of your SSN as the account number, which, for what it′s worth, SSNs were never intended to be used for anything other than taxation and tracking Social Security benefits. Many colleges, as an example, have moved away from using the SSN as the primary means of identification for students.

数据挖掘研究院


Another reason for performing a conversion may be that you have exhausted the original pool of possible account numbers or that your random alphanumeric string generation scheme has been generating duplicates. So whatever the reason, how would you go about salvaging actual numeric strings and identifying those which need to be converted? Here is an example to help clarify the situation.


Suppose you have the following account numbers: A12345, 009431, and 783652. By quick inspection, A12345 obviously needs to be converted because of the letter "A." The third account number is a "real" number, so nothing needs to be done with it. The second account number is a bit tricky. The characters, so to speak, are all numeric, but is 009431 a number you really want to keep given that most systems ignore leading zeroes? Moreover, if the new system has account numbers all the same length (six in this scenario), then 9431 fails to meet the minimum length. The end result is that the salvageable account numbers, which are truly numeric, must also be tested for length (or a minimum value). 数据挖掘研究院


What schemes are at your disposal to identify strings one way or the other? Math, translation, date functions, and regular expressions are but four ways to approach and solve this problem.

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?