Despite strengths of statistically rooted approaches such as ability to work with text in arbitrary human languages, there are risks, for instance, according to Vadim Berman of Digital Sonata, developer of the Carabao language kit, when linguistic rules are applied to texts whose characteristics don't match those of the training sets used to generate the rules. Faisal Mushtaq, CTO of media and market intelligence solution provider Biz360, explains, “No single technology or technique works the best. Automated analysis of unstructured text poses unique technology challenges requiring an interdisciplinary approach to text analysis. A good solution is a combination of the 'right' technologies to solve a real/immediate customer problem.”
In addition to the statistical vs. linguistic approaches, two hybrids are worth considering: 数据挖掘论坛
- Take into account fielded (usually numeric) information to improve sentiment-analysis accuracy. For instance, stars associated with Internet Movie DB comments hint at polarity. An Alvin and the Chipmunks reviewer – I refused to take my kids to that one myself – gave the movie 8 stars out of 10: it is likely that his sentiments captured in the text were generally positive and moderately forcefully held. It's not surprising that a 5/10 review has the title, "A huge disappointment for fans of this memorable series" and 10/10 is coupled with "I just LOVED IT!" Similarly, a hotel guest who chose a Fair rating in a satisfaction survey is likely to have posted more complaints than praise in free-text response fields.
- Try two analysis passes, the first using automated classification/extraction tools and the second for manual confirmation, correction, and augmentation as part of either a human-assisted machine learning approach where manual intervention tails off as you improve accuracy or an ongoing arrangement.