conspire, catastrophe, and cowardly are negative. 数据挖掘研究院
It is dangerous, however, to judge sentiment only by the presence of valence words. Throw in a negator such as not or never and you flip the valence. Intensifiers – for instance, very and most – indicate the strength of the sentiment expressed. Modal operators such as might, could, and should distinguish hypothetical from real situations and weaken intensity, as in Polanyi's and Zaenen's example sentence “If Mary were a terrible person, she would be mean to her dogs.” Other, “presuppositional” terms such as barely and even, similarly relate what the speaker/writer observes to his or her expectations. They can also help us distinguish subjective statements from objective ones.
数据挖掘实验室
We can start with a lexicon of all these expressive words, and perhaps we'd even build it up and refine it via some form of machine-learning process that starts from a manually annotated training set. A deeper linguistic analysis, based on word-scale to document-scale analysis of text, brings us a long way toward our goal of inferring meaning. 数据挖掘工具
Other information-extraction approaches are more quantitative. They analyze text using Bayesian statistical models for pattern matching that discern relationships among disparate pieces of information – the meaning of texts and the entities contained – via “interaction analysis.” Autonomy is a proponent of this technique, applied for instance by their etalk subsidiary in mining recorded call-center audio. When dealing with spoken language, it is possible to add attributes such as voice volume and pitch, which suggest emotion and emotional intensity, to the mix. Sequence is also important (just as it is for life-sciences researchers who apply text analytics to study protein interactions), providing additional context that supports sentiment analysis.