Modelling Language

phonedata · Post by **phonedata** » Mon Dec 23, 2024 5:26 am

The problem with scoring individual words is that we are missing all the subtlety of language. Scoring “bad” as -8, for example, might make sense overall but there is a big difference between “exceptionally bad”, “plain bad” and “really not bad at all”.

The model can therefore be improved by recognising “qualifier” words such as “very”, “rather”, and “not” which amplify, reduce or even negate the scoring impact of the keyword.

FastStats text model allows for a cyprus phone number pre-scored table of qualifier words. Of course we can’t hope to model all the complexity of language and will certainly be defeated by sarcasm. However within a limited subject area where reasonably simple language is the norm it is possible to get a reasonable level of automation.

5. Subject Sentiment
A similar word scoring strategy can be used in the scenario where you are interested in the reviewer’s sentiment around a particular subject. For example if your company produces bicycles and your latest model has changed from an aluminium to a steel frame you might be sensitive to whether the adjective qualifier words near “frame” are expressing positive or negative sentiment.