With the rapid development of internet, more and more kinds of online information are available. In these information resources, there are abundant of subjective comments and views, for example, comments for electronic products, cars, movies, or reviews for some events or policies. It is valuable to decide whether the comments are positive or negative, if the reviews are supportive or oppositive, and if a product is recommendable or not? These requirement elicited the research topic of sentiment analysis. In some literatures, sentiment analysis is also referred as opinion mining.
Nowadays, sentiment analysis is deployed very extensively in internet settings, because it can, to some extend, help to manage large scale of information and to locate interesting information. Particularly, sentiment analysis is very helpful to analyze consumersâ€™ feedbacks and consuming tendency. One example is in recommending systems, sentiment analysis helps to automatically classify online feedbacks of products or services, and select recommendable ones for consumers. Huge social networking company like facebook, twitter and google are trying to implement these analysis systems.
We implemented two algorithms for this purpose.
1. Feature Highlighting
Here the documentâ€™dâ€™ is consider as set of features with corresponding weights â€˜wâ€™. In this algorithm the sentiment is classified based on sentimental words such as â€œgood, bad, happyâ€. The polarity is determined by use of the sentimental words having highest count.
2. Feature Bagging
Here we are presented with bags full of words. We analyze the patterns to determine the polarity. The classifiers are trained with random examples and then combined into a final classifier to determine the sentiment. Larger the examples more efficient the algorithm becomes. This algorithm is provide efficiency but lacks in performance. It is also called as a machine learning technique as we provide a training dataset to it for analyzing the pattern.
Feature Highlighting gave greater performance than Feature Bagging, which is expected as it requires less number of comparison. We tested feature highlighting with 200,400,600,800 and 1000 data set which gave better performance, on the other hand feature bagging took large resources and was comparatively slower. We tested it for 100,200,300,400 and 500 data set. The larger number of comparison took more CPU resources.
The speed is dependent on number of iterations made, since feature highlighting is simpler, it required less comparison than feature bagging which is more complex. The performance of feature highlighting is expected to remain constant, however for feature bagging the response time will be slower as the example set are increased. Thus, feature highlighting is more efficient for larger amount of data, for smaller amounts of data that is for checking smaller number of posts at once the performance difference is negligible.
Feature Highlighting yield a result of around 85% on IMDB test data set, while feature highlighting yield result of 81%, which is expected to increase as the training data set increases