Sentiment analysis is “the process of identifying and categorizing opinions expressed in the text to determine the user’s attitude towards the subject of the document”. Retrieval of various levels of sentiment is also being used to measure the intensity of the writer’s opinion. 
Many distinct textual structures are analyzed in the usual NLP process. This provides a quick understanding of the writer’s attitude. ‘Sentiment words’ convey a positive or negative meaning.
Sentiment analysis involves complex components, including entity-name recognition, anaphora recognition, parsing, sarcasm detection, and many others. Several data pre-processing and classification techniques have been researched for Sentiment analysis.
The Sentiment Detection Paradox
Sentiment analysis has traditionally been carried out with a surprisingly simple design. Such an approach struggles with complex language structures; it cannot cope when contextual information is needed to clearly understand a phrase (i.e. when the writer is using sarcasm or irony and thus is actually expressing the opposite of the text’s “meaning” as understood by a machine).
Humans can understand sarcasm because they look for the context: gestures like rolling eyes or a certain vocal inflection indicate that the meaning is different than what the actual words convey. These clues are not present in textual data, which makes written sarcasm difficult for machines and humans to accurately process.
Recent advances in natural language generation have intrigued audiences, as they can measure the negative or positive sentiment of a word or phrase. Still, the results of such advances are tainted by sarcastic sentiments; these constitute “noisy data” and need to be filtered out for accuracy and robustness. Sarcasm detection is essential when training data inputs, which can then be used for better natural language sentence generation.
Text Cannot Suffice for All Classification Tasks
- There are some textual compositions that smack of sarcastic insinuations, like punctuation marks that denote sarcasm or irony, e.g. an inverted question mark, etc.
- Paradoxical phrases or oxymorons are hard to classify, e.g. “You look good when your eyes are closed, but you look the best when my eyes are closed”, “the silent music was mesmerizing”, etc.
- Hyperbole (exaggerations) and figures of speech also present problems, e.g. “The old man said that he could sleep for a hundred years”, etc.
Analyzing information that comes before the text sometimes helps. Using anomaly detection in the sentiments of a text corpus may detect sarcasm with some reliability.
But such Machine Learning algorithms will still not be able to serve the purpose of accurately detecting sarcasm in texts. This is primarily due to the following reasons:
- There are numerous hurdles in interpreting sarcasm, let alone detecting it. The difficulty of sarcasm classification is present for both humans and machine learning methods.
- Sarcasm is expressed in means that are different from other forms of verbal expression and which cannot be directly translated into written text. This limits the training sets for machine learning models and tends to overfit.
Several data pre-processing techniques have been used. Researchers have worked on various classifiers and their results have been provided. Also, comparative research on these classifiers has been done to find which ones deliver improved results. Techniques like deep learning, tree-based algorithms, and ensemble classifiers have produced better results. These findings may be correct, but even so, they do not provide a robust solution for detecting sarcasm.
Technical articles are published from the Absolutdata Labs group, and hail from The Absolutdata Data Science Center of Excellence. These articles also appear in BrainWave, Absolutdata’s quarterly data science digest.