
We can clearly see that the review for the movie was negative. Text after removal of stop words: “movie good” Movie review: “The movie was not good at all.” For example, if we are training a model that can perform the sentiment analysis task, we might not remove the stop words. The removal of stop words is highly dependent on the task we are performing and the goal we want to achieve. Do we always remove stop words? Are they always useless for us? 🙋♀️ Removal of stop words definitely reduces the dataset size and thus reduces the training time due to the fewer number of tokens involved in the training. In order words, we can say that the removal of such words does not show any negative consequences on the model we train for our task. By removing these words, we remove the low-level information from our text in order to give more focus to the important information.

Stop words are available in abundance in any human language. Examples of a few stop words in English are “the”, “a”, “an”, “so”, “what”. These are actually the most common words in any language (like articles, prepositions, pronouns, conjunctions, etc) and does not add much information to the text. The words which are generally filtered out before processing a natural language are called stop words. There are many different steps in text pre-processing but in this article, we will only get familiar with stop words, why do we remove them, and the different libraries that can be used to remove them. Text pre-processing is the process of preparing text data so that machines can use the same to perform tasks like analysis, predictions, etc. 👍🏼 However, the same cannot be used directly by the machine, and we need to pre-process the same first. Natural Language Processing (NLP) is the branch of Artificial Intelligence that allows machines to interpret human language.

In order to create intelligent systems, we need to use this information that we have in abundance. 📗 We communicate with each other by directly talking with them or using text messages, social media posts, phone calls, video calls, etc. 🧑🏻💻 However, a large portion of the information we have is in the form of text. We are well aware of the fact that computers can easily process numbers if programmed well.
