Salman Hameed
Data Science | Machine Learning | Deep Learning | Generative AI
Data Science | Machine Learning | Deep Learning | Generative AI
I built this awesome machine learning project called "Real-Time Abusive Language Detection with Machine Learning" from scratch, and I am excited to share its potential impact and technical achievements with you.
To develop this solution, I invested significant time and effort in curating a custom dataset, meticulously gathering real-world examples of abusive language. Understanding the importance of data quality, I performed comprehensive pre-processing techniques to normalize the dataset, handle stop words, address missing values, and eliminate noise and outliers. I also ensured the removal of duplicate values to improve the accuracy of the model.
Next, I applied advanced feature extraction methods, including Frequency-Inverse Document Frequency (TF-IDF) and N-Grams, to capture the unique linguistic patterns of abusive language. This allowed the model to learn and generalize from the text inputs effectively.
To evaluate the performance of different machine learning techniques, I split the data into 70% training and 30% testing sets. I then employed three classifiers/models, such as Naïve Bayes, Decision Tree, and Random Forest, comparing their accuracy and effectiveness in identifying abusive language. Utilizing a confusion matrix for each model, I quantified their performance and selected the best model that achieved an accuracy of 0.95.
To make this project accessible and user-friendly, I developed a responsive web application using Flask, Bootstrap, and AJAX. Users can simply type their text, and the application dynamically predicts and updates the results in real-time, eliminating the need for page refreshes. Moreover, the application goes beyond detection by counting the number of profanity words in the text, providing valuable insights into the severity of abusive language.
I created "Real-Time Abusive Language Detection" from scratch, curating a custom dataset and performing thorough preprocessing. Using TF-IDF and N-Grams, I trained Naïve Bayes, Decision Tree, and Random Forest models, achieving up to 93% accuracy. The project features a responsive Flask web app with AJAX for dynamic text analysis and profanity word count. Skills: Data Science, Machine Learning, Flask, AJAX, Python, NLP.