Natural Language Processing

Natural Language Processing

Research domains: Document Similarity, Recommendation systems, Fraud detection, Sentiment Analysis, Authorship attribution.

With increase in textual content  in every domain, SORAL takes the advantage of using several top natural language processing techniques to create impactful applications.

Research Publications :

  • “Detect review manipulation by leveraging reviewer historical stylometrics in Amazon, Yelp, Facebook and Google reviews”- 2020 The 6th International Conference on E-Business and Applications (ICEBA 2020) [ January 1, 2020 ]
    Publication URL: https://dl.acm.org/doi/abs/10.1145/3387263.3387272.

    Abstract: Consumers now check reviews and recommendations before consuming any services or products. But traders try to shape reviews and ratings of their merchandise to gain more consumers. Seldom they attempt to manage their competitor’s review and recommendation. These manipulations are hard to detect by standard lookup from an everyday consumer, but by thoroughly examining, customers can identify these manipulations. In this paper, we try to mimic how a specialist will look to detect review manipulation and came up with algorithms that are compatible with significant and well known online services. We provide a historical stylometry based methodology to detect review manipulations and supported that with results from Amazon, Yelp, Google, and Facebook.

  • “Stylometry as a Reliable Method for Fallback Authentication”- The 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON 2020) [ April 24, 2020 ]

    Publication URL: https://ieeexplore.ieee.org/document/9158216.

    Abstract: Over the decades with advancement in artificial intelligent systems, stylometry has proven to be crucial in authorship attribution. It is an evident that by natural development, writing styles between individuals are unique and cannot be fabricated. Stylometric analysis research has been done for author identification and there is significant progress to recognize an author based on their written texts. This paper aims to evaluate the efficiency of using stylometry as a fall back authentication method. We proposed to detect differences between writings on the same topic provided by a set of users and tested whether these differences are enough to use for an authentication system. We observed 74% accuracy in detecting the actual authors and concluded that with additional features the accuracy can be pushed to above 90%. Moreover, we deviced a threshold for authentication of a particular user. We observed that the combination of textual features can support authenticity of the user. We also analyzed the impact of some data cleaning systems like removing stop words and punctuation marks, and how they affected the overall detection outcome.

  • “Can NLP techniques be utilized as a reliable tool for medical science?” -Building a NLP Framework to Classify Medical Reports
    – IEEE 11th Annual Information Technology, Electronics and Mobile Communication Conference (IEEE IEMCON 2020) [ Vancouver, September 2020 ]

    Publication URL: https://ieeexplore.ieee.org/document/9284834.

    Abstract:Artificial intelligence persists on being a right-hand tool for many branches of biology. From preliminary advices and treatments, such as understanding if symptoms related to fever or cold, to critical detection of cancerous cell or classification of X-rays, traditional machine learning and deep learning techniques achieved remarkable feats. However, total dependency on machine-based prediction is yet a far fetched concept. In this paper, we provide a framework utilizing several Natural Language Processing (NLP) algorithms to construct a comparative analysis. We create an ensemble of top-performing algorithms to accomplish classification task on medical reports. We compare both the traditional machine learning and deep learning techniques and evaluate their probabilities of being reliable on analyzing medical diagnosis. We concluded that an ensemble approach can provide reliable outcomes with accuracy over 92% and that the current state of the art is unequipped to provide the result with the standard needed for health sectors but an ensemble of these techniques can be a pathway for future research direction.

  • ADCR: An Adaptive Tool to select “Appropriate Developer for Code Review” based on Code Context – The 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) [ New York, USA , October 2020 ]

    Publication URL: https://ieeexplore.ieee.org/document/9298102.

    Abstract:In this research, we propose novel reinforcement learning-based algorithms to recommend users without collecting identifiable data. With just only user activity on a session, our algorithm can model and track user behavior and formulate a recommendation system. We conclude our algorithms demonstrate positive results in capturing user behavior without collecting private data of any kind from the user.

  • Recommend Speciality Doctor from Health Transcription: Ensemble Machine Learning Approach- 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC 2021), USA [ January 27-30, 2021 ]
    Publication URL: https://ieeexplore.ieee.org/document/9376111.

    Abstract: The primary care doctor suggests a medical specialty doctor after careful evaluations of all symptoms and diagnostics reported from the patient. These symptoms and diagnostics reports can now be gathered using several automated patient content management systems. Identifying which specialty doctor is needed for a patient still depends on the primary care doctor’s opinion. In this research, we proposed a recommendation system that will recommend a doctor based on the result of ensemble machine learning (ML) and natural language processing (NLP) techniques. We developed a prototype and exhibited promising results, which indicate that this approach can be utilized in the health care industry.

  • Understanding the Pandemic Through Mining Covid News using Natural Language Processing- 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC 2021), USA [ January 27-30, 2021 ]
    Publication URL: https://ieeexplore.ieee.org/document/9376002.

    Abstract: Newspaper reports are a daily information tank for the majority of the world. We rely on newspapers as a primary source of information. In this research, we introduce a collection of 1050 news report dataset on COVID-19 from two different countries and used Natural Language Processing techniques to extract knowledge about the virus, including the number of COVID-cases, trending topics per month, sentiment analysis, etc. Moreover, we compared how the virus spreads and impacts a developed country and a developing country. Our curated dataset can be used in various socio-economical studies to understand news media’s effect on public awareness.

projects:

  • Reviewer Verification System: A web application from the paper “Detect review manipulation by leveraging reviewer historical stylometrics in Amazon, Yelp, Facebook and Google reviews” published in 2020 The 6th International Conference on E-Business and Applications (ICEBA 2020) .
    Goal: To detect the percentage of a reviewer profile being paid, bot or authentic. Website: https://isreviewfake.com/

  • Medical Speciality Detector: A web application from the paper Can NLP techniques be utilized as a reliable tool for medical science?” -Building a NLP Framework to Classify Medical Reports published in IEEE 11th Annual Information Technology, Electronics and Mobile Communication Conference (IEEE IEMCON 2020) .
    Goal: To predict and recommend speciality doctor from text. Website: https://medspecdetector.siliconorchard.com/