Skip to main content

Introduction to Natural Language Processing in Python

Natural Language Processing (NLP) sits at the intersection of Computer Science, Artificial Intelligence and Computational Linguistics. From analysing sentiments to performing machine translation and conquering language barriers, scientists have become heavily focused on exploring ways to communicate human language with computers. The goal for natural language processing, thus, is to process human language (text or speech) and convert it into structured data format understandable by computers.

An overview of Natural language processing fundamentals such as text pre-processing, part-of-speech tagging, dependency parsing, named entity extraction, tokenization, sentence segmentation, and topic detection will be presented in the first part of the workshop, using Python packages such as spaCy and NLTK. The second part of the workshop will entail an overview of text/document similarity models used in various NLP applications.



Pikakshi holds a Ph.D. in Natural Language Processing (Text Analytics) from University of Milano-Bicocca, Milan, Italy. Currently she is working as a Postdoctoral Research Fellow within the VISTA AR project. Her research had been an effort towards ‘Adaptation of Named Entity Recognition and Linking Framework’ for social media streams and different ontologies that come along with the task. Before doing her PhD, she obtained her Masters in Information Technology from YMCA University of Science and Technology, India and, thereafter, worked with IBM India for over a year. Her research interests include Text Analytics, Information Extraction, Social Media Analysis, Knowledge Discovery and Semantic Web Technologies.