The NLP Landscape from 1960s to 2020
What is NLP?
NLP is a sub-field of linguistics, Computer Science & A.I concerned with interactions between computers & human language. In particular how to program computers to process & analyze large amounts of Natural Language data.
Goal : Making machines understand the Natural Language
Some Real World Applications :
- Contextual Advertisement
- Email Clients — Spam Filtering, Smart Replies
- Social Media — Removing Adult Content, Opinion mining
- Search Engines
- Chat Bots
Common NLP Tasks:
- Text/Document Classifications
- Sentiment Analysis
- Information Retrieval
- Parts of Speech tagging
- Language Detection & Machine Translation
- Conversational Agents
- Knowledge Graphs & Q.A sys
- Text Summarization
- Topic Modelling
- Text Generation
- Spell Checking & Grammar correction
- Text Parsing
- Speech to text & Text to Speech
Approaches to NLP:
- Heuristic Methods
- M.L based models
- D.L based models
Heuristic Methods:
A heuristic, or a heuristic technique, is any approach to problem-solving that uses a practical method or various shortcuts in order to produce solutions that may not be optimal but are sufficient given a limited timeframe or deadline.
Regular Expression →Finding texts of same pattern
Word Net(Lexical Dictionary) →Unlike common dictionaries, Word Net is like a Lexical Dictionary i.e., here , words are stored here in an organized manner on the basis of it’s relations with other words .
Open Minded Common Sense →Common sensical facts are stored
Advantages :
- Quick & accurate
- Still Valid
M.L Methods :
All the major issues present in Heuristic methods like when there is some open ended issues .So, the major advantage of ML models over Heuristic method is it solves open ended problems.
Algorithms :
- Naive-Bayes
- Logistic Regression
- Support Vector Machine
- LDA (for Topic Modelling)
- Hidden Markov Models
D.L Methods :
One of the main issues present in Machine Learning approach was that ML models can’t read texts sequentially. But in Deep Learning approach text data is read in sequentially manner and unlike ML approach it can also automate feature generation.
Algorithms :
- RNN
- LSTM (Long Short Time Memory)
- GRU (Grated Recurrent Unit)
- CNN
- Transformers
- Auto encoders
In RNN the main issue is it can’t process a Long Sentence(Natural Language). This issue gets resolved in LSTM algorithm. GRU is mainly used for Text Generation. Transformers revolutionized the NLP . Transformers can provide more attention to certain words . Auto Encoders are mainly used using Two Neural Networks (LSTM based) one of which acts as an encoder & other one acts as decoder.
Challenges in NLP:
- Ambiguity
- Contextual Words
- Colloquialisms & slangs
- Synonyms
- Irony, Sarcasm & tonal difference
- Spelling Errors
- Creativity
- Diversity