- January 18, 2021
By now you probably know about the digital assistant that can book your next haircut appointment over the phone. And heard about the AI algorithm that can answer eighth-grade elementary science questions better than humans. You may have even interacted with a chatbot that can answer your simple banking questions. You are possibly carrying a mobile phone that can translate your sentences to 100 different languages in real-time. All these technological achievements are partially fueled by the recent developments in natural language processing (NLP).
What we are experiencing is so-called ‘artificial narrow intelligence’ where we can engineer AI systems that can achieve or surpass human-level performance in a single well-defined task. At this level, such AI systems can still provide immeasurable benefits in improving the quality of our lives and be game-changing for companies, creating a great financial impact on the bottom line of many industries including oil and gas.
The success of any NLP project starts with finding relevant high-quality data, which is not usually easy to come by in many industries. The hurdles of finding the right and abundant data can be due to regulatory reasons, privacy reasons, copyright reasons, or due to large data debt accumulated over many decades caused by the unstructured and nonuniform nature of it.
Digitization of decades-old reports and documents will unleash the potential for many NLP applications that can automate some manual tasks by freeing up much precious time of experts. It will also lead to new ways of discovering data and relevant information to make the right decision for exploration and production. Language models are at the heart of any NLP task, and we must train our language models on specific data for whichever domain we design our NLP solutions in.
On the algorithmic front, NLP has enjoyed decades of rich history, combining linguistics research with computational methods. Progress made in the past decade is bringing the products to the masses that we are seeing today. Ingeniously neural networks have been introduced into language model training. Later, the contextualized language models were introduced, with deep learning that can train on large amounts of unsupervised open-source data to learn representations of contextualized text. These developments have provided a step change, achieving human-level performance at certain tasks, such as sentiment analysis, question and answering, and machine translation.
NLP solutions that operate with multi-modal data will enable us to capture and produce knowledge in a continual-learning framework.