As human beings, we find understanding our native languages and even foreign languages relatively straightforward - well, after we have learned them. But why is that? The key lies in the learning process we go through. We start by learning our mother tongues as babies, subsequently picking up our first foreign languages as children and so on. This continuous learning process trains our brains, making it easier to acquire new languages with each one we learn. This in theory is not much different from how a machine is learning it. Natural Language Processing (NLP) adopts this idea to understand and generate languages, bridging the gap between human language and computational understanding. From ChatGPT to Microsoft Copilot, NLP has become a pivotal topic among numerous industries, including FinTech.
What is Natural Language Processing?
NLP is a subfield of Computer Science and Artificial Intelligence (AI) that uses machine learning to enable computers to understand and communicate with human language. NLP is composed of two primary components: Natural Language Understanding (NLU) and Natural Language Generation (NLG).
NLU focuses on comprehending and interpreting human language, allowing machines to process text, understand context and extract meaningful information. On the other hand, NLG focuses on the generation of a message or text by a machine based on human inputs. Together, these components form the core of NLP, enabling seamless applications such as language translation, sentiment analysis and text summarisation.
How does it Work?
Let’s look at this piece of text:
“In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies. By integrating AI, it enhances transaction security and of ers predictive financial advice, making financial services more accessible and secure. This innovation simplifies finance for individuals and businesses alike, empowering them to navigate the modern financial landscape with greater confidence and convenience.” |
This paragraph explains how FinTech is using advanced technologies to transform the banking sector in Malaysia. It would be great if a computer could read this text and understand that these advancements are taking place in Malaysia, the applications of AI in the banking industry and the benefits of these technologies. But to get there, we have to first teach our computer the fundamentals of written language and build up from there.
Step 1: Sentence Segmentation
Sentence segmentation is the first step in pre-processing text for NLP. It breaks the paragraph into separate sentences.
Sentence segmentation then produces the following result from the text above:
In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies.
By integrating AI, it enhances transaction security and offers predictive financial advice, making financial services more accessible and secure.
This innovation simplifies finance for individuals and businesses alike, empowering them to navigate the modern financial landscape with greater confidence and convenience.
We can assume that each sentence in English is a separate thought or idea. It will be a lot easier to understand a single sentence than to understand an entire paragraph. In practical aspects, segmentation models can be as simple as splitting sentences whenever there is a punctuation mark. However, most modern models use more complex models that work even when documents are not formatted properly.
Step 2: Tokenization
Using word tokenizers, we can now break our sentences into separate words or tokens. This is called tokenization. Through this technique, machines can use algorithms to easily identify patterns and structures within texts, enabling more efficient analysis and processing of language data.
With the first sentence from our paragraph:
“In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies.” |
This is the result after tokenization:
“In”, “Malaysia”, “,” , “FinTech”, “is”, “revolutionising”, “the”, “banking”, “sector”, “with”, “intelligent”, “technologies”, “.
Step 3: Stop Word Removal
Next, we need to consider the important words in this paragraph. English has a lot of filler words such as “and”, “a” and “the”, which need to be filtered out to give more focus to the important information. These are called stop words, and they need to be removed before performing any statistical analysis. However, the removal of stop words is highly dependent on the task we are performing and the goal we want to achieve. For instance, if we want to perform sentiment analysis, then we might not want to remove the stop words.
From the first line of the text, here is the result after removing the stop words:
“In Malaysia, FinTech revolutionising banking sector intelligent technologies.” |
Step 4: Lemmatization
Lemmatization is text pre-processing technique used in NLP models to break a word down to its root or base form, known as “lemma”, to identify similarities. For example, the words “intelligent”, “intelligence” and “intelligently” are all derived from the word “intelligent”. When working with a computer, it is important to know the base form of each word, so that the computer does not think that these words are three totally different words.
Step 5: Dependency Parsing
After that, we need to find out how all the words in our sentence relate to each other. Dependency parsing helps us understand the grammatical structure by identifying the relationship between words. It involves determining which words are the main words (heads) and which words depend on them (dependents). From the third sentence in the paragraph, dependency parsing identifies “enhances” as the main verb, “it” as the subject and “By integrating AI” as the prepositional phrase. As a result, the process creates a tree-like structure which shows how each word in the sentence is connected to each other, making it easier to analyse the sentence’s meaning.
Step 6: Named Entity Recognition (NER)
Now that we have tackled the tough parts, we can finally move past grammar and actually extract information. NER, also known as entity chunking or entity extraction, is a component of NLP that identifies predefined categories of objects in a body of text.
Here are some of the categories NER are able to recognise:
● People’s names
● Company names
● Geographical locations
● Product names
● Monetary values
The goal of NER is to detect and label these nouns with the real-world concepts that they represent. From the sentence: “In Malaysia, FinTech is revolutionising the banking sector with intelligent technologies.”, NER will be able to detect and tag “Malaysia” as a geographic entity.
While this is a general overview of how NLP works, there is room for different techniques to carry out more complex tasks to fit different purposes. More advanced technologies such as transformers and neural networks can also help leverage capabilities in NLP.
Applications in FinTech?
Since its surge in prominence, NLP has been adopted in various industries with diverse applications. Among the popular fields are FinTech, manufacturing and marketing. In this publication, we will explore how NLP is employed specifically in FinTech.
Business is without a doubt risky. In fact, there’s a well-known saying: “no risk, no reward”. Investment and financial firms utilise NER and NLP systems to extract key information from customer documents. This data is essential for conducting risk analysis, where customer profiles are evaluated to assess loan risks. Using document categorization and an established risk assessment criteria, NLP models analyse basic application documents such as account history, credit history, employment and education. This streamlined approach speeds up the analysis process and provides a border understanding of each customer’s circumstances.
Furthermore, in today’s competitive landscape where customer satisfaction is paramount, many companies use NLP to interpret their customer’s emotions to their products using sentiment analysis. Through tokenization, NLP breaks down unstructured data including social media posts, financial reports or news articles into tokens. These tokens, generated from text and document content, are trained using algorithms linked with a combination of different emotions to understand customer’s needs. Investment bank Morgan Stanley uses NLP tools to detect and collect online criticisms and allegations in real time, providing their investors with key information on public perception and potential impact on their company stock prices.
Not only that, NLP has been indispensable tools for fighting against fraud. A notable example is that JP Morgan uses machine learning algorithms trained on historical data to identify patterns that are indicative of fraud, such as unusual spending patterns or large transactions from unfamiliar locations. These algorithms allow JP Morgan to quickly identify and respond to potential fraud. Such approaches allow organisations to adapt to cybersecurity threats and maintain robust security measures.
NLP plays a significant role in FinTech by improving customer interactions, managing risk and enhancing data analysis. As NLP evolves with AI advancements, it continues to drive innovation in financial services, promising a future where technology elevates efficiency and meets evolving industry demands.
Written by: Swetha Jayaprasad Rao
About MYFinT
Malaysian Youth FinTech Association (MYFinT) is a non-profit youth organisation dedicated to to empower, motivate and inspire the young generation in all industries to gain exposure to the latest trends and development in FinTech industry.
Sources:
1. “Analysis of Natural Language Processing in the FinTech Models of Mid-21st Century.” Https://Www.researchgate.net/Profile/Pascal-Muam-Mah/Publication/363255877_Analysis_of_Natural_Language_P rocessing_in_the_FinTech_Models_of_Mid-21st_Century/Links/63137249acd814437f e4434/Analysis-of-Natural-La nguage-Processing-In-The-FinTech-Models-of-Mid-21st-Century.pdf.
2. “How AI Can Bolster Sustainable Investing.” Morgan Stanley, www.morganstanley.com/ideas/ai-sustainable-investing-use-potential#:~:text=NLP%20tools%20can%20be%20used,i mpact%20on%20company%20stock%20prices.
3. Khanna, Chetna. “Text Pre-Processing: Stop Words Removal Using Different Libraries.” Medium, 10 Feb. 2021, towardsdatascience.com/text-pre-processing-stop-words-removal-using-different-libraries-f20bac19929a.
4. “NLP Tutorial - Javatpoint.” Www.javatpoint.com, www.javatpoint.com/nlp.
5. Riti Dass. “The Essential Guide to How NLP Works.” Medium, Medium, 24 Sept. 2018, medium.com/@ritidass29/the-essential-guide-to-how-nlp-works-4d3bb23faf76.
6. UK, ACODS. “How JP Morgan Uses Data Science? - ACODS UK - Medium.” Medium, Medium, 18 Jan. 2023, medium.com/@Acods/how-jp-morgan-uses-data-science-2066871b2de8#:~:text=JPMorgan. Accessed 16 July 2024.
7. “What Is Natural Language Processing (NLP) & How Does It Work?” Levity.ai, levity.ai/blog/how-natural-language-processing-works.
8. “What Is Tokenization.” Datacamp, www.datacamp.com/blog/what-is-tokenization#.
Comentários