Natural language processing (NLP)
At our institute, we have developed expertise in natural language processing (NLP), a dynamic field that integrates computational linguistics, statistical modelling, and machine learning. This technology enables machines to recognise, understand, and generate human language, both in text and speech. Typical applications include voice-operated GPS systems, digital assistants, and speech recognition software. Businesses also rely on NLP to streamline operations and automate tasks such as spam detection, machine translation, chatbot interactions, and text summarisation.
By breaking down language into manageable components, NLP allows machine learning algorithms to interpret and respond to complex linguistic inputs. This process includes tasks such as:
- Speech Recognition (Speech-to-Text): Transforms spoken words into written text.
- Grammatical Tagging: Identifies parts of speech or syntactic roles within a sentence based on context.
- Semantic Analysis: Determines the precise meaning of words in specific contexts.
- Named Entity Recognition (NER): Identifies and categorises entities like names, locations, and organisations within a text.
- Coreference Resolution: Discerns when multiple words refer to the same entity.
- Sentiment Analysis: Detects subjective elements such as emotions, sarcasm, or irony in a text.
- Natural Language Generation: Converts structured data into coherent, human-like language.
Our researchers have developed an algorithm to detect fake news using machine learning and NLP. Our tools enable advanced text processing tasks, including syntax analysis, word segmentation, stemming, lemmatisation (reducing words to their root forms), and language unit segmentation. Additionally, these tools support the extraction of logical insights from text.
By combining statistical NLP, machine learning, and deep learning, we have optimised systems that automatically extract, classify, and label text data with exceptional precision and efficiency. The applications of NLP are extensive, ranging from enhancing user experiences to fostering innovation in data analysis and communication technologies.
Large Language Models (LLMs)
Large Language Models (LLMs) are systems trained on vast amounts of data to understand and generate natural language or other types of content, such as images and videos. These models have brought generative artificial intelligence into the mainstream, making it accessible to a wider audience. LLMs excel at capturing the context of a query to produce coherent and relevant responses, enabling tasks such as translation, text summarisation, answering questions, and generating content - including computer code.
Built on deep learning techniques, LLMs use neural network layers to predict the next word in a sequence based on the context of preceding words. While these outputs are not human conversations, they represent sophisticated prediction models driven by mathematical probabilities. Through training on extensive corpora, LLMs learn grammar, semantics, and conceptual contexts using techniques like zero-shot learning and self-supervised learning.
Our researchers have expertise in key areas such as prompt engineering, fine-tuning, and reinforcement learning from human feedback (RLHF). These skills enable the refinement of LLM outputs to reduce biases and enhance reliability. Such AI technologies offer tangible benefits for businesses, from optimising chatbots and virtual assistants to synthesising data, creating content, and writing computer code. LLMs also contribute to accessibility through speech synthesis and the development of inclusive content for people with disabilities.
Applications in SMEs and Industry
In SMEs or industrial contexts, LLMs can be used for a variety of tasks, including:
- Creating coherent and relevant content: emails, blogs, etc.
- Synthesising information: Summarising long texts or extracting insights from large datasets.
- Improving customer service: Answering customer enquiries via virtual assistants or chatbots.
- Supporting developers: Generating code, identifying errors, and addressing security issues.
- Analysing customer feedback: Understanding customer perceptions to improve reputation management.
- Providing translations: Delivering fluent translations across multiple languages.