2022 was an exciting year for machine learning. The industry saw remarkable innovations from the release of powerful generative AI tools like ChatGPT and DALL-E to major advancements in AI for drug discovery (see: Google's Deep Mind cracks the code on protein structures). Machine learning is quickly moving from theoretical research to real-world applications, and every organization is looking to tap into its potential.
So what exciting ML trends will emerge in 2023 and beyond? To help answer that question, we gathered predictions from our team at Predibase—many of which represent leading minds in AI. Their backgrounds span AI research at Google to creating popular open-source projects like Ludwig and Horovod. Whether you're a seasoned AI expert or just starting to learn about the field, we hope you find our musings on the future thought-provoking and insightful and we’d love to hear your predictions as well.
1. ML becomes indispensable for scientific research
The application of ML techniques for scientific discovery has been hindered by the specificity of the skills needed for both conceptualizing valuable uses (mostly machine learning and AI knowledge) and applying them fruitfully at scale (mostly computer science and software engineering knowledge). A new breed of ML models and technologies has lowered the barrier for use and demonstrated the disproportionate impact ML can have on scientific research. With the democratization of ML, the time is right for the widespread adoption of ML in chemistry, material science, biology, pharmacology, agronomy, and many other areas.
CEO & Cofounder of Predibase, creator of Ludwig and previously founder of the AI Lab at Uber
2. Multi-modal model architectures see mass adoption
As enterprises attempt to harness the potential of large foundation models, multi-modal model architectures that combine unstructured data with structured data will start to become the norm in industry. The primary driver for this will be the need to tailor these pretrained models for supervised learning tasks using structured metadata about the users, products, etc. associated with the unstructured data.
CTO & Cofounder of Predibase, creator of Horovod and previously Deep Learning Engineering Lead at Uber
3. A large language model provider will be the new AWS
Large language models (LLMs) like ChatGPT are rapidly growing in popularity, but are limited by the publicly available information that they are trained on. To get the most value out of a LLM, enterprises will need to train LLMs on their own documentation. For example, doctors and therapists might need to tune a LLM based on their own scripts. The company that is able to provide LLM-as-a-Service will become the new utility provider of the internet.
Sr. Data Scientist at Predibase and host of the “The Data Scientist Show" with 200k+ followers on LinkedIn and previously Sr. Data Scientist at AWS
4. Deep learning goes mainstream for the citizen data scientist
Outside the ivory tower of academia and inside the gritty world of practical data science, there’s long been a belief that “XGBoost is all you need”. And generally they’ve been right – gradient boosted machines dominated machine learning competitions, which tended to be tabular datasets, and ease and speed of linear and tree models (vs neural networks) made them the preferred approach for the large majority of data scientists.
However, there are going to be two factors that will drive the uptake of deep learning now: 1) the practical impact demonstrated by transformer architectures and foundation models that have put unstructured data modalities like text and images more front-and-center for data scientists – structured data is “boring” now, unstructured is where the envelope is being pushed. Coupled with 2) the rising popularity of higher level abstractions that practically make deep learning much easier to use, like fast ai, pytorch lightning, Ludwig, and others.
CPO & Cofounder of Predibase and previously PM Lead at Kaggle and Google Assistant
5. Standardization will enable deployment of ML models at scale
Industry standards for data management and ML development will take hold in 2023. For example, we’ll see broad adoption of second generation data lake formats Apache Iceberg and Delta Lake for performant reading and writing of versioning data.
The ecosystem around Apache Arrow, the language-independent columnar memory format for efficient data transfer, will continue to grow as projects like the Arrow Database Connection specification (ADBC is an API standard for database drivers) and Substrait (a cross-language specification for data compute) continue to mature.
Additionally, we will start to see standards evolve at the intersection of data contracts and model cards, providing a consistent protocol for defining input, output, and configuration required for model training.
On the model inference side, KFServing v2 which defines both REST and gRPC protocols will continue to see broad adoption from both open-source solutions like Seldon Core as well as cloud ML vendors.
These standards provide consistent interfaces to load data and consume predictions, enabling a customer to start with open-source tooling, and migrate to a managed machine learning platforms like Predibase without breaking any downstream applications.
Platform Engineering Manager at Predibase and previously Principal Solutions Architect at AWS
6. MLOps expands to support differentiated pre-trained models
As pre-trained models continue to proliferate the machine learning landscape, there will be a race to apply pre-trained models to proprietary data sources in the pursuit of specialized and differentiated ML.
MLOps companies will move to offer services in support of renewed interest in pre-trained models such as: fine-tuning as a service, enterprise model hubs, model distillation, and synthetic data. A one billion parameter model will be created that has performance on par with GPT-3, as practitioners seek cost-efficient ML.
ML Engineering Lead at Predibase and previously Sr. Engineer and NLP researcher at Google
7. AI reaches every corner of the corporate office
Every enterprise is investing in ML and the advent of low-code ML tools like Predibase are making it even easier for data practitioners to build ML applications. That being said, the average knowledge worker probably doesn't interact with ML that often or at all. That will rapidly change in 2023.
Generative AI tools like ChatGPT and DALL-E—which may seem like a novelty—are making it easier for a new breed of business personas to experiment with AI and unlock new use cases like marketing content development and product design (see: Hot Wheels and Stitch Fix). These tools still need refinement, and there are open questions around legality, but we're only scratching the surface of what's possible. As more of these tools make it into the mainstream, business teams from HR to Finance will begin experimenting.
Head of Marketing at Predibase and previously Marketing Director at Databricks
8. Open-source foundation models will make their way into the mainstream
Big Tech companies currently hold a disproportionately large amount of influence on how Foundation Models are ultimately trained and deployed; however, the data with which they are training these models is largely in the public domain. This means that, with sufficient hardware resources, time, and talent, there is no hard barrier to training an equivalently powerful Foundation Model.
A concerted effort from the OSS community (such as that spearheaded by the Together Project) to develop its own Foundation Models could accelerate innovation outside of Big Tech research labs and ultimately give enterprises access to powerful generative models that can be fine-tuned and deployed in any environment.
ML Engineer at Predibase and previously ML Researcher at Google and Stanford
9. Decreased corporate spending will put pressure on ML tools to provide demonstrable ROI
Generative AI models—like GPT-3 and Dall-E—have drawn many people into the true power of ML. While new and increasingly powerful generative models will continue to be released, businesses will be focused on how they can leverage many forms of ML including generative modeling and supervised ML to provide concrete business value. Success stories will likely be found across multiple departments in an organization, ranging from Marketing to Customer Success. In terms of tooling, horizontal platforms that can demonstrate quick time-to-value and widespread adoption will be well-positioned to take advantage of this opportunity.
Product Manager at Predibase and previously Cofounder and CTO at BlitzIQ
10. Machine learning becomes ubiquitous
In the 1990s, most programs didn't have a network stack. Today, it's hard to find an app without one. Consumers expect every product to communicate over networks and retrieve or store data in the cloud.
In a similar way, every software product in the near future will interact with ML. Features like personalization, natural language interfaces, and considerate systems will be expected in every product.
Like networking, most of us don't need to be machine learning experts, but we will all become familiar with its capabilities and learn how to interact with it.
ML Engineering Lead at Predibase and previously Engineering Lead at Google and cofounder of multiple startups including GameSalad
Get started with one of the biggest ML trends of 2023
Machine learning is finding its way into all areas of the enterprise, but data teams still struggle to build and deploy ML models at scale. Predibase is on a mission to change that. As the first low-code declarative ML platform, Predibase makes it easy for everyone—from developer to experienced data scientist—to train, optimize and deploy state-of-the-art models in minutes.