2023 was a profound year for AI. The launch and rapid rise of OpenAI’s ChatGPT set off a tidal wave of GenAI innovation. Meta soon thereafter released Llama-2 to the open-source, igniting an LLM arms race. Open-source models dominated much of the conversation last year as weekly releases flooded the leaderboards, and standouts like Zephyr, Mistral/Mixtral, and Phi showed what was possible with a much smaller model footprint.
It wasn’t just models driving innovation. Research in areas like parameter-efficient fine-tuning (e.g., LoRA) and quantization further demonstrated the ability of smaller models to outperform costly commercial alternatives. The open-source community also made strides across the broader LLMOps workflow with new frameworks like LangChain for orchestration and LoRAX for serving many fine-tuned LLMs on a single GPU.
As we turn the page on 2023, the question becomes: what’s in store for 2024? To help answer that question, we gathered predictions from our team of AI experts with backgrounds spanning research at Google, and Uber to leading open-source projects like Ludwig, Horovod, and LoRAX. We hope you find our predictions thought-provoking, and we’d love to hear yours too!
1. Small Language Models will drive enterprise AI adoption
Small Language Models (SLMs) will drive a shift in the AI landscape. SLMs, fine-tuned on high-quality medium-sized datasets (e.g., 10k data points), have already demonstrated performance comparable to or better than larger LLMs but with dramatically smaller resource requirements, faster inference, and reduced costs.
The economics of SLMs are so good that large enterprises will start to adopt them to productionize prototypes that were initially built with commercial APIs but never productionized due to cost concerns. Smaller companies, for which the cost barrier associated with scaling commercial APIs was unsustainable, will adopt SLMs to make their business models viable.
Piero Molino
Chief Scientific Officer & Cofounder of Predibase, creator of Ludwig, and previously founded the AI Lab at Uber
2. Mixture of Experts (MoE) + LoRA will enable small language models to outperform 10x larger LLMs
In late 2023, the release of Mixtral-8x7b confirmed a long-held belief in the LLM community: the “mixture of experts” model architecture works and allows model performance to scale without adding proportionately more parameters and GPUs. Many are now asking: do these efficient performance gains from sparse activation extend to mixtures of task-specific LoRA adapters?
Early evidence from academia, including DARE and TIES (post-train merging), as well as PESC and MoLoRA (train-time MoE), suggest that yes, sparsely activated LoRAs not only work but outperform the alternatives. Combined with the rise of franken-merges like Phixtral-2x2b and tools like mergekit and LoRAX, the stage is set for small and cheap mixture-of-LoRAs architectures to emerge and start conquering the LLM leaderboards.
Travis Addair
CTO & Cofounder of Predibase, creator of LoRAX, Horovod lead maintainer, and former Deep Learning Engineering Lead at Uber
3. Open-source becomes the de facto way to put LLMs into production
Almost every customer we talk to today working with GenAI has a similar story – ”we’re building a prototype with OpenAI but think we’ll need to move off that when we go live.” The key reasons are clear: APIs for monolithic models get expensive at scale, and if AI is core to the organization’s strategy, then maintaining full agency over their models is a must. As one customer put it, “We don’t want to be API monkeys.”
The path forward is for companies to own their models, which means adopting open-source. OpenAI and other walled garden APIs will still have a significant role to play in enterprise AI adoption but predominantly in prototyping and analysis workflows rather than production. And the relative market share of AI models will shift dramatically in favor of open-source instead of closed, as it is today.
Devvret Rishi
CEO & Cofounder of Predibase and former PM Lead at Kaggle and Google Assistant
4. Large language models will adopt modular architectures
Large language models today have remarkable capabilities for many tasks but suffer from being able to reason consistently and can often hallucinate in ways that are hard to detect. We are starting to see the promise of MoE (mixture of experts) models such as Mixtral-8x7b, which allows scaling up the model and dataset size given the same computing budget vs traditional methods.
I expect to see more fundamental breakthroughs in design patterns that are similar to the trends in traditional software development, moving from monolith to microservices. For example, a move to add new Systems 2 thinking capabilities that generalize in a more powerful way using knowledge graphs or other data structures could improve the correctness and latency of model inference while ideally maintaining end-to-end training, which has proven to be so powerful.
Making things more modular could also be a step towards better explainability, supporting ongoing efforts in responsible AI. Explainability will be increasingly important as governments worldwide introduce legislation requiring companies to understand and audit their AI systems.
Julian Bright
Platform Engineering Lead at Predibase and former Principal Solutions Architect at AWS
5. The synthetic data revolution arrives as all the best models become trained on some amount of synthetic data
2024 will mark a turning point in AI, driven by a newfound emphasis on high-quality, synthetic data.
This shift is best epitomized in 2023 by Microsoft's Phi models, demonstrating that models trained on 'textbook-quality' synthetic data can significantly outperform larger counterparts. MagicCoder models trained on simple synthetic variations of open-source code samples easily close the gap with top code models. E5 set new impressive benchmarks in text embeddings using entirely synthetic data and minimal training. In evaluation, frameworks like EvalPlus integrate synthetically generated test cases, enhancing the accuracy of LLM-synthesized code assessments. In vision, SynCLR’s use of synthetic images and captions for visual learning challenges traditional data sources, demonstrating comparable, if not superior, performance. These types of success stories with full-synthetic or hybrid-synthetic data will be more frequent, widespread, and atop the leaderboards.
In 2024, the focus will pivot towards datasets that are smaller in scale but unparalleled in quality. The continued affirmation that models tuned with a few thousand meticulously curated examples can easily surpass those trained on millions of mediocre ones will redefine the standards for data curation in AI. Synthetic data will complement or supplant traditional data sources, marking a significant milestone in the evolution of artificial intelligence.
Justin Zhao
Staff Engineer at Predibase and maintainer of Ludwig, and former Sr. Engineer and NLP researcher at Google
6. LLM hallucinations become a thing of the past as training techniques evolve
One of the biggest challenges of putting LLMs into production today is hallucinations, i.e., when models generate inaccurate or made-up outputs that are factually incorrect and don’t conform to real patterns. The current approaches to dealing with hallucinations include prompting techniques like Chain-of-Thought (CoT) and self-consistency, Retrieval Augmented Generation (RAG) methods involving interactions of embedding models, vector databases, and approximate kNN search algorithms, fine-tuning using parameter efficient methods such as LoRA and IA3, or building in guardrails around the outputs of LLMs to add quality and structure validation. While all of these methods work independently or in conjunction to reduce the impact of hallucinations, none of them completely fix model hallucinations because of the fundamental limitation of the training objective - next token prediction, which is a probabilistic task.
In 2024, we will see significant improvements in reducing LLM hallucinations. In the short term, I expect a stronger "better together" story for RAG + fine-tuning that involves joint embedder and LLM fine-tuning, building on the original RA-DIT paper and improving knowledge utilization and contextual awareness. There will be a paradigm shift from model-centric to data-centric pretraining, where pretraining data will become highly curated, often synthetically generated and pruned, and with domain adaptation at the pretraining stage to restrict the model’s outputs and prevent the sleeper agent phenomenon and to deal with any legal issues from pretraining on the web.
The most aspirational change I expect to see is either the creation of a new LLM training objective that is no longer next token prediction but perhaps involves a method to infuse symbolic representations into the training objective to better guide and improve reasoning capabilities with priors or a better way to dynamically inject symbolic representations at inference time that’s not based on approximate nearest neighbor search that we currently see with RAG.
Arnav Garg
ML Engineer at Predibase and maintainer of Ludwig and former ML Scientist at Atlassian
7. Data (not LLMs) becomes the competitive moat
In 2024, it will become evident that the unique and defensible value of generative AI applications lies in the data used to train the models.
Many companies have realized that commercial LLMs are no longer cost-effective, latency-sensitive, or scalable, and open-source LLMs are the future. In this new world of task-specific, open-source models, curating a diverse, balanced, and representative dataset for fine-tuning becomes the primary challenge.
This coming year, we will see a rise in data preparation tooling, an increase in the discourse around dataset best practices, and acknowledgment that leveraging proprietary, high-quality, and customer-specific data is the key to unlocking extraordinary model performance.
Abhay Malik
Product Manager at Predibase and former Cofounder and CTO BlitzIQ and Y Combinator Alum
8. LLM adoption grows as the Transformer architecture matures
Engineering best practices will be established to combine the fine-tuning and efficient serving of open-source LLMs with techniques such as RAG to control hallucinations. On the other hand, continued innovation at the Transformer model and architecture levels will significantly improve the reasoning capabilities of LLMs. Together, these advancements will boost the confidence in the safety and security of AI-based solutions, leading to broader adoption for business applications, such as customer support, and engendering trust from companies in regulated industries, such as finance and healthcare.
Alex Sherstinsky
Developer Relations at Predibase and maintainer of Ludwig and former staff engineer at GreatExpectations.io, Sr. Data/ML Architect at Directly, and Cofounder of GrowthHackers and Qualaroo
9. The shift from closed to open-source democratizes AI while improving transparency
Closed-source or commercial LLM providers will continue to be replaced by open-source alternatives, thereby democratizing access to AI. This shift is driven by the emergence of open-source LLMs (Llama, Mixtral, etc.) that match and often exceed the quality of their closed-source counterparts. These open-source models are accessible to the public and can be run on free computing platforms such as Google Colab or cheaply through LLM hosting providers.
One of the key advantages of this transition is the transparency it brings to the AI landscape. Users will be able to examine the underlying data used to train these models and fine-tune on higher quality data, where toxic content has been filtered out to mitigate negative biases towards specific individuals or groups without sacrificing accuracy. This adaptability also allows models to be tailored for particular tasks, including addressing challenges in under-resourced languages, further enhancing the inclusivity and ethical considerations in AI applications. As a result, the democratization of AI through open-source models is poised to empower a broader range of users and foster a more transparent and responsible AI ecosystem.
Joppe Geluykens
ML Solutions Engineer at Predibase and former Sr. Data Scientist and Researcher at IBM
10. LLMs for machines, not just people
Today, the most visible use cases for LLMs involve generating natural language text for human consumption. Such use cases are only a tiny portion of what LLMs can accomplish. In 2024, I expect a push toward integrating LLMs with existing software systems, enabling the automation of previously un-automatable tasks involving natural language, reasoning, and more.
Because these new integrations and data pipelines require LLMs to generate output for machine, not human consumption, the ability to reliably fine-tune and/or constrain LLM outputs to conform to arbitrary formal schema will become a critical need.
Jeffrey Tang
Founding Sr. Software Engineer at Predibase and former Sr. Engineer at Google and Cloudflare
11. AI-optimized web browsing experiences take flight
A quiet revolution has been brewing with the aim of running LLMs and Generative AI in your web browser. Current demos make it clear that it’s possible to do this; the only remaining problems are engineering ones — optimizing AI for the browser and building features on top.
Some use cases that I predict we’ll see by year’s end include ubiquitous text generation and summarization as you type, web content that changes based on your physical location or time of day, and imagery tailored to current world events. eCommerce, in particular, will benefit from dynamic product pages that show products in curated, personalized settings, for example, depicting Le Mans for site visitors from France and the Glastonbury Music Festival for site visitors from Great Britain. The dream of recreating an in-store experience online will finally be possible for Retailers.
Martin Davis
Product Experience Lead at Predibase and former Front End Manager at Mast Global
Get started with one of the the biggest LLM trends: Fine-tuning smaller, faster, open-source models
Ready to fine-tune LLMs for your use case?
Try out the Predibase free trial to train, serve and prompt popular open-source LLMs—like Llama2, Zephyr and Mistral—on scalable serverless infra.