By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

GenOs Index - The June Edition: 5 Emerging Trends among Generative AI Open Source Projects

Generative AI Open Source (GenOS) Index - June 2023

In early April, we launched the Generative AI Open Source (GenOS) Index to track the top open source projects related to Generative AI and LLMs. Since then, we have published two more editions of the GenOS Index - the April edition that showed frenetic activity in the Generative AI space and the May edition that showed some early signals of stability. This month, we are back with the June edition of the GenOS Index, showcasing the fastest growing open source projects in Generative AI between the beginning of March and end of June, and presenting 5 trends that we are seeing among the projects that made it to the GenOS Index.

First, a quick refresher on the methodology. Every month, we identify the top 30 open source projects in Generative AI as ranked by GitHub star growth (adds) in the preceding 90 days, with 500-star adds being the minimum for a project to be considered. Furthermore, we categorize the projects into three categories - Models, Infrastructure/Tooling and Applications - to provide visibility into how different parts of the Generative AI ecosystem are evolving.

The key takeaways from this month’s GenOS Index are as below.

Rise of the agents

In the most recent three editions of the GenOS Index, including the current one, Auto-GPT has remained at the very top of all open source projects in Generative AI. Additionally, multiple projects that help build autonomous agents such as SuperAGI (a dev-first autonomous agent framework and a new entrant this time), AgentGPT (deploying AI agents in browsers), BabyAGI, etc. have featured in the GenOS Index, indicating really strong user demand for projects that enable automation. While most of these agent frameworks had initially started with hobbyist use cases, many of these are now being used to automate task-level enterprise use cases, opening up a large market with available budgets. Looking forward, we anticipate some of these agent frameworks to graduate from “planning” the automation of a task to actually call underlying services to accomplish the end objective - we are seeing some early signs of that already.

Privacy matters

With enterprise use cases coming to the fore, there has clearly been a strong interest in projects which allow one to privately interact with language models based on proprietary data and in secure environments. No other project captures this trend better than PrivateGPT, which broke into the GenOS Index at #7 in the May edition and stayed at #5 this month. This month, we have two new projects in this category breaking into the top 30 - Quivr at #24 and LocalGPT at #26 - further reinforcing the user demand for data privacy and control.

CPU is the new GPU, and so will be the edge (soon)

In order to really put the power of LLMs into the hands of users, the future is to run these models on devices - read CPUs - and at the edge. Two projects in particular - GPT4All and llama.cpp - that have consistently shown up in previous editions of the GenOS Index as well as in the current one - are targeting exactly that. We expect that trend to accelerate further as LLMs are deployed in real use cases, many of which will require “inference at the edge.”

Building of LLM Infrastructure/Tooling continues unabated

Building the infrastructure and tools to train and run LLMs at scale shows no sign of slowing down. Like previous editions of the GenOS Index, over a third of the projects in the current edition are in the Infrastructure/Tooling category. LangChain continues to lead the way (remaining unmoved at #4 and being included in every edition of the GenOS Index since its launch in April), followed by JARVIS (at #11; a collaborative system where multiple AI models can be used to achieve a given task, with ChatGPT acting as the controller), Guidance (a new entrant at #18; guidance language to control LLMs) and DeepSpeed (at #13; deep learning optimization library that makes distributed training and inference fast and easy) from Microsoft, and finally Flowise (a new entrant at #21; drag-and-drop UI to build custom LLM flows).

Generative AI gets a voice

While LLMs had originally started with text and then graduated to images, this month’s GenOS Index includes multiple new entrants that are tackling audio, the final frontier of making Generative AI models truly multimodal -  Audicraft (PyTorch library for deep learning research on audio generation), AudioGPT (dialogue assistant like ChatGPT, but with audio as input), and Bark (a text-to-audio model that can generate highly realistic, multilingual speech as well as other audio). 

The list of all top 30 projects in this month’s GenOS Index are as follows:

The Rising Stars

As we had done in the past, we highlight below a few other really interesting projects that, while not on the GenOS Index this month, have gained significant traction and are anticipated to break into a future edition of the GenOS Index:

  • Tabby: This is a self-hosted AI coding assistant, i.e., an open source and thereby, on-prem alternative to GitHub Copilot. Given most of the other coding assistants are offered as SaaS, Tabby is clearly resonating with those who want to have the same capabilities on-prem and on proprietary code base.
  • PandasAI: This project adds Generative AI capabilities to Pandas, the highly popular data analysis and manipulation tool.
  • Semantic Kernel: The SK project from Microsoft is a lightweight SDK, enabling integration of AI Large Language Models (LLMs) with conventional programming languages, and thereby helping developers build AI-first apps faster.
  • FinGPT: This is a really interesting project aiming to democratize internet-scale financial data and help fine-tune financial models at a fraction of the cost of retraining a model from scratch. The goal is here to make sure that the financial language model can be swiftly and cheaply adapted to align with new financial data.
  • DB-GPT: The DB-GPT project uses localized GPT large models to interact with private data and support private domain knowledge base question-answering capability, while ensuring there is no risk of data leakage, and the data is 100% private and secure.

That is it for this edition of the GenOS Index. For the next installment, we plan to move to quarterly installments now that the top open source projects in Generative AI have somewhat stabilized. So, stay tuned for the Q3-2023 GenOS Index to be published in early October!