By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
News

Launching the Generative AI Open Source (GenOS) Index

There is a lot of excitement about Generative AI these days and for good reason. The emergence of this technology feels like a fundamental platform shift - much like the early days of the Internet or Mobile - and opens our minds to what is possible in every layer of the software stack.

Just as we have seen with every other major technical stack in the past, communities of builders and users have started stepping in and are building some of the most interesting Generative AI projects in open source. Several of the most active open source projects – both in contribution and usage - over the past few months have been in Generative AI. With the number of such projects increasing rapidly, Generative AI now deserves to be its own category in open source.

Today, we are launching a Generative AI Open Source (GenOS) Index to track the most active open source projects in Generative AI. We plan to update this index every month and identify the top 30 projects in terms of GitHub star growth (adds) in the preceding ninety days, with 500 star adds being the minimum for a project to be included. Furthermore, because there are enough differences in growth characteristics, we categorize the projects into three subcategories: Models, Infrastructure/Tools and Applications.

The GenOS Index - March 2023

In Q1 2023, 5,716 open source projects in total - existing or newly created - added at least 500 GitHub stars. Of those, we identified 187 as Generative AI projects with 46 (25%) being Models, 83 (44%) Infrastructure/Tools and 58 (31%) Applications. Among the 30 fastest-growing Generative AI projects, the share across the three subcategories changes to 33% of Models, 40% of Infrastructure/Tools and 27% of Applications.

The top 30 Generative AI projects, ranked by the number of GitHub stars added during the last ninety days, are the following:

Image made with Midjourney
Top 30 fastest-growing Generative AI open source projects with GitHub star adds in Q1 2023

The Rising Stars

Beyond the top 30, there were several other really interesting Generative AI projects that we anticipate gaining adoption and breaking into the GenOS Index. Here are five Rising Stars that we liked the most:

  • Alpa: Automate large-scale distributed training and serving with just a few lines of code
  • CarperAI trlX: Distributed training framework to fine-tune large language models with reinforcement learning
  • Modelscope: Model-aaS offering 700+ state-of-the-art models across multiple domains
  • ShellGPT: Command-line tool to generate shell commands, code snippets, comments, and documentation
  • ChatRWKV: Like ChatGPT but powered by RWKV - an RNN with transformer-level LLM performance

Takeaways

In the current GenOS Index, Infrastructure/Tools projects were the most active. They represent close to half of the top projects in Generative AI. This reflects the emerging nature of the Generative AI category which requires building the right infrastructure and toolchain first so that users can train models and build AI applications. Applications were the second most active Generative AI category in Q1, followed by Models.

Models: In the Models category, we see a strong representation of “lightweight” GPT models such as nanoGPT, minGPT and PicoGPT as well as ALBERT, which is a “lite” version of BERT from Google, and Cramming, which enables training a BERT-like LLM with limited compute. This indicates a strong demand for OpenAI alternatives that are easier to train - possibly on proprietary domain data -  and run at a fraction of the cost. As more products get powered by AI, we anticipate growing demand for such lightweight GPT models in open source. In addition, we see the emergence of LLMs in other languages - GLM-130B for English and Chinese, PhoBERT for Vietnamese and Rinna for Japanese.

Infrastructure/Tools: Among Infrastructure/Tools projects, we see ColossalAI, Petals and CarperAI trlX - that focus on making large AI models cheaper and faster to train and more accessible for inference - gain adoption. The same is true for projects like LangChain and LlamaIndex (GPT Index) that enable LLMs to connect with external data. Finally, we see vector databases such as Weaviate, Qdrant and Milvus - that scale similarity search by storing both vectors and objects in a database and making the data available through GraphQL, REST and other clients - do well.

Applications: In the Applications category, we see a wide variety of applications powered by GPT as expected from a community that is experimenting with many diverse use cases. Several of the projects such as lencx/ChatGPT, wechat-ChatGPT and ChatGPT-Mac enable ChatGPT through other interfaces such as desktop, WeChat, and a menu bar. As the Generative AI technology and infrastructure mature over time and become more accessible to a wider audience of builders and creators, we expect to see projects in the Applications subcategory surpass those in Models and Infrastructure/Tools.

What’s next

While Generative AI projects were only 3.3% of the 5,716 open source projects that added at least 500 stars in Q1, we anticipate the share of fast-growing Generative AI projects to increase materially over the next several quarters.

Stay tuned for monthly installments of the Decibel GenOS Index!