Holywood News

No GPU, no problem: Zirohl Labs can run AI models with CPU

But what if AI could run on the CPU – no loss of speed or quality?

California-based DeepTech startup Ziroh Labs has partnered with Indian Institute of Technology Madras (IIT Madras) and IITM Pravartak Technologies Foundation to build the Kompact AI platform. In less than a month after its release, Ziro Labs received more than 200 requests to use the product. “So far, we have only managed about 50 meetings – 7 to 8 times a day,” said Hrishikesh Dewan, co-founder of Ziroh Labs. Mint In the interview.

CPUs typically have fewer cores (typically 4 to 16) designed to handle complex tasks quickly, while GPUs have hundreds or thousands of simpler cores that can run many tasks simultaneously, making them ideal for AI and machine learning. However, GPUs are scarce and are much more expensive than CPUs. According to the tech industry observer Jon Peddy Research.

In contrast, there are more CPUs in the world, namely the minimum CPU per household, Dewan notes. He believes that if these CPUs are made to run AI, access can be unlocked immediately. “Startups can call any data center and get CPU-based machines. So accessibility is increased, costs are reduced, and power requirements are lower – making AI more sustainable,” Dewan explained.

How does Kompact AI work?

Dewan explains that AI models are essentially just sets of systems of equations that can be computed. This means they can run on the GPU, but they can run on the CPU, even on devices in the washing machine or refrigerator. According to him, the real question is not: Can you run AI models on your CPU? On the contrary: how long does it take to obtain the output, and what is the quality of that output?

For example, companies use technologies such as quantitative and distillation to run advanced AI on conventional CPUs, but these technologies can compromise output quality. But how? Quantization reduces the “precision” of numbers within the AI ​​model (rather than using 32-bit numbers, it may use 8-bits). Distillation takes a large model (“teacher”) and trains a smaller, simpler model (“student”) to mimic the teacher's behavior. Student models run faster, but are usually not as clever as teachers, which simplifies things, while some of the knowledge in translation is lost.

According to Dewan, Kompact AI avoids quantization and distillation to keep the model’s complete quality intact, but obtains the same output expected by the model designer. “First, we kept the original model without any quantization or distillation, so it performed by design. Second, we optimized it to a specific processor – into its instruction set. The pairing of the model and processor is what we call Kompact AI.”

Kompact Menu

Kompact AI provides a library of pre-optimized AI models covering text, voice, image and multimodal applications, all designed to run smoothly on the CPU. Developers can access these models worldwide and use Kompact's common AI runtime (called the common AI language runtime or ICAN), which supports over 10 programming languages ​​to easily integrate them into their applications. The runtime will usually take the code you wrote and make sure it is actually effective when the program is running. Dewan said the company “can run these models on its own CPU, both on-premises and in the cloud.”

Zeroh Labs optimized 17 models across eight processor types. It includes models from the DeepSeek series, Microsoft's PHI model, Meta's Llama and Alibaba's Qwen series. These models are also optimized for processors of Intel Xeon Silver, Gold, Platinum, Emerald, etc. Taking camel as an example, a 27 billion parameter model. “We first optimize it theoretically, and then we fine-tune it (teach a trained model to better accomplish a specific task by giving it new tasks, with new, focused examples). Performance of each processor. This is the product-optimized model and its matching runtime. The key part is this: we retain the full quality of the model. We have an abnormal quality on MADR. Our effect on IT. Implementation.” Dewan explained.

To be sure, other companies are also working on developing AI models that are already running on CPUs to reduce dependency on GPUs. For example, in April, Microsoft Research introduced it BITNET B1.58a “1-bit” AI model that can run on standard CPUs including Apple M2. However, to get the desired level of performance, Microsoft uses its custom framework Bitnet.cpp, which is currently only used with certain types of hardware.

Intel's model

Intel has also optimized over 500 AI models to run on its core Ultra processors, but they include built-in components such as neural processing units (NPUs) to help with AI processing on devices. Intel's Meteor Lake CPU also has an integrated NPU. ARM's CPU architecture is tailored for AI inference tasks (trained models make predictions or provide answers based on new inputs), supporting a range of workloads from large language models (LLM) to specific domains, making it suitable for edge devices and energy-efficient deployments. Neuromagic (acquired by IBM unit Red Hat) is committed to optimizing deep learning algorithms to leverage the chip's large memory and complex cores to run effectively on the CPU.

While others may pursue the challenge of building larger models, we focus on using domain-specific models to solve real-world problems, Dewan said. Kompact AI plans start with small models with 50 billion or less parameters. Dewan believes that Kompact AI will find applications in many areas. “Specific use cases, such as helping farmers, supporting students or assisting frontline workers in rural areas, are best run effectively by smaller, task-specific AI models for smartphones and low-power devices, ensuring accessibility to the most important places,” he explained.

It can even be used for “green computing” – running in data centers powered by renewable energy. Dewan explains that since most centers already have CPUs and power, unlike the GPUs that are longing for, and therefore no new infrastructure is needed, Kompact AI is a greener option. Other use cases include AI income tax, point-of-sale (POS) systems, he added Kirana Stores, climate and aqueous solutions. “Everyone wants to add intelligence to their workflow and we make it all possible,” he said.

Uncompromising privacy

Dewan said Kompact AI also helps maintain privacy. “Remember when everyone uploaded photos to Openai tools like Sora or dall-e? Someone asked where did those photos go? Today, AI is so powerful that it can be used to generate hundreds of them with one photo. Yes, OpenAI's terminology allows them to use that data. So the question becomes: Where is your data and how to use it?

“Now, the model can run on your own device or on your own server (called 'Edge' calculations), so your data is with you, solving your privacy issues because you don't need to send anything,” he explained.

As for the funding, Zirohab Labs has not announced any details yet, but “we have raised funds, mostly from angel investors in the Bay Area, and now from institutional investors.” “We will reveal more when the time is right. However, this has to be successful – there is no room for failure.”

Key Points

  1. Ziroh Labs' Kompact AI allows AI models to run efficiently on CPUs rather than GPUs, thus addressing GPU shortages and high costs.
  2. Kompact AI preserves complete model quality while optimizing the performance of a specific processor.
  3. Running AI on CPU makes AI deployment cheaper, more sustainable and easier to access.
  4. Kompact AI runs models on personal devices or private servers, enhancing data privacy by eliminating the need to send data to third-party cloud providers.
  5. The platform is designed for practical use cases such as supporting farmers, students, and frontline workers and implementing “green computing” in energy-efficient data centers.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button