#4198 IBM Granite 4 Agents

#4198 Tell me about IBM Granite 4 AI agents

IBM Granite 4.0 represents a significant pivot toward agentic AI and edge computing, released in late 2024 and expanded in early 2025. Unlike traditional transformer-only models, Granite 4.0 uses a novel Hybrid Mamba-2/Transformer architecture.

This hybrid approach makes these models particularly "agent-ready" because they offer 2x faster inference and over 70% lower memory requirements, allowing them to handle the multiple tool-calls and long context reasoning required for autonomous agents—even on local hardware like your laptop.

## Key Architecture & Agentic Features

Hybrid Architecture: By combining Mamba-2 (for linear scaling and long context) with standard Transformer layers (for high-quality attention), the models excel at instruction following and tool calling without the massive VRAM overhead of larger models.
Agent Performance: On the Berkeley Function Calling Leaderboard (BFCLv3), the Granite 4.0 "Small" models (roughly 9B activated parameters) compete directly with much larger models like Llama 3.1 70B and GPT-4o in their ability to accurately select and format API calls.
Native Tool-Calling Support: The chat template for Granite 4.0 is specifically designed for agentic loops. It uses specific tags like <|tool_call|> and <|tool_response|> to ensure the model distinguishes between its own reasoning and the output of an external tool.

## The Granite 4.0 Model Family

The "H" in the model names stands for Hybrid (Mamba-2 + Transformer).

Model	Size (Total / Activated)	Best Use Case for Agents
Granite-4.0-H-Small	32B / 9B (MoE)	The "workhorse" for complex multi-tool orchestration.
Granite-4.0-H-Tiny	7B / 1B (MoE)	Low-latency edge agents; ideal for Raspberry Pi or local servers.
Granite-4.0-H-Micro	3B (Dense)	Ultra-fast sub-agents (e.g., a "checker" or "router" agent).
Granite-4.0-Nano	350M - 1.5B	Browser-based agents. Small enough to run locally via WebGPU to interact with web APIs privately.

## Practical Applications for Developers

IBM has optimized these models for "agentic workflows" where a primary model (like the 32B Small) acts as an orchestrator that delegates tasks to specialized sub-agents.

RAG & Long Context: Because Mamba handles long sequences efficiently, Granite 4.0 is excellent for "Knowledge Agents" that need to ingest entire codebases or 100+ page technical manuals to answer questions.
Local Deployment: You can run these models locally using Ollama (e.g., ollama run granite4:micro-h) or LM Studio. This is a major advantage for privacy-sensitive work where you don't want to send tool-call data to a third-party cloud.
ISO 42001 Certified: For enterprise or NASA-related work, these are the first open models certified under the ISO 42001 standard for AI management and safety, making them easier to clear for high-stakes environments.

### Next Step

Given your interest in knowledge graphs and NASA data, would you like me to show you how to set up a local Granite 4.0 agent that can query a knowledge graph or call specific Python functions for data analysis?

Provide a template for defining an agent

For defining an agent specifically for IBM Granite 4.0, the gold standard is the BeeAI Framework. It is designed by IBM to leverage the unique "Hybrid Mamba" architecture of Granite, ensuring the model doesn't "forget" instructions during long tool-calling loops.

Below is a Python template using the beeai-framework to define a high-functioning agent.

## 1. The BeeAI Agent Template

This template uses the RequirementAgent class, which is more robust than a standard ReAct loop because it allows you to define strict execution rules (e.g., "Always verify data before answering").

from beeai_framework.agents.requirement import RequirementAgent
from beeai_framework.adapters.ollama import ChatModel
from beeai_framework.tools.search import DuckDuckGoSearchTool
# You can import your own custom tools here

async def main():
    # 1. Select the Granite 4.0 Model (Micro is great for speed, Small for logic)
    llm = ChatModel.from_name("ollama:granite4:micro-h")

    # 2. Define the Agent
    agent = RequirementAgent(
        llm=llm,
        tools=[DuckDuckGoSearchTool()], # Add your Knowledge Graph tools here
        role="Precision Research Assistant",
        instructions=[
            "Use the provided tools to find factual data.",
            "If a tool returns no results, do not hallucinate; state that the info is missing.",
            "Always format technical data in Markdown tables for clarity.",
            "Prioritize NASA-sourced information when available."
        ],
        # 'backstory' gives the model its persona/context
        backstory="You are an expert software process engineer working on S&MA documentation.",
        # 'requirements' enforce specific behaviors during the reasoning loop
        requirements=[
            # Example: Force a "thought" step before every action
            "Always explain your reasoning before calling a tool."
        ]
    )

    # 3. Execute a Task
    response = await agent.run(
        prompt="Find the latest status of the NASA Gateway Power and Propulsion Element (PPE).",
        expected_output="A summary table with current milestone dates and technical specs."
    )

    print(f"Agent Response: {response.result.text}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

## 2. Manual "Raw" Prompt Template

If you aren't using a framework and want to prompt Granite 4.0 directly (via API or Ollama), you must use its specific Role Tags to trigger agentic behavior.

System Prompt Template:

<|start_of_role|>system<|end_of_role|>
You are a specialized agent. Use the following tools to answer the user request.
Tools: [
  {
    "name": "get_knowledge_graph_node",
    "description": "Queries a NASA knowledge graph for a specific entity.",
    "parameters": {"type": "object", "properties": {"node_id": {"type": "string"}}}
  }
]
Always output your reasoning in a 'Thought:' block, then a tool call if needed.
<|end_of_text|>

## 3. Why This Works for You

Memory Efficiency: Since you work with complex requirements and NASA S&MA data, the Mamba-2 hybrid architecture in Granite 4.0 allows these agents to maintain a huge "working memory" of your documents without the performance hit typical of Llama or GPT models.
Local Control: You can run this entire template on your MSI Creator laptop using Ollama, keeping all sensitive NASA-related project data entirely offline.

### Next Step

Would you like me to help you write a custom tool function that connects this agent to one of your specific Python projects or a knowledge graph database?

From <https://gemini.google.com/app/> Google Gemini 3.0 (Fast)

Search This Blog

Learning from ChatGPT