DEV Community

Alain Airom
Alain Airom

Posted on

Granite πŸͺ¨ 4.0 3B πŸ‘“ Vision: Compact Multimodal Intelligence for Enterprise Documents is out and Bob built a UI to use it!

Creating a UI for the latest Granite Vision Model (using Docling)!

Introduction

IBM just annonced on Huggin Face the availability for β€œGranite 4.0 3b vision” model. This is a vision-language model (VLM) designed for enterprise-grade document data extraction. It focuses on specialized, complex extraction tasks that ultracompact models often struggle with:

  • Chart extraction: Converting charts into structured, machine-readable formats (Chart2CSV, Chart2Summary, and Chart2Code),
  • Table extraction: Accurately extracting tables with complex layouts from document images to JSON, HTML, or OTSL,
  • Semantic Key-Value Pair (KVP) extraction: Extracting values based on key names and descriptions across diverse document layouts.

Supported Tasks

The model supports specialized extraction tasks, each activated by a simple task tag in the user message. The chat template automatically expands tags into the full prompt β€” no need to write verbose instructions.


|                 Tag                 |                  Task                   |                  Output                   |
| :---------------------------------: | :-------------------------------------: | :---------------------------------------: |
|            `<chart2csv>`            |              Chart to CSV               | CSV table with headers and numeric values |
|           `<chart2code>`            |          Chart to Python code           |   Python code that recreates the chart    |
|          `<chart2summary>`          |            Chart to summary             | Natural-language description of the chart |
|           `<tables_json>`           |         Table extraction (JSON)         | Structured JSON with dimensions and cells |
|           `<tables_html>`           |         Table extraction (HTML)         |           HTML `<table>` markup           |
|           `<tables_otsl>`           |         Table extraction (OTSL)         |     OTSL markup with cell/merge tags      |
| KVP (see prompt instructions below) | Schema based Key-Value pairs extraction | JSON with nested dictionaries and arrays  |

Enter fullscreen mode Exit fullscreen mode

OK, having said that let’s see how you can implement, test and use it!


Implementation

Using the code provided on Hugging Face

You can use the code provided on Hugging Face page of the model (link provided in Links section) and run it!

  • Prepare your virtual environment and install the required packages
pip install torch==2.10.0 --index-url https://download.pytorch.org/whl/cu128
pip install transformers==4.57.6 peft==0.18.1 tokenizers==0.22.2 pillow==12.1.1
Enter fullscreen mode Exit fullscreen mode
  • Usage with transformers;
import re
from io import StringIO

import pandas as pd
import torch
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
from huggingface_hub import hf_hub_download

model_id = "ibm-granite/granite-4.0-3b-vision"
device = "cuda" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=torch.bfloat16,
    device_map=device
).eval()

# Optional: merge LoRA adapters into base weights for faster inference.
# Prefer to skip when using text-only tasks, as the LoRA adapters are vision-specific.
model.merge_lora_adapters()

def run_inference(model, processor, images, prompts):
    """Run batched inference on image+prompt pairs (one image per prompt)."""
    conversations = [
        [{"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": prompt},
        ]}]
        for prompt in prompts
    ]
    texts = [
        processor.apply_chat_template(conv, tokenize=False, add_generation_prompt=True)
        for conv in conversations
    ]
    inputs = processor(
        text=texts, images=images, return_tensors="pt", padding=True, do_pad=True
    ).to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=4096, 
        use_cache=True
    )
    results = []
    for i in range(len(prompts)):
        gen = outputs[i, inputs["input_ids"].shape[1]:]
        results.append(processor.decode(gen, skip_special_tokens=True))
    return results


def display_table(text):
    """Pretty-print CSV (possibly wrapped in ```

csv

```) or HTML table content via pandas."""
    m = re.search(r"```

csv\s*
(.*?)

```", text, re.DOTALL)
    if m:
        df = pd.read_csv(StringIO(m.group(1)))
        print(df.to_string(index=False))
    elif "<table" in text.lower():
        df = pd.read_html(StringIO(text))[0]
        print(df.to_string(index=False))
    else:
        print(text)
Enter fullscreen mode Exit fullscreen mode
  • Charts and Tables;
chart_path = hf_hub_download(repo_id=model_id, filename="chart.jpg")
table_path = hf_hub_download(repo_id=model_id, filename="table.png")
chart_img = Image.open(chart_path).convert("RGB")
table_img = Image.open(table_path).convert("RGB")

# Batched chart tasks
chart_prompts = ["<chart2csv>", "<chart2summary>", "<chart2code>"]
chart_results = run_inference(model, processor, [chart_img] * len(chart_prompts), chart_prompts)
for prompt, result in zip(chart_prompts, chart_results):
    print(f"{prompt}:")
    display_table(result)
    print()

# Batched table tasks
table_prompts = ["<tables_html>", "<tables_otsl>"]
table_results = run_inference(model, processor, [table_img] * len(table_prompts), table_prompts)
for prompt, result in zip(table_prompts, table_results):
    print(f"{prompt}:")
    display_table(result)
    print()
Enter fullscreen mode Exit fullscreen mode
  • KVP β€” Key Value Extraction
import json

invoice_path = hf_hub_download(repo_id=model_id, filename="invoice.png")
invoice_img = Image.open(invoice_path).convert("RGB")
schema = {
    "type": "object",
    "properties": {
        "invoice_date": {"type": "string", "description": "The date the invoice was issued"},
        "order_number": {"type": "string", "description": "The unique identifier for the order"},
        "seller_tax_id": {"type": "string", "description": "The tax identification number of the seller"},
    }
}

prompt = f"""Extract structured data from this document.
Return a JSON object matching this schema:

{json.dumps(schema, indent=2)}

Return null for fields you cannot find.
Return ONLY valid JSON.
Return an instance of the JSON with extracted values, not the schema itself."""

result = run_inference(model, processor, [invoice_img], [prompt])[0]
print(result)
Enter fullscreen mode Exit fullscreen mode

I made two simple applications implementing the Granite model, one built with Streamlit and a second one (enhance UI, more user-friendly, using IBM Bob).


Streamlit version including Docling (no Bob)

Addtitional packages are needed for this version.

  • Install the requirements (I had many version conflicts so you should adapt to your configuration)
beautifulsoup4>=4.12.0
lxml>=4.9.0
markdown>=3.5.0
python-dotenv>=1.0.0

docling>=2.0.0
pdf2image>=1.16.3

torch==2.10.0 --index-url https://download.pytorch.org/whl/cu128
transformers==4.57.6 
peft==0.18.1 
tokenizers==0.22.2 
pillow==12.1.1 
streamlit 
pandas
Enter fullscreen mode Exit fullscreen mode
  • And the application


import streamlit as st
import torch
from transformers import AutoProcessor, AutoModel 
from PIL import Image
import os
import traceback
from datetime import datetime

# --- DOCLING IMPORTS ---
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

# --- CONFIGURATION ---
MODEL_ID = "ibm-granite/granite-4.0-3b-vision"
DEVICE = "mps" if torch.backends.mps.is_available() else "cpu"
OUTPUT_DIR = "output"

if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)

st.set_page_config(page_title="Granite 4.0 Vision + Docling v4", layout="wide")

@st.cache_resource
def load_tools():
    try:
        with st.spinner("Step 1: Loading IBM Granite 4.0 Vision..."):
            processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
            model = AutoModel.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16, device_map=DEVICE, trust_remote_code=True)
            if hasattr(model, "merge_lora_adapters"):
                model.merge_lora_adapters()
        with st.spinner("Step 2: Initializing Docling Engine..."):
            pipeline_options = PdfPipelineOptions()
            pipeline_options.generate_page_images = True
            pipeline_options.images_scale = 2.0 
            pdf_options = PdfFormatOption(pipeline_options=pipeline_options)
            converter = DocumentConverter(format_options={InputFormat.PDF: pdf_options})
        return processor, model, converter
    except Exception as e:
        st.error(f"Initialization Error: {e}")
        return None, None, None

processor, model, converter = load_tools()

def run_vlm(image, task_prompt, _processor, _model):
    prompt = f"<|user|>\n<image>\n{task_prompt}<|assistant|>\n"
    inputs = _processor(images=image, text=prompt, return_tensors="pt").to(DEVICE, torch.bfloat16)
    output_tokens = _model.generate(**inputs, max_new_tokens=1024, do_sample=False, num_beams=1)
    return _processor.batch_decode(output_tokens, skip_special_tokens=True)[0].strip()

st.title("πŸ“„ Granite 4.0 Vision + Docling v3")
if processor is None or converter is None:
    st.stop()

TASK_MAP = {
    "Table (HTML)": {"prompt": "<table_html>", "ext": "html"},
    "Table (JSON)": {"prompt": "<table_json>", "ext": "json"},
    "Key-Value Pairs (KVP)": {"prompt": "<kvp>", "ext": "json"},
    "Chart (CSV)": {"prompt": "<chart_csv>", "ext": "csv"}
}

with st.sidebar:
    st.header("βš™οΈ Settings")
    do_markdown = st.checkbox("Export Full Markdown (Docling Native)", value=True)
    selected_tasks = st.multiselect("Granite AI Tasks:", list(TASK_MAP.keys()), default=["Table (HTML)"])
    if st.button("Clear Output Folder"):
        for f in os.listdir(OUTPUT_DIR): os.remove(os.path.join(OUTPUT_DIR, f))

uploaded_file = st.file_uploader("Upload PDF", type=["pdf", "png", "jpg", "jpeg"])

if uploaded_file:
    temp_path = os.path.join(os.getcwd(), f"temp_{uploaded_file.name}")
    with open(temp_path, "wb") as f: f.write(uploaded_file.getbuffer())

    if st.button("πŸš€ Start Extraction"):
        try:
            with st.status("Processing Document...") as status:
                conv_result = converter.convert(temp_path)

                if do_markdown:
                    md_output = conv_result.document.export_to_markdown()
                    with open(os.path.join(OUTPUT_DIR, f"doc_{datetime.now().strftime('%H%M%S')}.md"), "w") as f:
                        f.write(md_output)
                    st.success("Markdown Export Complete")
                    with st.expander("πŸ“ View Markdown"): st.markdown(md_output)

                for page_no, page_obj in conv_result.document.pages.items():
                    st.subheader(f"πŸ“ Page {page_no}")
                    if page_obj.image:
                        # Fix: Use .pil_image for ImageRef compatibility
                        page_img = page_obj.image.pil_image 
                        st.image(page_img, width=450)
                        for task in selected_tasks:
                            with st.spinner(f"Running {task}..."):
                                res = run_vlm(page_img, TASK_MAP[task]["prompt"], processor, model)
                                out_name = f"pg{page_no}_{task.replace(' ', '_')}_{datetime.now().strftime('%H%M%S')}.{TASK_MAP[task]['ext']}"
                                with open(os.path.join(OUTPUT_DIR, out_name), "w") as f: f.write(res)
                                with st.expander(f"βœ… {task}"): st.code(res)
            st.success("All Tasks Finished!")
        except Exception as e:
            st.error(f"Runtime Error: {e}")
            print(traceback.format_exc())
        finally:
            if os.path.exists(temp_path): os.remove(temp_path)     
Enter fullscreen mode Exit fullscreen mode
  • I used images and real invoices, the output for a real invoice from my medical visit in French, from PDF too Markdown) is quite outstanding!
-Facture nΒ° 1775 du 01/04/2026

Feuille de soins Γ©lectronique

- xxxx Β· EI

Masseur KinΓ©sithΓ©rapeute

xxxxx NΒ° facturation PS : xxxxxx RPPS : 11111111

- BΓ©nΓ©ficiaire des soins

YYYY NΓ© le xx/xx/xxxx NIR : 1234567890 NANTERRE (xxx) RΓ©gime GΓ©nΓ©ral (xx)

-AssurΓ©

YYYY

NΓ© le 01/01/1900

NIR : 1234567890

| Date       | ExonΓ©ration | Acte     | Taux AMO | Part AMO | Part AMC | Montant |
| ---------- | ----------- | -------- | -------- | -------- | -------- | ------- |
| 25/03/2026 | -           | VSM 8.08 | 60,00 %  | 10,72 €  | 0,00 €   | 38,00 € |
| 01/04/2026 | -           | VSM 8.08 | 60,00 %  | 10,72 €  | 0,00 €   | 38,00 € |

PayΓ© le 01/04/2026 par Carte bancaire

| TOTAL        | 76,00 € |
| ------------ | ------- |
| Part patient | 54,56 € |
| Part AMO     | 21,44 € |
| Part AMC     | 0,00 €  |

<!-- image -->

Prescripteur NΒ°12345 Le 09/03/2026
Enter fullscreen mode Exit fullscreen mode

Bob’s version built on Gradio including Docling

To create the version which is presented below, I used Bob, and gave to it the Streamlit built version. I asked for a Gradio based version and an enhanced UI.

This is the UI of what Bob provided;

And the following is the code;

#!/usr/bin/env python3
"""
Enhanced Gradio UI for Granite 4.0 Vision + Docling
Features:
- Better UI with live previews
- Multi-choice checkboxes for output formats
- Page-by-page processing with previews
- Real-time result display
"""

import gradio as gr
import torch
from transformers import AutoProcessor, AutoModel
from PIL import Image
import os
import traceback
from datetime import datetime
from pathlib import Path
import json

# Docling imports
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

# Configuration
MODEL_ID = "ibm-granite/granite-4.0-3b-vision"
DEVICE = "mps" if torch.backends.mps.is_available() else "cpu"
OUTPUT_DIR = "output"

# Ensure output directory exists
Path(OUTPUT_DIR).mkdir(exist_ok=True)

# Task definitions
TASK_MAP = {
    "Table (HTML)": {"prompt": "<table_html>", "ext": "html", "icon": "πŸ“Š"},
    "Table (JSON)": {"prompt": "<table_json>", "ext": "json", "icon": "πŸ“‹"},
    "Key-Value Pairs (KVP)": {"prompt": "<kvp>", "ext": "json", "icon": "πŸ”‘"},
    "Chart (CSV)": {"prompt": "<chart_csv>", "ext": "csv", "icon": "πŸ“ˆ"},
}

# Global variables for model and converter
processor = None
model = None
converter = None


def initialize_models():
    """Initialize Granite Vision model and Docling converter."""
    global processor, model, converter

    try:
        # Load Granite Vision model
        print(f"Loading Granite 4.0 Vision on {DEVICE}...")
        processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True)
        model = AutoModel.from_pretrained(
            MODEL_ID,
            torch_dtype=torch.bfloat16,
            device_map=DEVICE,
            trust_remote_code=True
        )

        if hasattr(model, "merge_lora_adapters"):
            model.merge_lora_adapters()

        # Initialize Docling converter
        print("Initializing Docling converter...")
        pipeline_options = PdfPipelineOptions()
        pipeline_options.generate_page_images = True
        pipeline_options.images_scale = 2.0
        pdf_options = PdfFormatOption(pipeline_options=pipeline_options)
        converter = DocumentConverter(format_options={InputFormat.PDF: pdf_options})

        return "βœ… Models initialized successfully!"

    except Exception as e:
        return f"❌ Error initializing models: {str(e)}\n{traceback.format_exc()}"


def run_vlm_task(image, task_prompt):
    """Run a VLM task on an image."""
    if processor is None or model is None:
        return "Error: Models not initialized"

    try:
        prompt = f"<|user|>\n<image>\n{task_prompt}<|assistant|>\n"
        inputs = processor(images=image, text=prompt, return_tensors="pt").to(DEVICE, torch.bfloat16)
        output_tokens = model.generate(**inputs, max_new_tokens=1024, do_sample=False, num_beams=1)
        result = processor.batch_decode(output_tokens, skip_special_tokens=True)[0].strip()
        return result
    except Exception as e:
        return f"Error: {str(e)}"


def format_preview(content, format_type):
    """Format content for preview based on type."""
    if format_type == "html":
        return content
    elif format_type in ["json", "csv"]:
        return f"```
{% endraw %}
{format_type}\n{content}\n
{% raw %}
```"
    else:
        return content


def process_document(
    file,
    export_markdown,
    task_table_html,
    task_table_json,
    task_kvp,
    task_chart_csv
):
    """Process uploaded document with selected tasks."""

    if file is None:
        return "Please upload a file first.", None, None, None

    if converter is None:
        return "Models not initialized. Please initialize first.", None, None, None

    # Determine which tasks to run
    selected_tasks = []
    if task_table_html:
        selected_tasks.append("Table (HTML)")
    if task_table_json:
        selected_tasks.append("Table (JSON)")
    if task_kvp:
        selected_tasks.append("Key-Value Pairs (KVP)")
    if task_chart_csv:
        selected_tasks.append("Chart (CSV)")

    if not export_markdown and not selected_tasks:
        return "Please select at least one output format.", None, None, None

    try:
        # Save uploaded file temporarily
        # Generate a unique temporary filename
        temp_path = Path(f"temp_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pdf")
        with open(temp_path, "wb") as f:
            # file is already bytes when type="binary"
            if isinstance(file, bytes):
                f.write(file)
            else:
                f.write(file.read())

        # Convert document
        status_text = "πŸ“„ Converting document with Docling...\n"
        conv_result = converter.convert(str(temp_path))

        # Export markdown if requested
        markdown_output = None
        markdown_file = None
        if export_markdown:
            status_text += "βœ… Exporting to Markdown...\n"
            markdown_output = conv_result.document.export_to_markdown()
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            markdown_file = Path(OUTPUT_DIR) / f"doc_{timestamp}.md"
            with open(markdown_file, "w", encoding="utf-8") as f:
                f.write(markdown_output)
            status_text += f"βœ… Markdown saved: {markdown_file.name}\n\n"

        # Process each page
        results_html = "<div style='font-family: sans-serif;'>"
        results_html += f"<h2>πŸ“Š Processing Results</h2>"
        results_html += f"<p><strong>Document:</strong> {temp_path.name}</p>"
        results_html += f"<p><strong>Total Pages:</strong> {len(conv_result.document.pages)}</p>"
        results_html += "<hr>"

        all_outputs = []

        for page_no, page_obj in conv_result.document.pages.items():
            status_text += f"\nπŸ“ Processing Page {page_no}...\n"
            results_html += f"<h3>πŸ“„ Page {page_no}</h3>"

            if page_obj.image:
                page_img = page_obj.image.pil_image

                # Save page image for preview
                img_path = Path(OUTPUT_DIR) / f"page_{page_no}_{datetime.now().strftime('%H%M%S')}.png"
                page_img.save(img_path)

                results_html += f"<img src='file/{img_path}' style='max-width: 400px; border: 1px solid #ddd; margin: 10px 0;'><br>"

                # Run selected tasks
                for task_name in selected_tasks:
                    task_info = TASK_MAP[task_name]
                    status_text += f"  {task_info['icon']} Running {task_name}...\n"

                    result = run_vlm_task(page_img, task_info["prompt"])

                    # Save result
                    timestamp = datetime.now().strftime('%H%M%S')
                    out_name = f"pg{page_no}_{task_name.replace(' ', '_')}_{timestamp}.{task_info['ext']}"
                    out_path = Path(OUTPUT_DIR) / out_name

                    with open(out_path, "w", encoding="utf-8") as f:
                        f.write(result)

                    all_outputs.append(str(out_path))

                    # Add to results HTML
                    results_html += f"<div style='margin: 10px 0; padding: 10px; background: #f5f5f5; border-radius: 5px;'>"
                    results_html += f"<strong>{task_info['icon']} {task_name}</strong> "
                    results_html += f"<span style='color: #666; font-size: 0.9em;'>({out_name})</span><br>"

                    # Preview result
                    preview = result[:500] + "..." if len(result) > 500 else result
                    if task_info['ext'] == 'html':
                        results_html += f"<div style='margin-top: 5px; padding: 5px; background: white; border: 1px solid #ddd; max-height: 200px; overflow: auto;'>{preview}</div>"
                    else:
                        results_html += f"<pre style='margin-top: 5px; padding: 5px; background: white; border: 1px solid #ddd; max-height: 200px; overflow: auto; font-size: 0.85em;'>{preview}</pre>"

                    results_html += "</div>"

                    status_text += f"  βœ… {task_name} complete\n"

            results_html += "<hr>"

        results_html += "</div>"

        status_text += f"\nπŸŽ‰ All tasks completed!\n"
        status_text += f"πŸ“ Output directory: {OUTPUT_DIR}\n"
        status_text += f"πŸ“„ Total files generated: {len(all_outputs) + (1 if markdown_file else 0)}\n"

        # Clean up temp file
        if temp_path.exists():
            temp_path.unlink()

        return status_text, results_html, markdown_output, str(markdown_file) if markdown_file else None

    except Exception as e:
        error_msg = f"❌ Error processing document:\n{str(e)}\n\n{traceback.format_exc()}"
        return error_msg, None, None, None


def clear_output_folder():
    """Clear all files in the output folder."""
    try:
        count = 0
        for file in Path(OUTPUT_DIR).glob("*"):
            if file.is_file():
                file.unlink()
                count += 1
        return f"βœ… Cleared {count} files from output folder"
    except Exception as e:
        return f"❌ Error clearing folder: {str(e)}"


def create_interface():
    """Create the Gradio interface."""

    with gr.Blocks(
        title="Granite 4.0 Vision + Docling Enhanced",
        theme=gr.themes.Soft(),
        css="""
        .output-box { border: 2px solid #e0e0e0; border-radius: 8px; padding: 15px; }
        .status-box { font-family: monospace; font-size: 0.9em; }
        """
    ) as app:

        gr.Markdown("""
        # πŸš€ Granite 4.0 Vision + Docling Enhanced

        **Advanced document processing with IBM Granite 4.0 3B Vision and Docling**

        Upload a PDF or image, select your desired output formats, and let AI extract structured data!
        """)

        with gr.Row():
            with gr.Column(scale=1):
                gr.Markdown("### βš™οΈ Configuration")

                init_btn = gr.Button("πŸ”„ Initialize Models", variant="primary", size="lg")
                init_status = gr.Textbox(
                    label="Initialization Status",
                    lines=3,
                    interactive=False
                )

                gr.Markdown("---")
                gr.Markdown("### πŸ“€ Upload Document")

                file_input = gr.File(
                    label="Upload PDF or Image",
                    file_types=[".pdf", ".png", ".jpg", ".jpeg"],
                    type="binary"
                )

                gr.Markdown("### πŸ“‹ Output Formats")

                export_markdown = gr.Checkbox(
                    label="πŸ“ Export Full Markdown (Docling Native)",
                    value=True
                )

                gr.Markdown("**Granite AI Tasks:**")

                task_table_html = gr.Checkbox(
                    label="πŸ“Š Table (HTML)",
                    value=True
                )
                task_table_json = gr.Checkbox(
                    label="πŸ“‹ Table (JSON)",
                    value=False
                )
                task_kvp = gr.Checkbox(
                    label="πŸ”‘ Key-Value Pairs (KVP)",
                    value=False
                )
                task_chart_csv = gr.Checkbox(
                    label="πŸ“ˆ Chart (CSV)",
                    value=False
                )

                gr.Markdown("---")

                process_btn = gr.Button("πŸš€ Start Processing", variant="primary", size="lg")
                clear_btn = gr.Button("πŸ—‘οΈ Clear Output Folder", variant="secondary")

                clear_status = gr.Textbox(
                    label="Clear Status",
                    lines=1,
                    interactive=False
                )

            with gr.Column(scale=2):
                gr.Markdown("### πŸ“Š Processing Status")

                status_output = gr.Textbox(
                    label="Status Log",
                    lines=10,
                    interactive=False,
                    elem_classes=["status-box"]
                )

                gr.Markdown("### πŸ” Results Preview")

                with gr.Tabs():
                    with gr.Tab("πŸ“„ Visual Results"):
                        results_html = gr.HTML(
                            label="Results with Previews",
                            elem_classes=["output-box"]
                        )

                    with gr.Tab("πŸ“ Markdown Output"):
                        markdown_output = gr.Textbox(
                            label="Full Markdown Content",
                            lines=20,
                            interactive=False
                        )
                        markdown_file = gr.Textbox(
                            label="Markdown File Path",
                            interactive=False
                        )

        gr.Markdown("""
        ---
        ### πŸ“š How to Use

        1. **Initialize Models**: Click "Initialize Models" (first time only, takes a few minutes)
        2. **Upload Document**: Choose a PDF or image file
        3. **Select Formats**: Check the output formats you want
        4. **Process**: Click "Start Processing" and wait for results
        5. **Review**: Check the results preview and status log
        6. **Access Files**: All outputs are saved in the `output/` folder

        ### πŸ’‘ Tips

        - **Markdown Export**: Best for full document conversion
        - **Table (HTML)**: Great for web display and styling
        - **Table (JSON)**: Perfect for data processing and APIs
        - **KVP**: Extracts key-value pairs from forms and documents
        - **Chart (CSV)**: Converts charts to structured data

        ### 🎯 Output Location

        All generated files are saved in: `output/` directory with timestamps
        """)

        # Event handlers
        init_btn.click(
            fn=initialize_models,
            outputs=[init_status]
        )

        process_btn.click(
            fn=process_document,
            inputs=[
                file_input,
                export_markdown,
                task_table_html,
                task_table_json,
                task_kvp,
                task_chart_csv
            ],
            outputs=[
                status_output,
                results_html,
                markdown_output,
                markdown_file
            ]
        )

        clear_btn.click(
            fn=clear_output_folder,
            outputs=[clear_status]
        )

    return app


def main():
    """Main entry point."""
    print(f"Starting Granite 4.0 Vision + Docling Enhanced UI")
    print(f"Device: {DEVICE}")
    print(f"Output directory: {OUTPUT_DIR}")

    app = create_interface()
    app.launch(
        server_name="0.0.0.0",
        server_port=7861,
        share=False,
        show_error=True
    )


if __name__ == "__main__":
    main()

# Made with Bob
Enter fullscreen mode Exit fullscreen mode

Conclusion

The integration of Docling and Granite 4.0 3B Vision transforms unstructured documents into a strategic asset. By bridging the gap between raw visual data and structured digital information, this pipeline moves beyond simple text extraction to provide deep contextual understanding.

The samples provided in this post are intended to be used as productivity accelerators to help you build your own solutions!

>>> Thanks for reading <<<

Links

Top comments (0)