#4251 Google Gemini File Upload with Prompt

#4251 Google Gemini File Upload with Prompt

#4251 Provide a Python application to upload a document to Gemini 2.5 with a prompt and do this in a Tkinter GUI

While Gemini 2.5 is the latest and greatest in our lineage, the integration typically uses the Google Gen AI SDK. To get this running, you'll need an API key from Google AI Studio.

The following application allows you to select a PDF or text file, enter a prompt, and see the response in a scrollable window.

Prerequisites

Install the library via terminal:

pip install -U google-generativeai

The Python Application

import tkinter as tk
from tkinter import filedialog, messagebox, scrolledtext
import google.generativeai as genai
import os

# --- Configuration ---
# Replace with your actual API key or set it as an environment variable
API_KEY = "YOUR_GEMINI_API_KEY"
genai.configure(api_key=API_KEY)

class GeminiApp:
    def __init__(self, root):
        self.root = root
        self.root.title("Gemini 2.5 Document Assistant")
        self.root.geometry("600x700")
        
        self.file_path = None

        # UI Layout
        tk.Label(root, text="Step 1: Select a Document", font=("Arial", 10, "bold")).pack(pady=10)
        self.btn_browse = tk.Button(root, text="Browse File", command=self.browse_file)
        self.btn_browse.pack()
        
        self.lbl_file = tk.Label(root, text="No file selected", fg="blue")
        self.lbl_file.pack(pady=5)

        tk.Label(root, text="Step 2: Enter your Prompt", font=("Arial", 10, "bold")).pack(pady=10)
        self.prompt_entry = tk.Entry(root, width=70)
        self.prompt_entry.pack(pady=5)
        self.prompt_entry.insert(0, "Summarize this document.")

        self.btn_submit = tk.Button(root, text="Upload & Process", bg="#4CAF50", fg="white", 
                                   command=self.process_document, height=2, width=20)
        self.btn_submit.pack(pady=20)

        tk.Label(root, text="Response:", font=("Arial", 10, "bold")).pack()
        self.output_text = scrolledtext.ScrolledText(root, width=70, height=20, wrap=tk.WORD)
        self.output_text.pack(pady=10, padx=10)

    def browse_file(self):
        self.file_path = filedialog.askopenfilename(
            filetypes=[("Documents", "*.pdf *.txt *.docx"), ("All files", "*.*")]
        )
        if self.file_path:
            self.lbl_file.config(text=os.path.basename(self.file_path))

    def process_document(self):
        if not self.file_path:
            messagebox.showwarning("Error", "Please select a file first.")
            return
        
        prompt = self.prompt_entry.get()
        if not prompt:
            messagebox.showwarning("Error", "Please enter a prompt.")
            return

        self.output_text.delete(1.0, tk.END)
        self.output_text.insert(tk.END, "Uploading and thinking... Please wait.")
        self.root.update_idletasks()

        try:
            # Upload the file to the Gemini API
            uploaded_file = genai.upload_file(path=self.file_path)
            
            # Initialize the model (Gemini 2.5 Flash/Pro)
            model = genai.GenerativeModel("gemini-2.5-flash")
            
            # Generate content
            response = model.generate_content([uploaded_file, prompt])
            
            # Display results
            self.output_text.delete(1.0, tk.END)
            self.output_text.insert(tk.END, response.text)
            
        except Exception as e:
            messagebox.showerror("API Error", f"An error occurred: {str(e)}")
            self.output_text.delete(1.0, tk.END)

if __name__ == "__main__":
    root = tk.Tk()
    app = GeminiApp(root)
    root.mainloop()

Key Features of this Script:

  • genai.upload_file: This is the modern way to handle documents. It uploads the file to Google's servers temporarily, which is much more efficient for large PDFs than trying to scrape the text locally.

  • Error Handling: If your API key is invalid or the file format isn't supported, it will catch the error and show a popup rather than just crashing.

  • Gemini 2.5 Flash: I've set the model to gemini-2.5-flash for speed, but you can swap it to gemini-2.5-pro if you need deeper reasoning for complex documents.

Note: Ensure your API key is kept private. For a production app, you’d want to load that from an .env file rather than hardcoding it.

Would you like me to add a progress bar or a feature to save the response to a text file?


From <https://gemini.google.com/app/d1f2d6bad0e350f5>  Google Gemini 3.0 (Fast)

Comments

Popular posts from this blog

#2892 Example of ChatGPT o1 Prompt

#3105 Federal and State Spending Idiocy