#3780 Semantic Query for Outline of Recommended Content
#3780 Semantic Query for Outline of Recommended Content
#3780 I have an outline that I want to use for semantic search of a document. I want to step through each entry in the outline to find the closest matches in the provided document. Some of the entries are very short. How can I use semantic search with this outline: """1. Minimum Recommended Content ... Minimum recommended content for the Software Development - Management Plan. ... Project organizational structure showing ... """
To perform a semantic search using your outline, you can use the outline entries as queries to find the most semantically similar text within your document. The key challenge is that some of your outline entries are very short, which can lead to poor search results. To address this, you should expand the short queries to provide more context.
Here's a step-by-step approach to using your outline for semantic search:
Step 1: Pre-process the document
First, you'll need to prepare your document for semantic search. This involves breaking it down into smaller, searchable chunks, such as paragraphs, sentences, or even sections. The size of the chunks is important; they should be large enough to contain meaningful context but small enough to be distinct.
Split the document: Divide the main document into a list of text snippets. Paragraphs are often a good starting point.
Embed the chunks: Use a pre-trained transformer model (like those from the Hugging Face library) to convert each text snippet into a numerical vector, or embedding. These embeddings capture the semantic meaning of the text. Store these embeddings in a vector database or a simple list.
Step 2: Pre-process the outline
Next, you need to prepare your outline entries to be used as effective search queries.
Expand short queries: For short entries like "Project involvement," "User involvement," or "Risk management," expand them into more descriptive phrases. For example, "Project involvement" could become "Description of how the project will involve external parties and stakeholders." This provides the model with more information to find relevant text.
Create a query list: Organize your outline into a list of these expanded queries.
Step 3: Perform the semantic search
With your document chunks and expanded queries, you can now perform the search.
Embed each query: One by one, take an expanded outline query and use the same transformer model to convert it into an embedding.
Calculate similarity: Compare the query embedding to all the document chunk embeddings using a similarity metric like cosine similarity. The cosine similarity measures the angle between two vectors, with a value closer to 1 indicating higher similarity in meaning.
Rank the results: Sort the document chunks in descending order based on their similarity score to the query.
Present the matches: The top-ranked chunks are the most semantically relevant matches for that specific outline entry.
Step 4: Refine the process
You may need to iterate to improve the results.
Adjust chunk size: If your initial results are not specific enough, you might try splitting the document into smaller chunks (e.g., sentences). If they lack context, try larger chunks (e.g., paragraphs).
Use a better model: Experiment with different embedding models. Some models are fine-tuned for specific domains, which could improve the accuracy of your results.
By expanding your short queries, you provide a more robust input for the semantic search model, leading to more accurate and meaningful matches between your outline and the document. This method helps you overcome the limitations of short, keyword-based queries and leverages the power of semantic understanding.
Regenerate this outline to provide the best queries
You have a great starting point for your semantic search. To make the outline entries into the best possible queries, we need to address the very short entries that lack context. The goal is to turn them into complete thoughts or questions that a semantic search model can better understand.
Here is the regenerated outline, with each entry expanded into an optimal semantic search query.
Regenerated Outline for Semantic Search Queries
Minimum Recommended ContentFind the minimum recommended content for the Software Development - Management Plan.
[a] Project Organizational Structure
Find the section describing the project organizational structure and the authority and responsibility of each team, including external organizations like Safety and Mission Assurance, IV&V, and Technical Authority.
[b] Safety-Critical Determination
Locate the safety-critical determination and software classification of all systems and subsystems containing software.
[c] Tailored Requirements Mapping
Find the tailored requirements mapping or compliance matrix that requires approval from the designated Engineering Technical Authority, especially if there are any waivers or deviations to the NPR.
[d] Engineering Environment
Locate information about the engineering environment, including the test environment, library, equipment, facilities, standards, procedures, and tools for development, operation, or maintenance.
[e] Work Breakdown Structure
Find details on the work breakdown structure for life cycle processes and activities, including software products, services, non-deliverable items, budgets, staffing, acquisition approach, physical resources, software size, and associated schedules.
[f] Quality Management
Find the section on how the quality characteristics of software products or services will be managed.
[g] Critical Requirements Management
Find information on the management of safety, security, privacy, and other critical requirements of the software products or services.
[h] Subcontractor Management
Locate the process for subcontractor management, including how subcontractors are selected and how they will be involved with the acquirer.
[i.i] Verification Methods and Criteria
Find the identification of selected software verification methods and criteria across the life cycle, such as peer review procedures, re-review criteria, and testing procedures.
[i.ii] Work Products for Verification
Identify the specific work products that are selected for verification.
[i.iii] Verification Environments
Find a description of the software verification environments to be established for the project, such as the software testing environment, system testing environment, and regression testing environment.
[i.iv] Verification Records and Corrective Actions
Locate where actual software verification records and analysis results (e.g., test records, inspection records) and any corrective actions will be documented.
[j.i] Validation Methods and Criteria
Find the identification of selected software validation methods and criteria across the life cycle, such as prototyping, user groups, simulation, and acceptance testing.
[j.ii] Work Products for Validation
Identify the specific work products that are selected for validation.
[j.iii] Validation Environments
Find a description of the software validation environments to be established, such as simulators for the operational environment.
[j.iv] Validation Records and Corrective Actions
Locate where actual software validation records and analysis results (e.g., user group records, prototyping records, acceptance testing records) and any corrective actions will be documented.
[k] Project Involvement
Find information on how external project stakeholders will be involved.
[l] User Involvement
Locate details on how end-users will be involved in the project.
[m] Risk Management
Find a description of the risk management process.
[n] Security Policy
Locate the project's security policy.
[o] Approvals and Certifications
Find details on required approvals, certifications, and proprietary, usage, ownership, warranty, or licensing rights.
[p] Scheduling, Tracking, and Reporting
Locate the process for scheduling, tracking, and reporting project progress.
[q] Personnel Training
Find information on the training of personnel, specifically any project-unique software training needs.
[r] Software Life Cycle Model
Locate a description of the software life cycle model, including software integration, hardware/software integration, delivery, and maintenance processes.
[s] Configuration Management
Find the section detailing the configuration management process.
[t] Software Document Deliverables
Locate the list of software document deliverables.
[u] Peer Review/Inspection Process
Find the process for peer review or inspection of software work products.
[v] Early Testing Requirements
Find the process for the early identification of testing requirements that drive software design decisions, such as special system-level timing requirements.
[w] Software Metrics
Locate the section that describes the software metrics to be used.
[x] Software Documentation Content
Find the content of the software documentation to be developed for the project.
[y] COTS/GOTS/MOTS/Reused/Open Source Software
Find the management, development, and testing approach for any commercial-off-the-shelf (COTS), government-off-the-shelf (GOTS), modified-off-the-shelf (MOTS), reused, or open source software components included in a NASA system or subsystem.
Rewrite the queries as ['query text 1','query text 2',...]
[
"Find the minimum recommended content for the Software Development - Management Plan.",
"Find the section describing the project organizational structure and the authority and responsibility of each team, including external organizations like Safety and Mission Assurance, IV&V, and Technical Authority.",
"Locate the safety-critical determination and software classification of all systems and subsystems containing software.",
"Find the tailored requirements mapping or compliance matrix that requires approval from the designated Engineering Technical Authority, especially if there are any waivers or deviations to the NPR.",
"Locate information about the engineering environment, including the test environment, library, equipment, facilities, standards, procedures, and tools for development, operation, or maintenance.",
"Find details on the work breakdown structure for life cycle processes and activities, including software products, services, non-deliverable items, budgets, staffing, acquisition approach, physical resources, software size, and associated schedules.",
"Find the section on how the quality characteristics of software products or services will be managed.",
"Find information on the management of safety, security, privacy, and other critical requirements of the software products or services.",
"Locate the process for subcontractor management, including how subcontractors are selected and how they will be involved with the acquirer.",
"Find the identification of selected software verification methods and criteria across the life cycle, such as peer review procedures, re-review criteria, and testing procedures.",
"Identify the specific work products that are selected for verification.",
"Find a description of the software verification environments to be established for the project, such as the software testing environment, system testing environment, and regression testing environment.",
"Locate where actual software verification records and analysis results (e.g., test records, inspection records) and any corrective actions will be documented.",
"Find the identification of selected software validation methods and criteria across the life cycle, such as prototyping, user groups, simulation, and acceptance testing.",
"Identify the specific work products that are selected for validation.",
"Find a description of the software validation environments to be established, such as simulators for the operational environment.",
"Locate where actual software validation records and analysis results (e.g., user group records, prototyping records, acceptance testing records) and any corrective actions will be documented.",
"Find information on how external project stakeholders will be involved.",
"Locate details on how end-users will be involved in the project.",
"Find a description of the risk management process.",
"Locate the project's security policy.",
"Find details on required approvals, certifications, and proprietary, usage, ownership, warranty, or licensing rights.",
"Locate the process for scheduling, tracking, and reporting project progress.",
"Find information on the training of personnel, specifically any project-unique software training needs.",
"Locate a description of the software life cycle model, including software integration, hardware/software integration, delivery, and maintenance processes.",
"Find the section detailing the configuration management process.",
"Locate the list of software document deliverables.",
"Find the process for peer review or inspection of software work products.",
"Find the process for the early identification of testing requirements that drive software design decisions, such as special system-level timing requirements.",
"Locate the section that describes the software metrics to be used.",
"Find the content of the software documentation to be developed for the project.",
"Find the management, development, and testing approach for any commercial-off-the-shelf (COTS), government-off-the-shelf (GOTS), modified-off-the-shelf (MOTS), reused, or open source software components included in a NASA system or subsystem."
]
From <https://gemini.google.com/app/> Google Gemini 2.5 Pro
Comments
Post a Comment