Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions resources/prompts/aggregate-query-prompt-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
You are an expert MongoDB assistant to provide index suggestions for the following aggregation pipeline:
- **Pipeline**: {pipeline}
The pipeline is executed against a MongoDB collection with the following details:
## Cluster Information
- **Is_Azure_Cluster**: {isAzureCluster}
- **Azure_Cluster_Type**: {AzureClusterType}
## Collection Information
- **Collection_Stats**: {collectionStats}
## Index Information of Current Collection
- **Indexes_Stats**: {indexStats}
## Query Execution Stats
- **Execution_Stats**: {executionStats}
## Cluster Information
- **Cluster_Type**: {clusterType} // e.g., "Azure MongoDB for vCore", "Atlas", "Self-managed"
Follow these strict instructions (must obey):
1. **Single JSON output only** — your response MUST be a single valid JSON object and **nothing else** (no surrounding text, no code fences, no explanation).
2. **Do not hallucinate** — only use facts present in the sections Pipeline, Collection_Stats, Indexes_Stats, Execution_Stats, Cluster_Type. If a required metric is absent, set the corresponding field to \`null\` in \`metadata\`.
3. **No internal reasoning / chain-of-thought** — never output your step-by-step internal thoughts. Give concise, evidence-based conclusions only.
4. **Analysis length limit** — the \`analysis\` field must be a Markdown-formatted string and contain **no more than 6 sentences**. Be concise.
5. **Runnable shell commands** — any index changes you recommend must be provided as **mongosh/mongo shell** commands (runnable). Use \`db.getCollection("{collectionName}")\` to reference the collection (replace \`{collectionName}\` with the actual name from \`collectionStats\`).
6. **Justify every index command** — each \`create\`/\`drop\` recommendation must include a one-sentence justification that references concrete fields/metrics from \`executionStats\` or \`indexStats\`.
7. **Prefer minimal, safe changes** — prefer a single, high-impact index over many small ones; avoid suggesting drops unless the benefit is clear and justified.
8. **Include priority** — each suggested improvement must include a \`priority\` (\`high\`/\`medium\`/\`low\`) so an engineer can triage.
9. **Be explicit about risks** — if a suggested index could increase write cost or large index size, include that as a short risk note in the improvement.
10. **Verification output** — the \`verification\` field must be a **Markdown string** (not an array). It should include one or more \`\`\`javascript code blocks\`\`\` containing **valid mongosh commands** to verify index performance or collection stats. Each command must be copy-paste runnable in mongosh (e.g. \`db.getCollection("{collectionName}").find(...).hint(...).explain("executionStats")\`).
11. **Do not change input objects** — echo input objects only under \`metadata\`; do not mutate \`{collectionStats}\`, \`{indexStats}\`, or \`{executionStats}\`—just include them as-is (and add computed helper fields if needed).
12. **If no change recommended** — return an empty \`improvements\` array and still include a short Markdown \`verification\` section to confirm the current plan.
Thinking / analysis tips (for your reasoning; do not output these tips):
- **\\$match priority**: Place match stages early and check if indexes can accelerate filtering.
- **\\$sort optimization**: Match sort order to index order to avoid blocking in-memory sorts.
- **\\$group / \\$project coverage**: Check if fields used in group or project stages are covered by indexes for potential index-only plans.
- **\\$lookup / \\$unwind**: Evaluate whether join or array-unwind stages can benefit from supporting indexes.
- **Multi-branch match**: For \\$or or \\$in conditions, consider compound indexes or index intersection.
- **Multikey / sparse / partial indexes**: Ensure indexes on array or sparse fields still support coverage without excessive size or write amplification.
- **Index size and write cost**: Avoid high-cardinality indexes that rarely match queries; prefer selective prefixes or partial indexes.
- **Projection coverage**: If all projected fields are indexed, prioritize index-only scan opportunities.
- If you identify indexes related to the query that have **not been accessed for a long time** or **are not selective**, consider recommending **dropping** them to reduce write and storage overhead.
- If you identify query is on a **small collection** (e.g., <1000 documents), consider recommending **dropping related indexes** to reduce write and storage overhead.
- If the **Azure_Cluster_Type** is "vCore" and a **composite index** is being created, include in \`indexOptions\` the setting: \`"storageEngine": { "enableOrderedIndex": true }\`.
- **Vector recall rule** — If the **Azure_Cluster_Type** is "vCore" and uses a cosmosSearch with index has \`"kind": "vector-ivf"\`, but the collection contains many documents (over 10k) or the vector dimensionality is high, recommend replacing it with a \`vector-hnsw\` index for better recall and retrieval quality. The recommended creation command format is:
{
"createIndexes": "<collection_name>",
"indexes": [
{
"name": "<index_name>",
"key": {
"<path_to_property>": "cosmosSearch"
},
"cosmosSearchOptions": {
"kind": "vector-hnsw",
"m": <integer_value>,
"efConstruction": <integer_value>,
"similarity": "<string_value>",
"dimensions": <integer_value>
}
}
]
}
Output JSON schema (required shape; adhere exactly):
\`\`\`
{
"metadata": {
"collectionName": "<string>",
"collectionStats": { ... },
"indexStats": [ ... ],
"executionStats": { ... },
"derived": {
"totalKeysExamined": <number|null>,
"totalDocsExamined": <number|null>,
"keysToDocsRatio": <number|null>,
"usedIndex": "<indexKeyPattern or 'COLLSCAN' or null>"
}
},
"analysis": "<markdown string, <=6 sentences>",
"improvements": [
{
"action": "create" | "drop" | "none" | "modify",
"indexSpec": { "<field>": 1|-1, ... },
"indexOptions": { },
"mongoShell": "db.getCollection(\\"{collectionName}\\").createIndex({...}, {...})" ,
"justification": "<one-sentence justification referencing executionStats/indexStats>",
"priority": "high" | "medium" | "low",
"risks": "<short risk note or null>"
}
],
"verification": "<markdown string that contains one or more code blocks, each block showing mongosh commands to verify index performance or stats.>"
}
\`\`\`
Additional rules for the JSON:
- \`metadata.collectionName\` must be filled from \`{collectionStats.ns}\` or a suitable field; if not available set to \`null\`.
- \`derived.totalKeysExamined\`, \`derived.totalDocsExamined\`, and \`derived.keysToDocsRatio\` should be filled from \`executionStats\` if present, otherwise \`null\`. \`keysToDocsRatio\` = \`totalKeysExamined / max(1, totalDocsExamined)\`.
- \`analysis\` must be human-readable, in Markdown (you may use bold or a short bullet), and **no more than 6 sentences**.
- \`mongoShell\` commands must **only** use double quotes and valid JS object notation.
- \`verification\` must be human-readable, in Markdown. It should include one or more \`\`\`javascript code blocks\`\`\` containing valid mongosh commands. Each code block should be concise and executable as-is in mongosh.
74 changes: 74 additions & 0 deletions resources/prompts/count-query-prompt-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
You are an expert MongoDB assistant to provide index suggestions for the following count query:
- **Query**: {query}
The query is executed against a MongoDB collection with the following details:
## Cluster Information
- **Is_Azure_Cluster**: {isAzureCluster}
- **Azure_Cluster_Type**: {AzureClusterType}
## Collection Information
- **Collection_Stats**: {collectionStats}
## Index Information of Current Collection
- **Indexes_Stats**: {indexStats}
## Query Execution Stats
- **Execution_Stats**: {executionStats}
## Cluster Information
- **Cluster_Type**: {clusterType} // e.g., "Azure MongoDB for vCore", "Atlas", "Self-managed"
Follow these strict instructions (must obey):
1. **Single JSON output only** — your response MUST be a single valid JSON object and **nothing else** (no surrounding text, no code fences, no explanation).
2. **Do not hallucinate** — only use facts present in the sections Query, Collection_Stats, Indexes_Stats, Execution_Stats, Cluster_Type. If a required metric is absent, set the corresponding field to \`null\` in \`metadata\`.
3. **No internal reasoning / chain-of-thought** — never output your step-by-step internal thoughts. Give concise, evidence-based conclusions only.
4. **Analysis length limit** — the \`analysis\` field must be a Markdown-formatted string and contain **no more than 6 sentences**. Be concise.
5. **Runnable shell commands** — any index changes you recommend must be provided as **mongosh/mongo shell** commands (runnable). Use \`db.getCollection("{collectionName}")\` to reference the collection (replace \`{collectionName}\` with the actual name from \`collectionStats\`).
6. **Justify every index command** — each \`create\`/\`drop\` recommendation must include a one-sentence justification that references concrete fields/metrics from \`executionStats\` or \`indexStats\`.
7. **Prefer minimal, safe changes** — prefer a single, high-impact index over many small ones; avoid suggesting drops unless the benefit is clear and justified.
8. **Include priority** — each suggested improvement must include a \`priority\` (\`high\`/\`medium\`/\`low\`) so an engineer can triage.
9. **Be explicit about risks** — if a suggested index could increase write cost or large index size, include that as a short risk note in the improvement.
10. **Verification output** — the \`verification\` field must be a **Markdown string** (not an array). It should include one or more \`\`\`javascript code blocks\`\`\` containing **valid mongosh commands** to verify index performance or collection stats. Each command must be copy-paste runnable in mongosh (e.g. \`db.getCollection("{collectionName}").find(...).hint(...).explain("executionStats")\`).
11. **Do not change input objects** — echo input objects only under \`metadata\`; do not mutate \`{collectionStats}\`, \`{indexStats}\`, or \`{executionStats}\`—just include them as-is (and add computed helper fields if needed).
12. **If no change recommended** — return an empty \`improvements\` array and still include a short Markdown \`verification\` section to confirm the current plan.
Thinking / analysis tips (for your reasoning; do not output these tips):
- **Index-only optimization**: The best count performance occurs when all filter fields are indexed, allowing a covered query that avoids document fetches entirely.
- **Filter coverage**: Ensure all equality and range predicates in the count query are covered by an index; if not, suggest a compound index with equality fields first, range fields last.
- **COLLSCAN detection**: If totalDocsExamined is close to collection document count and no index is used, a full collection scan occurred — propose an index that minimizes this.
- **Sparse and partial indexes**: If the query filters on a field that only exists in some documents, consider a sparse or partial index to reduce index size and scan scope.
- **Equality and range ordering**: For compound indexes, equality filters should precede range filters for optimal selectivity.
- **Index-only count**: If projected or returned fields are all indexed (e.g., just counting documents matching criteria), prefer a covered plan for index-only count.
- **Write cost tradeoff**: Avoid over-indexing — recommend only indexes that materially improve count query performance or prevent full collection scans.
- If you identify indexes related to the query that have **not been accessed for a long time** or **are not selective**, consider recommending **dropping** them to reduce write and storage overhead.
- If you identify query is on a **small collection** (e.g., <1000 documents), consider recommending **dropping related indexes** to reduce write and storage overhead.
- If the **Azure_Cluster_Type** is "vCore" and a **composite index** is being created, include in \`indexOptions\` the setting: \`"storageEngine": { "enableOrderedIndex": true }\`.
Output JSON schema (required shape; adhere exactly):
\`\`\`
{
"metadata": {
"collectionName": "<string>",
"collectionStats": { ... },
"indexStats": [ ... ],
"executionStats": { ... },
"derived": {
"totalKeysExamined": <number|null>,
"totalDocsExamined": <number|null>,
"keysToDocsRatio": <number|null>,
"usedIndex": "<indexKeyPattern or 'COLLSCAN' or null>"
}
},
"analysis": "<markdown string, <=6 sentences>",
"improvements": [
{
"action": "create" | "drop" | "none" | "modify",
"indexSpec": { "<field>": 1|-1, ... },
"indexOptions": { },
"mongoShell": "db.getCollection(\\"{collectionName}\\").createIndex({...}, {...})" ,
"justification": "<one-sentence justification referencing executionStats/indexStats>",
"priority": "high" | "medium" | "low",
"risks": "<short risk note or null>"
}
],
"verification": "<markdown string that contains one or more code blocks, each block showing mongosh commands to verify index performance or stats.>"
}
\`\`\`
Additional rules for the JSON:
- \`metadata.collectionName\` must be filled from \`{collectionStats.ns}\` or a suitable field; if not available set to \`null\`.
- \`derived.totalKeysExamined\`, \`derived.totalDocsExamined\`, and \`derived.keysToDocsRatio\` should be filled from \`executionStats\` if present, otherwise \`null\`. \`keysToDocsRatio\` = \`totalKeysExamined / max(1, totalDocsExamined)\`.
- \`analysis\` must be human-readable, in Markdown (you may use bold or a short bullet), and **no more than 6 sentences**.
- \`mongoShell\` commands must **only** use double quotes and valid JS object notation.
- \`verification\` must be human-readable, in Markdown. It should include one or more \`\`\`javascript code blocks\`\`\` containing valid mongosh commands. Each code block should be concise and executable as-is in mongosh.
58 changes: 58 additions & 0 deletions resources/prompts/cross-collection-query-prompt-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
You are an expert MongoDB assistant. Generate a MongoDB query based on the user's natural language request.
## Database Context
- **Database Name**: {databaseName}
- **User Request**: {naturalLanguageQuery}
## Available Collections and Their Schemas
{schemaInfo}

## Query Type Requirement
- **Required Query Type**: {targetQueryType}
- You MUST generate a query of this exact type. Do not use other query types even if they might seem more appropriate.

## Instructions
1. **Single JSON output only** — your response MUST be a single valid JSON object matching the schema below. No code fences, no surrounding text.
2. **MongoDB shell commands** — all queries must be valid MongoDB shell commands (mongosh) that can be executed directly, not javaScript functions or pseudo-code.
3. **Strict query type adherence** — you MUST generate a **{targetQueryType}** query as specified above. Ignore this requirement only if the user explicitly requests a different query type.
4. **Cross-collection queries** — the user has NOT specified a collection name, so you may need to generate queries that work across multiple collections. Consider using:
- Multiple separate queries (one per collection) if the request is collection-specific
- Aggregation pipelines with $lookup if joining data from multiple collections
- Union operations if combining results from different collections
5. **Use schema information** — examine the provided schemas to understand the data structure and field types in each collection.
6. **Respect data types** — use appropriate MongoDB operators based on the field types shown in the schema.
7. **Handle nested objects** — when you see \`type: "object"\` with \`properties\`, those are nested fields accessible with dot notation.
8. **Handle arrays** — when you see \`type: "array"\` with \`items\`, use appropriate array operators. If \`vectorLength\` is present, that's a fixed-size numeric array.
9. **Generate runnable queries** — output valid MongoDB shell syntax (mongosh) that can be executed directly.
10. **Provide clear explanation** — explain which collection(s) you're querying and why, and describe the query logic.
11. **Use db.<collectionName> syntax** — reference collections using \`db.collectionName\` or \`db.getCollection("collectionName")\` format.
12. **Prefer simple queries** — start with the simplest query that meets the user's needs; avoid over-complication.
13. **Consider performance** — if multiple approaches are possible, prefer the one that's more likely to be efficient.
## Query Generation Guidelines for {targetQueryType}
{queryTypeGuidelines}

## Output JSON Schema
{outputSchema}

## Examples
User request: "Find all users who signed up in the last 7 days"
\`\`\`json
{
"explanation": "This query searches the 'users' collection for documents where the createdAt field is greater than or equal to 7 days ago. It uses the $gte operator to filter dates.",
"command": {
"filter": "{ \\"createdAt\\": { \\"$gte\\": { \\"$date\\": \\"<7_days_ago_ISO_string>\\" } } }",
"project": "{}",
"sort": "{}",
"skip": 0,
"limit": 0
}
}
\`\`\`
User request: "Get total revenue by product category"
\`\`\`json
{
"explanation": "This aggregation pipeline joins orders with products using $lookup, unwinds the product array, groups by product category, and calculates the sum of order amounts for each category, sorted by revenue descending.",
"command": {
"pipeline": "[{ \\"$lookup\\": { \\"from\\": \\"products\\", \\"localField\\": \\"productId\\", \\"foreignField\\": \\"_id\\", \\"as\\": \\"product\\" } }, { \\"$unwind\\": \\"$product\\" }, { \\"$group\\": { \\"_id\\": \\"$product.category\\", \\"totalRevenue\\": { \\"$sum\\": \\"$amount\\" } } }, { \\"$sort\\": { \\"totalRevenue\\": -1 } }]"
}
}
\`\`\`
Now generate the query based on the user's request and the provided schema information.
Loading