feat(integrations): implement query expansion for semantic search #44
No reviewers
Labels
No Label
Compat/Breaking
Kind/Bug
Kind/Documentation
Kind/Enhancement
Kind/Feature
Kind/Security
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Blocks
#33 Searching for the upgrade guide in confluence, the app was not able to retrieve the docuument
sarman/tftsr-devops_investigation
Reference: sarman/tftsr-devops_investigation#44
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "feature/integration-search-expansion"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
This PR implements query expansion across all integration search functions (Confluence, ServiceNow, Azure DevOps, and webview fetch) to search for semantically related content, not just exact user query terms.
Security Audit Results
Following a full security audit, the following issues were addressed:
Critical Fixes
escape_wiql()function to escape single quotes in Azure DevOps WIQL queriesescape_cql()function to escape special characters in Confluence CQL queriesHigh Priority Fixes
Medium Priority Fixes
join_allrecommended for future optimizationLow Priority Improvements
3withMAX_EXPANDED_QUERIESconstantexpanded_queryinstead ofsearch_urlChanges
New module:
query_expansion.rswith:get_product_synonyms()- generates product name variants (VNXT, Vesta NXT, etc.)expand_query()- expands user queries with related termsextract_keywords()- extracts meaningful keywords (fixed infinite loop bug)Updated integrations to use expanded queries:
confluence_search.rsservicenow_search.rsazuredevops_search.rswebview_fetch.rsTesting
All 142 tests pass (including 4 new query expansion tests).
Related
Fixes the issue where searching for 'upgrade VESTA NXT 1.1.9' wasn't finding documents mentioning 'VNXT', 'Vesta NXT', '1.0.12' and other related terms.
🤖 Automated PR Review:\n\nHere's a detailed code review of the PR implementing query expansion for semantic search across integrations:
1. Summary
This PR adds query expansion functionality to improve semantic search coverage across Azure DevOps, Confluence, and (presumably) other integrations. It introduces a new
query_expansionmodule and modifiessearch_wikiandsearch_confluence(andsearch_work_items) to iterate over multiple expanded queries, collect results, deduplicate, and sort.✅ Positives:
query_expansionmodule).⚠️ Concerns:
query_expansion.rsfile is incomplete/cut off in the diff — likely a bug.dedup_by(|a, b| a.url == b.url)may not be robust (e.g., same URL with different fragment/query params).unwrap_or_default()misuse; no rate limiting or backpressure.2. Bugs & Errors
🔴 Critical: Incomplete
query_expansion.rsThe diff shows the file is truncated at line 285 (file ends mid-comment):
➡️ Fix: Ensure
expand_query()is fully implemented and tested. Without it, this PR is non-functional.🟡 Bug:
dedup_byis unsafe forSearchResultdedup_byonly removes adjacent duplicates. Sorting byurlhelps, but:#section) or query params (?page=1), which should be considered duplicates.SearchResulthas other fields (title,content, etc.) — deduplication should preserve richer content (e.g., fullcontentover excerpt-only).➡️ Fix:
🟡 Bug:
unwrap_or_default()onitem["wiki"]["id"]→
unwrap_or_default()returns"", which may causefetch_wiki_pageto fail or return wrong data.➡️ Fix: Use
and_thenor explicitif let:🟡 Bug:
search_work_itemsusestake(3)per expanded query → up to 9 results, but dedup may leave fewer➡️ Suggestion: Limit total results (e.g.,
all_results.truncate(5)) or use a priority queue.3. Security Issues
🔴 Critical: WIQL Injection in
search_work_items➡️ Risk: If
expanded_querycontains'or malicious WIQL fragments (e.g.,'; DROP TABLE ...), it could corrupt the query or leak data.✅ Fix: Sanitize or parameterize. WIQL does not support parameterized queries — so you must escape:
🟡 High: CQL Injection in
search_confluence→
text~"is a Confluence CQL operator for fuzzy search. Ifexpanded_querycontains",), or~, it breaks the query or enables injection.✅ Fix: Escape or use
urlencoding::encode()and sanitize special chars:🟡 Medium: Cookie header reuse
cookiescontain session tokens, andexpand_queryreturns many variants, you’re reusing the same auth header — fine, but:continueis used).➡️ Suggestion: Log when auth fails (e.g., 401/403) more explicitly.
4. Best Practices
🟡 Performance: Sequential API calls
search_wiki, each call does:➡️ Fix: Use
futures::future::join_allfor parallelization:🟡 Logging: Misleading message
→
search_urlis the endpoint, not the expanded query. Should logexpanded_query:🟡 Error Handling: Silent failures
→ This is intentional, but:
➡️ Suggestion: Add a counter or log warning with status code:
🟡 Code Quality: Magic number
3take(3)appears repeatedly. Should be a constant:🟡 Missing: Input validation
queryis empty or too long (e.g.,expand_query("")may return many empty strings).➡️ Suggestion:
🟡 Missing: Unit tests
expand_query()or deduplication logic.✅ Recommended Fixes Summary
query_expansion.rsincompletededup_byunsafejoin_allMAX_EXPANDED_QUERIES)expanded_query, notsearch_url✅ Final Verdict
This PR introduces a valuable feature but has critical gaps:
query_expansion.rsfile is incomplete — do not merge until fully implemented.Recommendation: Request changes for security fixes and completeness before merging.\n\n---\nthis is an automated review from Ollama
🤖 Automated PR Review:\n\n### Code Review: Query Expansion for Semantic Search (Azure DevOps & Confluence)
1. Summary
This PR implements query expansion for semantic search across Azure DevOps (Wiki & Work Items) and Confluence integrations. It leverages a shared
expand_query()function, adds safety escaping for query languages (WIQL, CQL), and improves robustness (e.g., graceful error handling, deduplication). The changes are largely well-structured and follow good patterns.✅ Positives:
.continue()on partial failures).⚠️ Concerns:
expand_query()inazuredevops_search.rs(but used in diff — likely an oversight).escape_wiql()andescape_cql()are incomplete and may be vulnerable to injection.2. Bugs & Errors
🔴 Critical: Missing
expand_queryimport inazuredevops_search.rs✅ Already present in diff — but check if
query_expansion.rsexists and compiles. If not, this will fail at build time.🟠 Incomplete WIQL escaping (
escape_wiql)Problem:
Azure DevOps WIQL uses single quotes for string literals, so escaping
' → ''is correct for quotes, but does not handle:],[,(,),=,<,>,~,%,*,&,|,!,(,)— all have special meaning in WIQL.query = "Title CONTAINS 'x' OR 1=1 --"→ could break or leak data.Fix:
urlencoding::encode()on the entireexpanded_querybefore embedding in WIQL — but noteurlencodingencodes spaces as+, while WIQL expects%20. Usepercent_encodinginstead.🟠 CQL escaping may be insufficient
Problem:
Confluence CQL has many special characters:
+,-,&&,||,!,{,},[,],^,",~,*,?,:,\,/, and whitespace.Example injection:
query = "test) OR project = X"→ breaks out of parentheses.Fix:
cql-escape(if available) or sanitize to alphanumeric + space only.🟡 URL canonicalization may lose meaningful fragments/queries
Problem:
?pageId=123or#headingfor deep linking. Removing them may cause deduplication to merge distinct pages (e.g., same page, different sections).search_wiki/search_work_itemsusesa.url.cmp(&b.url)— but if URLs differ only in fragment/query, they’ll be treated as distinct before canonicalization, butcanonicalize_url()isn’t used in deduplication.Fix:
canonicalize_url()only for deduplication, not for display.🟡 Inconsistent deduplication strategy
search_wikiandsearch_work_itemsdeduplicate byurl, butsearch_confluencedoes not deduplicate (missing in diff).Fix: Add deduplication to
search_confluence:3. Security Issues
🔴 SQL/WIQL/CQL Injection Risk
As discussed above,
escape_wiql()andescape_cql()are incomplete. While they handle quotes, they miss many special characters. An attacker could:OR 1=1in WIQL (ifCONTAINSis misused).Recommendation:
[a-zA-Z0-9\-_ ]+), then fallback to escaping.tracing::info!).🟠 Cookie header reuse without validation
Risk: If
cookiescontain maliciously crafted values (e.g.,;evil=...), they could break the header or leak session data.Fix:
key=valuepairs).reqwest::header::Cookie::new()(butreqwestdoesn’t support this — so manual validation is needed).4. Best Practices
✅ Good: Graceful error handling
continueon non-successful API calls instead of failing the whole search.tracing::warn!for partial failures.⚠️ Missing: Timeout & rate limiting
reqwest::Client(default is none), risking hanging requests.Fix:
⚠️ Missing: Tests
escape_wiql,escape_cql,expand_query, or deduplication logic.Fix: Add tests:
⚠️ Performance: Excessive API calls
MAX_EXPANDED_QUERIES = 3→ up to 3× more API calls.search_wiki, you fetch full page content for every expanded query result → N+1 problem.Fix:
✅ Good: Documentation
/// This function expands...).⚠️ Inconsistent result limit
search_wikiandsearch_work_itemsusetake(MAX_EXPANDED_QUERIES)(3) for inner loop, butsearch_confluenceusestake(3).MAX_EXPANDED_QUERIESconsistently.Fix:
✅ Action Items
escape_wiql/escape_cqlincompletesearch_confluencesort+dedup_by.timeout()toreqwest::ClientMAX_EXPANDED_QUERIESeverywhereFinal Verdict
Ready for merge? ❌ No — not yet.
Fix the critical injection vulnerabilities and deduplication gap before merging. The rest are improvements for future iterations.
Let me know if you'd like a patch for the escaping functions or a test suite!\n\n---\nthis is an automated review from Ollama
🤖 Automated PR Review:\n\n## Code Review: Query Expansion for Semantic Search
1. Summary
This PR implements query expansion for semantic search across Azure DevOps (Wiki & Work Items) and Confluence integrations. It introduces a reusable
expand_query()function, adds safety escaping for query languages (WIQL, CQL), and improves robustness with per-query error handling and deduplication. The changes are well-structured and follow modern Rust patterns.✅ Overall positive: Good engineering, clear intent, and solid safety improvements.
2. Bugs & Critical Errors
🔴 Critical: Missing
MAX_EXPANDED_QUERIESconstant insearch_confluenceloopIn
search_confluence, the loop uses.take(3)while the constantMAX_EXPANDED_QUERIESis defined and used elsewhere (e.g.,azuredevops_search.rs):Risk: Inconsistent behavior — if
MAX_EXPANDED_QUERIESis later changed (e.g., to 5), Confluence will still only return 3 items per query, leading to reduced recall.✅ Fix:
🔴 Security: SQL/WIQL Injection Risk (Mitigated but not eliminated)
The
escape_wiql()function only escapes single quotes:But WIQL uses
CONTAINS '...', and single quotes are not the only dangerous character. While Azure DevOps WIQL is not raw SQL (it’s a DSL), unescaped characters like"or;could still cause malformed queries or unexpected behavior.Example attack (theoretical, but worth hardening):
✅ Recommendation: Use a more robust escaping strategy:
', but also",\,(,),~,*,?,;,=, etc.urlencoding::encodein addition to manual escaping (already done in CQL, but not WIQL).Suggested improvement:
🟡 Minor: Inconsistent deduplication strategy
search_wikiandsearch_work_itemsdeduplicate byurl.search_confluencedoes not deduplicate at all (missing.sort_by+.dedup_by).✅ Fix: Add deduplication to
search_confluence:3. Security Issues
🔴 High: Cookie header reused across retries (session leakage risk)
The
cookie_headeris built once:and reused across all expanded queries. If
expand_query()is maliciously injected (e.g., via poisoned LLM output), and one query fails, the same cookies are sent to all subsequent queries — including potentially malicious endpoints.But wait: The cookies are from
webview_auth, so they’re user session cookies. If thecookiesparameter is compromised (e.g., via XSS), this could leak them across multiple API calls.✅ Mitigation: This is low-risk if
cookiesare properly sanitized and scoped (as intended bywebview_auth). However, consider:cookiesinput is validated."tracing::info!withexpanded_query, which is safe. ✅ Good.🟡 Medium:
canonicalize_urlmay panic on invalid URLs (but is wrapped inunwrap_or_else)✅ This is safe —
unwrap_or_elsehandlesNone. No panic.But:
Url::parsecan fail on malformed URLs (e.g.,"http://"), and falling back to raw URL is acceptable.🟡 Medium: No rate limiting or backpressure
Looping over
MAX_EXPANDED_QUERIES(3) × 3 integrations = up to 9 API calls per user query. While 9 is low, in high-concurrency scenarios, this could hit rate limits (especially Azure DevOps/Confluence).✅ Recommendation: Add a comment about rate limiting:
4. Best Practices
✅ Excellent: Graceful degradation
continueon non-200 responses instead of returningErr(...).tracing::warn!instead oftracing::error!for recoverable failures.✅ Excellent: Code reuse & modularity
expand_queryis cleanly abstracted.escape_wiql/escape_cqlare localized and well-documented.✅ Excellent: Deduplication & sorting
urlbefore deduplication is correct and stable.🟡 Minor: Magic number
3in Confluence loopAs noted in section 2,
take(3)is hardcoded. Fix to useMAX_EXPANDED_QUERIES.🟡 Minor: Inconsistent logging
search_wiki: logsexpanded_querysearch_work_items: logsexpanded_querysearch_confluence: logssearch_url(which includes encoded query) — less readable.✅ Suggestion:
🟡 Minor: Performance: Redundant client creation
reqwest::Client::new()is called inside each function. For high-throughput use, reuse a single client (e.g., viaArc<Client>). But for a Tauri app (low concurrency), this is acceptable.✅ Recommendation: Add a comment if this is intentional:
5. Suggested Fixes Summary
take(3)hardcoded insearch_confluenceMAX_EXPANDED_QUERIESsearch_confluence.sort_by().dedup_by()escape_wiqlincomplete",\,(,)expanded_queryinstead of full URLClient::new()Final Verdict
Approve with minor fixes. The PR significantly improves search quality and robustness. The critical issues (hardcoded
3, missing dedup) must be fixed before merging. The WIQL escaping should be hardened for production.Would you like me to draft the exact patch for the fixes?\n\n---\nthis is an automated review from Ollama
🤖 Automated PR Review:\n\nHere's a detailed review of the PR changes for query expansion for semantic search in Azure DevOps and Confluence integrations:
✅ 1. Summary
This PR implements query expansion (using a new
expand_queryfunction) to improve semantic search coverage across Confluence and Azure DevOps. Key changes:urlcrate dependency (required for URL canonicalization).escape_wiql()andescape_cql()helper functions for safe query escaping.search_wiki()andsearch_work_items()in Azure DevOps, andsearch_confluence()to:canonicalize_url()for robust deduplication.Overall, the implementation is well-structured, modular, and improves search recall. However, there are critical security issues and a few best-practice concerns.
⚠️ 2. Critical Bugs & Errors
🔴 A. Incomplete
escape_cql()— Missing&and|charactersIn
confluence_search.rs, theescape_cql()function does not escape&and|, which are reserved operators in Confluence CQL.Impact:
"security & admin"could be misinterpreted as a logical AND operation.Fix:
🔴 B.
MAX_EXPANDED_QUERIESused inconsistently for result limitingIn both
search_wiki()andsearch_work_items(), the code usesMAX_EXPANDED_QUERIES(set to 3) to limit inner loop iteration over results per expanded query, but the original behavior limited to top 3 results per query — now it's 3 per expanded query, which could return up to3 × 3 = 9results per integration.Problem:
Suggestion:
take(3)after collecting all results).Fix (if per-query limit is intended):
🔴 C. Missing error propagation for
expand_query()The code assumes
expand_query(query)returns a non-empty list, but no handling for empty/failed expansion.Impact:
Fix:
if expanded_queries.is_empty() { return Ok(Vec::new()); }expand_querynever returns empty (e.g., always return at least the original query).🛡️ 3. Security Issues
🔴 A. XSS Risk in Excerpt Display (Low-Medium)
In
search_wiki()andsearch_confluence(), excerpts are extracted as raw strings and may contain HTML:Problem:
<span class="highlight">...</span>), and this is rendered in a frontend without sanitization, XSS is possible.Mitigation:
html5everorhtml_escapecrate).Fix (minimal):
🔴 B. Cookie Header Reused Without Validation
The code passes
cookie_headerdirectly to all requests:Risk:
cookies_to_header()includes CSRF tokens, session cookies, or other sensitive headers, and theexpand_queryfunction is influenced by user input (e.g., via LLM), an attacker could potentially trigger requests with elevated privileges.Mitigation:
cookies_to_header()only includes necessary, sanitized cookies (e.g., auth tokens only).reqwest::Clientwith persistent auth, rather than raw cookie headers.📜 4. Best Practices & Code Quality
✅ Good Practices
continueinstead ofreturn Err(...)in loops → better resilience.tracing::info!andtracing::warn!used appropriately.dedup_by(|a, b| a.url == b.url)+sort_by(|a, b| a.url.cmp(&b.url))is correct.⚠️ Improvement Opportunities
1.
canonicalize_url()may not handle all edge casesIssue:
?version=2), and stripping them may cause false deduplication (e.g., two versions of same page become same URL).Fix:
utm_*,ref,src), or use a smarter canonicalizer (e.g., based on Confluence API docs).page_idif available (e.g.,item["content"]["id"]).2. Hardcoded
MAX_EXPANDED_QUERIES = 33. Missing
#[allow(clippy::redundant_closure)]orclippy::redundant_cloneIn
canonicalize_url():Minor, but worth a lint fix.
4. No tests added
escape_wiql(),escape_cql(),canonicalize_url(), orexpand_query()integration."foo(bar)","a&b","x=y").✅ Final Recommendations
&, `MAX_EXPANDED_QUERIES✅ Add fallback for empty
expand_query()✅ Consider smarter URL deduplication
✅ Make
MAX_EXPANDED_QUERIESconfigurable✅ Approve with Fixes
This PR is functionally solid and well-implemented, but must not be merged until the critical security issues (CQL escaping, HTML sanitization) are addressed.
Once those are fixed, this is a strong addition to the search system.
Let me know if you'd like a patch file with the suggested fixes!\n\n---\nthis is an automated review from Ollama
🤖 Automated PR Review:\n\n### Code Review: Query Expansion for Semantic Search in Azure DevOps & Confluence Integrations
1) Summary
This PR implements query expansion for semantic search across Azure DevOps (Wiki & Work Items) and Confluence integrations. It introduces a new dependency (
url), adds utility functions for WIQL/CQL escaping and HTML stripping, and refactors search logic to iterate over expanded queries, deduplicate results, and improve robustness (e.g.,continueon partial failures instead ofreturn Err).The implementation is generally well-structured, with good attention to security (XSS, injection) and resilience. However, several critical and high-severity issues exist—particularly around incomplete escaping, unsafe string interpolation in WIQL, and missing input validation—that must be addressed before merging.
2) Bugs & Errors
🔴 Critical: Incomplete WIQL Escaping in
escape_wiqlescape_wiqlfunction escapes',",(,), etc., but does not escapeCONTAINS-specific special characters like*,~,?,;,=in the context of WIQL string literals.escape_wiqlescapes'→'', it does not handle the fact thatCONTAINSin WIQL is case-insensitive and supports wildcards (*,?). If the expanded query contains unescaped*or?, they will be interpreted as wildcards (which may be intentional), but~and;can cause parse errors or unexpected behavior.expanded_query = "foo~bar"→CONTAINS 'foo~bar'→ WIQL parser may fail or misinterpret~as proximity operator.✅ Fix:
expand_queryreturns raw terms or pre-sanitized terms.expand_querymay produce terms with*,?,~,;, etc., then escape them as WIQL string literals and disable wildcard interpretation if unintended.urlencoding::encodein addition toescape_wiql, and consider usingCONTAINS(text, 'literal')(but note:CONTAINSin WIQL istext ~ 'query', notCONTAINS(...)).%NNencoding (like URL encoding), but that changes semantics.✅ Fix: Replace
CONTAINSwith~:And ensure
safe_queryis properly escaped for~queries (see above).🔴 High: HTML stripping is naive and incomplete
<img src=x onerror=alert(1)>→onerror=alert(1)remains).&,<,>).<script>,<style>, or event handlers.item["content"]contains malicious HTML (e.g.,<img src=x onerror=eval(atob(...))>), the stripped text may still contain JS or XSS vectors.✅ Fix:
html5everorscraper(already used elsewhere in the project? check).urlencoding::decodeorhtml_escapecrate), and strip<script>,<style>, and event attributes.🔴 High: Missing input validation on
expanded_queriesexpand_query(query)returnsVec<String>, but no validation is done on the length or content of expanded queries.expand_queryreturns very long strings or malicious payloads (e.g., SQL-injection-like strings), they’re passed directly to API calls.✅ Fix:
expanded_query.len()(e.g., ≤ 256 chars).3) Security Issues
🟠 Medium: Cookie header used directly (XSS/CSRF risk)
cookiescontain malicious data (e.g.,;evil=...), it could corrupt the header or leak auth tokens.cookies_to_headerproperly escapes/validates cookie values.✅ Fix: Validate cookie values (e.g., reject
;,\r,\n, or useCookie::new()fromhttpcrate).🟠 Medium: No rate limiting or backoff for multiple API calls
MAX_EXPANDED_QUERIES * 2(Wiki + Work Items) API calls per query.✅ Fix:
backoffcrate).futures::future::join_all+try_join_all, but cap concurrency (e.g.,semaphore).4) Best Practices
✅ Good: Deduplication & sorting
dedup_byafter sorting.✅ Good: Graceful error handling
continueinstead ofreturn Err(...)when a single expanded query fails → improves robustness.✅ Good: Unit tests for escaping
escape_wiqlandstrip_html_tagstests.🟡 Improvement: Magic number
MAX_EXPANDED_QUERIES = 3🟡 Improvement: Missing doc comments
escape_wiql,strip_html_tags,escape_cql,canonicalize_urllack///docs.✅ Fix: Add doc comments explaining purpose, edge cases, and security implications.
🟡 Improvement: Inconsistent quoting
search_confluence, the CQL query usestext~\"{}\"(with double quotes), butescape_cqlescapes"→\". This is correct if the CQL parser expects escaped double quotes inside double-quoted strings.text ~ 'query', while Confluence usestext ~ "query"(double quotes for CQL). Ensure consistency.✅ Fix: Use single quotes for CQL string literals if possible (Confluence supports both), or ensure
escape_cqlhandles double quotes correctly inside double-quoted strings.🔒 Security Checklist
CONTAINSinstead of~)~,*,?,;)✅ Recommended Fixes Before Merge
CONTAINS→~.escape_wiql: Ensure all special characters in~queries are escaped (especially~,*,?,;,=,(,)).scraper::Html::parse_document(...)+text_content()).expanded_query.len() <= 256).MAX_EXPANDED_QUERIESrationale and consider making it configurable.📌 Final Verdict
Do not merge in current state due to critical WIQL syntax error and incomplete escaping. Once fixed, this is a solid, well-tested enhancement that improves search relevance and resilience.
Let me know if you'd like a patch with suggested fixes!\n\n---\nthis is an automated review from Ollama
🤖 Automated PR Review:\n\n### Code Review: Query Expansion for Semantic Search in Azure DevOps & Confluence Integrations
1. Summary
This PR introduces query expansion to enhance semantic search across Azure DevOps (Wiki & Work Items) and Confluence. It adds a dependency on the
urlcrate, implements query expansion logic, and applies it to both integrations. Key improvements include:escape_wiql,escape_cql,strip_html_tags)The implementation is generally well-structured, but several critical and high-severity issues exist—particularly around security (XSS, injection), correctness (buggy escaping, incomplete sanitization), and maintainability.
2. Bugs & Errors
2.1.
escape_wiqlis incomplete and incorrect for WIQLIssue: The WIQL escaping function does not escape the
~operator (used for fuzzy search in WIQL), and does not handle theCONTAINS/~context correctly.~is used for fuzzy matching (e.g.,Title ~ 'query'), and must be escaped when part of the literal string, but the current function escapes~→\~, which breaks the query logic.~in the WIQL query (... ~ '{safe_query}'), but~in WIQL is not a wildcard—it’s for fuzzy matching with a similarity threshold. Ifsafe_querycontains~, the query becomes malformed or behaves unexpectedly.escape_wiql("foo~bar")→"foo\~bar", but WIQL interprets\~as invalid syntax.Fix:
~for literal matching in WIQL. UseCONTAINSwith escaped quotes/backslashes only.',",\,(,),;,=— but not~.CONTAINSis used with single quotes in WIQL, and escape'as''.Corrected
escape_wiql(forCONTAINS):2.2.
escape_cqlis missing critical charactersIssue: Confluence CQL requires escaping of many special characters, but
escape_cqlomits:*(wildcard),?(single-char wildcard),!(negation),^(boost),{,},[,],:(field operators),AND,OR,NOT(keywords if unquoted)."foo*bar"becomestext~"foo*bar"→*is a wildcard, potentially matching unintended items.Fix: Use
urlencodingon the entire CQL string, or properly quote and escape the query inside double quotes:2.3.
strip_html_tagsis naive and unsafeIssue:
<img src=x onerror=alert(1)>→ still returnsonerror=alert(1)if>is missing).<script>becomes<script>, not<script>).<a href="javascript:alert(1)">→href="javascript:alert(1)"remains in output).Fix:
html5everorhtmlparser.html_escape::decode_html_entities(&html)+ a simple tag stripper:text_contentfromhtml5everor a whitelist-based sanitizer.2.4. Deduplication uses
.sort_by(|a, b| a.url.cmp(&b.url))+.dedup_by(|a, b| a.url == b.url)Issue:
a.url.cmp(&b.url)is lexicographic, but URLs may differ only in fragment/query (e.g.,foo.com#section1vsfoo.com#section2). This causes false deduplication (same page, different sections treated as duplicates).canonicalize_url(inconfluence_search.rs) strips fragments/queries, butazuredevops_search.rsdoes not use it for deduplication.Fix:
canonicalize_url(or a URL normalization function) before deduplication:(title, url)or(source, id)if available.3. Security Issues
3.1. XSS in Azure DevOps excerpts
Issue:
strip_html_tagsis used after extractingitem["content"], but:item["content"]contains<script>tags, the current function may still output dangerous content (e.g.,onerror=...if tags are malformed).title,path, orurlfields (though less critical).Fix:
2.3).3.2. URL injection risk in
fetch_wiki_page/fetch_work_item_detailsIssue: The
org_urlandprojectvalues are used directly in URL construction. If either is untrusted (e.g., from user input), this could lead to SSRF or redirect attacks.org_urlis validated (e.g., only allow known domains, blockfile://,ftp://, etc.).Fix:
org_urlagainst a whitelist of allowed domains.Url::parse+Url::host_str()to verify scheme/domain.3.3.
urlcrate dependency added but not used inazuredevops_search.rsIssue:
url = "2.5.8"is added toCargo.tomlandCargo.lock, but only used inconfluence_search.rs. Inazuredevops_search.rs,Url::parseis not used.Fix:
urlusage toazuredevops_search.rsfor URL validation (e.g.,Url::parse(&org_url).map_err(...)?).confluence_search.rsis the only consumer.4. Best Practices
4.1. Error handling is inconsistent
Issue:
search_wikireturns an error on HTTP failure, butsearch_work_itemsreturnsOk(Vec::new())orcontinues on failure.Fix:
continueon non-critical failures (e.g., 404, 503), but log warnings.Ok(Vec::new())only if all queries fail (not just one).4.2. Hardcoded
MAX_EXPANDED_QUERIES = 3Issue: Magic number used in two places. No configuration or documentation.
Fix:
config.rsorconstants.rs:4.3. Tests for
escape_wiqlare incompleteIssue: Tests cover basic escaping but miss:
’(curly quote) vs')."'; DROP TABLE WorkItems--").Fix:
4.4. Missing documentation
Issue:
expand_queryis used but not defined in this PR. Its behavior (e.g., how many expansions, latency, fallbacks) is unknown.Fix:
expand_query(e.g., inquery_expansion.rs):Recommendations Before Merge
escape_wiqlto not escape~, and switch toCONTAINSinstead of~.strip_html_tagswith a robust HTML sanitizer.org_url.MAX_EXPANDED_QUERIESto a config constant.Final Verdict
Do not merge until critical security and correctness issues are fixed. The query expansion feature is valuable, but the current implementation risks:
escape_wiqlis misused),Once fixed, this PR will significantly improve search quality and security.\n\n---\nthis is an automated review from Ollama