Document	Size	Triple Status	Embeddings Status
No documents available for embedding generation
e.stopPropagation()}> handleItemClick(doc, e)} disabled={isGenerating} />	{doc.name}	{doc.size}	{doc.status === "New" && ( )} {doc.status === "Processing" && ( )} {doc.status === "Processed" && ( )} {doc.status === "Error" && ( )} {doc.status}	{getEmbeddingsStatusIcon(doc)} {getEmbeddingsStatusText(doc)} {doc.embeddings?.error && ( {doc.embeddings.error} )}

Document

Size

Triple Status

Embeddings Status

No documents available for embedding generation

e.stopPropagation()}> handleItemClick(doc, e)} disabled={isGenerating} />

{doc.name}

{doc.size}

{doc.status === "New" && ( )} {doc.status === "Processing" && ( )} {doc.status === "Processed" && ( )} {doc.status === "Error" && ( )} {doc.status}

{getEmbeddingsStatusIcon(doc)} {getEmbeddingsStatusText(doc)} {doc.embeddings?.error && (

{doc.embeddings.error}

)}

Extract structured knowledge triples from documents for knowledge graph construction

Processing Options

{/* Hidden: Use LangChain toggle - LangChain is always used for triple extraction */} {/*

Use LangChain

*/} {/*

Leverages LangChain for knowledge extraction from documents

*/} {false && useLangChain && (

LangChain Method

Default Extractor

Uses the standard LangChain extraction pipeline

LLMGraphTransformer

Uses LangChain's specialized graph structure transformer

Use Sentence Chunking

Split documents into sentences for more accurate triple extraction

Entity Extraction

Automatically detect and extract entities from documents

)}

{/* Chunking Method Selection */}

Chunking Method

setChunkingMethod(e.target.value as 'optimized' | 'pyg')} disabled={isProcessing} className="w-4 h-4 text-primary border-border focus:ring-primary" /> Large chunks

Large chunks with overlap for modern LLMs like Gemma3:27b. Best for efficiency.

setChunkingMethod(e.target.value as 'optimized' | 'pyg')} disabled={isProcessing} className="w-4 h-4 text-primary border-border focus:ring-primary" /> Default (configurable size and overlap)

PyG's txt2kg.py chunking algorithm with configurable chunk size and overlap.

Chunk Size (characters)

setChunkSize(Number(e.target.value))} disabled={isProcessing} className="w-full px-3 py-2 border border-border rounded-md bg-background text-foreground focus:outline-none focus:ring-2 focus:ring-primary focus:border-transparent" />

Larger chunks provide more context but use more GPU memory and may lose detailed information.

Overlap Size (characters)

setOverlapSize(Number(e.target.value))} disabled={isProcessing} className="w-full px-3 py-2 border border-border rounded-md bg-background text-foreground focus:outline-none focus:ring-2 focus:ring-primary focus:border-transparent" />

Overlap between chunks to preserve context across boundaries. Set to 0 for original PyG behavior.

ℹ

Current Configuration

{chunkingMethod === 'pyg' ? ( <>

• Method: PyTorch Geometric (enhanced with overlap)

• Estimated chunks for 64KB document: ~{Math.ceil(64000 / Math.max(1, chunkSize - overlapSize))}

• Chunk size: {chunkSize.toLocaleString()} characters

• Overlap: {overlapSize} characters {overlapSize === 0 ? '(original PyG)' : '(enhanced)'}

• Best for: {overlapSize === 0 ? 'PyG compatibility' : 'Enhanced context preservation'}

) : ( <>

• Method: Optimized for modern LLMs

• Estimated chunks for 64KB document: ~{Math.ceil(64000 / chunkSize)}

• GPU memory per chunk: ~{Math.round(chunkSize / 1000)}MB

• Overlap: {overlapSize} characters

• Processing efficiency: {chunkSize >= 32000 ? 'Optimal' : chunkSize >= 16000 ? 'Good' : 'Basic'}

)}

(doc.status === "New" || doc.status === "Processed" || doc.status === "Error")).length && documents.filter(doc => (doc.status === "New" || doc.status === "Processed" || doc.status === "Error")).length > 0} onChange={handleSelectAll} disabled={documents.filter(doc => (doc.status === "New" || doc.status === "Processed" || doc.status === "Error")).length === 0 || isProcessing} /> {selectedDocs.length > 0 ? ( {selectedDocs.length} selected ) : ( "Select all" )}

{isProcessing && ( )}

{documents.length === 0 ? (

No documents available for processing

) : ( {sortedDocuments.map((doc) => ( (doc.status === "New" || doc.status === "Processed" || doc.status === "Error") && !isProcessing && handleItemClick(doc, e)} > ))}

	handleSort('name')} > Document {sortField === 'name' && ( sortDirection === 'asc' ? : )}	handleSort('size')} > Size {sortField === 'size' && ( sortDirection === 'asc' ? : )}	handleSort('status')} > Status {sortField === 'status' && ( sortDirection === 'asc' ? : )}
e.stopPropagation()}> { e.stopPropagation(); handleItemClick(doc, e); }} disabled={(doc.status !== "New" && doc.status !== "Processed" && doc.status !== "Error") \|\| isProcessing} />	{doc.name}	{doc.status === "New" && ( {doc.status} )} {doc.status === "Processing" && ( {doc.status} )} {doc.status === "Processed" && ( {doc.status} )} {doc.status === "Error" && ( {doc.status} )}	{doc.size} KB

)}

Generate Embeddings

Processing Options

Knowledge Graph Triple Extraction

Processing Options

LangChain Method