DHI.Services.MCLite for Documents — Internal Developer Guide¶

This page explains how to use the Documents domain with the MCLite provider. It’s written for developers wiring up repositories or extending behavior. For the MCLite engine overall, see MCLite Providers.

What this provider does¶

DHI.Services.Provider.MCLite.DocumentRepository persists documents and their metadata in an MCLite workspace. It implements grouped, path-based IDs (e.g., /Reports/2025/Q1/summary.pdf) and stores the document payload in chunked BLOB rows. It also supports folder/group operations, keyword filtering (including optional XML metadata search), and lightweight thumbnails.

Backed DBs:

PostgreSQL (default), SQLite, and SQL Server — selected via dbflavour in the connection string.

ID & grouping model¶

FullName: A document’s ID is its full, absolute path, e.g. /Models/HEC/notes.docx.
- Name: last segment (notes.docx)
- Group: everything before (/Models/HEC)
Root documents have no group (ID looks like /file.ext).
Folders (aka groups) live in document_folder (aka Names.TableDocumentGroup).
Membership between documents and folders is in document_folder_association.
You can pass tree options in some APIs (e.g., ;nonrecursive for direct children only).

Tip

Internally, the provider frequently resolves names/paths to GUIDs (document IDs, group IDs), but you always address entities by full paths.

Storage model (tables & fields)¶

Documents (Names.TableDocument) – one row per document Important fields the provider uses:
- id (GUID), title (document name), author, summary, description, language, keywords, format (free text), modified_time, version (GUID), thumbnail_id (GUID → Blob), is_public (bool), exchangeable (bool), data_id (GUID → Blob)
Blob (Names.TableBlob) – chunked storage of bytes
- id (GUID), block_no (int), data (bytea/varbinary/blob)
Folders (Names.TableDocumentGroup) – nested folders with parent_id
Folder–Document association (Names.TableDocumentFolderAssociation)
Metadata (Names.TableMetadata) – optional XML payload per entity
EntityDescription / EntityType – generic entity catalog rows for “Document”
Workspace schema: resolved from master.workspace by workspace name on connect

Repository class & key behaviors¶

DocumentRepository : BaseGroupedDocumentRepository<string>, IDocumentRepository<string>

Add¶

public override void Add(Stream stream, string id, Parameters parameters, ClaimsPrincipal user = null)

Upsert behavior: if a doc with id exists, it’s removed first.
Splits id into Group + Name; inserts a new document row with:
- data_id = new Blob GUID (payload), thumbnail_id = new GUID (optional, see below)
- Metadata taken from parameters (all optional):
  - Author, Summary, Description, Language, Keywords, Format
- is_public is set true by default.
Folder association is created if Group is non-empty.
Blob is inserted in 8 KB blocks into Names.TableBlob.
Registers an EntityDescription (“Document”).

Thumbnails

thumbnail_id is allocated, and if a blob exists for it, it will be surfaced. Add does not generate a thumbnail; populate Names.TableBlob for thumbnail_id in your own pipeline if you want one.

Get¶

public override (Stream stream, string fileType, string fileName) Get(string id, ClaimsPrincipal user = null)

Looks up the data_id and streams the concatenated Blob.
fileType is the file extension without the dot (e.g., pdf).
fileName is the last path segment (e.g., summary.pdf).

Remove¶

Removes the Blob (all blocks), folder association, document row, and entity description.

Contains / Count / GetIds¶

Contains(id) checks existence by full path.
Count() returns number of rows in document.
GetIds() returns full names for all documents.

Metadata APIs¶

GetMetadata(id) returns a flat dictionary:
- Title, Author, Summary, Language, Format, IsPublic, Description, Thumbnail (base64 PNG if present), Id (full path)
GetAllMetadata() calls GetMetadataByFilter(string.Empty).
GetMetadataByFilter(filter, parameters):
- Splits filter on spaces → all keywords must match (AND semantics).
- Searches case-insensitively across: Title/Name, Summary, Description, Author, Language.
- Optional parameters:
  - defaultfolder (path): limits results to that folder and all descendants.
  - includexmlmetadata (true|false): if true and a filter is provided, we add a second pass against XML metadata in Names.TableMetadata. A hit requires every keyword to appear somewhere in the text nodes of the XML. Duplicates from the first pass are skipped.

Note

The filter builder uses SQL LIKE on LOWER(column). For XML, we stream through text nodes (XmlReader) and confirm all keywords are present.

Listing by group¶

ContainsGroup(group) – validates a folder path or returns true for empty path.
GetByGroup(group) returns Document<string> objects:
- If group ends with ;nonrecursive, returns only direct members.
- Otherwise, returns members from the folder and all subfolders.

FullName expansion¶

GetFullNames(group, user) supports TreeOptions via suffixes on the group:
- ;nonrecursive → direct children (files + folders)
- ;groupsonly → only folder full names
- ;nonrecursive;groupsonly → direct child folders only
For recursive full listing, call without ;nonrecursive.

Parameters reference (when adding or filtering)¶

On Add (document metadata):

Author, Summary, Description, Language, Keywords, Format
- All are optional strings. If omitted, the DB receives empty strings.
- Format is free text (e.g., pdf, docx) and is not auto-derived.

On GetMetadataByFilter (filtering):

defaultfolder – limit to this folder and descendants
includexmlmetadata – true to include XML metadata scan (AND with column filters)

Connection & environment (MCLite)¶

The MCLite Db:

Resolves the workspace schema name from master.workspace (workspace={name}).
Chooses DB driver via dbflavour:
- PostgreSQL (default), SQLite, SqlServer
Uses . table delimiter (PostgreSQL/SQL Server) or _ (SQLite).
Parameter prefix is @.

Connection string keys (commonly used)¶

PostgreSQL: database, host, port, username, password, workspace, dbflavour=PostgreSQL
SQLite: database (file path), dbflavour=SQLite
SQL Server: host, port, database, username, password, dbflavour=SqlServer

Using the repository directly (C#)¶

using DHI.Services.Provider.MCLite;
using DHI.Services.Documents;
using System.Security.Claims;

// Build the connection string (PostgreSQL example)
var cs = "database=mc2014.2;host=localhost;port=5432;username=dss_admin;password=secretdss_admin;workspace=workspace1;dbflavour=PostgreSQL";

IDocumentRepository<string> repo = new DocumentRepository(cs);

// 1) Add a document
var id = "/Reports/2025/Q1/summary.pdf";
var meta = new Parameters {
  ["Author"] = "Jane Doe",
  ["Summary"] = "Quarterly summary",
  ["Language"] = "en",
  ["Keywords"] = "finance revenue",
  ["Description"] = "Q1 2025 performance",
  ["Format"] = "pdf"
};
using (var file = File.OpenRead(@"C:\docs\summary.pdf"))
{
    repo.Add(file, id, meta);
}

// 2) Fetch it back
var (stream, fileType, fileName) = repo.Get(id);
using (var fs = File.Create($@"C:\out\{fileName}"))
{
    stream.CopyTo(fs);
}

// 3) Metadata lookups
var md = repo.GetMetadata(id); // Title, Author, Summary, etc.

var filtered = repo.GetMetadataByFilter(
    "revenue 2025",
    new Parameters {
        ["defaultfolder"] = "/Reports",
        ["includexmlmetadata"] = "true"
    }
);

// 4) List by group
var docs = repo.GetByGroup("/Reports/2025");            // recursive
var docsDirect = repo.GetByGroup("/Reports/2025;nonrecursive");

// 5) Remove
repo.Remove(id);

Via the Web API (quick reference)¶

Your Documents Web API already covers routes & auth. With the Connections entry (below) named mclite, typical calls look like:

GET    /api/documents/mclite/ids
GET    /api/documents/mclite/metadata?filter=revenue%202025&defaultfolder=/Reports&includexmlmetadata=true
GET    /api/documents/mclite/file?path=/Reports/2025/Q1/summary.pdf
POST   /api/documents/mclite/file   (multipart/form-data with metadata fields)
DELETE /api/documents/mclite/file?path=/Reports/2025/Q1/summary.pdf

(See the Documents WebApi — Internal Guide for exact payloads, status codes, and auth.)

Connections module entries¶

Add these objects to your connections.json (or equivalent configuration) to enable the MCLite Documents provider via Web API:

{
  "type": "DHI.Services.Documents.WebApi.GroupedDocumentServiceConnection, DHI.Services.Documents.WebApi",
  "id": "mclite",
  "name": "MCLite (PostgreSQL)",
  "repositoryType": "DHI.Services.Provider.MCLite.DocumentRepository, DHI.Services.MCLite",
  "connectionString": "database=mc2014.2;host=localhost;port=5432;username=dss_admin;password=secretdss_admin;workspace=workspace1;dbflavour=PostgreSQL"
},
{
  "type": "DHI.Services.Documents.WebApi.GroupedDocumentServiceConnection, DHI.Services.Documents.WebApi",
  "id": "mclite-sqlite",
  "name": "MCLite (SQLite)",
  "repositoryType": "DHI.Services.Provider.MCLite.DocumentRepository, DHI.Services.MCLite",
  "connectionString": "database=[AppData]MCSQLiteTest.sqlite;dbflavour=SQLite"
}

Note

Use id (e.g., mclite) as the provider segment in your Web API URLs.

Performance & operational notes¶

Chunk size is 8 KB per Blob row. Large files create many rows; indexes on (id, block_no) are recommended.
No explicit transaction wraps Add end-to-end. If you need stricter atomicity (document + blob + association), wrap at a higher layer.
is_public is inserted as true. This provider doesn’t inspect ClaimsPrincipal; authorization is expected to be enforced in the API layer.
Thumbnails are optional. If you populate the Blob for thumbnail_id, the metadata projection will return a base64 PNG with black treated as transparent.

Common recipes¶

Add from byte[]

using var ms = new MemoryStream(bytes);
repo.Add(ms, "/Inbox/policy.docx", new Parameters { ["Format"] = "docx" });

Search by free text across multiple columns

var hits = repo.GetMetadataByFilter("coastal risk model",
    new Parameters { ["defaultfolder"] = "/Studies/Coastal" });

Search including XML metadata

var hits = repo.GetMetadataByFilter("nitrates 2023",
    new Parameters {
        ["defaultfolder"] = "/WaterQuality",
        ["includexmlmetadata"] = "true"
    });

List only folders under a path (names)

var folderNames = repo.GetFullNames("/Reports/2025;nonrecursive;groupsonly");

Troubleshooting¶

“Schema … does not exist”: The workspace name must exist in master.workspace; the repository resolves schema_name from there.
No results with defaultfolder: Ensure the folder path is correct (/-prefixed) and actually exists; the filter includes the folder and all descendants.
Thumbnails not appearing: Add doesn’t generate them. Insert Blob blocks for thumbnail_id yourself.
Format blank: Supply it in Parameters ("Format"="pdf" etc.); it isn’t inferred from the filename.