Designing Content-Based ETL Process with .NET and SFDC
- by Patrick
As my firm makes the transition to using SFDC as our main operational system, we've spun together a couple of SFDC portals where we can post customer-specific documents to be viewed at will. As such, we've had the need for pseudo-ETL applications to be implemented that are able to extract metadata from the documents our analysts generate internally (most are industry-standard PDFs, XML, or MS Office formats) and place in networked "queue" folders. From there, our applications scoop of the queued documents and upload them to the appropriate SFDC CRM Content Library along with some select pieces of metadata. I've mostly used DbAmp to broker communication with SFDC (DbAmp is a Linked Server provider that allows you to use SQL conventions to interact with your SFDC Org data).
I've been able to create [console] applications in C# that work pretty well, and they're usually structured something like this:
static void Main()
{
// Load parameters from app.config.
// Get documents from queue.
var files = someInterface.GetFiles(someFilterOrRegexPattern);
foreach (var file in files)
{
// Extract metadata from the file.
// Validate some attributes of the file; add any validation errors to an in-memory
// structure (e.g. List<ValidationErrors>).
if (isValid)
{
// Upload using some wrapper for an ORM an
someInterface.Upload(meta.Param1, meta.Param2, ...);
}
else
{
// Bounce the file
}
}
// Report any validation errors (via message bus or SMTP or some such).
}
And that's pretty much it. Most of the time I wrap all these operations in a "Worker" class that takes the needed interfaces as constructor parameters.
This approach has worked reasonably well, but I just get this feeling in my gut that there's something awful about it and would love some feedback. Is writing an ETL process as a C# Console app a bad idea? I'm also wondering if there are some design patterns that would be useful in this scenario that I'm clearly overlooking.
Thanks in advance!