Designing a Content-Based ETL Process with .NET and SFDC

Posted by Patrick on Programmers See other posts from Programmers or by Patrick
Published on 2012-07-04T20:13:51Z Indexed on 2012/07/05 3:22 UTC
Read the original article Hit count: 424

Filed under:
|
|
|
|

As my firm makes the transition to using SFDC as our main operational system, we've spun together a couple of SFDC portals where we can post customer-specific documents to be viewed at will. As such, we've had the need for pseudo-ETL applications to be implemented that are able to extract metadata from the documents our analysts generate internally (most are industry-standard PDFs, XML, or MS Office formats) and place in networked "queue" folders. From there, our applications scoop of the queued documents and upload them to the appropriate SFDC CRM Content Library along with some select pieces of metadata. I've mostly used DbAmp to broker communication with SFDC (DbAmp is a Linked Server provider that allows you to use SQL conventions to interact with your SFDC Org data).

I've been able to create [console] applications in C# that work pretty well, and they're usually structured something like this:

static void Main()
{
    // Load parameters from app.config.
    // Get documents from queue.
    var files = someInterface.GetFiles(someFilterOrRegexPattern);

    foreach (var file in files)
    {
        // Extract metadata from the file.
        // Validate some attributes of the file; add any validation errors to an in-memory          
        // structure (e.g. List<ValidationErrors>).
        if (isValid) 
        {
            var fileData = File.ReadAllBytes(file);
            // Upload using some wrapper for an ORM or DAL
            someInterface.Upload(fileData, meta.Param1, meta.Param2, ...);
        }
        else
        {
            // Bounce the file
        }
    }

    // Report any validation errors (via message bus or SMTP or some such).
}

And that's pretty much it. Most of the time I wrap all these operations in a "Worker" class that takes the needed interfaces as constructor parameters.

This approach has worked reasonably well, but I just get this feeling in my gut that there's something awful about it and would love some feedback. Is writing an ETL process as a C# Console app a bad idea? I'm also wondering if there are some design patterns that would be useful in this scenario that I'm clearly overlooking.

Thanks in advance!

© Programmers or respective owner

Related posts about c#

Related posts about .NET