Improve Efficiency for This Text Processing Code

Posted by johnv on Stack Overflow See other posts from Stack Overflow or by johnv
Published on 2010-12-27T14:53:16Z Indexed on 2010/12/27 15:53 UTC
Read the original article Hit count: 562

Filed under:

c#

|

text

I am writing a program that counts the number of words in a text file which is already in lowercase and separated by spaces. I want to use a dictionary and only count the word IF it's within the dictionary. The problem is the dictionary is quite large (~100,000 words) and each text document has also ~50,000 words. As such, the codes that I wrote below gets very slow (takes about 15 sec to process one document on a quad i7 machine). I'm wondering if there's something wrong with my coding and if the efficiency of the program can be improved. Thanks so much for your help. Code below:

public static string WordCount(string countInput)
        {
            string[] keywords = ReadDic(); /* read dictionary txt file*/

            /*then reads the main text file*/
            Dictionary<string, int> dict = ReadFile(countInput).Split(' ')
                .Select(c => c)
                .Where(c => keywords.Contains(c))
                .GroupBy(c => c)
                .Select(g => new { word = g.Key, count = g.Count() })
                .OrderBy(g => g.word)
                .ToDictionary(d => d.word, d => d.count);

            int s = dict.Sum(e => e.Value);
            string k = s.ToString();
            return k;

        }

© Stack Overflow or respective owner

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

as seen on West-Wind - Search for 'West-Wind'
I’ve run into the problem a few times now: How to pre-authenticate .NET WebRequest calls doing an HTTP call to the server – essentially send authentication credentials on the very first request instead of waiting for a server challenge first? At first glance this sound like it should be easy:… >>> More
HttpWebRequest and Ignoring SSL Certificate Errors

as seen on West-Wind - Search for 'West-Wind'
Man I can't believe this. I'm still mucking around with OFX servers and it drives me absolutely crazy how some these servers are just so unbelievably misconfigured. I've recently hit three different 3 major brokerages which fail HTTP validation with bad or corrupt certificates at least according to… >>> More
The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

as seen on West-Wind - Search for 'West-Wind'
I’ve written quite a bit about Visual FoxPro interoperating with .NET in the past both for ASP.NET interacting with Visual FoxPro COM objects as well as Visual FoxPro calling into .NET code via COM Interop. COM Interop with Visual FoxPro has a number of problems but one of them at least got a lot… >>> More
Dynamic Type to do away with Reflection

as seen on West-Wind - Search for 'West-Wind'
The dynamic type in C# 4.0 is a welcome addition to the language. One thing I’ve been doing a lot with it is to remove explicit Reflection code that’s often necessary when you ‘dynamically’ need to walk and object hierarchy. In the past I’ve had a number of ReflectionUtils that used string based expressions… >>> More
Finding a Relative Path in .NET

as seen on West-Wind - Search for 'West-Wind'
Here’s a nice and simple path utility that I’ve needed in a number of applications: I need to find a relative path based on a base path. So if I’m working in a folder called c:\temp\templates\ and I want to find a relative path for c:\temp\templates\subdir\test.txt I want to receive back subdir\test… >>> More

Related posts about text

Error in running script [closed]

as seen on Programmers - Search for 'Programmers'
I'm trying to run heathusf_v1.1.0.tar.gz found here I installed tcsh to make build_heathusf work. But, when I run ./build_heathusf, I get the following (I'm running that on a Fedora Linux system from Terminal): $ ./build_heathusf Compiling programs to build a library of image processing functions… >>> More
Coloring even heighten columns

as seen on Stack Overflow - Search for 'Stack Overflow'
I try to set different a background colors for left and right columns and to maintain the same height. So I set a background color for outer wrapper ("container" div) so it will set a color to rightBar. But this didn't work. Online Demo I want it to work on all browsers. Markup: <!DOCTYPE… >>> More
HTML: How to create a DIV with only vertical scroll-bar to show long paragraphs on a webpage?

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to show terms and condition note on my website. I dont want to use text field and also dont want to use my whole page. I just want to display my text in selected area and want to use only vertical scroll-bar to go down and read all text. Currently I am using this code: <div style="width:10;height:10;overflow:scroll"… >>> More
Qt Linking Error.

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I configure qt-x11 with following options ./configure -prefix /iTalk/qtx11 -prefix-install -bindir /iTalk/qtx11-install/bin -libdir /iTalk/qtx11-install/lib -docdir /iTalk/qtx11-install/doc -headerdir /iTalk/qtx11-install/include -datadir /iTalk/qtx11-install/data -examplesdir /iTalk/qtx11-install/examples… >>> More
XSLT Escape Character not working

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to use escape charaters in my text output, as i would like too surround the output in emailData tags. I am using <xsl:text><emailData></xsl:text> In the XSLT to esnure that this works however because i am using a tool called Cast Iron for some reason it… >>> More