Search Results

Search found 54055 results on 2163 pages for 'multiple files'.

Page 39/2163 | < Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >

ClearTrace Performance on 170GB of Trace Files

- by Bill Graziano

I’ve always worked to make ClearTrace perform well. That’s probably because I spend so much time watching it work. I’m often going through two or three gigabytes of trace files but I rarely get the chance to run it on a really large set of files. One of my clients wanted to run a full trace for a week and then analyze the results. At the end of that week we had 847 200MB trace files for a total of nearly 170GB. I regularly use 200MB trace files when I monitor production systems. I usually get around 300,000 statements in a file that size if it’s mostly stored procedures. So those 847 trace files contained roughly 250 million statements. (That’s 730 bytes per statement if you’re keeping track. Newer trace files have some compression in them but I’m not exactly sure what they’re doing.) On a system running 1,000 statements per second I get a new file every five minutes or so. It took 27 hours to process these files on an older development box. That works out to 1.77MB/second. That means ClearTrace processed about 2,654 statements per second. You can query the data while you’re loading it but I’ve found it works better to use a second instance of ClearTrace to do this. I’m not sure why yet but I think there’s still some dependency between the two processes. ClearTrace is almost always CPU bound. It’s really just a huge, ugly collection of regular expressions. It only writes a summary to its database at the end of each trace file so that usually isn’t a bottleneck. At the end of this process, the executable was using roughly 435MB of RAM. Certainly more than when it started but I think that’s acceptable. The database where all this is stored started out at 100MB. After processing 170GB of trace files the database had grown to 203MB. The space savings are due to the “datawarehouse-ish” design and only storing a summary of each trace file. You can download ClearTrace for SQL Server 2008 or test out the beta version for SQL Server 2012. Happy Tuning!

Read the article
A music player that can handle multiple artist tags

- by Keidax

The mp3 format can handle multiple artists per track (in the form of "artist1\artist2"), and as far as I know other modern music formats can do the same thing. However, Rhythmbox (my default music player) seems to be capable of only reading the first artist. Are there any music players that can read and sort songs with multiple artists, or a plugin for Rhythmbox that can provide this functionality?

Read the article
How to remove configuration files completely

- by Jasper Loy

Recently I uninstalled some software using sudo apt-get --purge autoremove, thinking that this would remove all traces of it including unused dependencies and configuration files. However I discovered that a configuration file was left behind in my home folder. Is there a more powerful command which would remove even that? As a related question about keeping things clean , is is safe to delete the hidden files and folders under home, if they are merely configuration files, or are there other kinds of files?

Read the article
Multiple readers on FIFO

- by poly

I've asked a question here before about multiple writers on a FIFO, and I know now that the write is thread safe as long as I write less than the PIPE_BIF, here is the link for that limit. What about read? what if have two(or more) readers in multiple threads for reading from the same fifo, do I need locks here? or all I need is to read less than the PIPE_BUF? BTW, I'm talking about Linux FIFO, And I'm using C.

Read the article
How do I not show files on the main screen

- by ChuckMcM

Ok, so I upgraded to 12.10 and have been trying out Unity, my screen has become a complete mess of folders and files. "Back in the day" the folders that were on my screen were the ones shown in the .Desktop directory now it seems like all the files in my login directory are there (that is a lot of files) Is there some way to set the files being diplayed to come from a specific directory? if so how? I think I've gone through every panel of the system settings application.

Read the article
More than 5 custom variables across multiple websites using Google Analytics

- by brakes

We have multiple websites using the same Google Analytics account number so we can track visitors across multiple websites. One of these websites has set 5 custom variables. We want to introduce a new custom variable to track logged in users for our single sign-on (SSO) system to find out what parts of which website they are accessing. Is this possible or is it a case that all the custom variables have been used up by 1 of the sites?

Read the article
3 Benefits of Multiple C Class Hosting

Multiple C Class hosting has become an essential tool for marketers striving to have their websites rank highly in the search engines. The ability to interlink websites while having search engines actually count rather than discount the links is invaluable. What are the benefits of Multiple C Class hosting? Read on to find out.

Read the article
rsync --remove-source-files but only those that match a pattern

- by Daniel

Is this possible with rsync? Transfer everything from src:path/to/dir to dest:/path/to/other/dir and delete some of the source files in src:path/to/dir that match a pattern (or size limit) but keep all other files. I couldn't find a way to limit --remove-source-files with a regexp or size limit. Update1 (clarification): I'd like all files in src:path/to/dir to be copied to dest:/path/to/other/dir. Once this is done, I'd like to have some files (those that match a regexp or size limit) in src:path/to/dir deleted but don't want to have anything deleted in dest:/path/to/other/dir. Update2 (more clarification): Unfortunately, I can't simply rsync everything and then manually delete the files matching my regexp from src:. The files to be deleted are continuously created. So let's say there are N files of the type I'd like to delete after the transfer in src: when rsync starts. By the time rsync finishes there will be N+M such files there. If I now delete them manually, I'll lose the M files that were created while rsync was running. Hence I'd like to have a solution that guarantees that the only files deleted from src: are those known to be successfully copied over to dest:. I could fetch a file list from dest: after the rsync is complete, and compare that list of files with what I have in src:, and then do the removal manually. But I was wondering if rsync can do this by itself.

Read the article
rsync --remove-source-files but only those that match a pattern

- by user28146

Is this possible with rsync? Transfer everything from src:path/to/dir to dest:/path/to/other/dir and delete some of the source files in src:path/to/dir that match a pattern (or size limit) but keep all other files. I couldn't find a way to limit --remove-source-files with a regexp or size limit. Update1 (clarification): I'd like all files in src:path/to/dir to be copied to dest:/path/to/other/dir. Once this is done, I'd like to have some files (those that match a regexp or size limit) in src:path/to/dir deleted but don't want to have anything deleted in dest:/path/to/other/dir. Update2 (more clarification): Unfortunately, I can't simply rsync everything and then manually delete the files matching my regexp from src:. The files to be deleted are continuously created. So let's say there are N files of the type I'd like to delete after the transfer in src: when rsync starts. By the time rsync finishes there will be N+M such files there. If I now delete them manually, I'll lose the M files that were created while rsync was running. Hence I'd like to have a solution that guarantees that the only files deleted from src: are those known to be successfully copied over to dest:. I could fetch a file list from dest: after the rsync is complete, and compare that list of files with what I have in src:, and then do the removal manually. But I was wondering if rsync can do this by itself.

Read the article
Installing Multiple OWB Patches

- by [email protected]

When an OUBI bug requires a fix to the Oracle Warehouse Builder (OWB) code, the fix is delivered as an MDL export file that will need to be imported and deployed in OWB. If more than one bug is being patched, then a recent question came in that asked if it would be possible to import one after the next, and then do the deploy steps once? The answer is Yes, all of the imports can be done before any of objects are deployed. Once all of the objects have been deployed, then the TCL scripts that need to be rerun can be run, and then the objects that were changed can be deployed. The order that the MDL files are loaded does not matter unless the same object is in two or more MDL files. In that case, the latest MDL file should be loaded last. For example, if two MDL files both contain changes to the SPLMAP_F_RECENT_CREW mapping, and one was created on January 2, 2009 and the second one was created on March 14, 2009, then the January 2 file should be loaded first and the March 14 file should be loaded second. Note that if the MDL files are always loaded in the order that they were created by Release Services, then this will work correctly.

Read the article
All in a Day's Work: Unblocking Multiple Downloaded Files with a Single Command

- by Sam Abraham

Files downloaded using Internet Explorer retain Internet Zone permission level and hence are “Blocked” by default on Windows 7 machines. Honestly, while an added overhead for developers; I really appreciate this feature as it provides a good protection layer for casual web users. My workaround is to simply unblock the downloaded zip file (if download was a zip file) which, in turn, unblocks the files stored within. Today however, I was left with a situation where I had to “Open” and “Copy” the content rather than “Save” a zip file. That of course left me with a few dozen files I have to manually unblock. A few minutes of internet search lead me to the link below which worked like a charm: 1-Download streams.exe from SystInternals - http://technet.microsoft.com/en-us/sysinternals/bb897440.aspx 2-Go to command prompt (cmd.exe) 3-Navigate to where you have streams.exe installed 4-Use command line switches: streams.exe –s –d “<folder path>” This removed the Internet Zone restrictions from all files under “<folder path>” and its subfolders as well. [Deleted :Zone.Identifier:$DATA] References: http://social.technet.microsoft.com/Forums/en-US/itproxpsp/thread/806f0104-1caa-4a66-b504-7a681d1ccb33/

Read the article
How to put Multiple emails into the Input object with style (Windows Live and Facebook Like)?

- by CuSS

How to put Multiple emails into the Input object with style (Windows Live and Facebook Like)? Like the image above:

Read the article
How to indicate to user that a command affects a subset of a multiple selection?

- by Zamboni

Here is an example that illustrates my question. I have a program that lists 1000 items. I select 10 of 1000 items. The program enables a button indicating that a command is available for my selection. I click the button, and a window appears. I make some change in the window and click OK. The command changes 5 of the 10 items in my multiple selection, and those 5 changed items now reflect a modified state in my list. My question is: How do I indicate to user that the command affects a subset of a multiple selection before clicking OK? Can anyone cite examples of existing products that handle this scenario well?

Read the article
What is the difference between Multiple Inheritance and Polymorphism?

- by Parth

What is the difference between Multiple Inheritance and Polymorphism?

Read the article
jquery - how is multiple selection working in this example?

- by hatorade

The relevant snippet of HTML: <span class="a"> <div class="fieldname">Question 1</div> <input type="text" value="" name="q1" /> </span> The relevant jQuery: $.each($('.a'), function(){ $thisField = $('.fieldname', $(this)); }); What exactly is being set to $thisField? If my understanding of multiple selectors in jQuery is correct, it should be grabbing the outer <span> element AND the inner <div> element. But for some reason, if I use $thisField.prepend("hi"); it ends up putting hi right before the text Question 1, but not before <div>. I thought multiple selectors would grab both elements, and that prepend() adds hi to the beginning of BOTH elements, not just the <div>

Read the article
Is Multiple Inheritance allowed at class level in PHP?

- by Parth

Is Multiple Inheritance allowed at class level in PHP?

Read the article
SharePoint OCR image files indexing

Introduction This article describes how to setup indexing of the image files (including TIFF, PDF, JPEG, BMP...) using OCR technology. The indexing described below utilizes Microsoft IFilter technology and as such is not specific to SharePoint, but can be used with any product that uses Microsoft indexing: Microsoft Search, Desktop search, SQL Server search, and through the plug-ins with Google desktop search. I however use it with Microsoft Windows SharePoint Services 2003. For those other products, the registration may need to be slightly different. Background One of the projects I was working on required a storage of old documents scanned into PDF files. Then there was a separate team of people responsible for providing a tags for a search engine so those image documents could be found. The whole process was clumsy, labor intensive, and error prone. That was what started me on my exploration path. OCR The first search I fired was for the Open Source OCR products. Pretty quickly, I narrowed it down to TESSERACT (http://code.google.com/p/tesseract-ocr/). Tesseract is an orphaned brain child of HP that worked on it from 1985 to 1995. Then it was moved to the Open Source, and now if I understand it correctly, Google is working on it. With credentials like that, it's no wonder that Tesseract scores one of the highest marks on OCR recognition and accuracy. After downloading and struggling just a bit, I got Tesseract to work. The struggling part was that the home page claims that its base input format is a TIFF file. May be my TIFFs were bad, but I was able to get it to work only for BMP files. Image files conversion So now that I have an OCR that can convert BMP files into text, how do I get text out of the image PDF files? One more search, and I settled down on ImageMagic (http://www.imagemagick.org/). This is another wonderful Open Source utility that can convert any file into image. It did work out of the box, converting any TIFF files into bitmaps, but to get PDF files converted, it requires a GhostScript (http://mirror.cs.wisc.edu/pub/mirrors/ghost/GPL/gs864/gs864w32.exe). Dealing with text PDFs With that utility installed, I was cooking - I can convert any file (in particular PDF and TIFF) into bitmap, and then I can extract the text out of the bitmap. The only consideration was to somehow treat PDF files containing text differently - after all, OCR is very computation intensive and somewhat error prone even with perfect image quality and resolution. So another quick search, and I have a PDFTOTEXT (ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip) - thank God for Open Source! With these guys, I can pull text out of PDF in an eye blink. However, I would get nothing for pure image PDFs, but I already have a solution for that! Batch process It took another 15 minutes to setup a batch script to automate the process: Check the file extension If file is a PDF file try to extract text out of it if there is more than certain amount of text in the file - done! if there is no text, convert first page into bitmap run OCR on the bitmap For any other file type, convert file into bitmap Run OCR on the bitmap Once you unzip the attached project, check out the bin\OCR.BAT file. It will create a temporary file in the directory where your source file is with the same name + the '.txt' extension.Continue span.fullpost {display:none;}

Read the article
Convert .3GP and .3G2 Files to AVI / MPEG for Free

- by DigitalGeekery

3GP and .3G2 are common video capture formats used on many mobile phones, but they may not be supported by your favorite media player. Today we’ll show you a quick and easy way to convert those files to AVI or MPG format with the free Windows application, Pazera Free 3GP to AVI Converter. Download the Pazera Free 3GP to AVI Converter. You’ll have to unzip the download folder, but there is no need to install the application. Just double-click the 3gptoavi.exe file to run the application. To add your 3GP or 3G2 files to the queue to be converted, click on the Add files button at the top left. Browse for your file, and click Open. Your video will be added to the Queue. You can add multiple files to the queue and convert them all at one time. Most users will find it preferable to use one of the pre-configured profiles for their conversion settings. To load a profile, choose one from the Profile drop down list and then click the Load button. You will see the profile update the settings in the panels at the bottom of the application. We tested Pazera Free 3GP to AVI Converter with 3GP files recorded on a Motorola Droid, and found the AVI H.264 Very High Q. profile to return the best results for AVI output, and the MPG – DVD NTSC: MPEG-2 the best results for MPG output. Other profiles produced smaller file sizes, but at a cost of reduced quality video output. More advanced users may tweak video and audio settings to their liking in the lower panels. Click on the AVI button under Output file format / Video settings to adjust settings AVI… Or the MPG button to adjust the settings for MPG output. By default, the converted file will be output to the same location as the input directory. You can change it by clicking the text box input radio button and browsing for a different folder. When you’ve chosen your settings, click Convert to begin the conversion process. A conversion output box will open and display the progress. When finished, click Close. Now you’re ready to enjoy your video in your favorite media player. Pazera Free 3GP to AVI Converter isn’t the most robust media conversion tool, but it does what it is intended to do. It handles the task of 3GP to AVI / MPG conversion very well. It’s easy enough for the beginner to manage without much trouble, but also has enough options to please more experienced users. Download Pazera Free 3GP to AVI Converter Similar Articles Productive Geek Tips How To Convert Video Files to MP3 with VLCEasily Change Audio File Formats with XRECODEConvert PDF Files to Word Documents and Other FormatsConvert Video and Remove Commercials in Windows 7 Media Center with MCEBuddy 1.1Compress Large Video Files with DivX / Xvid and AutoGK TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips DVDFab 6 Revo Uninstaller Pro Registry Mechanic 9 for Windows PC Tools Internet Security Suite 2010 Install, Remove and HIDE Fonts in Windows 7 Need Help with Your Home Network? Awesome Lyrics Finder for Winamp & Windows Media Player Download Videos from Hulu Pixels invade Manhattan Convert PDF files to ePub to read on your iPad

Read the article
Keep it Professional – Multiple Environments

- by AjarnMark

I have certainly been reading blogs a whole lot more than writing them the last several weeks, and it’s about time I got back to writing. I have been collecting several topics and references for blog posts…some of which will probably just never get written as the timeliness of the topics fade over time. Nonetheless, I’m back, and I think it is time to revive my Doing Business Right series, this time coming from the slant of managing a development team rather than the previous angle of being self-employed. First up: separating Dev, Test, and Prod. A few months ago, Colin Stasiuk (@BenchmarkIT) wrote a great post about separating your Dev, Test/UAT, and Prod environments. This post covers all the important points such as removing Developer access from both PROD and UAT, and the importance of proper deployment (a.k.a. promotion) procedures. I won’t repeat it all here, go read the original! But what I do want to address is what I believe to be the #1 excuse people use for not having separate environments: Money. I discussed this briefly in my comment on Colin’s post at the time, but let me repeat it here and expand on it a bit. Don’t let the size of your company or the size of its budget dictate whether you do things professionally or not. I am convinced that most developers and development teams would agree that it is a best practice to have separate environments for development, testing, and production (a.k.a. Live). So why don’t they? Because they think that it means separate servers which means more money. While having separate physical servers for the different environments would be ideal, it is not an absolute requirement in order to make this work. Here are a few ideas: Use multiple instances of SQL Server and multiple Web Sites with Headers or Ports. For no additional fees* you can install multiple instances of SQL Server on the same machine. This gives you a nice separation, allowing you to even use the same database names as will appear in PROD, yet isolating the data and security access. And in IIS, you can create multiple Web Sites on the same server just by using Host Headers or different port numbers to separate them. This approach does still pose the risk of non-Prod environments impacting performance on Prod, but when your application is busy enough for that to be a concern, you can probably afford one of the other options. Use desktop PCs instead of servers. Instead of investing in full server-grade hardware, you can mimic the separate environments on old desktop PCs and at least get functional equivalency, if not performance matching. The last I checked, Microsoft did not require separate licensing for SQL Server if that installation was used exclusively for dev or test purposes*. There may be some version or performance differences between this approach and what you have in Prod, but you have isolated test from impacting Prod resources this way. Virtualization. This is of course one of the hot topics of the day, and I would be remiss if I did not suggest this. It is quite easy these days to setup virtual machines so that, again, your environments are fairly isolated from one another, and you retain all the security and procedural benefits of having separate environments. So the point is, keep your high professional standards intact. You don’t need to compromise on using proper procedure just because you work in a small company with a small budget. Keep doing things the right way! By the way, where I work, our DEV environment is not on a server. All development is done on the developer’s individual workstation where it can be isolated from other developers’ work for the duration of writing the code, but also where the developers have to reconcile (merge) differences in code under concurrent development. This usually means that each change is executed multiple times (once per developer to update their environments with the latest changes from others) giving us an extra, informal. test deployment before even going to the Test/UAT server. It also means that if the network goes down, the developers can continue to hum along because they are not dependent on networked resources. In fact, they will likely be even more productive because they aren’t being interrupted by email…but that’s another post I need to write. * I am not a lawyer, nor a licensing specialist, but it appeared to be so the last time I checked. When in doubt, consult an expert on the topic.

Read the article
Copy New Files Only in .NET

- by psheriff

Recently I had a client that had a need to copy files from one folder to another. However, there was a process that was running that would dump new files into the original folder every minute or so. So, we needed to be able to copy over all the files one time, then also be able to go back a little later and grab just the new files. After looking into the System.IO namespace, none of the classes within here met my needs exactly. Of course I could build it out of the various File and Directory classes, but then I remembered back to my old DOS days (yes, I am that old!). The XCopy command in DOS (or the command prompt for you pure Windows people) is very powerful. One of the options you can pass to this command is to grab only newer files when copying from one folder to another. So instead of writing a ton of code I decided to simply call the XCopy command using the Process class in .NET. The command I needed to run at the command prompt looked like this: XCopy C:\Original\*.* D:\Backup\*.* /q /d /y What this command does is to copy all files from the Original folder on the C drive to the Backup folder on the D drive. The /q option says to do it quitely without repeating all the file names as it copies them. The /d option says to get any newer files it finds in the Original folder that are not in the Backup folder, or any files that have a newer date/time stamp. The /y option will automatically overwrite any existing files without prompting the user to press the "Y" key to overwrite the file. To translate this into code that we can call from our .NET programs, you can write the CopyFiles method presented below. C# using System.Diagnostics public void CopyFiles(string source, string destination){ ProcessStartInfo si = new ProcessStartInfo(); string args = @"{0}\*.* {1}\*.* /q /d /y"; args = string.Format(args, source, destination); si.FileName = "xcopy"; si.Arguments = args; Process.Start(si);} VB.NET Imports System.Diagnostics Public Sub CopyFiles(source As String, destination As String) Dim si As New ProcessStartInfo() Dim args As String = "{0}\*.* {1}\*.* /q /d /y" args = String.Format(args, source, destination) si.FileName = "xcopy" si.Arguments = args Process.Start(si)End Sub The CopyFiles method first creates a ProcessStartInfo object. This object is where you fill in name of the command you wish to run and also the arguments that you wish to pass to the command. I created a string with the arguments then filled in the source and destination folders using the string.Format() method. Finally you call the Start method of the Process class passing in the ProcessStartInfo object. That's all there is to calling any command in the operating system. Very simple, and much less code than it would have taken had I coded it using the various File and Directory classes. Good Luck with your Coding,Paul Sheriff ** SPECIAL OFFER FOR MY BLOG READERS **Visit http://www.pdsa.com/Event/Blog for a free video on Silverlight entitled Silverlight XAML for the Complete Novice - Part 1.

Read the article
Kill all the project files!

- by jamiet

Like many folks I’m a keen podcast listener and yesterday my commute was filled by listening to Scott Hunter being interviewed on .Net Rocks about the next version of ASP.Net. One thing Scott said really struck a chord with me. I don’t remember the full quote but he was talking about how the ASP.Net project file (i.e. the .csproj file) is going away. The rationale being that the main purpose of that file is to list all the other files in the project, and that’s something that the file system is pretty good at. In Scott’s own words (that someone helpfully put in the comments): A file that lists files is really redundant when the OS already does this Romeliz Valenciano correctly pointed out on Twitter that there will still be a project.json file however no longer will there be a need to keep a list of files in a project file. I suspect project.json will simply contain a list of exclusions where necessary rather than the current approach where the project file is a list of inclusions. On the face of it this seems like a pretty good idea. I’ve long been a fan of convention over configuration and this is a great example of that. Instead of listing all the files in a separate file, just treat all the files in the directory as being part of the project. Ostensibly the approach is if its in the directory, its part of the project. Simple. Now I’m not an ASP.net developer, far from it, but it did occur to me that the same approach could be applied to the two Visual Studio project types that I am most familiar with, SSIS & SSDT. Like many people I’ve long been irritated by SSIS projects that display a faux file system inside Solution Explorer. As you can see in the screenshot below the project has Miscellaneous and Connection Managers folders but no such folders exist on the file system: This may seem like a minor thing but it means useful Solution Explorer features like Show All Files and Open Folder in Windows Explorer don’t work and quite frankly it makes me feel like a second class citizen in the Microsoft ecosystem. I’m a developer, treat me like one. Don’t try and hide the detail of how a project works under the covers, show it to me. I’m a big boy, I can handle it! Would it not be preferable to simply treat all the .dtsx files in a directory as being part of a project? I think it would, that’s pretty much all the .dtproj file does anyway (that, and present things in a non-alphabetic order – something else that wildly irritates me), so why not just get rid of the .dtproj file? In the case of SSDT the .sqlproj actually does a whole lot more than simply list files because it also states the BuildAction of each file (Build, NotInBuild, Post-Deployment, etc…) but I see no reason why the convention over configuration approach can’t help us there either. Want to know which is the Post-deployment script? Well, its the one called Post-DeploymentScript.sql! Simple! So that’s my new crusade. Let’s kill all the project files (well, the .dtproj & .sqlproj ones anyway). Are you with me? @Jamiet

Read the article
Homebrew large data cluster access for 2 user levels?

- by Yegor

The title probably makes little sense, so here is an example. I have a file hosting site, that serves a large amount of semi-randomly accessed files. The setup is as follows: High horsepower front-end +DB server that also does encoding for files that need encoding Fresh file server, which stores newly uploaded content, thats probably (and usually) rapidly accessible, which has 500GB of raided SSD storage, that can push over 3GBit of traffic. 3 cheap node servers, containing 2 x 750GB SATA drives in raid1, where files older than 2 weeks are archived, from the SSD server (mentioned above). Files on each server are accessed via subdomains (via modsec) in a straight forward fashion (server1.domain.com, server2.domain.com, etc) Where I have the problem is this. I introduced a "premium" service where people pay a small fee every month, and get ad-free, quick accesses to stuff on the site. Once they are logged in, they access same files via premium.server1.domain.com via a different modsec script, with a different pass phrase. That all works fine and dandy.... except the cheap node servers are all IO bound, so accessing the files on them via a different, unsaturated network makes no difference, since it cannot read off the drive fast enough. What would be a good way to make files on the site be accessible via 2 different network routes, 1 of which will be saturated (the "free network") while all other files are on an un-saturated "premium" network?

Read the article
Azure, don't give me multiple VMs, give me one elastic VM

- by FransBouma

Yesterday, Microsoft revealed new major features for Windows Azure (see ScottGu's post). It all looks shiny and great, but after reading most of the material describing the new features, I still find the overall idea behind all of it flawed: why should I care on how much VMs my web app runs? Isn't that a problem to solve for the Windows Azure engineers / software? And what if I need the file system, why can't I simply get a virtual filesystem ? To illustrate my point, let's use a real example: a product website with a customer system/database and next to it a support site with accompanying database. Both are written in .NET, using ASP.NET and use a SQL Server database each. The product website offers files to download by customers, very simple. You have a couple of options to host these websites: Buy a server, place it in a rack at an ISP and run the sites on that server Use 'shared hosting' with an ISP, which means your sites' appdomains are running on the same machine, as well as the files stored, and the databases are hosted in the same server as the other shared databases. Hire a VM, install your OS of choice at an ISP, and host the sites on that VM, basically the same as the first option, except you don't have a physical server At some cloud-vendor, either host the sites 'shared' or in a VM. See above. With all of those options, scalability is a problem, even the cloud-based ones, though not due to the same reasons: The physical server solution has the obvious problem that if you need more power, you need to buy a bigger server or more servers which requires you to add replication and other overhead Shared hosting solutions are almost always capped on memory usage / traffic and database size: if your sites get too big, you have to move out of the shared hosting environment and start over with one of the other solutions The VM solution, be it a VM at an ISP or 'in the cloud' at e.g. Windows Azure or Amazon, in theory allows scaling out by simply instantiating more VMs, however that too introduces the same overhead problems as with the physical servers: suddenly more than 1 instance runs your sites. If a cloud vendor offers its services in the form of VMs, you won't gain much over having a VM at some ISP: the main problems you have to work around are still there: when you spin up more than one VM, your application must be completely stateless at any moment, including the DB sub system, because what's in memory in instance 1 might not be in memory in instance 2. This might sounds trivial but it's not. A lot of the websites out there started rather small: they were perfectly runnable on a single machine with normal memory and CPU power. After all, you don't need a big machine to run a website with even thousands of users a day. Moving these sites to a multi-VM environment will cause a problem: all the in-memory state they use, all the multi-page transitions they use while keeping state across the transition, they can't do that anymore like they did that on a single machine: state is something of the past, you have to store every byte of state in either a DB or in a viewstate or in a cookie somewhere so with the next request, all state information is available through the request, as nothing is kept in-memory. Our example uses a bunch of files in a file system. Using multiple VMs will require that these files move to a cloud storage system which is mounted in each VM so we don't have to store the files on each VM. This might require different file paths, but this change should be minor. What's perhaps less minor is the maintenance procedure in place on the new type of cloud storage used: instead of ftp-ing into a VM, you might have to update the files using different ways / tools. All in all this makes moving an existing website which was written for an environment that's based around a VM (namely .NET with its CLR) overly cumbersome and problematic: it forces you to refactor your website system to be able to be used 'in the cloud', which is caused by the limited way how e.g. Windows Azure offers its cloud services: in blocks of VMs. Offer a scalable, flexible VM which extends with my needs Instead, cloud vendors should offer simply one VM to me. On that VM I run the websites, store my DB and my files. As it's a virtual machine, how this machine is actually ran on physical hardware (e.g. partitioned), I don't care, as that's the problem for the cloud vendor to solve. If I need more resources, e.g. I have more traffic to my server, way more visitors per day, the VM stretches, like I bought a bigger box. This frees me from the problem which comes with multiple VMs: I don't have any refactoring to do at all: I can simply build my website as if it runs on my local hardware server, upload it to the VM offered by the cloud vendor, install it on the VM and I'm done. "But that might require changes to windows!" Yes, but Microsoft is Windows. Windows Azure is their service, they can make whatever change to what they offer to make it look like it's windows. Yet, they're stuck, like Amazon, in thinking in VMs, which forces developers to 'think ahead' and gamble whether they would need to migrate to a cloud with multiple VMs in the future or not. Which comes down to: gamble whether they should invest time in code / architecture which they might never need. (YAGNI anyone?) So the VM we're talking about, is that a low-level VM which runs a guest OS, or is that VM a different kind of VM? The flexible VM: .NET's CLR ? My example websites are ASP.NET based, which means they run inside a .NET appdomain, on the .NET CLR, which is a VM. The only physical OS resource the sites need is the file system, however this too is accessed through .NET. In short: all the websites see is what .NET allows the websites to see, the world as the websites know it is what .NET shows them and lets them access. How the .NET appdomain is run physically, that's the concern of .NET, not mine. This begs the question why Windows Azure doesn't offer virtual appdomains? Or better: .NET environments which look like one machine but could be physically multiple machines. In such an environment, no change has to be made to the websites to migrate them from a local machine or own server to the cloud to get proper scaling: the .NET VM will simply scale with the need: more memory needed, more CPU power needed, it stretches. What it offers to the application running inside the appdomain is simply increasing, but not fragmented: all resources are available to the application: this means that the problem of how to scale is back to where it should be: with the cloud vendor. "Yeah, great, but what about the databases?" The .NET application communicates with the database server through a .NET ADO.NET provider. Where the database is located is not a problem of the appdomain: the ADO.NET provider has to solve that. I.o.w.: we can host the databases in an environment which offers itself as a single resource and is accessible through one connection string without replication overhead on the outside, and use that environment inside the .NET VM as if it was a single DB. But what about memory replication and other problems? This environment isn't simple, at least not for the cloud vendor. But it is simple for the customer who wants to run his sites in that cloud: no work needed. No refactoring needed of existing code. Upload it, run it. Perhaps I'm dreaming and what I described above isn't possible. Yet, I think if cloud vendors don't move into that direction, what they're offering isn't interesting: it doesn't solve a problem at all, it simply offers a way to instantiate more VMs with the guest OS of choice at the cost of me needing to refactor my website code so it can run in the straight jacket form factor dictated by the cloud vendor. Let's not kid ourselves here: most of us developers will never build a website which needs a truck load of VMs to run it: almost all websites created by developers can run on just a few VMs at most. Yet, the most expensive change is right at the start: moving from one to two VMs. As soon as you have refactored your website code to run across multiple VMs, adding another one is just as easy as clicking a mouse button. But that first step, that's the problem here and as it's right there at the beginning of scaling the website, it's particularly strange that cloud vendors refuse to solve that problem and leave it to the developers to solve that. Which makes migrating 'to the cloud' particularly expensive.

Read the article
How to simulate inner join on very large files in java (without running out of memory)

- by Constantin

I am trying to simulate SQL joins using java and very large text files (INNER, RIGHT OUTER and LEFT OUTER). The files have already been sorted using an external sort routine. The issue I have is I am trying to find the most efficient way to deal with the INNER join part of the algorithm. Right now I am using two Lists to store the lines that have the same key and iterate through the set of lines in the right file once for every line in the left file (provided the keys still match). In other words, the join key is not unique in each file so would need to account for the Cartesian product situations ... left_01, 1 left_02, 1 right_01, 1 right_02, 1 right_03, 1 left_01 joins to right_01 using key 1 left_01 joins to right_02 using key 1 left_01 joins to right_03 using key 1 left_02 joins to right_01 using key 1 left_02 joins to right_02 using key 1 left_02 joins to right_03 using key 1 My concern is one of memory. I will run out of memory if i use the approach below but still want the inner join part to work fairly quickly. What is the best approach to deal with the INNER join part keeping in mind that these files may potentially be huge public class Joiner { private void join(BufferedReader left, BufferedReader right, BufferedWriter output) throws Throwable { BufferedReader _left = left; BufferedReader _right = right; BufferedWriter _output = output; Record _leftRecord; Record _rightRecord; _leftRecord = read(_left); _rightRecord = read(_right); while( _leftRecord != null && _rightRecord != null ) { if( _leftRecord.getKey() < _rightRecord.getKey() ) { write(_output, _leftRecord, null); _leftRecord = read(_left); } else if( _leftRecord.getKey() > _rightRecord.getKey() ) { write(_output, null, _rightRecord); _rightRecord = read(_right); } else { List<Record> leftList = new ArrayList<Record>(); List<Record> rightList = new ArrayList<Record>(); _leftRecord = readRecords(leftList, _leftRecord, _left); _rightRecord = readRecords(rightList, _rightRecord, _right); for( Record equalKeyLeftRecord : leftList ){ for( Record equalKeyRightRecord : rightList ){ write(_output, equalKeyLeftRecord, equalKeyRightRecord); } } } } if( _leftRecord != null ) { write(_output, _leftRecord, null); _leftRecord = read(_left); while(_leftRecord != null) { write(_output, _leftRecord, null); _leftRecord = read(_left); } } else { if( _rightRecord != null ) { write(_output, null, _rightRecord); _rightRecord = read(_right); while(_rightRecord != null) { write(_output, null, _rightRecord); _rightRecord = read(_right); } } } _left.close(); _right.close(); _output.flush(); _output.close(); } private Record read(BufferedReader reader) throws Throwable { Record record = null; String data = reader.readLine(); if( data != null ) { record = new Record(data.split("\t")); } return record; } private Record readRecords(List<Record> list, Record record, BufferedReader reader) throws Throwable { int key = record.getKey(); list.add(record); record = read(reader); while( record != null && record.getKey() == key) { list.add(record); record = read(reader); } return record; } private void write(BufferedWriter writer, Record left, Record right) throws Throwable { String leftKey = (left == null ? "null" : Integer.toString(left.getKey())); String leftData = (left == null ? "null" : left.getData()); String rightKey = (right == null ? "null" : Integer.toString(right.getKey())); String rightData = (right == null ? "null" : right.getData()); writer.write("[" + leftKey + "][" + leftData + "][" + rightKey + "][" + rightData + "]\n"); } public static void main(String[] args) { try { BufferedReader leftReader = new BufferedReader(new FileReader("LEFT.DAT")); BufferedReader rightReader = new BufferedReader(new FileReader("RIGHT.DAT")); BufferedWriter output = new BufferedWriter(new FileWriter("OUTPUT.DAT")); Joiner joiner = new Joiner(); joiner.join(leftReader, rightReader, output); } catch (Throwable e) { e.printStackTrace(); } } } After applying the ideas from the proposed answer, I changed the loop to this private void join(RandomAccessFile left, RandomAccessFile right, BufferedWriter output) throws Throwable { long _pointer = 0; RandomAccessFile _left = left; RandomAccessFile _right = right; BufferedWriter _output = output; Record _leftRecord; Record _rightRecord; _leftRecord = read(_left); _rightRecord = read(_right); while( _leftRecord != null && _rightRecord != null ) { if( _leftRecord.getKey() < _rightRecord.getKey() ) { write(_output, _leftRecord, null); _leftRecord = read(_left); } else if( _leftRecord.getKey() > _rightRecord.getKey() ) { write(_output, null, _rightRecord); _pointer = _right.getFilePointer(); _rightRecord = read(_right); } else { long _tempPointer = 0; int key = _leftRecord.getKey(); while( _leftRecord != null && _leftRecord.getKey() == key ) { _right.seek(_pointer); _rightRecord = read(_right); while( _rightRecord != null && _rightRecord.getKey() == key ) { write(_output, _leftRecord, _rightRecord ); _tempPointer = _right.getFilePointer(); _rightRecord = read(_right); } _leftRecord = read(_left); } _pointer = _tempPointer; } } if( _leftRecord != null ) { write(_output, _leftRecord, null); _leftRecord = read(_left); while(_leftRecord != null) { write(_output, _leftRecord, null); _leftRecord = read(_left); } } else { if( _rightRecord != null ) { write(_output, null, _rightRecord); _rightRecord = read(_right); while(_rightRecord != null) { write(_output, null, _rightRecord); _rightRecord = read(_right); } } } _left.close(); _right.close(); _output.flush(); _output.close(); } UPDATE While this approach worked, it was terribly slow and so I have modified this to create files as buffers and this works very well. Here is the update ... private long getMaxBufferedLines(File file) throws Throwable { long freeBytes = Runtime.getRuntime().freeMemory() / 2; return (freeBytes / (file.length() / getLineCount(file))); } private void join(File left, File right, File output, JoinType joinType) throws Throwable { BufferedReader leftFile = new BufferedReader(new FileReader(left)); BufferedReader rightFile = new BufferedReader(new FileReader(right)); BufferedWriter outputFile = new BufferedWriter(new FileWriter(output)); long maxBufferedLines = getMaxBufferedLines(right); Record leftRecord; Record rightRecord; leftRecord = read(leftFile); rightRecord = read(rightFile); while( leftRecord != null && rightRecord != null ) { if( leftRecord.getKey().compareTo(rightRecord.getKey()) < 0) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); } else if( leftRecord.getKey().compareTo(rightRecord.getKey()) > 0 ) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); } else if( leftRecord.getKey().compareTo(rightRecord.getKey()) == 0 ) { String key = leftRecord.getKey(); List<File> rightRecordFileList = new ArrayList<File>(); List<Record> rightRecordList = new ArrayList<Record>(); rightRecordList.add(rightRecord); rightRecord = consume(key, rightFile, rightRecordList, rightRecordFileList, maxBufferedLines); while( leftRecord != null && leftRecord.getKey().compareTo(key) == 0 ) { processRightRecords(outputFile, leftRecord, rightRecordFileList, rightRecordList, joinType); leftRecord = read(leftFile); } // need a dispose for deleting files in list } else { throw new Exception("DATA IS NOT SORTED"); } } if( leftRecord != null ) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); while(leftRecord != null) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.LeftExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, leftRecord, null); } leftRecord = read(leftFile); } } else { if( rightRecord != null ) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); while(rightRecord != null) { if( joinType == JoinType.RightOuterJoin || joinType == JoinType.RightExclusiveJoin || joinType == JoinType.FullExclusiveJoin || joinType == JoinType.FullOuterJoin ) { write(outputFile, null, rightRecord); } rightRecord = read(rightFile); } } } leftFile.close(); rightFile.close(); outputFile.flush(); outputFile.close(); } public void processRightRecords(BufferedWriter outputFile, Record leftRecord, List<File> rightFiles, List<Record> rightRecords, JoinType joinType) throws Throwable { for(File rightFile : rightFiles) { BufferedReader rightReader = new BufferedReader(new FileReader(rightFile)); Record rightRecord = read(rightReader); while(rightRecord != null){ if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.RightOuterJoin || joinType == JoinType.FullOuterJoin || joinType == JoinType.InnerJoin ) { write(outputFile, leftRecord, rightRecord); } rightRecord = read(rightReader); } rightReader.close(); } for(Record rightRecord : rightRecords) { if( joinType == JoinType.LeftOuterJoin || joinType == JoinType.RightOuterJoin || joinType == JoinType.FullOuterJoin || joinType == JoinType.InnerJoin ) { write(outputFile, leftRecord, rightRecord); } } } /** * consume all records having key (either to a single list or multiple files) each file will * store a buffer full of data. The right record returned represents the outside flow (key is * already positioned to next one or null) so we can't use this record in below while loop or * within this block in general when comparing current key. The trick is to keep consuming * from a List. When it becomes empty, re-fill it from the next file until all files have * been consumed (and the last node in the list is read). The next outside iteration will be * ready to be processed (either it will be null or it points to the next biggest key * @throws Throwable * */ private Record consume(String key, BufferedReader reader, List<Record> records, List<File> files, long bufferMaxRecordLines ) throws Throwable { boolean processComplete = false; Record record = records.get(records.size() - 1); while(!processComplete){ long recordCount = records.size(); if( record.getKey().compareTo(key) == 0 ){ record = read(reader); while( record != null && record.getKey().compareTo(key) == 0 && recordCount < bufferMaxRecordLines ) { records.add(record); recordCount++; record = read(reader); } } processComplete = true; // if record is null, we are done if( record != null ) { // if the key has changed, we are done if( record.getKey().compareTo(key) == 0 ) { // Same key means we have exhausted the buffer. // Dump entire buffer into a file. The list of file // pointers will keep track of the files ... processComplete = false; dumpBufferToFile(records, files); records.clear(); records.add(record); } } } return record; } /** * Dump all records in List of Record objects to a file. Then, add that * file to List of File objects * * NEED TO PLACE A LIMIT ON NUMBER OF FILE POINTERS (check size of file list) * * @param records * @param files * @throws Throwable */ private void dumpBufferToFile(List<Record> records, List<File> files) throws Throwable { String prefix = "joiner_" + files.size() + 1; String suffix = ".dat"; File file = File.createTempFile(prefix, suffix, new File("cache")); BufferedWriter writer = new BufferedWriter(new FileWriter(file)); for( Record record : records ) { writer.write( record.dump() ); } files.add(file); writer.flush(); writer.close(); }

Read the article
How can I split my conkeror-rc config over multiple files?

- by Ryan Thompson

Short version: can you help me fill in this code? var conkeror_settings_dir = ".conkeror.mozdev.org/settings"; function load_all_js_files_in_dir (dir) { var full_path = get_home_directory().appendRelativePath(dir); // YOUR CODE HERE } load_all_js_files_in_dir(conkeror_settings_dir); Background I'm trying out Conkeror for web browsing. It's an emacs-like browser running on Mozilla's rendering engine, using javascript as configuration language (filling the role that elisp plays for emacs). In my emacs config, I have split my customizations into a series of files, where each file is a single unit of related options (for example, all my perl-related settings might be in perl-settings.el. All these settings files are loaded automatically by a function in my .emacs that simply loads every elisp file under my "settings" directory. I am looking to structure my Conkeror config in the same way, with my main conkeror-rc file basically being a stub that loads all the js files under a certain directory relative to my home directory. Unfortunately, I am much less literate in javascript than I am in elisp, so I don't even know how to "source" a file.

Read the article

< Previous Page | 35 36 37 38 39 40 41 42 43 44 45 46 | Next Page >