sparse files - Page 112

Deleting files associated with model - django

- by alexBrand

I have the following code in one of my models class PostImage(models.Model): post = models.ForeignKey(Post, related_name="images") # @@@@ figure out a way to have image folders per user... image = models.ImageField(upload_to='images') image_infowindow = models.ImageField(upload_to='images') image_thumb = models.ImageField(upload_to='images') image_web = models.ImageField(upload_to='images') description = models.CharField(max_length=100) order = models.IntegerField(null=True) IMAGE_SIZES = { 'image_infowindow':(70,70), 'image_thumb':(100,100), 'image_web':(640,480), } def delete(self, *args, **kwargs): # delete files.. self.image.delete(save=False) self.image_thumb.delete(save=False) self.image_web.delete(save=False) self.image_infowindow.delete(save=False) super(PostImage, self).delete(*args, **kwargs) I am trying to delete the files when the delete() method is called on PostImage. However, the files are not being removed. As you can see, I am overriding the delete() method, and deleting each ImageField. For some reason however, the files are not being removed.

Read the article

Member classes versus #includes

- by ShallowThoughts

I've recently discovered that it is bad form to have #includes in your header files because anyone who uses your code gets all those extra includes they won't necessarily want. However, for classes that have member variables defined as a type of another class, what's the alternative? For example, I was doing things the following way for the longest time: /* Header file for class myGrades */ #include <vector> //bad #include "classResult.h" //bad class myGrades { vector<classResult> grades; int average; int bestScore; } (Please excuse the fact that this is a highly artificial example) So, if I want to get rid of the #include lines, is there any way I can keep the vector or do I have to approach programming my code in an entirely different way?

Read the article

How to find files older than N days from a given timestamp

- by JGeZau

I want to find files older than N days from a given timestamp in format YYYYMMDDHH I can find file older than 2 days with the below command, but this finds files with present time find /path/to/dir -mtime -2 -type f -ls Lets say I give the input timeSamp=2011093009 so I want to find files older than 2 days from 2011093009 Been doing my research, but can't seem to figure it out. ========================================== Found the solution...see below for my Answer.. Thanks

Read the article

.net File.Copy very slow when copying many small files (not over network)

- by Guavaman

I'm making a simple folder sync backup tool for myself and ran into quite a roadblock using File.Copy. Doing tests copying a folder of ~44,000 small files (Windows mail folders) to another drive in my system, I found that using File.Copy was over 3x slower than using a command line and running xcopy to copy the same files/folders. My C# version takes over 16+ minutes to copy the files, whereas xcopy takes only 5 minutes. I've tried searching for help on this topic, but all I find is people complaining about slow file copying of large files over a network. This is neither a large file problem nor a network copying problem. I found an interesting article about a better File.Copy replacement, but the code as posted has some errors which causes problems with the stack and I am nowhere near knowledgeable enough to fix the problems in his code. Are there any common or easy ways to replace File.Copy with something more speedy?

Read the article

BASH statements execute alone but return "no such file" in for loop.

- by reve_etrange

Another one I can't find an answer for, and it feels like I've gone mad. I have a BASH script using a for loop to run a complex command (many protein sequence alignments) on a lot of files (~5000). The loop produces statements that will execute when given alone (i.e. copy-pasted from the error message to the command prompt), but which return "no such file or directory" inside the loop. Script below; there are actually several more arguments but this includes some representative ones and the file arguments. #!/bin/bash # Pass directory with targets as FASTA sequences as argument. # Arguments to psiblast # Common db=local/db/nr/nr outfile="/mnt/scratch/psi-blast" e=0.001 threads=8 itnum=5 pssm="/mnt/scratch/psi-blast/pssm." pssm_txt="/mnt/scratch/psi-blast/pssm." pseudo=0 pwa_inclusion=0.002 for i in ${1}/* do filename=$(basename $i) "local/ncbi-blast-2.2.23+/bin/psiblast\ -query ${i}\ -db $db\ -out ${outfile}/${filename}.out\ -evalue $e\ -num_threads $threads\ -num_iterations $itnum\ -out_pssm ${pssm}$filename\ -out_ascii_pssm ${pssm_txt}${filename}.txt\ -pseudocount $pseudo\ -inclusion_ethresh $pwa_inclusion" done Running this scripts gives "<scriptname> line <last line before 'done'>: <attempted command> : No such file or directory. If I then paste the attempted command onto the prompt it will run. Each of these commands takes a couple of minutes to run.

Read the article

Declaring struct in header file

- by wrongusername

I've been trying to include a structure called "student" in a student.h file, but I'm not quite sure how to do it. My student.h file code consists of entirely: #include<string> using namespace std; struct Student; while the student.cpp file consists of entirely: #include<string> using namespace std; struct Student { string lastName, firstName; //long list of other strings... just strings though }; Unfortunately, files that use #include "student.h" come up with numerous errors like error C2027: use of undefined type 'Student', error C2079: 'newStudent' uses undefined struct 'Student' (where newStudent is a function with a Student parameter), and error C2228: left of '.lastName' must have class/struct/union. It appears the compiler (VC++) does not recognize struct Student from "student.h" or something? I have tried declaring the whole struct in "student.h", but it didn't help either. How can I declare struct Student in "student.h" so that I can just #include "student.h" and start using the struct? BTW, it seems there are no compiler errors in student.h...

Read the article

C# File IO with Streams - Best Memory Buffer Size

- by AJ

Hi, I am writing a small IO library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass, I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...). My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 byte hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive app. As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance). Thanks in advance for any ideas, Adam

Read the article

Binary search in a sorted (memory-mapped ?) file in Java

- by sds

I am struggling to port a Perl program to Java, and learning Java as I go. A central component of the original program is a Perl module that does string prefix lookups in a +500 GB sorted text file using binary search (essentially, "seek" to a byte offset in the middle of the file, backtrack to nearest newline, compare line prefix with the search string, "seek" to half/double that byte offset, repeat until found...) I have experimented with several database solutions but found that nothing beats this in sheer lookup speed with data sets of this size. Do you know of any existing Java library that implements such functionality? Failing that, could you point me to some idiomatic example code that does random access reads in text files? Alternatively, I am not familiar with the new (?) Java I/O libraries but would it be an option to memory-map the 500 GB text file (I'm on a 64-bit machine with memory to spare) and do binary search on the memory-mapped byte array? I would be very interested to hear any experiences you have to share about this and similar problems.

Read the article

Alternatives to using web.config to store settings (for complex solutions)

- by Brian MacKay

In our web applications, we seperate our Data Access Layers out into their own projects. This creates some problems related to settings. Because the DAL will eventually need to be consumed from perhaps more than one application, web.config does not seem like a good place to keep the connection strings and some of the other DAL-related settings. To solve this, on some of our recent projects we introduced a third project just for settings. We put the setting in a system of .Setting files... With a simple wrapper, the ability to have different settings for various enviroments (Dev, QA, Staging, Production, etc) was easy to achieve. The only problem there is that the settings project (including the .Settings class) compiles into an assembly, so you can't change it without doing a build/deployment, and some of our customers want to be able to configure their projects without Visual Studio. So, is there a best practice for this? I have that sense that I'm reinventing the wheel. Some solutions such as storing settings in a fixed directory on the server in, say, our own XML format occurred to us. But again, I would rather avoid having to re-create encryption for sensitive values and so on. And I would rather keep the solution self-contained if possible. EDIT: The original question did not contain the really penetrating reason that we can't (I think) use web.config ... That puts a few (very good) answers out of context, my bad.

Read the article

Control pdb file output from build defintion file

- by Urvi

Hello, I am trying to generate a release build with no pdb files generated. I have seen numerous posts that suggest right-clicking on the project, selecting Properties, going to the Build tab and then to the Advanced... butoon and changing Debug Info to none. This works and all, but I need to do this for a build of ~50 solutions which contain ~25 projects each! Other posts mention editing the appropriate .csproj file, but again, with so many projects, this would take a long time. Is there any way to achieve this via the TFSBuild.proj file? I have tried adding the following to the TFSBuild.proj file, with no luck. <PropertyGroup> <Configuration>Release</Configuration> <Platform>AnyCPU</Platform> </PropertyGroup> <PropertyGroup> <DebugSymbols>false</DebugSymbols> <DebugType>none</DebugType> <Optimize>true</Optimize> </PropertyGroup> The following line prints out Release|AnyCPU, none, and false, but I still see .pdb file in the $(OutputDir) folder. <Message Text="$Configuration|Platform): $(Configuration)|$(Platform)" /> <Message Text="DebugType is: $(DebugType)"/> <Message Text="DebugSymbols is: $(DebugSymbols)"/> Thanks in advance, Urvi

Read the article

Opening a Unicode file with Perl

- by Jaco Pretorius

I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that perl doesn't seem to like the fact that the results files are unicode. I wrote a little test script to test it and the output comes out all warbled. $file = shift; open OUTPUT, $file or die "Can't open $file: $!\n"; while (<OUTPUT>) { print $_; if (/Invalid|invalid|Cannot|cannot/) { push(@invalids, $file); print "invalid file - $inputfile - schedule for retry\n"; last; } } Any ideas? I've tried decoding using decode_utf8 but it makes no difference. I've also tried to set the encoding when opening the file. I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'. Edit: Using perl v5.8.8

Read the article

Does SetFileBandwidthReservation affect memory-mapped file performance?

- by Ghostrider

Does this function affect Memory-mapped file performance? Here's the problem I need to solve: I have two applications competing for disk access: "reader" and "updater". Whole system runs on Windows Server 2008 R2 x64 "Updater" constantly accesses disk in a linear manner, updating data. They system is set up in such a way that updater always has infinite data to update. Consider that it is constantly approximating a solution of a huge set of equations that takes up entire 2TB disk drive. Updater uses ReadFile and WriteFile to process data in a linear fashion. "Reader" is occasionally invoked by user to get some pieces of data. Usually user would read several 4kb blocks from the drive and stop. Occasionally user needs to read up to 100mb sequentially. In exceptional cases up to several gigabytes. Reader maps files to memory to get data it needs. What I would like to achieve is for "reader" to have absolute priority so that "updater" would completely stop if needed so that "reader" could get the data user needs ASAP. Is this problem solvable by using SetPriorityClass and SetFileBandwidthReservation calls? I would really hate to put synchronization login in "reader" and "updater" and rather have the OS take care of priorities.

Read the article

Changing where a resource is pulled during runtime?

- by Brandon

I have a website that goes out to multiple clients. Sometimes a client will insist on minor changes. For reasons beyond my control, I have to comply no matter how minor the request. Usually this isn't a problem, I would just create a client specific version of the user control or page and overwrite the default one during build time or make a configuration setting to handle it. Now that I am localizing the site, I'm curious about the best way to go about making minor wording changes. Lets say I have a resource file called Resources.resx that has 300 resources in it. It has a resource called Continue. English value is "Continue", the French value is "Continuez". Now one client, for whatever reason, wants it to say "Next" and "Après" and the others want to keep it the same. What is the best way to accomodate a request like this? (This is just a simple example). The only two ways I can think of is to Create another Resources.resx specific to the client, and replace the .dll during build time. Since I'd be completely replacing the dll, the new resource file would have to contain all 300 strings. The obvious problem being that I now have 2 resource files, each with 300 strings to maintain. Create a custom user control/page and change it to use a custom resource file. e.g. SignIn.ascx would be replaced during the build and it would pull its resources from ClientName.resx instead of Resources.resx. Are there any other things I could try? Is there any way to change it so that the application will always look in a ClientResources.resx file for the overridden values before actually look at the specified resource file?

Read the article

Improving File Read Performance (single file, C++, Windows)

- by david

I have large (hundreds of MB or more) files that I need to read blocks from using C++ on Windows. Currently the relevant functions are: errorType LargeFile::read( void* data_out, __int64 start_position, __int64 size_bytes ) const { if( !m_open ) { // return error } else { seekPosition( start_position ); DWORD bytes_read; BOOL result = ReadFile( m_file, data_out, DWORD( size_bytes ), &bytes_read, NULL ); if( size_bytes != bytes_read || result != TRUE ) { // return error } } // return no error } void LargeFile::seekPosition( __int64 position ) const { LARGE_INTEGER target; target.QuadPart = LONGLONG( position ); SetFilePointerEx( m_file, target, NULL, FILE_BEGIN ); } The performance of the above does not seem to be very good. Reads are on 4K blocks of the file. Some reads are coherent, most are not. A couple questions: Is there a good way to profile the reads? What things might improve the performance? For example, would sector-aligning the data be useful? I'm relatively new to file i/o optimization, so suggestions or pointers to articles/tutorials would be helpful.

Read the article

File.Replace throwing IOException

- by WebDevHobo

I have an app that can make modify images. In some cases, this makes the filesize smaller, in some cases bigger. The program doesn't have an option to "not replace the file if result has a bigger filesize". So I wrote a little C# app to try and solve this. Instead of overwriting the files, I make the app write the result to a folder under the current one and name that folder Test. The C# app I wrote compares grabs the contents of both folders and puts the full path to the file(s) in two List objects. I then compare and replace. The replacing isn't working however. I get the following IOException: Unable to remove the file to be replaced The location is on an external hard-drive, on which I have full rights. Now, I know I can just do File.Delete and File.Move in that order, but this exception has gotten me interested in why this particular setup wont work. Here's the source code: http://pastebin.com/4Vq82Umu And yes, the file specified as last argument of the Replace function does exist.

Read the article

File IO with Streams - Best Memory Buffer Size

- by AJ

I am writing a small IO library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass, I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...). My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 byte hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive app. As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance).

Read the article

Problem with unlink() in php!

- by Holicreature

I'm creating a temp image always named 1.png under specific folder and once i read the image_contents and process, i use unlink() to delete that specific image from that folder. But sometimes the image file is not deleted and the same image is file is read and processed. That script is working otherwise fine... There is no permission related issues , as the files are deleted sometimes... Will there be any issue when the script is repeatedly called and the image with the name is already present and not deleted etc.. ??? Please suggest me what would be the problem extension_loaded('ffmpeg'); $max_width = 120; $max_height = 72; $path ="/home/fff99/public_html/temp/"; ..... ..... $nname = "/home/friend99/public_html/temp/".$imgname; $fileo = fopen($nname,"rb"); if($fileo) { $imgData = addslashes(file_get_contents($nname)); .... ... .. } unlink('$nname');

Read the article

Any board-like platform but only for files instead of text posts? [on hold]

- by Janwillhaus

I am looking for a (best open-source or free but also commercially available) CMS platform that is built similarly to bulletin boards (for example phpBB) but instead of text posts, registered users can upload files that can be rated, commented etc.) I am aware of the number of board CMSes that have file-database plugins, but that is not what I want. I want to have a system that focuses on the files rather than on the postings. Or do you have any alternative ideas on solutions to the problem? I need the following functions CMS focused on file management Files that can be categorized (in a tree-like view for example) Comments and ratings can be added to files Users that can be provided various rights around the platform (moderation, commenting, exclusion of non-registered users, etc.) Users can upload files themselves (for further moderation

Read the article

Is it possible for a web-server to send more files than requested for, and have the browser accept them?

- by Osiris

I've created a basic web server for a school project, and it serves static content without a problem. I thought of having the server parse all htm/html files for links to .js/.css/image files, and send these files to the client without these files being requested by the client later. eg. The browser requests: index.htm The server responds with intex.htm and image.jpg I modified the server to send two distinct http responses for a "GET /index.html HTTP1.1" (one for the html page and one for the image), but the browser ended up requesting the image when it was good and ready. Is there any way to bypass this? (use a multipart response, perhaps) Will these files be accepted by most browsers, or will they be rejected for security reasons?

Read the article

How do I migrate web files from a Plesk 8 installation (on a slaved HDD) to a Plesk 10.4.4 installation?

- by Ranger

Due to Plesk 8 being at End of Support our host setup a new installation of RHEL and Plesk 10 on a new hard drive. They then slaved the old drive to the new so that we could migrate all our files using SSH. I am having challenges correctly migrating the sub domain files. The path to subdomain root folder in Plesk 10.4.4 is confusing as I don't know where to copy the files to. The path to the files on the slaved drive is: "/mnt/old-drive/var/www/vhosts/domain_name.com/subdomains/SUBDOMAIN_NAME/", meanwhile on the new installation I have "/var/www/vhosts/SUBDOMAIN_NAME.domain_name.com". There is an httpdocs folder in the '/var/www/vhosts/domain_name' folder but none in the '/var/www/vhosts/SUBDOMAIN_NAME.domain_name.com' folder. Where do I copy my subdomain files to? Please help.

Read the article

rsyslogd not monitoring all files

- by Tom O'Connor

So.. I've installed Logstash, and instead of using the logstash shipper (because it needs the JVM and is generally massive), I'm using rsyslogd with the following configuration. # Use traditional timestamp format $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $IncludeConfig /etc/rsyslog.d/*.conf # Provides kernel logging support (previously done by rklogd) $ModLoad imklog # Provides support for local system logging (e.g. via logger command) $ModLoad imuxsock # Log all kernel messages to the console. # Logging much else clutters up the screen. #kern.* /dev/console # Log anything (except mail) of level info or higher. # Don't log private authentication messages! *.info;mail.none;authpriv.none;cron.none;local6.none /var/log/messages # The authpriv file has restricted access. authpriv.* /var/log/secure # Log all the mail messages in one place. mail.* -/var/log/maillog # Log cron stuff cron.* /var/log/cron # Everybody gets emergency messages *.emerg * # Save news errors of level crit and higher in a special file. uucp,news.crit /var/log/spooler # Save boot messages also to boot.log local7.* /var/log/boot.log In /etc/rsyslog.d/logstash.conf there are 28 file monitor blocks using imfile $ModLoad imfile # Load the imfile input module $ModLoad imklog # for reading kernel log messages $ModLoad imuxsock # for reading local syslog messages $InputFileName /var/log/rabbitmq/startup_err $InputFileTag rmq-err: $InputFileStateFile state-rmq-err $InputFileFacility local6 $InputRunFileMonitor .... $InputFileName /var/log/some.other.custom.log $InputFileTag cust-log: $InputFileStateFile state-cust-log $InputFileFacility local6 $InputRunFileMonitor .... *.* @@10.90.0.110:5514 There are 28 InputFileMonitor blocks, each monitoring a different custom application logfile.. If I run [root@secret-gm02 ~]# lsof|grep rsyslog rsyslogd 5380 root cwd DIR 253,0 4096 2 / rsyslogd 5380 root rtd DIR 253,0 4096 2 / rsyslogd 5380 root txt REG 253,0 278976 1015955 /sbin/rsyslogd rsyslogd 5380 root mem REG 253,0 58400 1868123 /lib64/libgcc_s-4.1.2-20080825.so.1 rsyslogd 5380 root mem REG 253,0 144776 1867778 /lib64/ld-2.5.so rsyslogd 5380 root mem REG 253,0 1718232 1867780 /lib64/libc-2.5.so rsyslogd 5380 root mem REG 253,0 23360 1867787 /lib64/libdl-2.5.so rsyslogd 5380 root mem REG 253,0 145872 1867797 /lib64/libpthread-2.5.so rsyslogd 5380 root mem REG 253,0 85544 1867815 /lib64/libz.so.1.2.3 rsyslogd 5380 root mem REG 253,0 53448 1867801 /lib64/librt-2.5.so rsyslogd 5380 root mem REG 253,0 92816 1868016 /lib64/libresolv-2.5.so rsyslogd 5380 root mem REG 253,0 20384 1867990 /lib64/rsyslog/lmnsd_ptcp.so rsyslogd 5380 root mem REG 253,0 53880 1867802 /lib64/libnss_files-2.5.so rsyslogd 5380 root mem REG 253,0 23736 1867800 /lib64/libnss_dns-2.5.so rsyslogd 5380 root mem REG 253,0 20768 1867988 /lib64/rsyslog/lmnet.so rsyslogd 5380 root mem REG 253,0 11488 1867982 /lib64/rsyslog/imfile.so rsyslogd 5380 root mem REG 253,0 24040 1867983 /lib64/rsyslog/imklog.so rsyslogd 5380 root mem REG 253,0 11536 1867987 /lib64/rsyslog/imuxsock.so rsyslogd 5380 root mem REG 253,0 13152 1867989 /lib64/rsyslog/lmnetstrms.so rsyslogd 5380 root mem REG 253,0 8400 1867992 /lib64/rsyslog/lmtcpclt.so rsyslogd 5380 root 0r REG 0,3 0 4026531848 /proc/kmsg rsyslogd 5380 root 1u IPv4 1200589517 0t0 TCP 10.10.10.90 t:40629->10.10.10.90:5514 (ESTABLISHED) rsyslogd 5380 root 2u IPv4 1200589527 0t0 UDP *:45801 rsyslogd 5380 root 3w REG 253,3 17999744 2621483 /var/log/messages rsyslogd 5380 root 4w REG 253,3 13383 2621484 /var/log/secure rsyslogd 5380 root 5w REG 253,3 7180 2621493 /var/log/maillog rsyslogd 5380 root 6w REG 253,3 43321 2621529 /var/log/cron rsyslogd 5380 root 7w REG 253,3 0 2621494 /var/log/spooler rsyslogd 5380 root 8w REG 253,3 0 2621495 /var/log/boot.log rsyslogd 5380 root 9r REG 253,3 1064271998 2621464 /var/log/custom-application.monolog.log rsyslogd 5380 root 10u unix 0xffff81081fad2e40 0t0 1200589511 /dev/log You can see that there are nowhere near 28 logfiles actually being read. I really had to get one file monitored, so I moved it to the top, and it picked it up, but I'd like to be able to monitor all 28+ files, and not have to worry. OS is Centos 5.5 Kernel 2.6.18-308.el5 rsyslogd 3.22.1, compiled with: FEATURE_REGEXP: Yes FEATURE_LARGEFILE: Yes FEATURE_NETZIP (message compression): Yes GSSAPI Kerberos 5 support: Yes FEATURE_DEBUG (debug build, slow code): No Atomic operations supported: Yes Runtime Instrumentation (slow code): No Questions: Why is rsyslogd only monitoring a very small subset of the files? How can I fix this so that all the files are monitored?

Read the article

XML: Process large data

- by Atmocreations

Hello What XML-parser do you recommend for the following purpose: The XML-file (formatted, containing whitespaces) is around 800 MB. It mostly contains three types of tag (let's call them n, w and r). They have an attribute called id which i'd have to search for, as fast as possible. Removing attributes I don't need could save around 30%, maybe a bit more. First part for optimizing the second part: Is there any good tool (command line linux and windows if possible) to easily remove unused attributes in certain tags? I know that XSLT could be used. Or are there any easy alternatives? Also, I could split it into three files, one for each tag to gain speed for later parsing... Speed is not too important for this preparation of the data, of course it would be nice when it took rather minutes than hours. Second part: Once I have the data prepared, be it shortened or not, I should be able to search for the ID-attribute I was mentioning, this being time-critical. Estimations using wc -l tell me that there are around 3M N-tags and around 418K W-tags. The latter ones can contain up to approximately 20 subtags each. W-Tags also contain some, but they would be stripped away. "All I have to do" is navigating between tags containing certain id-attributes. Some tags have references to other id's, therefore giving me a tree, maybe even a graph. The original data is big (as mentioned), but the resultset shouldn't be too big as I only have to pick out certain elements. Now the question: What XML parsing library should I use for this kind of processing? I would use Java 6 in a first instance, with having in mind to be porting it to BlackBerry. Might it be useful to just create a flat file indexing the id's and pointing to an offset in the file? Is it even necessary to do the optimizations mentioned in the upper part? Or are there parser known to be quite as fast with the original data? Little note: To test, I took the id being on the very last line on the file and searching for the id using grep. This took around a minute on a Core 2 Duo. What happens if the file grows even bigger, let's say 5 GB? I appreciate any notice or recommendation. Thank you all very much in advance and regards

Read the article

gcc/g++: error when compiling large file

- by Alexander

Hi, I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed. When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches --param ggc-min-expand=0 --param ggc-min-heapsize=4096 may solve the problem. They, however, only seem to work when optimization is turned on. 1) Is this really the solution that I am looking for? 2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this? Best wishes, Alexander Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results) The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++. However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.

Read the article

Cpp some basic problems

- by DevAno1

Hello. My task was as follows : Create class Person with char*name and int age. Implement contructor using dynamic allocation of memory for variables, destructor, function init and friend function show. Then transform this class to header and cpp file and implement in other program. Ok so I've almost finished my Person class, but I get error after destructor. First question is how to write this properly ? #include <iostream> using namespace std; class Person { char* name; int age; public: int * take_age(); Person(){ int size=0; cout << "Give length of char*" << endl; cin >> size; name = new char[size]; age = 0; } ~Person(){ cout << "Destroying resources" << endl; delete *[] name; delete * take_age(); } friend void(Person &p); int * Person::take_age(){ return age; } void init(char* n, int a) { name = n; age = a; } void show(Person &p){ cout << "Name: " << p.name << "," << "age: " << p.age << endl; } }; int main(void) { Person *p = new Person; p->init("Mary", 25); p.show(); system("PAUSE"); return 0; } And now with header/implementation part : - do I need to introduce constructor in header/implementation files ? If yes - how? - my show() function is a friendly function. Should I take it into account somehow ? I already failed to return this task on my exam, but still I'd like to know how to implement it.

Read the article

Optimal storage of data structure for fast lookup and persistence

- by Mikael Svenson

Scenario I have the following methods: public void AddItemSecurity(int itemId, int[] userIds) public int[] GetValidItemIds(int userId) Initially I'm thinking storage on the form: itemId -> userId, userId, userId and userId -> itemId, itemId, itemId AddItemSecurity is based on how I get data from a third party API, GetValidItemIds is how I want to use it at runtime. There are potentially 2000 users and 10 million items. Item id's are on the form: 2007123456, 2010001234 (10 digits where first four represent the year). AddItemSecurity does not have to perform super fast, but GetValidIds needs to be subsecond. Also, if there is an update on an existing itemId I need to remove that itemId for users no longer in the list. I'm trying to think about how I should store this in an optimal fashion. Preferably on disk (with caching), but I want the code maintainable and clean. If the item id's had started at 0, I thought about creating a byte array the length of MaxItemId / 8 for each user, and set a true/false bit if the item was present or not. That would limit the array length to little over 1mb per user and give fast lookups as well as an easy way to update the list per user. By persisting this as Memory Mapped Files with the .Net 4 framework I think I would get decent caching as well (if the machine has enough RAM) without implementing caching logic myself. Parsing the id, stripping out the year, and store an array per year could be a solution. The ItemId - UserId[] list can be serialized directly to disk and read/write with a normal FileStream in order to persist the list and diff it when there are changes. Each time a new user is added all the lists have to updated as well, but this can be done nightly. Question Should I continue to try out this approach, or are there other paths which should be explored as well? I'm thinking SQL server will not perform fast enough, and it would give an overhead (at least if it's hosted on a different server), but my assumptions might be wrong. Any thought or insights on the matter is appreciated. And I want to try to solve it without adding too much hardware :) [Update 2010-03-31] I have now tested with SQL server 2008 under the following conditions. Table with two columns (userid,itemid) both are Int Clustered index on the two columns Added ~800.000 items for 180 users - Total of 144 million rows Allocated 4gb ram for SQL server Dual Core 2.66ghz laptop SSD disk Use a SqlDataReader to read all itemid's into a List Loop over all users If I run one thread it averages on 0.2 seconds. When I add a second thread it goes up to 0.4 seconds, which is still ok. From there on the results are decreasing. Adding a third thread brings alot of the queries up to 2 seonds. A forth thread, up to 4 seconds, a fifth spikes some of the queries up to 50 seconds. The CPU is roofing while this is going on, even on one thread. My test app takes some due to the speedy loop, and sql the rest. Which leads me to the conclusion that it won't scale very well. At least not on my tested hardware. Are there ways to optimize the database, say storing an array of int's per user instead of one record per item. But this makes it harder to remove items.

Search Results

Search found 37883 results on 1516 pages for 'sparse files'.

Page 112/1516 | < Previous Page | 108 109 110 111 112 113 114 115 116 117 118 119 | Next Page >

- by alexBrand

- by ShallowThoughts

- by JGeZau

- by Guavaman

- by reve_etrange

- by wrongusername

- by AJ

- by sds

- by Brian MacKay

- by Urvi

- by Jaco Pretorius

- by Ghostrider

- by Brandon

- by david

- by WebDevHobo

- by AJ

- by Holicreature

- by Janwillhaus

- by Osiris

- by Ranger

- by Tom O'Connor

- by Atmocreations

- by Alexander

- by DevAno1

- by Mikael Svenson

< Previous Page | 108 109 110 111 112 113 114 115 116 117 118 119 | Next Page >