Search Results

Search found 18409 results on 737 pages for 'large projects'.

Page 58/737 | < Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >

how to find maximum frequent item sets from large transactional data file

- by ANIL MANE

Hi, I have the input file contains large amount of transactions like Transaction ID Items T1 Bread, milk, coffee, juice T2 Juice, milk, coffee T3 Bread, juice T4 Coffee, milk T5 Bread, Milk T6 Coffee, Bread T7 Coffee, Bread, Juice T8 Bread, Milk, Juice T9 Milk, Bread, Coffee, T10 Bread T11 Milk T12 Milk, Coffee, Bread, Juice i want the occurrence of every unique item like Item Name Count Bread 9 Milk 8 Coffee 7 Juice 6 and from that i want an a fp-tree now by traversing this tree i want the maximal frequent itemsets as follows The basic idea of method is to dispose nodes in each “layer” from bottom to up. The concept of “layer” is different to the common concept of layer in a tree. Nodes in a “layer” mean the nodes correspond to the same item and be in a linked list from the “Head Table”. For nodes in a “layer” NBN method will be used to dispose the nodes from left to right along the linked list. To use NBN method, two extra fields will be added to each node in the ordered FP-Tree. The field tag of node N stores the information of whether N is maximal frequent itemset, and the field count’ stores the support count information in the nodes at left. In Figure, the first node to be disposed is “juice: 2”. If the min_sup is equal to or less than 2 then “bread, milk, coffee, juice” is a maximal frequent itemset. Firstly output juice:2 and set the field tag of “coffee:3” as “false” (the field tag of each node is “true” initially ). Next check whether the right four itemsets juice:1 be the subset of juice:2. If the itemset one node “juice:1” corresponding to is the subset of juice:2 set the field tag of the node “false”. In the following process when the field tag of the disposed node is FALSE we can omit the node after the same tagging. If the min_sup is more than 2 then check whether the right four juice:1 is the subset of juice:2. If the itemset one node “juice:1” corresponding to is the subset of juice:2 then set the field count’ of the node with the sum of the former count’ and 2 After all the nodes “juice” disposed ,begin to dispose the node “coffee:3”. Any suggestions or available source code, welcome. thanks in advance

Read the article
Suggestions for designing large-scale Java webapp from the ground up

- by Chris Thompson

Hi all, I'm about to start developing a large-scale system and I'm struggling with which direction to proceed. I've done plenty of Java web apps before and I have plenty of experience with servlet containers and GWT and some experience with Spring. The problem is most of my webapps have been thrown together just to be a proof of concept and what I'm struggling with is what set of frameworks to use. I need to have both a browser based application as well as a web service designed to support access from mobile devices (Android and iPhone for now). Ideally, I'd like to design this system in such a way that I don't end up rewriting all of my servlets for each client (browser and phone) although I don't mind having some small checks in there to properly format the data. In addition, although I'm the only developer now, that won't necessarily be the case down the road and I'd like to design something that scales well both with regards to traffic and number of developers (isn't just a nightmare to maintain). So where I am now is planning on using GWT to design the browser-based interface but I'm struggling with how to reuse that code with to present the interface (most likely xml) for the mobile devices. Using GWT RPC would, I think, make it relatively easy to do all of the AJAX in the browser, but might make generating xml for the mobile phones difficult. In addition, I like the idea of using something like Hibernate for persistence and Spring Security to secure the whole thing. Again, I'm not sure how well those will cooperate with GWT (I think Hibernate should be fine...) There's obviously a lot more to this than I've presented here, but I've tried to give you the 5-minute overview. I'm a bit stumped and was wondering if anybody in the community had any experience starting from this place. Does what I'm trying to do make sense? Is it realistic? I have no doubt I can make all of these frameworks speak the same language, I'm just wondering if it's worth my time to fight with them. Also, am I missing a framework that would be really beneficial? Thanks in advance and sorry for the relatively broad question... Chris

Read the article
Performance issue when querying a large xml file through php/ajax on Apache Server

- by Niall

Hey, I have a simple "live search" (results displayed while typing) web site. This make up is Ajax to PHP querying a pretty large XML document (10,000+ lines). This is all been hosted on a local Apache server (xamp). The scale of the xml document seems to be causing huge performance issue with results taking 10ish seconds to give the results. I'm very new to PHP (this actually being my first play about) so there below is a snippet of code in case there is something obvious for($i=0; $i<($foodListXML->length); $i++){ $type=$foodListXML->item($i)->getElementsByTagName('type'); $foodnote=$foodListXML->item($i)->getElementsByTagName('foodnote'); $style=$foodListXML->item($i)->getElementsByTagName('style'); if ($type->item(0)->nodeType==1) { //find a link matching the search text if (stristr($type->item(0)->childNodes->item(0)->nodeValue,$q)){ $currentFoodName = $type->item(0)->childNodes->item(0)->nodeValue; $currentFoodStyle = $style->item(0)->childNodes->item(0)->nodeValue; $currentFoodNote = $foodnote->item(0)->childNodes->item(0)->nodeValue; if ($hint==""){ $hint= $currentFoodName . " , " . $currentFoodNote . " , " . $currentFoodStyle. "" . " " ; } else{ $hint=$hint . $currentFoodName . " , " . $currentFoodNote . " , " . $currentFoodStyle. "" . " " ; } } } } } Also if having the data in a DB and accessing that is faster, then I'm open to that.. All ideas really!! Thanks.

Read the article
WPF performance for large number of elements on the screen

- by Mark

Im currently trying to create a Scene in WPF where I have around 250 controls on my screen and the user can Pan and Zoom in and out of these controls using the mouse. I have run the WPF Performance Suite tools on the application when there are a large number of these controls on the screen (i.e. when the user has zoomed right out) the FPS drops down to around 15 which is not very good. Here is the basic outline of the XAML: <Window> <Window.Resources> <ControlTemplate x:Key="LandTemplate" TargetType="{x:Type local:LandControl}"> <Canvas> <Path Fill="White" Stretch="Fill" Stroke="Black" StrokeThickness="1" Width="55.5" Height="74.687" Data="M0.5,0.5 L55,0.5 L55,74.187 L0.5,74.187 z"/> <Canvas x:Name="DetailLevelCanvas" Width="24.5" Height="21" Canvas.Left="15.306" Canvas.Top="23.972"> <TextBlock Width="21" Height="14" Text="712" TextWrapping="Wrap" Foreground="Black"/> <TextBlock Width="17.5" Height="7" Canvas.Left="7" Canvas.Top="14" Text="614m2" TextWrapping="Wrap" FontSize="5.333" Foreground="Black"/> </Canvas> </Canvas> </ControlTemplate> </Window.Resources> ... <local:LandControl Width="55.5" Height="74.552" Canvas.Top="xxx" Template=" {StaticResource LandTemplate}" RenderTransformOrigin="0.5,0.5" Canvas.Left="xxx"> <local:LandControl Width="55.5" Height="74.552" Canvas.Top="xxx" Template=" {StaticResource LandTemplate}" RenderTransformOrigin="0.5,0.5" Canvas.Left="xxx"> <local:LandControl Width="55.5" Height="74.552" Canvas.Top="xxx" Template=" {StaticResource LandTemplate}" RenderTransformOrigin="0.5,0.5" Canvas.Left="xxx"> <local:LandControl Width="55.5" Height="74.552" Canvas.Top="xxx" Template=" {StaticResource LandTemplate}" RenderTransformOrigin="0.5,0.5" Canvas.Left="xxx"> ... and so on... </Window> Ive tried to minimise the details in the control template and I even did a massive find and replace of the controls to just put their raw elements inline instead of using a template, but with no noticeable performance improvements. I have seen other SO questions about this and people say to do custom drawing, but I dont really see how that make sense when you have to zoom and pan like I do. If anyone can help out here, that would be great! Mark

Read the article
Importing a large delimited file to a MySQL table

- by Tom

I have this large (and oddly formatted txt file) from the USDA's website. It is the NUT_DATA.txt file. But the problem is that it is almost 27mb! I was successful in importing the a few other smaller files, but my method was using file_get_contents which it makes sense why an error would be thrown if I try to snag 27+ mb of RAM. So how can I import this massive file to my MySQL DB without running into a timeout and RAM issue? I've tried just getting one line at a time from the file, but this ran into timeout issue. Using PHP 5.2.0. Here is the old script (the fields in the DB are just numbers because I could not figure out what number represented what nutrient, I found this data very poorly document. Sorry about the ugliness of the code): <? $file = "NUT_DATA.txt"; $data = split("\n", file_get_contents($file)); // split each line $link = mysql_connect("localhost", "username", "password"); mysql_select_db("database", $link); for($i = 0, $e = sizeof($data); $i < $e; $i++) { $sql = "INSERT INTO `USDA` (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17) VALUES("; $row = split("\^", trim($data[$i])); // split each line by carrot for ($j = 0, $k = sizeof($row); $j < $k; $j++) { $val = trim($row[$j], '~'); $val = (empty($val)) ? 0 : $val; $sql .= ((empty($val)) ? 0 : $val) . ','; // this gets rid of those tildas and replaces empty strings with 0s } $sql = rtrim($sql, ',') . ");"; mysql_query($sql) or die(mysql_error()); // query the db } echo "Finished inserting data into database.\n"; mysql_close($link); ?>

Read the article
How to maxmise the largest contiguous block of memory in the Large Object Heap

- by Unsliced

The situation is that I am making a WCF call to a remote server which is returns an XML document as a string. Most of the time this return value is a few K, sometimes a few dozen K, very occasionally a few hundred K, but very rarely it could be several megabytes (first problem is that there is no way for me to know). It's these rare occasions that are causing grief. I get a stack trace that starts: System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.Xml.BufferBuilder.AddBuffer() at System.Xml.BufferBuilder.AppendHelper(Char* pSource, Int32 count) at System.Xml.BufferBuilder.Append(Char[] value, Int32 start, Int32 count) at System.Xml.XmlTextReaderImpl.ParseText() at System.Xml.XmlTextReaderImpl.ParseElementContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.XmlTextReader.Read() at System.Xml.XmlReader.ReadElementString() at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderMDRQuery.Read2_getMarketDataResponse() at Microsoft.Xml.Serialization.GeneratedAssembly.ArrayOfObjectSerializer2.Deserialize(XmlSerializationReader reader) at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events) at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle) at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall) at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters) I've read around and it is because the Large Object Heap is just getting too fragmented, so even preceding the call with a quick check to StringBuilder.EnsureCapacity just causes the OutOfMemoryException to be thrown earlier (and because I'm guessing at what's needed, it might not actually need that much so my check is causing more problems than it is solving). Some opinions are that there's not much I can do about it. Some of the questions I've asked myself: Use less memory - have you checked for leaks? Yes. The memory usage goes up and down, but there's no fundamental growth that guarantees this to happen. Some of the times it fails, it succeeded at that stage previously. Transfer smaller amounts Not an option, this is a third party web service over which I have no control (or at least it would take a long time to resolve, in the meantime I still have a problem) Can you do something to the LOH to make it less likely to fail? ... now this is most fruitful course. It's a 32-bit process (it has to be for various political, technical and boring reasons) but there's normally hundreds of meg free (multiples of the largest amount for which we've seen failures). Can we monitor the LOH? Using perfmon I can track the size of the heaps, but I don't think there's a way to monitor the largest available contiguous block of memory. Question is: any advice or suggestions for things to try?

Read the article
Large number of UPDATE queries slowing down page

- by Bryan Lewis

I am reading and validating large fixed-width text files (range from 10-50K lines) that are submitted via our ASP.net website (coded in VB.Net). I do an initial scan of the file to check for basic issues (line length, etc). Then I import each row into a MS SQL table. Each DB rows basically consists of a record_ID (Primary, auto-incrementing) and about 50 varchar fields. After the insert is done, I run a validation function on the file that checks each field in each row based on a bunch of criteria (trimmed length, isnumeric, range checks, etc). If it finds an error in any field, it inserts a record into the Errors table, which has an error_ID, the record_ID and an error message. In addition, if the field fails in a particular way, I have to do a "reset" on that field. A reset might consist of blanking the entire field, or simply replacing the value with another value (e.g. replacing the string with a new one that has all illegals chars taken out). I have a 5,000 line test file. The upload, initial check, and import takes about 5-6 seconds. The detailed error check and insert into the Errors table takes about 5-8 seconds (this file has about 1200 errors in it). However, the "resets" part takes about 40-45 seconds for 750 fields that need to be reset. When I comment out the resets function (returning immediately without actually calling the UPDATE stored proc), the process is very fast. With the resets turned on, the pages take 50 seconds to return. My UPDATE stored proc is using some recommended code from http://sommarskog.se/dynamic_sql.html, whereby it uses CASE instead of dynamic SQL: UPDATE dbo.Records SET dbo.Records.file_ID = CASE @field_name WHEN 'file_ID' THEN @field_value ELSE file_ID END, . . (all 50 varchar field CASE statements here) . WHERE dbo.Records.record_ID = @record_ID Is there any way I can help my performance here. Can I somehow group all of these UPDATE calls into a single transaction? Should I be reworking the UPDATE query somehow? Or is it just sheer quantity of 750+ UPDATEs and things are just slow (it's a quad proc server with 8GB ram). Any suggestions appreciated.

Read the article
How to remove the explicit dependencies to other projects' libraries in Eclipse launch configuration

- by euluis

In Eclipse it is possible to create launch configurations in a project, specifying the runtime dependencies from another project. A problem I found was that if you have a multiple project workspace, being possible that each project has its own libraries, it is easy to add explicit dependencies in a secondary project to libraries that are of another project and therefore subject to change. An example of this problem follows: proj1 +-- src +-- lib +-- jar1-v1.0.jar +-- jar2-v1.0.jar proj2 +-- src +-- proj2-tests.launch I don't have a dependency from the code in proj2/src to the libraries in proj1/lib. Nevertheless, I do have a dependency from proj2/src to proj1/src, although since there is an internal dependency in the code in proj1/src to its libraries jar1-v1.0.jar and jar2.v1.0.jar, I have to add a dependency in proj2-tests.launch to the libraries in proj1/lib. This translates to the following ugly lines in proj2-tests.launch: <listEntry value="<?xml version="1.0" encoding="UTF-8" standalone="no"?> <runtimeClasspathEntry path="3" projectName="proj1" type="1"/> "/> <listEntry value="<?xml version="1.0" encoding="UTF-8" standalone="no"?> <runtimeClasspathEntry internalArchive="/proj1/lib/jar1-v1.0.jar" path="3" type="2"/> "/> <listEntry value="<?xml version="1.0" encoding="UTF-8" standalone="no"?> <runtimeClasspathEntry internalArchive="/proj1/lib/jar2-v1.0.jar" path="3" type="2"/> "/> This wouldn't be a big problem if there wasn't the need from time to time to evolve the software, upgrade the libraries and etc. Consider the common need to upgrade the libraries jar1-v1.0.jar and jar2-v1.0.jar to their versions v1.1. Consider that you have about 10 projects in one workspace, having about 5 libraries each and about 4 launch configurations. You get a maintenance overhead in doing a simple upgrade of a library, which normally must imply changes in files for which there wasn't the need for. Or maybe I'm doing something wrong... What I would like to state is proj2 depends on proj1 and on its libraries and having this translated to simply that in the *.launch files. Is that possible?

Read the article
Best practices for displaying large number of images as thumbnails in c#

- by andySF

I got to a point where it's very difficult to get answers by debugging and tracing object, so i need some help. What I'm trying to do: A history form for my screen capture pet project. The history must list all images as thumbnails (ex: picasa). What I've done: I created a HistoryItem:UserControl. This history item has a few buttons, a check box, a label and a picture box. The buttons are for delete/edit/copy image. The check box is used for selecting one or more images and the label is for some info text. The picture box is getting the image from a public property that is a path and a method creates a proportional thumbnail to display it when the control has been loaded. This user control has two public events. One for deleting the image and one for bubbling the events for mouse enter and mouse leave trough all controls. For this I use EventBroadcastProvider. The bubbling is useful because wherever I move the mouse over the control, the buttons appear. The dispose method has been extended and I manually remove the events. All images are loaded by looping a xml file that contains the path of all images. For each image in this XML I create a new HitoryItem that is added (after a little coding to sort and limit the amount of images loaded) to a flow layout panel. The problem: When I lunch the history form, and the flow layout panel is populated with my HistoryItem custom control, my memory usage increases drastically.From 14Mb to around 100MB with 100 images loaded. By closing the history form and disposing whatever I could dispose and even trying to call GC.Collect() the memory increase remain. I search for any object that could not be disposed properly like an image or event but wherever I used them they are disposed. The problem seams to be from multiple sources. One is that the events for bubbling are not disposing properly, and the other is from the picture box itself. All of this i could see by commenting all the code to a limited version when only the custom control without any image processing and even events is loaded. Without the events the memory consumption is reduced by axiomatically 20%. So my real question is if this logic, flow layout panels and custom controls with picture boxes, is the best solution for displaying large amounts of images as thumbnails. Thank you!

Read the article
Large memory chunk not garbage collected

- by Niels

In a hunt for a memory-leak in my app I chased down a behaviour I can't understand. I allocate a large memory block, but it doesn't get garbage-collected resulting in a OOM, unless I explicit null the reference in onDestroy. In this example I have two almost identical activities that switch between each others. Both have a single button. On pressing the button MainActivity starts OOMActivity and OOMActivity returns by calling finish(). After pressing the buttons a few times, Android throws a OOMException. If i add the the onDestroy to OOMActivity and explicit null the reference to the memory chunk, I can see in the log that the memory is correctly freed. Why doesn't the memory get freed automatically without the nulling? MainActivity: package com.example.oom; import android.app.Activity; import android.content.Intent; import android.os.Bundle; import android.view.View; import android.view.View.OnClickListener; import android.widget.Button; public class MainActivity extends Activity implements OnClickListener { private int buttonId; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); System.gc(); Button OOMButton = new Button(this); OOMButton.setText("OOM"); buttonId = OOMButton.getId(); setContentView(OOMButton); OOMButton.setOnClickListener(this); } @Override public void onClick(View v) { if (v.getId() == buttonId) { Intent leakIntent = new Intent(this, OOMActivity.class); startActivity(leakIntent); } } } OOMActivity: public class OOMActivity extends Activity implements OnClickListener { private static final int WASTE_SIZE = 20000000; private byte[] waste; private int buttonId; protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); Button BackButton = new Button(this); BackButton.setText("Back"); buttonId = BackButton.getId(); setContentView(BackButton); BackButton.setOnClickListener(this); waste = new byte[WASTE_SIZE]; } public void onClick(View view) { if (view.getId() == buttonId) { finish(); } } }

Read the article
Is it possible to download a large database using mysql query

- by Rose

i am downloading files from server using WinSCP.Is it possible to write a query to download a large database using mysql query? Or using any other method i have tried with this code but i am not able to get the whole database structure <?php if(file_exists('backup_sql/my_backup.zip')) { unlink('backup_sql/my_backup.zip'); } $tables='*'; $host='MY HOST NAME'; $user='MY_USERNAME'; $pass='MYPASSWORD'; $name='MY_DB_NAME'; $link = mysql_connect($host,$user,$pass); mysql_select_db($name,$link); //get all of the tables if($tables == '*') { $tables = array(); $result = mysql_query('SHOW TABLES'); while($row = mysql_fetch_row($result)) { $tables[] = $row[0]; } } else { $tables = is_array($tables) ? $tables : explode(',',$tables); } $return=''; //cycle through foreach($tables as $table) { $result = mysql_query('SELECT * FROM '.$table); $num_fields = mysql_num_fields($result); //$return.= 'DROP TABLE '.$table.';'; $row2 = mysql_fetch_row(mysql_query('SHOW CREATE TABLE '.$table)); $return.= "\n\n".$row2[1].";\n\n"; for ($i = 0; $i < $num_fields; $i++) { while($row = mysql_fetch_row($result)) { $return.= 'INSERT INTO '.$table.' VALUES('; for($j=0; $j<$num_fields; $j++) { $row[$j] = addslashes($row[$j]); //$row[$j] = ereg_replace("\n","\\n",$row[$j]); if (isset($row[$j])) { $return.= '"'.$row[$j].'"' ; } else { $return.= '""'; } if ($j<($num_fields-1)) { $return.= ','; } } $return.= ");\n"; } } $return.="\n\n\n"; } $rand_var=time(); $files_to_zip = array( "'backup_sql/db-backup-'.$rand_var.'.sql'", ); $name = 'db-backup-'.$rand_var.'.sql'; $data = $return; ?> any one please help me... thank you

Read the article
Fast, Unicode-capable, cross-platform programmer's text editor that shows invisibles like ZWSP?

- by Roger_S

Our publishing workflow includes Windows and Linux machines (there are some Macs too, but not in the critical-path workflow). Many texts include both English and Khmer and are marked-up in XML. XML Copy Editor is the best cross-platform open-source XML editor I've discovered. It utilizes the Scintilla editing component, which is generally good with Unicode but which does not enable non-printing or invisible characters like U+200B (zero-width space) and U+200C (zero-width non-joiner) to be displayed. Khmer does not separate words with a space character as Western languages do, so ZWSP is used in electronic texts to enable applications to break lines easily. Ideally I'd edit the markup and the content in a single editor, but XML awareness is less important at times than being able to display invisibles. (OpenOffice.org Writer and Microsoft Word are the only two apps I know that will display ZWSP. They are not suitable for the markup and text manipulations that need to be done to prepare manuscripts for publication, unfortunately, although I guess they're fine for authoring.) I tried out a promising editor last week, but a search-and-replace regex operation that took under a second in TextPad 4.7.3 lasted over twenty seconds. So I want to mention that speed and the ability to handle large (up to 150mb) files is also a concern. Is there a good, fast, free or not too expensive text editor, with versions on Windows and Linux and maybe mac too, Unicode-aware and capable of displaying invisibles like ZWSP? That has syntax highlighting, can handle large files and is customizable enough that I won't tear my hair out in frustration? Thanks, Roger_S

Read the article
Reading and writing in parallel

- by Malfist

I want to be able to read and write a large file in parallel, or if not in parallel, at least in blocks so that I don't use up so much memory. This is my current code: // Define memory stream which will be used to hold encrypted data. MemoryStream memoryStream = new MemoryStream(); // Define cryptographic stream (always use Write mode for encryption). CryptoStream cryptoStream = new CryptoStream(memoryStream, encryptor, CryptoStreamMode.Write); //start encrypting using (BinaryReader reader = new BinaryReader(File.Open(fileIn, FileMode.Open))) { byte[] buffer = new byte[1024 * 1024]; int read = 0; do { read = reader.Read(buffer, 0, buffer.Length); cryptoStream.Write(buffer, 0, read); } while (read == buffer.Length); } // Finish encrypting. cryptoStream.FlushFinalBlock(); // Convert our encrypted data from a memory stream into a byte array. //byte[] cipherTextBytes = memoryStream.ToArray(); //write our memory stream to a file memoryStream.Position = 0; using (BinaryWriter writer = new BinaryWriter(File.Open(fileOut, FileMode.Create))) { byte[] buffer = new byte[1024 * 1024]; int read = 0; do { read = memoryStream.Read(buffer, 0, buffer.Length); writer.Write(buffer, 0, read); } while (read == buffer.Length); } // Close both streams. memoryStream.Close(); cryptoStream.Close(); As you can see, it reads the entire file into memory, encrypts it, then writes it out. If I happen to be encrypting files that are very large (2GB+) it tends not to work, or at the very least, consumes ~97% of my memory. How could I do it in a more effective manner?

Read the article
24TB RAID 6 configuration

- by Phil

I am in charge of a new website in a niche industry that stores lots of data (10+ TB per client, growing to 2 or 3 clients soon). We are considering ordering about $5000 worth of 3TB drives (10 in a RAID 6 configuration and 10 for backup), which will give us approximately 24 TB of production storage. The data will be written once and remain unmodified for the lifetime of the website, so we only need to do a backup one time. I understand basic RAID theory, however I am not experienced with it. My question is, does this sound like a good configuration? What potential problems could this setup cause? Also, what is the best way to do a one-time backup? Have two RAID 6 arrays, one for offsite backup and one for production? Or should I backup the RAID 6 production array to a JBOD? EDIT: The data server is running Windows 2008 Server x64. EDIT 2: To reduce rebuild time, what would you think about using two RAID 5's instead of one RAID 6?

Read the article
rm on a directory with millions of files

- by BMDan

Background: physical server, about two years old, 7200-RPM SATA drives connected to a 3Ware RAID card, ext3 FS mounted noatime and data=ordered, not under crazy load, kernel 2.6.18-92.1.22.el5, uptime 545 days. Directory doesn't contain any subdirectories, just millions of small (~100 byte) files, with some larger (a few KB) ones. We have a server that has gone a bit cuckoo over the course of the last few months, but we only noticed it the other day when it started being unable to write to a directory due to it containing too many files. Specifically, it started throwing this error in /var/log/messages: ext3_dx_add_entry: Directory index full! The disk in question has plenty of inodes remaining: Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda3 60719104 3465660 57253444 6% / So I'm guessing that means we hit the limit of how many entries can be in the directory file itself. No idea how many files that would be, but it can't be more, as you can see, than three million or so. Not that that's good, mind you! But that's part one of my question: exactly what is that upper limit? Is it tunable? Before I get yelled at--I want to tune it down; this enormous directory caused all sorts of issues. Anyway, we tracked down the issue in the code that was generating all of those files, and we've corrected it. Now I'm stuck with deleting the directory. A few options here: rm -rf (dir)I tried this first. I gave up and killed it after it had run for a day and a half without any discernible impact. unlink(2) on the directory: Definitely worth consideration, but the question is whether it'd be faster to delete the files inside the directory via fsck than to delete via unlink(2). That is, one way or another, I've got to mark those inodes as unused. This assumes, of course, that I can tell fsck not to drop entries to the files in /lost+found; otherwise, I've just moved my problem. In addition to all the other concerns, after reading about this a bit more, it turns out I'd probably have to call some internal FS functions, as none of the unlink(2) variants I can find would allow me to just blithely delete a directory with entries in it. Pooh. while [ true ]; do ls -Uf | head -n 10000 | xargs rm -f 2/dev/null; done ) This is actually the shortened version; the real one I'm running, which just adds some progress-reporting and a clean stop when we run out of files to delete, is: export i=0; time ( while [ true ]; do ls -Uf | head -n 3 | grep -qF '.png' || break; ls -Uf | head -n 10000 | xargs rm -f 2/dev/null; export i=$(($i+10000)); echo "$i..."; done ) This seems to be working rather well. As I write this, it's deleted 260,000 files in the past thirty minutes or so. Now, for the questions: As mentioned above, is the per-directory entry limit tunable? Why did it take "real 7m9.561s / user 0m0.001s / sys 0m0.001s" to delete a single file which was the first one in the list returned by "ls -U", and it took perhaps ten minutes to delete the first 10,000 entries with the command in #3, but now it's hauling along quite happily? For that matter, it deleted 260,000 in about thirty minutes, but it's now taken another fifteen minutes to delete 60,000 more. Why the huge swings in speed? Is there a better way to do this sort of thing? Not store millions of files in a directory; I know that's silly, and it wouldn't have happened on my watch. Googling the problem and looking through SF and SO offers a lot of variations on "find" that obviously have the wrong idea; it's not going to be faster than my approach for several self-evident reasons. But does the delete-via-fsck idea have any legs? Or something else entirely? I'm eager to hear out-of-the-box (or inside-the-not-well-known-box) thinking. Thanks for reading the small novel; feel free to ask questions and I'll be sure to respond. I'll also update the question with the final number of files and how long the delete script ran once I have that. Final script output!: 2970000... 2980000... 2990000... 3000000... 3010000... real 253m59.331s user 0m6.061s sys 5m4.019s So, three million files deleted in a bit over four hours.

Read the article
undelete big files - mission impossible?

- by johnrembo

Hi, I've accidentaly deleted outlook.pst (6.7GB) file, while there was only 400MB free space left on primary NTFS partition (winxp). I've tried several recovery tools to get this file back. "Ontrack Easy Recovery Pro" found 0 pst files (complete scan mode), while "Recover My Files" in sector scan mode found 5 pst's, but 4 of them of sizes from 3 to 28 KB, while the 5th one - 1Gb. I've managed to succesfuly recover 1Gb pst file, which was 1 year old copy (the one used after the latest windows reinstall). Now, I'm frustrated and confused Why 1 year old file was succesfuly recovered if there were only 400MB left on primary partition? Where's 6.7GB file gone? I did some reading (i.e. here), and it seems that there's almost no probability to retrieve the file I'm looking for, but wait - none of recovery tools i've used found zero-sized pst file, moreover - if due to fragmentation a file might be corrupted - we could use scanpst.exe to fix some errors and survive with 10 or 100 emails missing - whatever. Could you please recommend some more sophisticated recovery tools for this particular task? Appretiate your help - thanks in advance

Read the article
Cannot copy files from external hard drive to desktop hard drive in Window 7

- by Mohammad Reza Selim

I'm trying to copy some old files from one of my external hard-drives to the hard drive of my desktop PC. Some files can not be copied but giving the error like 'Cannot read from source file or disk'. Those files are videos files (.DAT, .VOB, .MPG) and I watched them all the way through with no issues so the files aren't corrupted. I'm running Windows 7, with admin permissions. Could any one let me know the reason and a solution?

Read the article
How to uncompress a 9GB file in Windows FAT32

- by Kashif

I have a 2GB RAR file that contains a 9GB video file. I'm using a FAT32 file system. Now I want to unzip that file but after 4GB I get an error due to the FAT32 file size limit. Now I want to know that how I can extract that video? I know that one way is to convert my partition to NTFS but I don't want to follow that way. I've also tried 7-zip but that again gives error after 4GB. One other way is to split that file but I don't know how I can split a video file that is zipped. So any idea please? How can I get rid of this problem.

Read the article
Advise on a 240,000 sqft outdoor wireless network

- by whlspacedude

I would be very appreciative of some advice in the purchase of equipment to provide a wireless network that covers the entire area of an outdoor arena. The area is rectangular-ish in shape. 400ft wide and 600ft long. It has 6 light towers, 1 on each of the 400 foot ends and 2 on each of the 600 foot ends. I can mount on anything and spend as much money as needed. The needs of the network would be to provide access for, up to 15 wireless HD cameras with audio, and a public-wifi network. Can someone point me in the right direction as far as equipment and antennas ? I can provide any additional information that you may need.

Read the article
4TB HGST SATA drive only shows 1.62 TB in Windows Server 2012

- by user136085

I'm using a Supermicro X9SRE-3F motherboard with the latest BIOS and 2x 4TB drives connected to the on-board SATA controller. If I set the BIOS to RAID and create a RAID 1 array, the array shows up in the BIOS as 3.6TB. However when I boot Windows (on a separate RAID 1 array), the 4TB drives show up individually in disk manager as 2x 1.62TB drives. I could use Windows 2012 to set up software RAID 1, but when I set the BIOS back to 2x individual drives, they still show up in Windows as 2x 1.62TB drives. How do I access the full capacity of these drives? Thanks, Brian Bulaw

Read the article
Delete files from directory: memory exhausted

- by codeholic

This question is a logical continuation of http://serverfault.com/questions/45245/how-can-i-delete-all-files-from-a-directory-when-it-reports-argument-list-too-lo I have drwxr-xr-x 2 doreshkin doreshkin 198291456 Apr 6 21:35 session_data I tried find session_data -type f -delete find session_data -type f | xargs rm -f find session_data -maxdepth 1 -type f -print0 | xargs -r0 rm -f The result is the same: find: memory exhausted What can I do to remove this directory?

Read the article
Does NTFS performance degrade significantly in volumes larger than five or six TB?

- by Josh Yeager

One of my customers is planning to set up a new document store, which will probably grow by 1-2TB per year. One of my co-workers says that Windows performance is extremely bad if it has a single NTFS volume that is bigger than five or six TB. He thinks that we need to set up their system with multiple volumes so that no single volume will exceed that limit. Is this a real problem? Does Windows or NTFS slow down when the volume size reaches several terabytes? Or is it possible to create a single volume of 10 or more TB?

Read the article
How to burn a 8.5GB ISO?

- by Vilx-

I have an ISO image of a DVD which is 8.5GB in size. I find this strange, because that is about 500MB more than a standard DL DVD can hold. I tried overburning with Nero, but it failed. Is it possible to somehow burn such an image? Are there some special DVD blanks that allow you to write more? Or is this ISO simply made by some tool without any regards of whether it can be burned or not?

Read the article
Easily Plotting Multiple Data Series in Excel

- by John

I really need help figuring out how to speed up graphing multiple series on a graph. I have seperate devices that give monthly readings for several variables like pressure, temperature, and salinity. Each of these variables is going to be its own graph with devices being the series. My x-axis is going to be the dates that these values were taken. The problem is that it takes ages to do this for each spreadsheet since I have monthly dates from 1950 up to the present and I have about 50 devices in each spreadsheet. I also have graphs for calculated values that are in columns next to them. Each of these devices is going to become a data series in the graph. E.g. In one of my graphs I have all the pressures from the devices and each of the data series' names is the name of the device. I want a fast way to do this. Doing this manually is taking a very long time. Please help! Is there any easier way to do this? It is consistent and the dates all line up. I am just repeating the same clicks over and over again Thank you!

Read the article
Software Engineering Practices – Different Projects should have different maturity levels

- by Dylan Smith

I’ve had a lot of discussions at the office lately about the drastically different sets of software engineering practices used on our various projects, if what we are doing is appropriate, and what factors should you be considering when determining what practices are most appropriate in a given context. I wanted to write up my thoughts in a little more detail on this subject, so here we go: If you compare any two software projects (specifically comparing their codebases) you’ll often see very different levels of maturity in the software engineering practices employed. By software engineering practices, I’m specifically referring to the quality of the code and the amount of technical debt present in the project. Things such as Test Driven Development, Domain Driven Design, Behavior Driven Development, proper adherence to the SOLID principles, etc. are all practices that you would expect at the mature end of the spectrum. At the other end of the spectrum would be the quick-and-dirty solutions that are done using something like an Access Database, Excel Spreadsheet, or maybe some quick “drag-and-drop coding”. For this blog post I’m going to refer to this as the Software Engineering Maturity Spectrum (SEMS). I believe there is a time and a place for projects at every part of that SEMS. The risks and costs associated with under-engineering solutions have been written about a million times over so I won’t bother going into them again here, but there are also (unnecessary) costs with over-engineering a solution. Sometimes putting multiple layers, and IoC containers, and abstracting out the persistence, etc is complete overkill if a one-time use Access database could solve the problem perfectly well. A lot of software developers I talk to seem to automatically jump to the very right-hand side of this SEMS in everything they do. A common rationalization I hear is that it may seem like a small trivial application today, but these things always grow and stick around for many years, then you’re stuck maintaining a big ball of mud. I think this is a cop-out. Sure you can’t always anticipate how an application will be used or grow over its lifetime (can you ever??), but that doesn’t mean you can’t manage it and evolve the underlying software architecture as necessary (even if that means having to toss the code out and re-write it at some point…maybe even multiple times). My thoughts are that we should be making a conscious decision around the start of each project approximately where on the SEMS we want the project to exist. I believe this decision should be based on 3 factors: 1. Importance - How important to the business is this application? What is the impact if the application were to suddenly stop working? 2. Complexity - How complex is the application functionality? 3. Life-Expectancy - How long is this application expected to be in use? Is this a one-time use application, does it fill a short-term need, or is it more strategic and is expected to be in-use for many years to come? Of course this isn’t an exact science. You can’t say that Project X should be at the 73% mark on the SEMS and expect that to be helpful. My point is not that you need to precisely figure out what point on the SEMS the project should be at then translate that into some prescriptive set of practices and techniques you should be using. Rather my point is that we need to be aware that there is a spectrum, and that not everything is going to be (or should be) at the edges of that spectrum, indeed a large number of projects should probably fall somewhere within the middle; and different projects should adopt a different level of software engineering practices and maturity levels based on the needs of that project. To give an example of this way of thinking from my day job: Every couple of years my company plans and hosts a large event where ~400 of our customers all fly in to one location for a multi-day event with various activities. We have some staff whose job it is to organize the logistics of this event, which includes tracking which flights everybody is booked on, arranging for transportation to/from airports, arranging for hotel rooms, name tags, etc The last time we arranged this event all these various pieces of data were tracked in separate spreadsheets and reconciliation and cross-referencing of all the data was literally done by hand using printed copies of the spreadsheets and several people sitting around a table going down each list row by row. Obviously there is some room for improvement in how we are using software to manage the event’s logistics. The next time this event occurs we plan to provide the event planning staff with a more intelligent tool (either an Excel spreadsheet or probably an Access database) that can track all the information in one location and make sure that the various pieces of data are properly linked together (so for example if a person cancels you only need to delete them from one place, and not a dozen separate lists). This solution would fall at or near the very left end of the SEMS meaning that we will just quickly create something with very little attention paid to using mature software engineering practices. If we examine this project against the 3 criteria I listed above for determining it’s place within the SEMS we can see why: Importance – If this application were to stop working the business doesn’t grind to a halt, revenue doesn’t stop, and in fact our customers wouldn’t even notice since it isn’t a customer facing application. The impact would simply be more work for our event planning staff as they revert back to the previous way of doing things (assuming we don’t have any data loss). Complexity – The use cases for this project are pretty straightforward. It simply needs to manage several lists of data, and link them together appropriately. Precisely the task that access (and/or Excel) can do with minimal custom development required. Life-Expectancy – For this specific project we’re only planning to create something to be used for the one event (we only hold these events every 2 years). If it works well this may change (see below). Let’s assume we hack something out quickly and it works great when we plan the next event. We may decide that we want to make some tweaks to the tool and adopt it for planning all future events of this nature. In that case we should examine where the current application is on the SEMS, and make a conscious decision whether something needs to be done to move it further to the right based on the new objectives and goals for this application. This may mean scrapping the access database and re-writing it as an actual web or windows application. In this case, the life-expectancy changed, but let’s assume the importance and complexity didn’t change all that much. We can still probably get away with not adopting a lot of the so-called “best practices”. For example, we can probably still use some of the RAD tooling available and might have an Autonomous View style design that connects directly to the database and binds to typed datasets (we might even choose to simply leave it as an access database and continue using it; this is a decision that needs to be made on a case-by-case basis). At Anvil Digital we have aspirations to become a primarily product-based company. So let’s say we use this tool to plan a handful of events internally, and everybody loves it. Maybe a couple years down the road we decide we want to package the tool up and sell it as a product to some of our customers. In this case the project objectives/goals change quite drastically. Now the tool becomes a source of revenue, and the impact of it suddenly stopping working is significantly less acceptable. Also as we hold focus groups, and gather feedback from customers and potential customers there’s a pretty good chance the feature-set and complexity will have to grow considerably from when we were using it only internally for planning a small handful of events for one company. In this fictional scenario I would expect the target on the SEMS to jump to the far right. Depending on how we implemented the previous release we may be able to refactor and evolve the existing codebase to introduce a more layered architecture, a robust set of automated tests, introduce a proper ORM and IoC container, etc. More likely in this example the jump along the SEMS would be so large we’d probably end up scrapping the current code and re-writing. Although, if it was a slow phased roll-out to only a handful of customers, where we collected feedback, made some tweaks, and then rolled out to a couple more customers, we may be able to slowly refactor and evolve the code over time rather than tossing it out and starting from scratch. The key point I’m trying to get across is not that you should be throwing out your code and starting from scratch all the time. But rather that you should be aware of when and how the context and objectives around a project changes and periodically re-assess where the project currently falls on the SEMS and whether that needs to be adjusted based on changing needs. Note: There is also the idea of “spectrum decay”. Since our industry is rapidly evolving, what we currently accept as mature software engineering practices (the right end of the SEMS) probably won’t be the same 3 years from now. If you have a project that you were to assess at somewhere around the 80% mark on the SEMS today, but don’t touch the code for 3 years and come back and re-assess its position, it will almost certainly have changed since the right end of the SEMS will have moved farther out (maybe the project is now only around 60% due to decay). Developer Skills Another important aspect to this whole discussion is around the skill sets of your architects and lead developers. When talking about the progression of a developers skills from junior->intermediate->senior->… they generally start by only being able to write code that belongs on the left side of the SEMS and as they gain more knowledge and skill they become capable of working at a higher and higher level along the SEMS. We all realize that the learning never stops, but eventually you’ll get to the point where you can comfortably develop at the right-end of the SEMS (the exact practices and techniques that translates to is constantly changing, but that’s not the point here). A critical skill that I’d love to see more evidence of in our industry is the most senior guys not only being able to work at the right-end of the SEMS, but more importantly be able to consciously work at any point along the SEMS as project needs dictate. An even more valuable skill would be if you could make the conscious decision to move a projects code further right on the SEMS (based on changing needs) and do so in an incremental manner without having to start from scratch. An exercise that I’m planning to go through with all of our projects here at Anvil in the near future is to map out where I believe each project currently falls within this SEMS, where I believe the project *should* be on the SEMS based on the business needs, and for those that don’t match up (i.e. most of them) come up with a plan to improve the situation.

Read the article

Search Results

Search found 18409 results on 737 pages for 'large projects'.

Page 58/737 | < Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >

- by ANIL MANE

- by Chris Thompson

- by Niall

- by Mark

- by Tom

- by Unsliced

- by Bryan Lewis

- by euluis

- by andySF

- by Niels

- by Rose

- by Roger_S

- by Malfist

- by Phil

- by BMDan

- by johnrembo

- by Mohammad Reza Selim

- by Kashif

- by whlspacedude

- by user136085

- by codeholic

- by Josh Yeager

- by Vilx-

- by John

- by Dylan Smith

< Previous Page | 54 55 56 57 58 59 60 61 62 63 64 65 | Next Page >