Search Results

Search found 59041 results on 2362 pages for 'data replication'.

Page 430/2362 | < Previous Page | 426 427 428 429 430 431 432 433 434 435 436 437  | Next Page >

  • Is a blob more efficient than a varchar for data that can be ANY size?

    - by BillyNair
    When setting up a database I want to use the most efficient data type for potentially fairly long data. Currently my project is to store song titles and thoughts pertaining to that song. Some titles might be 5 characters or longer than 100 characters and the thoughts could run pretty long. Is it more efficient to use a varchar set to 8000 or to use a blob? Is using a blob the same as a varchar, in that there is a set size it is allocated regardless of what it holds? or is it just a pointer and it doesn't really use much space on the table? Is there a certain set size of a blob in KB or is it expandable?

    Read the article

  • Zelda-style top-down RPG. How to store tile and collision data?

    - by Delerat
    I'm looking to build a Zelda: LTTP style top-down RPG. I've read a lot on the subject and am currently going back and forth on a few solutions. I'm using C#, MonoGame, and Tiled. For my tile maps, these are the choices I can see in front of me: Store each tile as its own array. Each one having 3-4 layers, texture/animation, depth, flags, and maybe collision(depending on how I do it). I've read warning about memory issues going this route, and my biggest map will probably be 160x120 tiles. My average map however will be about 40x30. The number of tiles might cut in half if I decide to double my tile size, which is currently 16x16. This is the most appealing approach for me, as I feel like I would know how to save maps, make changes, and separate it into chunks for collision checks. Store the static parts of my tile map in multiple arrays acting as the different layers. Then I would just use entities for anything that wasn't static. All of the other tile data such as collisions, depth, etc., would be stored in their own layers as well I guess? This way just seems messy to me though. Regardless of which one I choose, I'm also unsure how to plan all of that other tile data. I could write a bunch of code that would know which integer represents what tile and it's data, but if I changed a tileset in Tiled and exported it again, all of those integers could potentially change and I'd have to adjust a whole bunch of code. My other issue is about how I could do collision. I want to at least support angled collision that slides you around the corners of objects like LTTP does, if not more oddball shapes as well. So do I: Store collision as a flag for binary collision. Could I get this to support angles? Would it be fine to store collision as an integer and have each number represent a certain angle of collision? Store a list of rectangles or other shapes and do collision that way? Sorry for the large two-part(three-part?) question. I felt like these needed to be asked together as I believe each choice influences the other.

    Read the article

  • How can I fix a very broken Ubuntu installation without losing data?

    - by jredkai
    Okay guys, I was installing a program (I do not remember the name). When I did sudo apt-get update I was given missing dependencies. It told me to sudo apt-get install -f which deleted just about every dependency needed for Ubuntu, now I cannot log in or anything, now in GRUB it actually says Debian instead of Ubuntu. I have tons of important data in that partition. Can I some how use the live cd to fix this problem??? I mean like re-install without losing data. Any help is greatly appreciated!!!

    Read the article

  • Does the direction of storage make us bad data citizens?

    - by simonsabin
      My career started at a company where we hardly had email, the network was a 10base2 affair with cables running all around the office. You used floppy disks and the thought of a GB of data was absurd. You had to look after every byte and only keep what you really needed. Whilst the cost of the spinning disks gradually falls the cost and size of flash storage continues to plummet. The new Crucial SSD is £380 for 1TB I can now keep 128GB of data on a SD card the size of my finger. It only costs...(read more)

    Read the article

  • How to visualize real time data on Android? [closed]

    - by matarsak
    I want to build and android app that visualizes real time data (2D animation). I set up a UDP channel that get the data, now I want to visualize it. I know that I can use OpenGL ES, but after a few weeks, I dont think that I'm able to learn that. What about Android Processing? Could it be used for an extensive visualization task like this? or is it limited in some way? I've heard it's not hard learn. Any other options?

    Read the article

  • How to export SSIS to Microsoft Excel without additional software?

    - by Dr. Zim
    This question is long winded because I have been updating the question over a very long time trying to get SSIS to properly export Excel data. I managed to solve this issue, although not correctly. Aside from someone providing a correct answer, the solution listed in this question is not terrible. The only answer I found was to create a single row named range wide enough for my columns. In the named range put sample data and hide it. SSIS appends the data and reads metadata from the single row (that is close enough for it to drop stuff in it). The data takes the format of the hidden single row. This allows headers, etc. WOW what a pain in the butt. It will take over 450 days of exports to recover the time lost. However, I still love SSIS and will continue to use it because it is still way better than Filemaker LOL. My next attempt will be doing the same thing in the report server. Original question notes: If you are in Sql Server Integrations Services designer and want to export data to an Excel file starting on something other than the first line, lets say the forth line, how do you specify this? I tried going in to the Excel Destination of the Data Flow, changed the AccessMode to OpenRowSet from Variable, then set the variable to "YPlatters$A4:I20000" This fails saying it cannot find the sheet. The sheet is called YPlatters. I thought you could specify (Sheet$)(Starting Cell):(Ending Cell)? Update Apparently in Excel you can select a set of cells and name them with the name box. This allows you to select the name instead of the sheet without the $ dollar sign. Oddly enough, whatever the range you specify, it appends the data to the next row after the range. Oddly, as you add data, it increases the named selection's row count. Another odd thing is the data takes the format of the last line of the range specified. My header rows are bold. If I specify a range that ends with the header row, the data appends to the row below, and makes all the entries bold. if you specify one row lower, it puts a blank line between the header row and the data, but the data is not bold. Another update No matter what I try, SSIS samples the "first row" of the file and sets the metadata according to what it finds. However, if you have sample data that has a value of zero but is formatted as the first row, it treats that column as text and inserts numeric values with a single quote in front ('123.34). I also tried headers that do not reflect the data types of the columns. I tried changing the metadata of the Excel destination, but it always changes it back when I run the project, then fails saying it will truncate data. If I tell it to ignore errors, it imports everything except that column. Several days of several hours a piece later... Another update I tried every combination. A mostly working example is to create the named range starting with the column headers. Format your column headers as you want the data to look as the data takes on this format. In my example, these exist from A4 to E4, which is my defined range. SSIS appends to the row after the defined range, so defining A4 to E68 appends the rows starting at A69. You define the Connection as having the first row contains the field names. It takes on the metadata of the header row, oddly, not the second row, and it guesses at the data type, not the formatted data type of the column, i.e., headers are text, so all my metadata is text. If your headers are bold, so is all of your data. I even tried making a sample data row without success... I don't think anyone actually uses Excel with the default MS SSIS export. If you could define the "insert range" (A5 to E5) with no header row and format those columns (currency, not bold, etc.) without it skipping a row in Excel, this would be very helpful. From what I gather, noone uses SSIS to export Excel without a third party connection manager. Any ideas on how to set this up properly so that data is formatted correctly, i.e., the metadata read from Excel is proper to the real data, and formatting inherits from the first row of data, not the headers in Excel? One last update (July 17, 2009) I got this to work very well. One thing I added to Excel was the IMEX=1 in the Excel connection string: "Excel 8.0;HDR=Yes;IMEX=1". This forces Excel (I think) to look at all rows to see what kind of data is in it. Generally, this does not drop information, say for instance if you have a zip code then about 9 rows down you have a zip+4, Excel without this blanks that field entirely without error. With IMEX=1, it recognizes that Zip is actually a character field instead of numeric. And of course, one more update (August 27, 2009) The IMEX=1 will succeed importing data with missing contents in the first 8 rows, but it will fail exporting data where no data exists. So, have it on your import connection string, but not your export Excel connection string. I have to say, after so much fiddling, it works pretty well.

    Read the article

  • SSRS 2005: How do I make available varbinary data for download in a report?

    - by Angelo
    Hi, SSRS newbie question here... I have a table where one column is varbinary(max) data. I would like to make a report that makes this data available for download as a hyperlink so the user can just click on the item and get a file download dialog for the binary data. In this particular case, the binary data happens to be the content of old pdf files, but that shouldn't matter. I tried searching around but I can't find any pointers on how to do this. It seems to me that it should be possible. There are ways to display images in a report using varbinary data, so it makes sense that one should be able to make arbitrary binary data downloadable on a report, right?

    Read the article

  • Is there a format or service for resume/CV data?

    - by Ben Dauphinee
    I have noticed through the process of signing up for various freelance and job seeking or professional network sites that they all want your resume/CV data. And I am really getting tired of copy/pasting this data, especially since I have a website. Is there a standard format or service somewhere that I do not know about for this data? If not, does anyone want to help me build something like this out? I'm thinking a service similar to OpenID that allows you to maintain a central resume to have your data pulled from. No more filling in the same data over and over, and having to maintain the copies on any of the plethora of websites that have that data. Takers?

    Read the article

  • How best to convert CakePHP date picker form data to a PHP DateTime object?

    - by Daren Thomas
    I'm doing this in app/views/mymodel/add.ctp: <?php echo $form->input('Mymodel.mydatefield'); ?> And then, in app/controllers/mymodel_controller.php: function add() { # ... (if we have some submitted data) $datestring = $this->data['Mymodel']['mydatefield']['year'] . '-' . $this->data['Mymodel']['mydatefield']['month'] . '-' . $this->data['Mymodel']['mydatefield']['day']; $mydatefield = DateTime::createFromFormat('Y-m-d', $datestring); } There absolutly has to be a better way to do this - I just haven't found the CakePHP way yet... What I would like to do is: function add() { # ... (if we have some submitted data) $mydatefield = $this->data['Mymodel']['mydatefiled']; # obviously doesn't work }

    Read the article

  • Offline backup synchronization

    - by Pavan Kumar
    There is a Central Server running Windows Server 2003 and SQL Server 2005 and there are 7 client machines situated in various places and has XP Pro & SQL Server 2005 installed in all of them. They are not interconnected so they are physically seperate. One person goes to each of these centers maybe twice a month and takes the backup (Full database consisting of mdf and ldf files) with a pen drive and brings it to the Central server which contains the central database holding same schema as all the other client databases. I need to synchronize each backup database (belonging to different centers) one by one to update the existing data or inserting new data in the central database . The solution i got was Replication. The pendrive is brought to central server consisting of 7 instances of the databases and then the databases is attached to the central server one by one to the same SQL Server where the central database exists. Then my idea was to replicate the backup database one by one i.e using single subscription (Central Database) and multiple publication ( i.e 7 instances of databases in my case) toplogy by performing replication locally (i.e in the same machine). So i tried to develop a UI in C# .Net to programatically run the Transactional Replication with push subscription using RMO Programming (which is incomplete as of now because there is no point in developing when you already know it is not the solution). Transactional Replication can either be set to initialize with a snapshot or without a snapshot. If i go for the first option i.e with a snapshot , the data whatever is present in Central Database is overwritten by the new data . So the data present initially in the central database is lost. If i try to initialize without snapshot , no data (the data already has the updated and new data) will be sent from the backup database to server. The replication will work in a scenario where any incremental changes is done only after you set the replication . So the initial data whatever was present in the backup database when setting up the replication will not be replicated when running the snapshot agent for the first time to synchronize. Only changes in the backup database thereafter will be reflected to the central database .(Remember I am not going to insert new data or make any changes to the backup database after i attach it to the Central Server. ) So this solution is not feasible. I want a solution for synchronizing from one client database to central database present in the same machine using C#.NET. If you can provide me small example maybe with two databases(with same schema) DB1(Client) to DB2(Server) consisting of one or two tables it will be very helpful. The synchronization is not bidirectional.I want to only update existing data or insert new data from DB1 to DB2 (DB2 may contain some data initially). Thanks and Regards Pavan

    Read the article

  • How to think in data stores instead of databases?

    - by Jim
    As an example, Google App Engine uses data stores, not a database, to store data. Does anybody have any tips for using data stores instead of databases? It seems I've trained my mind to think 100% in object relationships that map directly to table structures, and now it's hard to see anything differently. I can understand some of the benefits of data stores (e.g. performance and the ability to distribute data), but some good database functionality is sacrificed (e.g. joins). Does anybody who has worked with data stores like BigTable have any good advice to working with them?

    Read the article

  • Showing a loading spinner only if the data has not been cached.

    - by Aaron Mc Adam
    Hi guys, Currently, my code shows a loading spinner gif, returns the data and caches it. However, once the data has been cached, there is a flicker of the loading gif for a split second before the data gets loaded in. It's distracting and I'd like to get rid of it. I think I'm using the wrong method in the beforeSend function here: $.ajax({ type : "GET", cache : false, url : "book_data.php", data : { keywords : keywords, page : page }, beforeSend : function() { $('.jPag-pages li:not(.cached)').each(function (i) { $('#searchResults').html('<p id="loader">Loading...<img src="../assets/images/ajax-loader.gif" alt="Loading..." /></p>'); }); }, success : function(data) { $('.jPag-current').parent().addClass('cached'); $('#searchResults').replaceWith($(data).find('#searchResults')).find('table.sortable tbody tr:odd').addClass('odd'); detailPage(); selectForm(); } });

    Read the article

  • Is there a way to only backup a SQL 2005 database structure fully, but only the data in a certain se

    - by TheSoftwareJedi
    I have several schemas in my database, and the largest one ("large" meaning disk space consumed) is my "web" schema which is a denormalized copy of data in the operational schemas. This denormalized data is able to be reconstructed at anytime, and is merely there for extremely fast read purposes. Since the data is redundant, and VERY large - I'd like to exclude it from being backed up. I already have stored procedures that can regenerate all of the data in that schema in a couple of hours - for use in the event of a failure. I assume I can split the tables in this schema out to another data file or such (ideally even on another drive for faster reads), but is there a way to never have that data file backup, yet still in the event of a failure its structure could be restored (and other DDL stuff like procs, views, etc)? Somewhat related, can I also have these tables not do transaction logging, if I go to "Full" backup mode for the rest of the database?

    Read the article

  • How can I structure and recode messy categorical data in R?

    - by briandk
    I'm struggling with how to best structure categorical data that's messy, and comes from a dataset I'll need to clean. The Coding Scheme I'm analyzing data from a university science course exam. We're looking at patterns in student responses, and we developed a coding scheme to represent the kinds of things students are doing in their answers. A subset of the coding scheme is shown below. Note that within each major code (1, 2, 3) are nested non-unique sub-codes (a, b, ...). What the Raw Data Looks Like I've created an anonymized, raw subset of my actual data which you can view here. Part of my problem is that those who coded the data noticed that some students displayed multiple patterns. The coders' solution was to create enough columns (reason1, reason2, ...) to hold students with multiple patterns. That becomes important because the order (reason1, reason2) is arbitrary--two students (like student 41 and student 42 in my dataset) who correctly applied "dependency" should both register in an analysis, regardless of whether 3a appears in the reason column or the reason2 column. How Can I Best Structure Student Data? Part of my problem is that in the raw data, not all students display the same patterns, or the same number of them, in the same order. Some students may do just one thing, others may do several. So, an abstracted representation of example students might look like this: Note in the example above that student002 and student003 both are coded as "1b", although I've deliberately shown the order as different to reflect the reality of my data. My (Practical) Questions Should I concatenate reason1, reason2, ... into one column? How can I (re)code the reasons in R to reflect the multiplicity for some students? Thanks I realize this question is as much about good data conceptualization as it is about specific features of R, but I thought it would be appropriate to ask it here. If you feel it's inappropriate for me to ask the question, please let me know in the comments, and stackoverflow will automatically flood my inbox with sadface emoticons. If I haven't been specific enough, please let me know and I'll do my best to be clearer.

    Read the article

  • Assigning a variable of a struct that contains an instance of a class to another variable

    - by xport
    In my understanding, assigning a variable of a struct to another variable of the same type will make a copy. But this rule seems broken as shown on the following figure. Could you explain why this happened? using System; namespace ReferenceInValue { class Inner { public int data; public Inner(int data) { this.data = data; } } struct Outer { public Inner inner; public Outer(int data) { this.inner = new Inner(data); } } class Program { static void Main(string[] args) { Outer p1 = new Outer(1); Outer p2 = p1; Console.WriteLine("p1:{0}, p2:{1}", p1.inner.data, p2.inner.data); p1.inner.data = 2; Console.WriteLine("p1:{0}, p2:{1}", p1.inner.data, p2.inner.data); p2.inner.data = 3; Console.WriteLine("p1:{0}, p2:{1}", p1.inner.data, p2.inner.data); Console.ReadKey(); } } }

    Read the article

  • How do I write raw binary data in Python?

    - by Chris B.
    I've got a Python program that stores and writes data to a file. The data is raw binary data, stored internally as str. I'm writing it out through a utf-8 codec. However, I get UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 25: character maps to <undefined> in the cp1252.py file. This looks to me like Python is trying to interpret the data using the default code page. But it doesn't have a default code page. That's why I'm using str, not unicode. I guess my questions are: How do I represent raw binary data in memory, in Python? When I'm writing raw binary data out through a codec, how do I encode/unencode it?

    Read the article

  • What are the repercussions of not checking existing data when adding a foreign key?

    - by scottm
    I've inherited a database that doesn't exactly strive for data integrity. I am trying to add some foreign keys to change that, but there is data in some tables that doesn't fit the constraints. Most likely, the data won't be used again so I want to know what problems I might face by leaving it there. The other option I see is to move it into some kind of table without referential constraints, just for historical purposes. So, what are the repercussions of not checking existing data? If I create a foreign key constraint on a table and don't check existing data, will all new data inserted into the table be enforced?

    Read the article

  • Want to save data field from form into two columns of two models.

    - by vette982
    I have a Profile model with a hasOne relationship to a Detail model. I have a registration form that saves data into both model's tables, but I want the username field from the profile model to be copied over to the usernamefield in the details model so that each has the same username. function new_account() { if(!empty($this->data)) { $this->Profile->modified = date("Y-m-d H:i:s"); if($this->Profile->save($this->data)) { $this->data['Detail']['profile_id'] = $this->Profile->id; $this->data['Detail']['username'] = $this->Profile->username; $this->Profile->Detail->save($this->data); $this->Session->setFlash('Your registration was successful.'); $this->redirect(array('action'=>'index')); } } } This code in my Profile controller gives me the error: Undefined property: Profile::$username Any ideas?

    Read the article

  • How the existing data to be if entity structure modified or deleted on GAE?

    - by Eonil
    GAE recommends using JDO/JPA. But I have serious question about using OODB like them. JDO based on user's class structure. And data structure should be modified continually as service advances. So, If data(entity) class property being removed, what happened to existing data on the property? If data(entity) class renamed for refactoring reason, how the JDO know those renaming? Or all data loss? Major point is "How JDO/GAE/BigTable applies modification of class into existing entity structure and data?".

    Read the article

  • best way to store php data on a page for use with javascript/jquery?

    - by Haroldo
    Ok, so im trying to work out the fastest way of storing data on my page without slowing the page load: I need to store information in the page to be later used by jquery. My page is an events page and i want to attach data to each event anchor. there are 100+ events to attach data to. The events anchors are created with a php loop, so i could create the data elements within this loop using either use un-semantic tags ie *rel="some_data"* create a jquery.data() for each iteration of the loop or i could run the loop again, separately, this time inside script tags with jquery.data(); would really appreciate any thoughts on this!

    Read the article

  • Offsite data storage for simple app, or a similar supported persistence mechanism?

    - by jdk
    Question Is there a usable facebook entry point to the Data Storage API that facebook lists on their app admin page for developers, or should I consider an alternate mechanism? What alternative mechanisms exist to simply persist my information offsite (away from my server app) without stuffing it into a cookie that's prone to expire? ... Background The facebook Data Store Admin tool is made available in a facebook App's Settings as seen here: (continue reading below) However when I visit the DataStoreAdmin link nothing works (i.e. clicking the buttons to define the data store types and objects does nothing - I have tried different browsers). The Wiki page for Data Store API hasn't been updated recently and the second last update says the beta Data Store was taken offline. It seems odd the link would be readily available and highly visible at the top of the App configuration area if indeed it's defunct. I was hoping some kind of key/value pair solution to remove the data calls from my own server.

    Read the article

  • How to convert searchTwitter results (from library(twitteR)) into a data.frame?

    - by analyticsPierce
    I am working on saving twitter search results into a database (SQL Server) and am getting an error when I pull the search results from twitteR. If I execute: library(twitteR) puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100)) I get an error of: Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class structure("status", package = "twitteR") into a data.frame This is important because in order to use RODBC to add this to a table using sqlSave it needs to be a data.frame. At least that's the error message I got: Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging", : should be a data frame So does anyone have any suggestions on how to coerce the list to a data.frame or how I can load the list through RODBC?

    Read the article

  • What's the most simple way to retrieve all data from a table and save it back in .NET 3.5?

    - by zoman
    I have a number of tables containing some basic (business related) mapping data. What's the most simple way to load the data from those tables, then save the modified values back. (all data should be replaced in the tables) An ORM is out of question as I would like to avoid creating domain objects for each table. The actual editing of the data is not an issue. (it is exported into Excel where the data is edited, then the file is uploaded with the modified data) The technology is .NET 3.5 (ASP.NET MVC) and SQL Server 2005. Thanks.

    Read the article

  • RabbitMQ as a proxy between a data store and a producer ?

    - by hyperboreean
    I have some code that produces lots of data that should be stored in the database. The problem is that the database can't keep with the data that it gets produced. So I am wondering whether some kind of queuing mechanism would help in this situation - I am thinking in particular at RabiitMQ and whether is feasible to have the data stored in its queues until some consumer gets the data out of it and pushes it to the database. Also, I am not particular interested whether that data made it to the database or not because pretty soon, the same data will be updated.

    Read the article

  • External File Upload Optimizations for Windows Azure

    - by rgillen
    [Cross posted from here: http://rob.gillenfamily.net/post/External-File-Upload-Optimizations-for-Windows-Azure.aspx] I’m wrapping up a bit of the work we’ve been doing on data movement optimizations for cloud computing and the latest set of data yielded some interesting points I thought I’d share. The work done here is not really rocket science but may, in some ways, be slightly counter-intuitive and therefore seemed worthy of posting. Summary: for those who don’t like to read detailed posts or don’t have time, the synopsis is that if you are uploading data to Azure, block your data (even down to 1MB) and upload in parallel. Set your block size based on your source file size, but if you must choose a fixed value, use 1MB. Following the above will result in significant performance gains… upwards of 10x-24x and a reduction in overall file transfer time of upwards of 90% (eg, uploading a 1GB file averaged 46.37 minutes prior to optimizations and averaged 1.86 minutes afterwards). Detail: For those of you who want more detail, or think that the claims at the end of the preceding paragraph are over-reaching, what follows is information and code supporting these claims. As the title would indicate, these tests were run from our research facility pointing to the Azure cloud (specifically US North Central as it is physically closest to us) and do not represent intra-cloud results… we have performed intra-cloud tests and the overall results are similar in notion but the data rates are significantly different as well as the tipping points for the various block sizes… this will be detailed separately). We started by building a very simple console application that would loop through a directory and upload each file to Azure storage. This application used the shipping storage client library from the 1.1 version of the azure tools. The only real variation from the client library is that we added code to collect and record the duration (in ms) and size (in bytes) for each file transferred. The code is available here. We then created a directory that had a collection of files for the following sizes: 2KB, 32KB, 64KB, 128KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, and 1GB (50 files for each size listed). These files contained randomly-generated binary data and do not benefit from compression (a separate discussion topic). Our file generation tool is available here. The baseline was established by running the application described above against the directory containing all of the data files. This application uploads the files in a random order so as to avoid transferring all of the files of a given size sequentially and thereby spreading the affects of periodic Internet delays across the collection of results.  We then ran some scripts to split the resulting data and generate some reports. The raw data collected for our non-optimized tests is available via the links in the Related Resources section at the bottom of this post. For each file size, we calculated the average upload time (and standard deviation) and the average transfer rate (and standard deviation). As you likely are aware, transferring data across the Internet is susceptible to many transient delays which can cause anomalies in the resulting data. It is for this reason that we randomized the order of source file processing as well as executed the tests 50x for each file size. We expect that these steps will yield a sufficiently balanced set of results. Once the baseline was collected and analyzed, we updated the test harness application with some methods to split the source file into user-defined block sizes and then to upload those blocks in parallel (using the PutBlock() method of Azure storage). The parallelization was handled by simply relying on the Parallel Extensions to .NET to provide a Parallel.For loop (see linked source for specific implementation details in Program.cs, line 173 and following… less than 100 lines total). Once all of the blocks were uploaded, we called PutBlockList() to assemble/commit the file in Azure storage. For each block transferred, the MD5 was calculated and sent ensuring that the bits that arrived matched was was intended. The timer for the blocked/parallelized transfer method wraps the entire process (source file splitting, block transfer, MD5 validation, file committal). A diagram of the process is as follows: We then tested the affects of blocking & parallelizing the transfers by running the updated application against the same source set and did a parameter sweep on the block size including 256KB, 512KB, 1MB, 2MB, and 4MB (our assumption was that anything lower than 256KB wasn’t worth the trouble and 4MB is the maximum size of a block supported by Azure). The raw data for the parallel tests is available via the links in the Related Resources section at the bottom of this post. This data was processed and then compared against the single-threaded / non-optimized transfer numbers and the results were encouraging. The Excel version of the results is available here. Two semi-obvious points need to be made prior to reviewing the data. The first is that if the block size is larger than the source file size you will end up with a “negative optimization” due to the overhead of attempting to block and parallelize. The second is that as the files get smaller, the clock-time cost of blocking and parallelizing (overhead) is more apparent and can tend towards negative optimizations. For this reason (and is supported in the raw data provided in the linked worksheet) the charts and dialog below ignore source file sizes less than 1MB. (click chart for full size image) The chart above illustrates some interesting points about the results: When the block size is smaller than the source file, performance increases but as the block size approaches and then passes the source file size, you see decreasing benefit to the point of negative gains (see the values for the 1MB file size) For some of the moderately-sized source files, small blocks (256KB) are best As the size of the source file gets larger (see values for 50MB and up), the smallest block size is not the most efficient (presumably due, at least in part, to the increased number of blocks, increased number of individual transfer requests, and reassembly/committal costs). Once you pass the 250MB source file size, the difference in rate for 1MB to 4MB blocks is more-or-less constant The 1MB block size gives the best average improvement (~16x) but the optimal approach would be to vary the block size based on the size of the source file.    (click chart for full size image) The above is another view of the same data as the prior chart just with the axis changed (x-axis represents file size and plotted data shows improvement by block size). It again highlights the fact that the 1MB block size is probably the best overall size but highlights the benefits of some of the other block sizes at different source file sizes. This last chart shows the change in total duration of the file uploads based on different block sizes for the source file sizes. Nothing really new here other than this view of the data highlights the negative affects of poorly choosing a block size for smaller files.   Summary What we have found so far is that blocking your file uploads and uploading them in parallel results in significant performance improvements. Further, utilizing extension methods and the Task Parallel Library (.NET 4.0) make short work of altering the shipping client library to provide this functionality while minimizing the amount of change to existing applications that might be using the client library for other interactions.   Related Resources Source code for upload test application Source code for random file generator ODatas feed of raw data from non-optimized transfer tests Experiment Metadata Experiment Datasets 2KB Uploads 32KB Uploads 64KB Uploads 128KB Uploads 256KB Uploads 512KB Uploads 1MB Uploads 5MB Uploads 10MB Uploads 25MB Uploads 50MB Uploads 100MB Uploads 250MB Uploads 500MB Uploads 750MB Uploads 1GB Uploads Raw Data OData feeds of raw data from blocked/parallelized transfer tests Experiment Metadata Experiment Datasets Raw Data 256KB Blocks 512KB Blocks 1MB Blocks 2MB Blocks 4MB Blocks Excel worksheet showing summarizations and comparisons

    Read the article

< Previous Page | 426 427 428 429 430 431 432 433 434 435 436 437  | Next Page >