Search Results

Search found 11409 results on 457 pages for 'large teams'.

Page 46/457 | < Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >

Best way to get distinct values from large table

- by derivation

I have a db table with about 10 or so columns, two of which are month and year. The table has about 250k rows now, and we expect it to grow by about 100-150k records a month. A lot of queries involve the month and year column (ex, all records from march 2010), and so we frequently need to get the available month and year combinations (ie do we have records for april 2010?). A coworker thinks that we should have a separate table from our main one that only contains the months and years we have data for. We only add records to our main table once a month, so it would just be a small update on the end of our scripts to add the new entry to this second table. This second table would be queried whenever we need to find the available month/year entries on the first table. This solution feels kludgy to me and a violation of DRY. What do you think is the correct way of solving this problem? Is there a better way than having two tables?

Read the article
Using large transparent pictures as texture atlases

- by azlisum

i'm new to android programming and i'm trying to create a relatively big 2D game. I have to use lots of images and objects in my game so I decided to use OpenGL ES. I have several texture atlases, all of them saved as png's because of the transparency. I also know, but i'm not sure why, that I have to use images, which height and width is multiple of two. I test my game on an old HTC Hero running Android 2.3.3. When my picture atlases are 512x512 each, my game has a frame rate of between 50 to 60 fps. When I use 1024x1024 non transparent png, there is no problem - the FPS is again between 50 to 60 fps. But when i decide to use a 1024x1024 transperent PNG's my frame rate drops to 4,5 fps. Could this be a problem related to the age of the device i'm using for testing? These are the OpenGL functions I use each loop to draw batches: gl.glEnable(GL10.GL_TEXTURE_2D); gl.glEnable(GL10.GL_BLEND); gl.glBlendFunc(GL10.GL_SRC_ALPHA, GL10.GL_ONE_MINUS_SRC_ALPHA); //drawing happens here gl.glDisable(GL10.GL_BLEND); Thanks in advance :)

Read the article
handling large number

- by klw

Hello, this is actually an problem question from project euler site http://projecteuler.net/index.php?section=problems&id=3 Anyway I'm not out after the solution, but I probably guess you will know what my approach is. To my question now, how do I handle numbers exceeding unsigned int? Is there a mathematical approach for this, if so where can I read about it?

Read the article
Large Web Font (Yes or No?)

- by Ruth Rettigo

Hi, I want to use special font on my website, but the file is over 9 MB. Is it possible to reduce font's size? Thanks folks!

Read the article
Dealing with large directories in a checkout

- by Eric

I am trying to come up with a version control process for a web app that I work on. Currently, my major stumbling blocks are two directories that are huge (both over 4GB). Only a few people need to work on things within the huge directories; most people don't even need to see what's in them. Our directory structure looks something like: / --file.aspx --anotherFile.aspx --/coolThings ----coolThing.aspx --/bigFolder ----someHugeMovie.mov ----someHugeSound.mp3 --/anotherBigFolder ----... I'm sure you get the picture. It's hard to justify a checkout that has to pull down 8GB of data that's likely useless to a developer. I know, it's only once, but even once could be really frustrating for someone (and will make it harder for me to convince everyone to use source control). (Plus, clean checkouts will be painfully slow.) These folders do have to be available in the web application. What can I do? I've thought about separate repositories for the big folders. That way, you only download if you need it; but then how do I manage checking these out onto our development server? I've also thought about not trying to version control those folders: just update them directly on the web server... but I am not enamored of this idea. Is there some magic way to simply exclude directories from a checkout that I haven't found? (Pretty sure there is not.) Of course, there's always the option to just give up, bite the bullet, and accept downloading 8 useless GB. What say you? Have you encountered this problem before? How did you solve it?

Read the article
Longest substring in a large set of strings

- by user1516492

I have a huge fixed library of text strings, and a frequently changing input string s. I need to find the longest matching substring from any string in the library to s, starting from the beginning of string s, in minimal time. In a perfect world, I would also return the next longest match from the library, and the next best, and so on. This is not the longest common string problem - I'm not looking for the longest common string for all the strings in the library... I just need a pairwise best substring between s and each string in the vast library as fast as possible.

Read the article
How to define large list of strings in Visual Basic

- by Jenny_Winters

I'm writing a macro in Visual Basic for PowerPoint 2010. I'd like to initialize a really big list of strings like: big_ol_array = Array( _ "string1", _ "string2", _ "string3", _ "string4" , _ ..... "string9999" _ ) ...but I get the "Too many line continuations" error in the editor. When I try to just initialize the big array with no line breaks, the VB editor can't handle such a long line (1000+) characters. Does anyone know a good way to initialize a huge list of strings in VB? Thanks in advance!

Read the article
SQL Server 05, which is optimal, LIKE %<term>% or CONTAINS() for searching large column

- by Spud1

I've got a function written by another developer which I am trying to modify for a slightly different use. It is used by a SP to check if a certain phrase exists in a text document stored in the DB, and returns 1 if the value is found or 0 if its not. This is the query: SELECT @mres=1 from documents where id=@DocumentID and contains(text, @search_term) The document contains mostly XML, and the search_term is a GUID formatted as an nvarchar(40). This seems to run quite slowly to me (taking 5-6 seconds to execute this part of the process), but in the same script file there is also this version of the above, commented out. SELECT @mres=1 from documents where id=@DocumentID and textlike '%' + @search_term + '%' This version runs MUCH quicker, taking 4ms compared to 15ms for the first example. So, my question is why use the first over the second? I assume this developer (who is no longer working with me) had a good reason, but at the moment I am struggling to find it.. Is it possibly something to do with the full text indexing? (this is a dev DB I am working with, so the production version may have better indexing..) I am not that clued up on FTI really so not quite sure at the moment. Thoughts/ideas?

Read the article
Scaling Scrum within a group of 100s of programmers

- by blunders

Most Scrum teams lean toward 7-15 people **, though it's not clear how to scale Scrum among 100s of people, or how the effectiveness of a given team might be compared to another team within the group; meaning beyond just breaking the group into Scrum teams of 7-15 people, it's unclear how efforts between the teams are managed, compared, etc. Any suggestions related to either of these topics, or additional related topics that might be of more importance to account for in planning a large scale SCRUM grouping? ** In reviewing research related to the suggested size of software development teams, which appears to be the basis for the suggested Scrum team size, I found what appears to be an error in the research which oddly appears to show that bigger teams (15+ ppl), not smaller teams (7 ppl) are better. UPDATE, "Re: Scrum doesn't scale": Made huge amounts of progress on personally researching the topic, but thought I'd respond to the general belief of some that Scrum doesn't scale by citing a quote from Succeeding with Agile by Mike Cohn : Scrum Does Scale: You have to admire the intellectual honesty of the earliest agile authors. They were all very careful to say that agile methodolgies like Scrum were for small projects. This conservatism wasn’t because agile or Scrum turned out to be unsuited for large projects but because they hadn’t used these processes on large projects and so were reluctant to advise their readers to do so. But, in the years since the Agile Manifesto and the books that came shortly before and after it, we have learned that the principles and practices of agile development can be scaled up and applied on large projects, albeit it with a considerable amount of overhead. Fortunately, if large organizations use the techniques described regarding the role of the product owner, working with a shared product backlog, being mindful of dependencies, coordinating work among teams, and cultivating communities of practice, they can successfully scale a Scrum project. SOURCE: (ran across the book thanks to Ladislav Mrnka answer)

Read the article
WCF Streaming not working at server

- by Radhi

hi, i have used WCF service to transfer large files in chunks to the server for that i have reference this article http://kjellsj.blogspot.com/2007/02/wcf-streaming-upload-files-over-http.html i have configured my application on IIS on my machine. its work fine here. it allows upto 64mb file upload but when we have published the site. it allows only maximum 30Mb file if i try to upload more than that i got error 404 - resource not found. here is the binding config i have used. <basicHttpBinding>  <binding name="FileTransferServicesBinding" closeTimeout="00:01:00" openTimeout="00:01:00" receiveTimeout="00:10:00" sendTimeout="00:01:00" transferMode="Streamed" messageEncoding="Mtom" maxBufferSize="65536" maxReceivedMessageSize="67108864"> <security mode="None"> <transport clientCredentialType="None"/> </security> </binding> </basicHttpBinding> please suggest me where i am missing anything. and if required more code please let me know -thanks in advance

Read the article
XML streaming with XProc.

- by Pierre

Hi all, I'm playing with xproc, the XML pipeline language and http://xmlcalabash.com/. I'd like to find an example for streaming large xml documents. for example, given the following huge xml document: <Books> <Book> <title>Book-1</title> </Book> <Book> <title>Book-2</title> </Book> <Book> <title>Book-3</title> </Book>  <Book> <title>Book-N</title> </Book> </Books> How should I proceed to loop (streaming) over x-N documents like <Books> <Book> <title>Book-x</title> </Book> </Books> and treat each document with a xslt ? is it possible with xproc ?

Read the article
Unable to load huge XML document (incorrectly suppose it's due to the XSLT processing)

- by krisvandenbergh

I'm trying to match certain elements using XSLT. My input document is very large and the source XML fails to load after processing the following code (consider especially the first line). <xsl:template match="XMI/XMI.content/Model_Management.Model/Foundation.Core.Namespace.ownedElement/Model_Management.Package/Foundation.Core.Namespace.ownedElement"> <rdf:RDF> <rdf:Description rdf:about=""> <xsl:for-each select="Foundation.Core.Class"> <xsl:for-each select="Foundation.Core.ModelElement.name"> <owl:Class rdf:ID="@Foundation.Core.ModelElement.name" /> </xsl:for-each> </xsl:for-each> </rdf:Description> </rdf:RDF> </xsl:template> Apparently the XSLT fails to load after "Model_Management.Model". The PHP code is as follows: if ($xml->loadXML($source_xml) == false) { die('Failed to load source XML: ' . $http_file); } It then fails to perform loadXML and immediately dies. I think there are two options now. 1) I should set a maximum executing time. Frankly, I don't know how that I do this for the built-in PHP 5 XSLT processor. 2) Think about another way to match. What would be the best way to deal with this? The input document can be found at http://krisvandenbergh.be/uml_pricing.xml Any help would be appreciated! Thanks.

Read the article
Getting started with massive data

- by Max

I'm a math guy and occasionally do some statistics/machine learning analysis consulting projects on the side. The data I have access to are usually on the smaller side, at most a couple hundred of megabytes (and almost always far less), but I want to learn more about handling and analyzing data on the gigabyte/terabyte scale. What do I need to know and what are some good resources to learn from? Hadoop/MapReduce is one obvious start. Is there a particular programming language I should pick up? (I primarily work now in Python, Ruby, R, and occasionally Java, but it seems like C and Clojure are often used for large-scale data analysis?) I'm not really familiar with the whole NoSQL movement, except that it's associated with big data. What's a good place to learn about it, and is there a particular implementation (Cassandra, CouchDB, etc.) I should get familiar with? Where can I learn about applying machine learning algorithms to huge amounts of data? My math background is mostly on the theory side, definitely not on the numerical or approximation side, and I'm guessing most of the standard ML algorithms don't really scale. Any other suggestions on things to learn would be great!

Read the article
Copying a foreign Subversion repository to keep under dependencies

- by Jonathan Sternberg

I want to keep dependencies for my project in our own repository, that way we have consistent libraries for the entire team to work with. For example, I want our project to use the Boost libraries. I've seen this done in the past with putting dependencies under a "vendor" or "dependencies" folder. But I still want to be able to update these dependencies. If a new feature appears in a library and we need it, I want to just be able to update that repository within our own repository. I don't want to have to recopy it and put it under version control again. I'd also like for us to have the ability to change dependencies if a small change is needed without stopping us from ever updating the library. I want the ability to do something like 'svn cp', then be able to 'svn merge' in the future. I just tried this with the boost trunk, but I'm not able to get any history using 'svn log' on the copy I made. How do I do this? What is usually done for large projects with dependencies?

Read the article
Database structure for ecommerce site

- by imanc

Hey Guys, I have been tasked with designing an ecommerce solution. The aspect that is causing me the most problems is the database. Currently the site consists of 10+ country based shops each with their own database (all residing on the same mysql instance). For the new site I'd rather all these shop databases be merged into one database so that all tables (products, orders, customers etc.) have a shop_id field. From a programming perspective this seems to make the most sense as we won't have to manage data across multiple databases. Currently the entire site generates about 120k orders a year, but is experiencing fairly heavy growth and we need to design a solution that will scale. In 5 years there may be more than a million orders per year and a database that contains 5 years order history (archiving maybe a solution here). The question is - do we use a single database, or do we keep the database-per-shop structure? I am currently trying to find supporting evidence for either avenue. The company I am designing the solution for prefer the per-shop database structure because they believe it will allow the sites to scale. But my argument is that the shop's database probably won't get that busy over the next few years that they exceed the capacity of a mysql database and a "no expenses spared" hardware set-up. I am wondering if anyone has any advice either way? Does anyone have experience with websites / ecommerce sites that have tables containing millions of records? I know there is probably not a clear answer here, but at what stage do we have too many records or too large table files to have a fast loading site? Also, if anyone has any advice on sources of information - books, websites, etc. where I can do further research, it would be highly appreciated! Cheers, imanc

Read the article
Improving File Read Performance (single file, C++, Windows)

- by david

I have large (hundreds of MB or more) files that I need to read blocks from using C++ on Windows. Currently the relevant functions are: errorType LargeFile::read( void* data_out, __int64 start_position, __int64 size_bytes ) const { if( !m_open ) { // return error } else { seekPosition( start_position ); DWORD bytes_read; BOOL result = ReadFile( m_file, data_out, DWORD( size_bytes ), &bytes_read, NULL ); if( size_bytes != bytes_read || result != TRUE ) { // return error } } // return no error } void LargeFile::seekPosition( __int64 position ) const { LARGE_INTEGER target; target.QuadPart = LONGLONG( position ); SetFilePointerEx( m_file, target, NULL, FILE_BEGIN ); } The performance of the above does not seem to be very good. Reads are on 4K blocks of the file. Some reads are coherent, most are not. A couple questions: Is there a good way to profile the reads? What things might improve the performance? For example, would sector-aligning the data be useful? I'm relatively new to file i/o optimization, so suggestions or pointers to articles/tutorials would be helpful.

Read the article
Read/Write/Find/Replace huge csv file

- by notapipe

I have a huge (4,5 GB) csv file.. I need to perform basic cut and paste, replace operations for some columns.. the data is pretty well organized.. the only problem is I cannot play with it with Excel because of the size (2000 rows, 550000 columns). here is some part of the data: ID,Affection,Sex,DRB1_1,DRB1_2,SENum,SEStatus,AntiCCP,RFUW,rs3094315,rs12562034,rs3934834,rs9442372,rs3737728 D0024949,0,F,0101,0401,SS,yes,?,?,A_A,A_A,G_G,G_G D0024302,0,F,0101,7,SN,yes,?,?,A_A,G_G,A_G,?_? D0023151,0,F,0101,11,SN,yes,?,?,A_A,G_G,G_G,G_G I need to remove 4th, 5th, 6th, 7th, 8th and 9th columns; I need to find every _ character from column 10 onwards and replace it with a space ( ) character; I need to replace every ? with zero (0); I need to replace every comma with a tab; I need to remove first row (that has column names; I need to replace every 0 with 1, every 1 with 2 and every ? with 0 in 2nd column; I need to replace F with 2, M with 1 and ? with 0 in 3rd column; so that in the resulting file the output reads: D0024949 1 2 A A A A G G G G D0024302 1 2 A A G G A G 0 0 D0023151 1 2 A A G G G G G G (both input and output should read one line per row, ne extra blank row) Is there a memory efficient way of doing that with java(and I need a code to do that) or a usable tool for playing with this large data so that I can easily apply Excel functionality..

Read the article
MySQL - What is wrong with this query or my database? Terrible performance.

- by Moss

SELECT * from `employees` a LEFT JOIN (SELECT phone1 p1, count(*) c, FROM `employees` GROUP BY phone1) b ON a.phone1 = b.p1; I'm not sure if it is this query in particular that has the problem. I have been getting terrible performance in general with this database. The table in question has 120,000 rows. I have tried this particular query remotely and locally with the MyISAM and InnoDB engines, with different types of joins, and with and without an index on phone1. I can get this to complete in about 4 minutes on a 10,000 row table successfully but performance drops exponentially with larger tables. Remotely it will lose connection to the server and locally it brings my system to its knees and seems to go on forever. This query is only a smaller step I was trying to do when a larger query couldn't complete. Maybe I should explain the whole scenario. I have one big flat ugly table that lists a bunch of people and their contact info and the info of the companies they work for. I'm trying to normalize the database and intelligently determine which phone numbers apply to individual people and which apply to an office location. My reasoning is that if a phone number occurs multiple times and the number of occurrence equals the number of times that the street address it is attached to occurs then it must be an office number. So the first step is to count each phone number grouping by phone number. Normally if you just use COUNT()...GROUP BY it will only list the first record it finds in that group so I figured I have to join the full table to the count table where the phone number matches. This does work but as I said I can't successfully complete it on any table much larger than 10,000 rows. This seems pathetic and this doesn't seem like a crazy query to do. Is there a better way to achieve what I want or do I have to break my large table into 12 pieces or is there something wrong with the table or db?

Read the article
Given a trace of packets, how would you group them into flows?

- by zxcvbnm

I've tried it these ways so far: 1) Make a hash with the source IP/port and destination IP/port as keys. Each position in the hash is a list of packets. The hash is then saved in a file, with each flow separated by some special characters/line. Problem: Not enough memory for large traces. 2) Make a hash with the same key as above, but only keep in memory the file handles. Each packet is then put into the hash[key] that points to the right file. Problems: Too many flows/files (~200k) and it might run out of memory as well. 3) Hash the source IP/port and destination IP/port, then put the info inside a file. The difference between 2 and 3 is that here the files are opened and closed for each operation, so I don't have to worry about running out of memory because I opened too many at the same time. Problems: WAY too slow, same number of files as 2 so also impractical. 4) Make a hash of the source IP/port pairs and then iterate over the whole trace for each flow. Take the packets that are part of that flow and place them into the output file. Problem: Suppose I have a 60 MB trace that has 200k flows. This way, I would process, say, a 60 MB file 200k times. Maybe removing the packets as I iterate would make it not so painful, but so far I'm not sure this would be a good solution. 5) Split them by IP source/destination and then create a single file for each one, separating the flows by special characters. Still too many files (+50k). Right now I'm using Ruby to do it, which might've been a bad idea, I guess. Currently I've filtered the traces with tshark so that they only have relevant info, so I can't really make them any smaller. I thought about loading everything in memory as described in 1) using C#/Java/C++, but I was wondering if there wouldn't be a better approach here, especially since I might also run out of memory later on even with a more efficient language if I have to use larger traces. In summary, the problem I'm facing is that I either have too many files or that I run out of memory. I've also tried searching for some tool to filter the info, but I don't think there is one. The ones I've found only return some statistics and wouldn't scan for every flow as I need.

Read the article
Windows Azure Service Bus Splitter and Aggregator

- by Alan Smith

This article will cover basic implementations of the Splitter and Aggregator patterns using the Windows Azure Service Bus. The content will be included in the next release of the “Windows Azure Service Bus Developer Guide”, along with some other patterns I am working on. I’ve taken the pattern descriptions from the book “Enterprise Integration Patterns” by Gregor Hohpe. I bought a copy of the book in 2004, and recently dusted it off when I started to look at implementing the patterns on the Windows Azure Service Bus. Gregor has also presented an session in 2011 “Enterprise Integration Patterns: Past, Present and Future” which is well worth a look. I’ll be covering more patterns in the coming weeks, I’m currently working on Wire-Tap and Scatter-Gather. There will no doubt be a section on implementing these patterns in my “SOA, Connectivity and Integration using the Windows Azure Service Bus” course. There are a number of scenarios where a message needs to be divided into a number of sub messages, and also where a number of sub messages need to be combined to form one message. The splitter and aggregator patterns provide a definition of how this can be achieved. This section will focus on the implementation of basic splitter and aggregator patens using the Windows Azure Service Bus direct programming model. In BizTalk Server receive pipelines are typically used to implement the splitter patterns, with sequential convoy orchestrations often used to aggregate messages. In the current release of the Service Bus, there is no functionality in the direct programming model that implements these patterns, so it is up to the developer to implement them in the applications that send and receive messages. Splitter A message splitter takes a message and spits the message into a number of sub messages. As there are different scenarios for how a message can be split into sub messages, message splitters are implemented using different algorithms. The Enterprise Integration Patterns book describes the splatter pattern as follows: How can we process a message if it contains multiple elements, each of which may have to be processed in a different way? Use a Splitter to break out the composite message into a series of individual messages, each containing data related to one item. The Enterprise Integration Patterns website provides a description of the Splitter pattern here. In some scenarios a batch message could be split into the sub messages that are contained in the batch. The splitting of a message could be based on the message type of sub-message, or the trading partner that the sub message is to be sent to. Aggregator An aggregator takes a stream or related messages and combines them together to form one message. The Enterprise Integration Patterns book describes the aggregator pattern as follows: How do we combine the results of individual, but related messages so that they can be processed as a whole? Use a stateful filter, an Aggregator, to collect and store individual messages until a complete set of related messages has been received. Then, the Aggregator publishes a single message distilled from the individual messages. The Enterprise Integration Patterns website provides a description of the Aggregator pattern here. A common example of the need for an aggregator is in scenarios where a stream of messages needs to be combined into a daily batch to be sent to a legacy line-of-business application. The BizTalk Server EDI functionality provides support for batching messages in this way using a sequential convoy orchestration. Scenario The scenario for this implementation of the splitter and aggregator patterns is the sending and receiving of large messages using a Service Bus queue. In the current release, the Windows Azure Service Bus currently supports a maximum message size of 256 KB, with a maximum header size of 64 KB. This leaves a safe maximum body size of 192 KB. The BrokeredMessage class will support messages larger than 256 KB; in fact the Size property is of type long, implying that very large messages may be supported at some point in the future. The 256 KB size restriction is set in the service bus components that are deployed in the Windows Azure data centers. One of the ways of working around this size restriction is to split large messages into a sequence of smaller sub messages in the sending application, send them via a queue, and then reassemble them in the receiving application. This scenario will be used to demonstrate the pattern implementations. Implementation The splitter and aggregator will be used to provide functionality to send and receive large messages over the Windows Azure Service Bus. In order to make the implementations generic and reusable they will be implemented as a class library. The splitter will be implemented in the LargeMessageSender class and the aggregator in the LargeMessageReceiver class. A class diagram showing the two classes is shown below. Implementing the Splitter The splitter will take a large brokered message, and split the messages into a sequence of smaller sub-messages that can be transmitted over the service bus messaging entities. The LargeMessageSender class provides a Send method that takes a large brokered message as a parameter. The implementation of the class is shown below; console output has been added to provide details of the splitting operation. public class LargeMessageSender { private static int SubMessageBodySize = 192 * 1024; private QueueClient m_QueueClient; public LargeMessageSender(QueueClient queueClient) { m_QueueClient = queueClient; } public void Send(BrokeredMessage message) { // Calculate the number of sub messages required. long messageBodySize = message.Size; int nrSubMessages = (int)(messageBodySize / SubMessageBodySize); if (messageBodySize % SubMessageBodySize != 0) { nrSubMessages++; } // Create a unique session Id. string sessionId = Guid.NewGuid().ToString(); Console.WriteLine("Message session Id: " + sessionId); Console.Write("Sending {0} sub-messages", nrSubMessages); Stream bodyStream = message.GetBody<Stream>(); for (int streamOffest = 0; streamOffest < messageBodySize; streamOffest += SubMessageBodySize) { // Get the stream chunk from the large message long arraySize = (messageBodySize - streamOffest) > SubMessageBodySize ? SubMessageBodySize : messageBodySize - streamOffest; byte[] subMessageBytes = new byte[arraySize]; int result = bodyStream.Read(subMessageBytes, 0, (int)arraySize); MemoryStream subMessageStream = new MemoryStream(subMessageBytes); // Create a new message BrokeredMessage subMessage = new BrokeredMessage(subMessageStream, true); subMessage.SessionId = sessionId; // Send the message m_QueueClient.Send(subMessage); Console.Write("."); } Console.WriteLine("Done!"); }} The LargeMessageSender class is initialized with a QueueClient that is created by the sending application. When the large message is sent, the number of sub messages is calculated based on the size of the body of the large message. A unique session Id is created to allow the sub messages to be sent as a message session, this session Id will be used for correlation in the aggregator. A for loop in then used to create the sequence of sub messages by creating chunks of data from the stream of the large message. The sub messages are then sent to the queue using the QueueClient. As sessions are used to correlate the messages, the queue used for message exchange must be created with the RequiresSession property set to true. Implementing the Aggregator The aggregator will receive the sub messages in the message session that was created by the splitter, and combine them to form a single, large message. The aggregator is implemented in the LargeMessageReceiver class, with a Receive method that returns a BrokeredMessage. The implementation of the class is shown below; console output has been added to provide details of the splitting operation. public class LargeMessageReceiver { private QueueClient m_QueueClient; public LargeMessageReceiver(QueueClient queueClient) { m_QueueClient = queueClient; } public BrokeredMessage Receive() { // Create a memory stream to store the large message body. MemoryStream largeMessageStream = new MemoryStream(); // Accept a message session from the queue. MessageSession session = m_QueueClient.AcceptMessageSession(); Console.WriteLine("Message session Id: " + session.SessionId); Console.Write("Receiving sub messages"); while (true) { // Receive a sub message BrokeredMessage subMessage = session.Receive(TimeSpan.FromSeconds(5)); if (subMessage != null) { // Copy the sub message body to the large message stream. Stream subMessageStream = subMessage.GetBody<Stream>(); subMessageStream.CopyTo(largeMessageStream); // Mark the message as complete. subMessage.Complete(); Console.Write("."); } else { // The last message in the sequence is our completeness criteria. Console.WriteLine("Done!"); break; } } // Create an aggregated message from the large message stream. BrokeredMessage largeMessage = new BrokeredMessage(largeMessageStream, true); return largeMessage; } } The LargeMessageReceiver initialized using a QueueClient that is created by the receiving application. The receive method creates a memory stream that will be used to aggregate the large message body. The AcceptMessageSession method on the QueueClient is then called, which will wait for the first message in a message session to become available on the queue. As the AcceptMessageSession can throw a timeout exception if no message is available on the queue after 60 seconds, a real-world implementation should handle this accordingly. Once the message session as accepted, the sub messages in the session are received, and their message body streams copied to the memory stream. Once all the messages have been received, the memory stream is used to create a large message, that is then returned to the receiving application. Testing the Implementation The splitter and aggregator are tested by creating a message sender and message receiver application. The payload for the large message will be one of the webcast video files from http://www.cloudcasts.net/, the file size is 9,697 KB, well over the 256 KB threshold imposed by the Service Bus. As the splitter and aggregator are implemented in a separate class library, the code used in the sender and receiver console is fairly basic. The implementation of the main method of the sending application is shown below. static void Main(string[] args) { // Create a token provider with the relevant credentials. TokenProvider credentials = TokenProvider.CreateSharedSecretTokenProvider (AccountDetails.Name, AccountDetails.Key); // Create a URI for the serivce bus. Uri serviceBusUri = ServiceBusEnvironment.CreateServiceUri ("sb", AccountDetails.Namespace, string.Empty); // Create the MessagingFactory MessagingFactory factory = MessagingFactory.Create(serviceBusUri, credentials); // Use the MessagingFactory to create a queue client QueueClient queueClient = factory.CreateQueueClient(AccountDetails.QueueName); // Open the input file. FileStream fileStream = new FileStream(AccountDetails.TestFile, FileMode.Open); // Create a BrokeredMessage for the file. BrokeredMessage largeMessage = new BrokeredMessage(fileStream, true); Console.WriteLine("Sending: " + AccountDetails.TestFile); Console.WriteLine("Message body size: " + largeMessage.Size); Console.WriteLine(); // Send the message with a LargeMessageSender LargeMessageSender sender = new LargeMessageSender(queueClient); sender.Send(largeMessage); // Close the messaging facory. factory.Close(); } The implementation of the main method of the receiving application is shown below. static void Main(string[] args) { // Create a token provider with the relevant credentials. TokenProvider credentials = TokenProvider.CreateSharedSecretTokenProvider (AccountDetails.Name, AccountDetails.Key); // Create a URI for the serivce bus. Uri serviceBusUri = ServiceBusEnvironment.CreateServiceUri ("sb", AccountDetails.Namespace, string.Empty); // Create the MessagingFactory MessagingFactory factory = MessagingFactory.Create(serviceBusUri, credentials); // Use the MessagingFactory to create a queue client QueueClient queueClient = factory.CreateQueueClient(AccountDetails.QueueName); // Create a LargeMessageReceiver and receive the message. LargeMessageReceiver receiver = new LargeMessageReceiver(queueClient); BrokeredMessage largeMessage = receiver.Receive(); Console.WriteLine("Received message"); Console.WriteLine("Message body size: " + largeMessage.Size); string testFile = AccountDetails.TestFile.Replace(@"\In\", @"\Out\"); Console.WriteLine("Saving file: " + testFile); // Save the message body as a file. Stream largeMessageStream = largeMessage.GetBody<Stream>(); largeMessageStream.Seek(0, SeekOrigin.Begin); FileStream fileOut = new FileStream(testFile, FileMode.Create); largeMessageStream.CopyTo(fileOut); fileOut.Close(); Console.WriteLine("Done!"); } In order to test the application, the sending application is executed, which will use the LargeMessageSender class to split the message and place it on the queue. The output of the sender console is shown below. The console shows that the body size of the large message was 9,929,365 bytes, and the message was sent as a sequence of 51 sub messages. When the receiving application is executed the results are shown below. The console application shows that the aggregator has received the 51 messages from the message sequence that was creating in the sending application. The messages have been aggregated to form a massage with a body of 9,929,365 bytes, which is the same as the original large message. The message body is then saved as a file. Improvements to the Implementation The splitter and aggregator patterns in this implementation were created in order to show the usage of the patterns in a demo, which they do quite well. When implementing these patterns in a real-world scenario there are a number of improvements that could be made to the design. Copying Message Header Properties When sending a large message using these classes, it would be great if the message header properties in the message that was received were copied from the message that was sent. The sending application may well add information to the message context that will be required in the receiving application. When the sub messages are created in the splitter, the header properties in the first message could be set to the values in the original large message. The aggregator could then used the values from this first sub message to set the properties in the message header of the large message during the aggregation process. Using Asynchronous Methods The current implementation uses the synchronous send and receive methods of the QueueClient class. It would be much more performant to use the asynchronous methods, however doing so may well affect the sequence in which the sub messages are enqueued, which would require the implementation of a resequencer in the aggregator to restore the correct message sequence. Handling Exceptions In order to keep the code readable no exception handling was added to the implementations. In a real-world scenario exceptions should be handled accordingly.

Read the article
Scrum and Team Consolidation

- by John K. Hines

I’m still working my way through one of the more painful team consolidations of my career. One thing that’s made it hard was my assumption that the use of Agile methods and Scrum would make everything easy. Take three teams, make all work visible, track it, and presto: An efficient, functioning software development team. What I’ve come to realize is that the primary benefit of Scrum is that Scrum brings teams closer to their customers. Frequent meetings, short iterations, and phased deployments are all meant to keep the customer in the loop. It’s true that as teams become proficient with Scrum they tend to become more efficient. But I don’t think it’s true that Scrum automatically helps people work together. Instead, Scrum can point out when teams aren’t good at working together. And it really illustrates when teams, especially teams in sustaining mode, are reacting to their customers instead of innovating with them. At the moment we’ve inherited a huge backlog of tools, processes, and personalities. It’s up to us to sort them all out. Unfortunately, after 7 ½ months we’re still sorting. What I’d recommend for any blended team is to look at your current product lifecycles and work on a single lifecycle for all work. If you can’t objectively come up with one process, that’s a good indication that the new team might not be a good fit for being a single unit (which happens all the time in bigger companies). Go ahead & self-organize into sub-teams. Then repeat the process. If you can come up with a single process, tackle each piece and standardize all of them. Do this as soon as possible, as it can be uncomfortable. Standardize your requirements gathering and tracking, your exploration and technical analysis, your project planning, development standards, validation and sustaining processes. Standardize all of it. Make this your top priority, get it out of the way, and get back to work. Lastly, managers of blended teams should realize what I’m suggesting is a disruptive process. But you’ve just reorganized the team is already disrupted. Don’t pull the bandage off slowly and force the team through a prolonged transition phase, lowering their productivity over the long term. You can role model leadership to your team and drive a true consolidation. Destroy roadblocks, reassure those on your team who are afraid of change, and push forward to create something efficient and beautiful. Then use Scrum to reengage your customers in a way that they’ll love. Technorati tags: Scrum Scrum Process

Read the article
How does I/O work for large graph databases?

- by tjb1982

I should preface this by saying that I'm mostly a front end web developer, trained as a musician, but over the past few years I've been getting more and more into computer science. So one idea I have as a fun toy project to learn about data structures and C programming was to design and implement my own very simple database that would manage an adjacency list of posts. I don't want SQL (maybe I'll do my own query language? I'm just having fun). It should support ACID. It should be capable of storing 1TB let's say. So with that, I was trying to think of how a database even stores data, without regard to data structures necessarily. I'm working on linux, and I've read that in that world "everything is a file," including hardware (like /dev/*), so I think that that obviously has to apply to a database, too, and it clearly does--whether it's MySQL or PostgreSQL or Neo4j, the database itself is a collection of files you can see in the filesystem. That said, there would come a point in scale where loading the entire database into primary memory just wouldn't work, so it doesn't make sense to design it with that mindset (I assume). However, reading from secondary memory would be much slower and regardless some portion of the database has to be in primary memory in order for you to be able to do anything with it. I read this post: Why use a database instead of just saving your data to disk? And I found it difficult to understand how other databases, like SQLite or Neo4j, read and write from secondary memory and are still very fast (faster, it would seem, than simply writing files to the filesystem as the above question suggests). It seems the key is indexing. But even indexes need to be stored in secondary memory. They are inherently smaller than the database itself, but indexes in a very large database might be prohibitively large, too. So my question is how is I/O generally done with large databases like the one I described above that would be at least 1TB storing a big adjacency list? If indexing is more or less the answer, how exactly does indexing work--what data structures should be involved?

Read the article
Opinion: IT Strangled by Overspecialization

What happened to the old "sysadmin" of just a few years ago? We've split what used to be the sysadmin into application teams, server teams, storage teams, and network teams. Now look at what we've done -- knowledge is so decentralized we must invent new roles to act as liaisons between all the IT groups.

Read the article
Why do large IT projects tend to fail or have big cost/schedule overruns?

- by Pratik

I always read about large scale transformation or integration project that are total or almost total disaster. Even if they somehow manage to succeed the cost and schedule blow out is enormous. What is the real reason behind large projects being more prone to failure. Can agile be used in these sort of projects or traditional approach is still the best. One example from Australia is the Queensland Payroll project where they changed test success criteria to deliver the project. See some more failed projects in this SO question Have you got any personal experience to share?

Read the article
How can I create multiple identical AWS EC2 server instances with large amounts of persistent data?

- by mojones

I have a CPU-intensive data-processing application that I want to run across many (~100,000) input files. The application needs a large (~20GB) data file in order to run. What I would like to do is create an EC2 machine image that has my application and associated data files installed boot up a large number (e.g. 100) of instances of this image split my input files up into 100 batches and send one batch to be processed on each instance I am having trouble figuring out the best way to ensure that each instance has access to the large data file. The data file is too big to fit on the root filesystem of an AMI. I could use Block Storage, but a given Block Storage volume can only be attached to a single instance, so I would need 100 clones. Is there some way to create a custom image that has more space on the root filsystem so that I can include my large data file? Or is there a better way to tackle this problem?

Read the article

< Previous Page | 42 43 44 45 46 47 48 49 50 51 52 53 | Next Page >