Search Results

Search found 11675 results on 467 pages for 'parallel testing'.

Page 140/467 | < Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147  | Next Page >

  • Microsoft Technical Computing

    - by Daniel Moth
    In the past I have described the team I belong to here at Microsoft (Parallel Computing Platform) in terms of contributing to Visual Studio and related products, e.g. .NET Framework. To be more precise, our team is part of the Technical Computing group, which is still part of the Developer Division. This was officially announced externally earlier this month in an exec email (from Bob Muglia, the president of STB, to which DevDiv belongs). Here is an extract: "… As we build the Technical Computing initiative, we will invest in three core areas: 1. Technical computing to the cloud: Microsoft will play a leading role in bringing technical computing power to scientists, engineers and analysts through the cloud. Existing high- performance computing users will benefit from the ability to augment their on-premises systems with cloud resources that enable ‘just-in-time’ processing. This platform will help ensure processing resources are available whenever they are needed—reliably, consistently and quickly. 2. Simplify parallel development: Today, computers are shipping with more processing power than ever, including multiple cores, but most modern software only uses a small amount of the available processing power. Parallel programs are extremely difficult to write, test and trouble shoot. However, a consistent model for parallel programming can help more developers unlock the tremendous power in today’s modern computers and enable a new generation of technical computing. We are delivering new tools to automate and simplify writing software through parallel processing from the desktop… to the cluster… to the cloud. 3. Develop powerful new technical computing tools and applications: We know scientists, engineers and analysts are pushing common tools (i.e., spreadsheets and databases) to the limits with complex, data-intensive models. They need easy access to more computing power and simplified tools to increase the speed of their work. We are building a platform to do this. Our development efforts will yield new, easy-to-use tools and applications that automate data acquisition, modeling, simulation, visualization, workflow and collaboration. This will allow them to spend more time on their work and less time wrestling with complicated technology. …" Our Parallel Computing Platform team is directly responsible for item #2, and we work very closely with the teams delivering items #1 and #3. At the same time as the exec email, our marketing team unveiled a website with interviews that I invite you to check out: Modeling the World. Comments about this post welcome at the original blog.

    Read the article

  • How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

    - by Bob Hanckel
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables.  They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL  8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use.  The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section).  Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load.  In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks.  It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts.  The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes.   The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead.  The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load.  It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same.  Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size.  For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods.  This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

    Read the article

  • Is it possible to do A/B testing by page rather than by individual?

    - by mojones
    Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc). Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n) This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page. Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?

    Read the article

  • Unit testing an MVC action method with a Cache dependency?

    - by Steve
    I’m relatively new to testing and MVC and came across a sticking point today. I’m attempting to test an action method that has a dependency on HttpContext.Current.Cache and wanted to know the best practice for achieving the “low coupling” to allow for easy testing. Here's what I've got so far... public class CacheHandler : ICacheHandler { public IList<Section3ListItem> StateList { get { return (List<Section3ListItem>)HttpContext.Current.Cache["StateList"]; } set { HttpContext.Current.Cache["StateList"] = value; } } ... I then access it like such... I'm using Castle for my IoC. public class ProfileController : ControllerBase { private readonly ISection3Repository _repository; private readonly ICacheHandler _cache; public ProfileController(ISection3Repository repository, ICacheHandler cacheHandler) { _repository = repository; _cache = cacheHandler; } [UserIdFilter] public ActionResult PersonalInfo(Guid userId) { if (_cache.StateList == null) _cache.StateList = _repository.GetLookupValues((int)ELookupKey.States).ToList(); ... Then in my unit tests I am able to mock up ICacheHandler. Would this be considered a 'best practice' and does anyone have any suggestions for other approaches? Thanks in advance. Cheers

    Read the article

  • Does "for" in .Net Framework 4.0 execute loops in parallel? Or why is the total not the sum of the p

    - by Shiraz Bhaiji
    I am writing code to performance test a web site. I have the following code: string url = "http://xxxxxx"; System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch(); System.Diagnostics.Stopwatch totalTime = new System.Diagnostics.Stopwatch(); totalTime.Start(); for (int i = 0; i < 10; i++) { stopwatch.Start(); WebRequest request = HttpWebRequest.Create(url); WebResponse webResponse = request.GetResponse(); webResponse.Close(); stopwatch.Stop(); textBox1.Text += "Time Taken " + i.ToString() + " = " + stopwatch.Elapsed.Milliseconds.ToString() + Environment.NewLine; stopwatch.Reset(); } totalTime.Stop(); textBox1.Text += "Total Time Taken = " + totalTime.Elapsed.Milliseconds.ToString() + Environment.NewLine; Which is giving the following result: Time Taken 0 = 88 Time Taken 1 = 161 Time Taken 2 = 218 Time Taken 3 = 417 Time Taken 4 = 236 Time Taken 5 = 217 Time Taken 6 = 217 Time Taken 7 = 218 Time Taken 8 = 409 Time Taken 9 = 48 Total Time Taken = 257 I had expected the total time to be the sum of the individual times. Can anybody see why it is not?

    Read the article

  • How to execute a program on PostBuild event in parallel?

    - by John
    I managed to set the compiler to execute another program when the project is built/ran with the following directive in project options: call program.exe param1 param2 The problem is that the compiler executes "program.exe" and waits for it to terminate and THEN the project executable is ran. What I ask: How to set the compiler to run both executables in paralel without waiting for the one in PostBuild event to terminate? Thanks in advance

    Read the article

  • SSIS - Parallel Execution of Tasks - How efficient is it?

    - by Randy Minder
    I am building an SSIS package that will contain dozens of Sequence tasks. Each Sequence task will contain three tasks. One to truncate a destination table and remove indexes on the table, another to import data from a source table, and a third to add back indexes to the destination table. My question is this. I currently have nine of these Sequences tasks built, and none are dependent on any of the others. When I execute the package, SSIS seems to do a pretty good job of determining which tasks in which Sequence to execute, which, by the way, appears to be quite random. As I continue adding more Sequences, should I attempt to be smarter about how SSIS should execute these Sequences, or is SSIS smart enough to do it itself? Thanks.

    Read the article

  • SQLAlchemy - SQLite for testing and Postgresql for devlopment - How to port?

    - by StackUnderflow
    I want to use sqlite memory database for all my testing and Postgresql for my development/production server. But the SQL syntax is not same in both dbs. for ex: SQLite has autoincrement, and Postgresql has serial Is it easy to port the SQL script from sqlite to postgresql... what are your solutions? If you want me to use standard SQL, how should I go about generating primary key in both the databases?

    Read the article

  • iPhone:How to prepare app build to my client for testing?

    - by user187532
    Hi iPhone experts, I am a registered Apple developer. I am developing an iPhone application for my client. If i want to give my app build to my client for testing my build, do i need to create a build using my adhoc provisioning profile and send the build as well as my adhoc provisioning profile to my client? If yes, how can they install my build on their iPhone devices? Could someone guide me please? Thank in advance.

    Read the article

  • In Perl, how can I wait for threads to end in parallel?

    - by Pmarcoen
    I have a Perl script that launches 2 threads,one for each processor. I need it to wait for a thread to end, if one thread ends a new one is spawned. It seems that the join method blocks the rest of the program, therefore the second thread can't end until everything the first thread does is done which sort of defeats its purpose. I tried the is_joinable method but that doesn't seem to do it either. Here is some of my code : use threads; use threads::shared; @file_list = @ARGV; #Our file list $nofiles = $#file_list + 1; #Real number of files $currfile = 1; #Current number of file to process my %MSG : shared; #shared hash $thr0 = threads->new(\&process, shift(@file_list)); $currfile++; $thr1 = threads->new(\&process, shift(@file_list)); $currfile++; while(1){ if ($thr0->is_joinable()) { $thr0->join; #check if there are files left to process if($currfile <= $nofiles){ $thr0 = threads->new(\&process, shift(@file_list)); $currfile++; } } if ($thr1->is_joinable()) { $thr1->join; #check if there are files left to process if($currfile <= $nofiles){ $thr1 = threads->new(\&process, shift(@file_list)); $currfile++; } } } sub process{ print "Opening $currfile of $nofiles\n"; #do some stuff if(some condition){ lock(%MSG); #write stuff to hash } print "Closing $currfile of $nofiles\n"; } The output of this is : Opening 1 of 4 Opening 2 of 4 Closing 1 of 4 Opening 3 of 4 Closing 3 of 4 Opening 4 of 4 Closing 2 of 4 Closing 4 of 4

    Read the article

  • SSIS: Way to handle hot folder items in parallel?

    - by Dr. Zim
    We have eight Xeon (i7) cores and 16 gig of RAM on our SSIS box. We have about 200 image files we want to convert using a command line utility every day. Currently the process is using Adobe Photoshop and droplets (very manual, taking upwards of two hours a day) Using SSIS hot folders, is there a way to execute up to eight conversions at once? Is there any way to tell a process completed or execute code upon it's completion?

    Read the article

  • How (and if) to write a single-consumer queue using the task parallel library?

    - by Eric
    I've heard a bunch of podcasts recently about the TPL in .NET 4.0. Most of them describe background activities like downloading images or doing a computation, using tasks so that the work doesn't interfere with a GUI thread. Most of the code I work on has more of a multiple-producer / single-consumer flavor, where work items from multiple sources must be queued and then processed in order. One example would be logging, where log lines from multiple threads are sequentialized into a single queue for eventual writing to a file or database. All the records from any single source must remain in order, and records from the same moment in time should be "close" to each other in the eventual output. So multiple threads or tasks or whatever are all invoking a queuer: lock( _queue ) // or use a lock-free queue! { _queue.enqueue( some_work ); _queueSemaphore.Release(); } And a dedicated worker thread processes the queue: while( _queueSemaphore.WaitOne() ) { lock( _queue ) { some_work = _queue.dequeue(); } deal_with( some_work ); } It's always seemed reasonable to dedicate a worker thread for the consumer side of these tasks. Should I write future programs using some construct from the TPL instead? Which one? Why?

    Read the article

  • Are parallel calls to send/recv on the same socket valid?

    - by Jay
    Can we call send from one thread and recv from another on the same socket? Can we call multiple sends parallely from different threads on the same socket? I know that a good design should avoid this, but I am not clear how these system APIs will behave. I am unable to find a good documentation also for the same. Any pointers in the direction will be helpful.

    Read the article

< Previous Page | 136 137 138 139 140 141 142 143 144 145 146 147  | Next Page >