Search Results

Search found 9667 results on 387 pages for 'parallel loop'.

Page 82/387 | < Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89  | Next Page >

  • Parallelism in .NET – Part 8, PLINQ’s ForAll Method

    - by Reed
    Parallel LINQ extends LINQ to Objects, and is typically very similar.  However, as I previously discussed, there are some differences.  Although the standard way to handle simple Data Parellelism is via Parallel.ForEach, it’s possible to do the same thing via PLINQ. PLINQ adds a new method unavailable in standard LINQ which provides new functionality… LINQ is designed to provide a much simpler way of handling querying, including filtering, ordering, grouping, and many other benefits.  Reading the description in LINQ to Objects on MSDN, it becomes clear that the thinking behind LINQ deals with retrieval of data.  LINQ works by adding a functional programming style on top of .NET, allowing us to express filters in terms of predicate functions, for example. PLINQ is, generally, very similar.  Typically, when using PLINQ, we write declarative statements to filter a dataset or perform an aggregation.  However, PLINQ adds one new method, which provides a very different purpose: ForAll. The ForAll method is defined on ParallelEnumerable, and will work upon any ParallelQuery<T>.  Unlike the sequence operators in LINQ and PLINQ, ForAll is intended to cause side effects.  It does not filter a collection, but rather invokes an action on each element of the collection. At first glance, this seems like a bad idea.  For example, Eric Lippert clearly explained two philosophical objections to providing an IEnumerable<T>.ForEach extension method, one of which still applies when parallelized.  The sole purpose of this method is to cause side effects, and as such, I agree that the ForAll method “violates the functional programming principles that all the other sequence operators are based upon”, in exactly the same manner an IEnumerable<T>.ForEach extension method would violate these principles.  Eric Lippert’s second reason for disliking a ForEach extension method does not necessarily apply to ForAll – replacing ForAll with a call to Parallel.ForEach has the same closure semantics, so there is no loss there. Although ForAll may have philosophical issues, there is a pragmatic reason to include this method.  Without ForAll, we would take a fairly serious performance hit in many situations.  Often, we need to perform some filtering or grouping, then perform an action using the results of our filter.  Using a standard foreach statement to perform our action would avoid this philosophical issue: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action foreach (var item in filteredItems) { // These will now run serially item.DoSomething(); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This would cause a loss in performance, since we lose any parallelism in place, and cause all of our actions to be run serially. We could easily use a Parallel.ForEach instead, which adds parallelism to the actions: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action once the filter completes Parallel.ForEach(filteredItems, item => { // These will now run in parallel item.DoSomething(); }); This is a noticeable improvement, since both our filtering and our actions run parallelized.  However, there is still a large bottleneck in place here.  The problem lies with my comment “perform an action once the filter completes”.  Here, we’re parallelizing the filter, then collecting all of the results, blocking until the filter completes.  Once the filtering of every element is completed, we then repartition the results of the filter, reschedule into multiple threads, and perform the action on each element.  By moving this into two separate statements, we potentially double our parallelization overhead, since we’re forcing the work to be partitioned and scheduled twice as many times. This is where the pragmatism comes into play.  By violating our functional principles, we gain the ability to avoid the overhead and cost of rescheduling the work: // Perform an action on the results of our filter collection .AsParallel() .Where( i => i.SomePredicate() ) .ForAll( i => i.DoSomething() ); The ability to avoid the scheduling overhead is a compelling reason to use ForAll.  This really goes back to one of the key points I discussed in data parallelism: Partition your problem in a way to place the most work possible into each task.  Here, this means leaving the statement attached to the expression, even though it causes side effects and is not standard usage for LINQ. This leads to my one guideline for using ForAll: The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression. Any other usage scenario should use Parallel.ForEach, instead.

    Read the article

  • Microsoft Technical Computing

    - by Daniel Moth
    In the past I have described the team I belong to here at Microsoft (Parallel Computing Platform) in terms of contributing to Visual Studio and related products, e.g. .NET Framework. To be more precise, our team is part of the Technical Computing group, which is still part of the Developer Division. This was officially announced externally earlier this month in an exec email (from Bob Muglia, the president of STB, to which DevDiv belongs). Here is an extract: "… As we build the Technical Computing initiative, we will invest in three core areas: 1. Technical computing to the cloud: Microsoft will play a leading role in bringing technical computing power to scientists, engineers and analysts through the cloud. Existing high- performance computing users will benefit from the ability to augment their on-premises systems with cloud resources that enable ‘just-in-time’ processing. This platform will help ensure processing resources are available whenever they are needed—reliably, consistently and quickly. 2. Simplify parallel development: Today, computers are shipping with more processing power than ever, including multiple cores, but most modern software only uses a small amount of the available processing power. Parallel programs are extremely difficult to write, test and trouble shoot. However, a consistent model for parallel programming can help more developers unlock the tremendous power in today’s modern computers and enable a new generation of technical computing. We are delivering new tools to automate and simplify writing software through parallel processing from the desktop… to the cluster… to the cloud. 3. Develop powerful new technical computing tools and applications: We know scientists, engineers and analysts are pushing common tools (i.e., spreadsheets and databases) to the limits with complex, data-intensive models. They need easy access to more computing power and simplified tools to increase the speed of their work. We are building a platform to do this. Our development efforts will yield new, easy-to-use tools and applications that automate data acquisition, modeling, simulation, visualization, workflow and collaboration. This will allow them to spend more time on their work and less time wrestling with complicated technology. …" Our Parallel Computing Platform team is directly responsible for item #2, and we work very closely with the teams delivering items #1 and #3. At the same time as the exec email, our marketing team unveiled a website with interviews that I invite you to check out: Modeling the World. Comments about this post welcome at the original blog.

    Read the article

  • Has the hardware in my modem gone bad?

    - by Tyler Scott
    I contacted CenturyLink about my modem recently and received useless and unrelated information. The problem seems to be that the modem will no longer save settings, the web interface is unusable except in internet explorer for some reason, and the modem keeps resetting. CenturyLink claimed it had to do with signal strength but I checked and it is currently between good and outstanding according to this. All of the lights remain green even when it starts acting up and I lose internet and shortly before it crashes and reboots. Does anyone have any idea what is going on or what I can do to fix it? (Asking CenturyLink again is obviously not going to help.) Update 1: Accessing the syslog from the web interface causes a crash. After it reboots, the log looks like as follows: 01/01/1970 12:01:29 AM Ethernet Ethernet client connected ,ip(192.168.0.2), mac(1c:6f:65:4c:6d:3b) 01/01/1970 12:01:38 AM Wireless 802.11 client connected ,ip(192.168.0.18), mac(d0:df:c7:c2:73:ca) 01/01/1970 12:01:41 AM System Event Line 0: VDSL2 link up, Bearer 0, us=20128, ds=40127 01/01/1970 12:01:43 AM dhcp6s[2028] dhcp6_ctl_authinit: failed to open /etc/dhcp6sctlkey: No such file or directory 01/01/1970 12:01:50 AM dhcp6s[2469] dhcp6_ctl_authinit: failed to open /etc/dhcp6sctlkey: No such file or directory 01/01/1970 12:01:52 AM radvd[2306] poll error: Interrupted system call 01/01/1970 12:01:56 AM PPP Link PPP server detected. 01/01/1970 12:01:56 AM PPP Link PPP session established. 01/01/1970 12:01:56 AM PPP Link PPP LCP UP. 01/01/1970 12:01:56 AM System Event Received valid IP address from server. Connection UP. 06/05/2014 08:16:01 AM radvd[2511] poll error: Interrupted system call 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:03 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:04 AM System Event Dead loop on virtual device tun6rd, fix it urgently! 06/05/2014 08:16:04 AM dhcp6s[3236] dhcp6_ctl_authinit: failed to open /etc/dhcp6sctlkey: No such file or directory 06/05/2014 08:16:08 AM Wireless 802.11 client connected ,ip(192.168.0.7), mac(44:6d:57:c4:d7:08) I also get it to crash on various other pages. I am guessing the web server is unstable.

    Read the article

  • How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

    - by Bob Hanckel
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables.  They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL  8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use.  The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section).  Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load.  In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks.  It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts.  The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes.   The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead.  The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load.  It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same.  Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size.  For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods.  This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

    Read the article

  • Loop through all cells in Xceed DataGrid for WPF?

    - by ewall
    I am changing the background color of the cells when the user has made an edit. I would like to return all cells to normal colors when the changes are saved (or reverted). It's easy enough to set the cell's original background color (as stored in the parent row). But I can't figure out how to loop through all the cells in the table to reset them. I found an article in the Xceed Knowledge Base called "How to iterate through the grid's rows"... which you would think would be perfect, right? Wrong; the properties (or methods) like .DataRows, .FixedHeaderRows, etc. mentioned in the article are from an older/defunct Xceed product. This forum thread recommends using the DataGrid's .Items property, which in my case returns a collection of System.Data.DataRowViews... but I can't find any way to cast that (or any of its related elements) up to the Xceed.Wpf.DataGrid.DataCells I need to change the background color. In short, how do I loop through the rows and cells so I can reset the background property?

    Read the article

  • What is the most platform- and Python-version-independent way to make a fast loop for use in Python?

    - by Statto
    I'm writing a scientific application in Python with a very processor-intensive loop at its core. I would like to optimise this as far as possible, at minimum inconvenience to end users, who will probably use it as an uncompiled collection of Python scripts, and will be using Windows, Mac, and (mainly Ubuntu) Linux. It is currently written in Python with a dash of NumPy, and I've included the code below. Is there a solution which would be reasonably fast which would not require compilation? This would seem to be the easiest way to maintain platform-independence. If using something like Pyrex, which does require compilation, is there an easy way to bundle many modules and have Python choose between them depending on detected OS and Python version? Is there an easy way to build the collection of modules without needing access to every system with every version of Python? Does one method lend itself particularly to multi-processor optimisation? (If you're interested, the loop is to calculate the magnetic field at a given point inside a crystal by adding together the contributions of a large number of nearby magnetic ions, treated as tiny bar magnets. Basically, a massive sum of these.) # calculate_dipole # ------------------------- # calculate_dipole works out the dipole field at a given point within the crystal unit cell # --- # INPUT # mu = position at which to calculate the dipole field # r_i = array of atomic positions # mom_i = corresponding array of magnetic moments # --- # OUTPUT # B = the B-field at this point def calculate_dipole(mu, r_i, mom_i): relative = mu - r_i r_unit = unit_vectors(relative) #4pi / mu0 (at the front of the dipole eqn) A = 1e-7 #initalise dipole field B = zeros(3,float) for i in range(len(relative)): #work out the dipole field and add it to the estimate so far B += A*(3*dot(mom_i[i],r_unit[i])*r_unit[i] - mom_i[i]) / sqrt(dot(relative[i],relative[i]))**3 return B

    Read the article

  • How does loop address alignment affect the speed on Intel x86_64?

    - by Alexander Gololobov
    I'm seeing 15% performance degradation of the same C++ code compiled to exactly same machine instructions but located on differently aligned addresses. When my tiny main loop starts at 0x415220 it's faster then when it is at 0x415250. I'm running this on Intel Core2 Duo. I use gcc 4.4.5 on x86_64 Ubuntu. Can anybody explain the cause of slowdown and how I can force gcc to optimally align the loop? Here is the disassembly for both cases with profiler annotation: 415220 576 12.56% |XXXXXXXXXXXXXX 48 c1 eb 08 shr $0x8,%rbx 415224 110 2.40% |XX 0f b6 c3 movzbl %bl,%eax 415227 0.00% | 41 0f b6 04 00 movzbl (%r8,%rax,1),%eax 41522c 40 0.87% | 48 8b 04 c1 mov (%rcx,%rax,8),%rax 415230 806 17.58% |XXXXXXXXXXXXXXXXXXX 4c 63 f8 movslq %eax,%r15 415233 186 4.06% |XXXX 48 c1 e8 20 shr $0x20,%rax 415237 102 2.22% |XX 4c 01 f9 add %r15,%rcx 41523a 414 9.03% |XXXXXXXXXX a8 0f test $0xf,%al 41523c 680 14.83% |XXXXXXXXXXXXXXXX 74 45 je 415283 ::Run(char const*, char const*)+0x4b3 41523e 0.00% | 41 89 c7 mov %eax,%r15d 415241 0.00% | 41 83 e7 01 and $0x1,%r15d 415245 0.00% | 41 83 ff 01 cmp $0x1,%r15d 415249 0.00% | 41 89 c7 mov %eax,%r15d 415250 679 13.05% |XXXXXXXXXXXXXXXX 48 c1 eb 08 shr $0x8,%rbx 415254 124 2.38% |XX 0f b6 c3 movzbl %bl,%eax 415257 0.00% | 41 0f b6 04 00 movzbl (%r8,%rax,1),%eax 41525c 43 0.83% |X 48 8b 04 c1 mov (%rcx,%rax,8),%rax 415260 828 15.91% |XXXXXXXXXXXXXXXXXXX 4c 63 f8 movslq %eax,%r15 415263 388 7.46% |XXXXXXXXX 48 c1 e8 20 shr $0x20,%rax 415267 141 2.71% |XXX 4c 01 f9 add %r15,%rcx 41526a 634 12.18% |XXXXXXXXXXXXXXX a8 0f test $0xf,%al 41526c 749 14.39% |XXXXXXXXXXXXXXXXXX 74 45 je 4152b3 ::Run(char const*, char const*)+0x4c3 41526e 0.00% | 41 89 c7 mov %eax,%r15d 415271 0.00% | 41 83 e7 01 and $0x1,%r15d 415275 0.00% | 41 83 ff 01 cmp $0x1,%r15d 415279 0.00% | 41 89 c7 mov %eax,%r15d

    Read the article

  • How to use a loop to download HTML with paging?

    - by Nai
    I want to loop through this URL and download the HTML. https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" +"&start=" + i start and num controls the paging of the URL. So if &start=2, and &num=10, it will scrape 10 results from page 2. Given that Google has a max limit of num = 10, how can I write a loop that loops through the HTML and scrape the results for the first 10 pages? This is what I have so far which just scrapes the first page. //input search term Console.WriteLine("What is your search query?:"); string searchTerm = Console.ReadLine(); //concantenate the strings using + symbol to make it URL friendly for google string searchTermFormat = searchTerm.Replace(" ", "+"); //create a new instance of Webclient and use DownloadString method from the Webclient class to extract download html WebClient client = new WebClient(); int i = 1; string Json = client.DownloadString("https://www.googleapis.com/customsearch/v1?key=AIzaSyAAoPQprb6aAV-AfuVjoCdErKTiJHn-4uI&cx=017576662512468239146:omuauf_lfve&q=" + searchTermFormat + "&num=10" + "&start=" + i); //create a new instance of JavaScriptSerializer and deserialise the desired content JavaScriptSerializer js = new JavaScriptSerializer(); GoogleSearchResults results = js.Deserialize<GoogleSearchResults>(Json); //output results to console Console.WriteLine(js.Serialize(results)); Console.ReadLine();

    Read the article

  • Counting entries in a list of dictionaries: for loop vs. list comprehension with map(itemgetter)

    - by Dennis Williamson
    In a Python program I'm writing I've compared using a for loop and increment variables versus list comprehension with map(itemgetter) and len() when counting entries in dictionaries which are in a list. It takes the same time using a each method. Am I doing something wrong or is there a better approach? Here is a greatly simplified and shortened data structure: list = [ {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'biscuits and gravy'}, {'key1': False, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'peaches and cream'}, {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': False, 'filenotfound': 'Abbott and Costello'}, {'key1': False, 'dontcare': False, 'ignoreme': True, 'key2': False, 'filenotfound': 'over and under'}, {'key1': True, 'dontcare': True, 'ignoreme': False, 'key2': True, 'filenotfound': 'Scotch and... well... neat, thanks'} ] Here is the for loop version: #!/usr/bin/env python # Python 2.6 # count the entries where key1 is True # keep a separate count for the subset that also have key2 True key1 = key2 = 0 for dictionary in list: if dictionary["key1"]: key1 += 1 if dictionary["key2"]: key2 += 1 print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2) Output for the data above: Counts: key1: 3, subset key2: 2 Here is the other, perhaps more Pythonic, version: #!/usr/bin/env python # Python 2.6 # count the entries where key1 is True # keep a separate count for the subset that also have key2 True from operator import itemgetter KEY1 = 0 KEY2 = 1 getentries = itemgetter("key1", "key2") entries = map(getentries, list) key1 = len([x for x in entries if x[KEY1]]) key2 = len([x for x in entries if x[KEY1] and x[KEY2]]) print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2) Output for the data above (same as before): Counts: key1: 3, subset key2: 2 I'm a tiny bit surprised these take the same amount of time. I wonder if there's something faster. I'm sure I'm overlooking something simple. One alternative I've considered is loading the data into a database and doing SQL queries, but the data doesn't need to persist and I'd have to profile the overhead of the data transfer, etc., and a database may not always be available. I have no control over the original form of the data. The code above is not going for style points.

    Read the article

  • Which is faster for large "for" loop: function call or inline coding?

    - by zaplec
    Hi, I have programmed an embedded software (using C of course) and now I'm considering ways to improve the running time of the system. The most important single module in my system is one very large nested for loop module. That module consists of two nested for loops that loops max 122500 times. That's not very much yet, but the problem is that inside that nested for loop I have a function call to a function that is in another source file. That specific function consists mostly of two another nested for loops which loops always 22500 times. So now I have to make a function call 122500 times. I have made that function that is to be called a lot lighter and shorter (yet still works as it should) and now I started to think that would it be faster to rip off that function call and write that process directly inside those first two for loops? The processor in that system is ARM7TDMI and its frequency is 55MHz. The system itself isn't very time critical so it doesn't have to be real time capable. However the faster it can process its duties the better. Also would it be also faster to use while loops instead of fors? And any piece of advice about how to improve the running time is appreciated. -zaplec

    Read the article

  • Why Enumerable.Range is faster than a direct yield loop?

    - by Morgan Cheng
    Below code is checking performance of three different ways to do same solution. public static void Main(string[] args) { // for loop { Stopwatch sw = Stopwatch.StartNew(); int accumulator = 0; for (int i = 1; i <= 100000000; ++i) { accumulator += i; } sw.Stop(); Console.WriteLine("time = {0}; result = {1}", sw.ElapsedMilliseconds, accumulator); } //Enumerable.Range { Stopwatch sw = Stopwatch.StartNew(); var ret = Enumerable.Range(1, 100000000).Aggregate(0, (accumulator, n) => accumulator + n); sw.Stop(); Console.WriteLine("time = {0}; result = {1}", sw.ElapsedMilliseconds, ret); } //self-made IEnumerable<int> { Stopwatch sw = Stopwatch.StartNew(); var ret = GetIntRange(1, 100000000).Aggregate(0, (accumulator, n) => accumulator + n); sw.Stop(); Console.WriteLine("time = {0}; result = {1}", sw.ElapsedMilliseconds, ret); } } private static IEnumerable<int> GetIntRange(int start, int count) { int end = start + count; for (int i = start; i < end; ++i) { yield return i; } } } The result is like this: time = 306; result = 987459712 time = 1301; result = 987459712 time = 2860; result = 987459712 It is not surprising that "for loop" is faster than the other two solutions, because Enumerable.Aggregate takes more method invocations. However, it really surprises that "Enumerable.Range" is faster than the "self-made IEnumerable". I thought that Enumerable.Range will take more overhead than the simple GetIntRange method. What is the possible reason for this?

    Read the article

  • How do I get a less than in a javascript for loop in XSL to work?

    - by Kyle
    I am using CDATA to escape the script but in IE8's debugger I still get this message: "Expected ')'" in the for loop conditions. I am assuming it still thinks that the ; in the &lt; generated by CDATA is ending the loop conditions. Original script in my XSL template: <script type="text/javascript" language="javascript"> <![CDATA[ function submitform(form){ var oErrorArray = new Array(); for (i=0;i<form.length;i++) eval("oErrorArray["+i+"]=oError"+i); var goForm = true; for(i=0;i<form.length;i++) { oErrorArray[i].innerHTML = ""; if(form[i].value="")){ oErrorArray[i].innerHTML = "Error - input field is blank"; goForm = false; } } if(goForm == true) form.submit(); } function resetform(form){ form.reset(); } ]]> </script> Code generated after transformation (from IE8 debugger): <script type="text/javascript" language="javascript"> function submitform(form){ var oErrorArray = new Array(); for (i=0;i&lt;form.length;i++) eval("oErrorArray["+i+"]=oError"+i); goForm = true; for(i=0;i&lt;form.length;i++) { oErrorArray[i].innerHTML = ""; if(form[i].value="")){ oErrorArray[i].innerHTML = "Error - input field is blank"; goForm = false; } } if(goForm == true) form.submit(); } function resetform(form){ form.reset(); } </script> Error reported by IE8 debugger: Expected ')' login.xml, line 29 character 30 (which is right after the first "form.length")

    Read the article

  • Mulitple variable containing "for loop" is not working in my code...

    - by OM The Eternity
    Below is the code I am working with and the last for loop i am iterating is not working... I think i am doing something wrong using multiple variable in a for loop,also I m aware of the fact that it can be done. $updaterbk = "SELECT j1. * FROM jos_audittrail j1 LEFT OUTER JOIN jos_audittrail j2 ON ( j1.trackid != j2.trackid AND j1.field != j2.field AND j1.changedone < j2.changedone ) WHERE j1.operation = 'UPDATE' AND j2.id IS NULL "; $selectupdrbk = mysql_query($updaterbk); while($row1 = mysql_fetch_array($selectupdrbk)) { $updrbk[] = $row1; } foreach($updrbk as $upfield) { if(!in_array($upfield['field'],$exist)) { $exist[] = $upfield['field']; $existval[] = $upfield['oldvalue']; //echo $updqueryrbk = "UPDATE `".$upfield['table_name']."` SET "; } } print_r($existval); for($i=0;$j=0;$i<count($exist);$j<count($existval);$i++;$j++) { $updqueryrbk.= $exist[$i]['field']."=".$existval[$j]['oldvalue'].","; $updqueryrbk.="="; $updqueryrbk.= $existval[$j]['oldvalue']; $updqueryrbk.=","; }

    Read the article

  • How can I use multi-threading with a "for" or "foreach" loop?

    - by saafh
    I am trying to run the for loop in a separate thread so that the UI should be responsive and the progress bar is visible. The problem is that I don't know how to do that :). In this code, the process starts in a separate thread, but the next part of the code is executed at the same time. The messageBox is displayed and the results are never returned (e.g. the listbox's selected index property is never set). It doesn't work even if I use, "taskEx.delay()". TaskEx.Run(() => { for (int i = 0; i < sResults.Count(); i++) { if (sResults.ElementAt(i).DisplayIndexForSearchListBox.Trim().Contains(ayaStr)) { lstGoto.SelectedIndex = i; lstGoto_SelectionChanged(lstReadingSearchResults, null); IsIndexMatched = true; break; } } }); //TaskEx.delay(1000); if (IsIndexMatched == true) stkPanelGoto.Visibility = Visibility.Collapsed; else //the index didn't match { MessagePrompt.ShowMessage("The test'" + ayaStr + "' does not exist.", "Warning!"); } Could anyone please tell me how can I use multi-threading with a "for" or "foreach" loop?

    Read the article

  • Deleted array value still showing up on foreach loop in AS3 (bug in flash?)

    - by nexus
    It took me many hours to narrow down a problem in some code to this reproducible error, which seems to me like a bug in AVM2. Can anyone shed light on why this is occurring or how to fix it? When the value at index 1 is deleted and a value is subsequently set at index 0, the non-existent (undefined) value at index 1 will now show up in a foreach loop. I have only been able to produce this outcome with index 1 and 0 (not any other n and n-1). Run this code: package { import flash.display.Sprite; public class Main extends Sprite { public function Main():void { var bar : Array = new Array(6); out(bar); //proper behavior trace("bar[1] = 1", bar[1] = 1); out(bar); //proper behavior trace("delete bar[1]", delete bar[1]); out(bar); //proper behavior trace("bar[4] = 4", bar[4] = 4); out(bar); //for each loop will now iterate over the undefined position at index 1 trace("bar[0] = 0", bar[0] = 0); out(bar); trace("bar[3] = 3", bar[3] = 3); out(bar); } private function out(bar:Array):void { trace(bar); for each(var i : * in bar) { trace(i); } } } } It will give this output: ,,,,, bar[1] = 1 1 ,1,,,, 1 delete bar[1] true ,,,,, bar[4] = 4 4 ,,,,4, 4 bar[0] = 0 0 0,,,,4, 0 undefined 4 bar[3] = 3 3 0,,,3,4, 0 undefined 4 3

    Read the article

  • What's an effective way to reuse ArrayLists in a for loop?

    - by Patrick
    hi, I'm reusing the same ArrayList in a for loop, and I use for loop results = new ArrayList<Integer>(); experts = new ArrayList<Integer>(); output = new ArrayList<String>(); .... to create new ones. I guess this is wrong, because I'm allocating new memory. Is this correct ? If yes, how can I empty them ? Added: another example I'm creating new variables each time I call this method. Is this good practice ? I mean to create new precision, relevantFound.. etc ? Or should I declare them in my class, outside the method to not allocate more and more memory ? public static void computeMAP(ArrayList<Integer> results, ArrayList<Integer> experts) { //compute MAP double precision = 0; int relevantFound = 0; double sumprecision = 0; thanks

    Read the article

  • Watching setTimeout loops so that only one is running at a time.

    - by DA
    I'm creating a content rotator in jQuery. 5 items total. Item 1 fades in, pauses 10 seconds, fades out, then item 2 fades in. Repeat. Simple enough. Using setTimeout I can call a set of functions that create a loop and will repeat the process indefinitely. I now want to add the ability to interrupt this rotator at any time by clicking on a navigation element to jump directly to one of the content items. I originally started going down the path of pinging a variable constantly (say every half second) that would check to see if a navigation element was clicked and, if so, abandon the loop, then restart the loop based on the item that was clicked. The challenge I ran into was how to actually ping a variable via a timer. The solution is to dive into JavaScript closures...which are a little over my head but definitely something I need to delve into more. However, in the process of that, I came up with an alternative option that actually seems to be better performance-wise (theoretically, at least). I have a sample running here: http://jsbin.com/uxupi/14 (It's using console.log so have fireBug running) Sample script: $(document).ready(function(){ var loopCount = 0; $('p#hello').click(function(){ loopCount++; doThatThing(loopCount); }) function doThatOtherThing(currentLoopCount) { console.log('doThatOtherThing-'+currentLoopCount); if(currentLoopCount==loopCount){ setTimeout(function(){doThatThing(currentLoopCount)},5000) } } function doThatThing(currentLoopCount) { console.log('doThatThing-'+currentLoopCount); if(currentLoopCount==loopCount){ setTimeout(function(){doThatOtherThing(currentLoopCount)},5000); } } }) The logic being that every click of the trigger element will kick off the loop passing into itself a variable equal to the current value of the global variable. That variable gets passed back and forth between the functions in the loop. Each click of the trigger also increments the global variable so that subsequent calls of the loop have a unique local variable. Then, within the loop, before the next step of each loop is called, it checks to see if the variable it has still matches the global variable. If not, it knows that a new loop has already been activated so it just ends the existing loop. Thoughts on this? Valid solution? Better options? Caveats? Dangers? UPDATE: I'm using John's suggestion below via the clearTimeout option. However, I can't quite get it to work. The logic is as such: var slideNumber = 0; var timeout = null; function startLoop(slideNumber) { ...do stuff here to set up the slide based on slideNumber... slideFadeIn() } function continueCheck(){ if (timeout != null) { // cancel the scheduled task. clearTimeout(timeout); timeout = null; return false; }else{ return true; } }; function slideFadeIn() { if (continueCheck){ // a new loop hasn't been called yet so proceed... // fade in the LI $currentListItem.fadeIn(fade, function() { if(multipleFeatures){ timeout = setTimeout(slideFadeOut,display); } }); }; function slideFadeOut() { if (continueLoop){ // a new loop hasn't been called yet so proceed... slideNumber=slideNumber+1; if(slideNumber==features.length) { slideNumber = 0; }; timeout = setTimeout(function(){startLoop(slideNumber)},100); }; startLoop(slideNumber); The above kicks of the looping. I then have navigation items that, when clicked, I want the above loop to stop, then restart with a new beginning slide: $(myNav).click(function(){ clearTimeout(timeout); timeout = null; startLoop(thisItem); }) If I comment out 'startLoop...' from the click event, it, indeed, stops the initial loop. However, if I leave that last line in, it doesn't actually stop the initial loop. Why? What happens is that both loops seem to run in parallel for a period. So, when I click my navigation, clearTimeout is called, which clears it.

    Read the article

  • Replacing innertext of XML node using PHP DOMDocument

    - by Rohan Kumar
    I want to replace innertext of a XML node my XML file named test.xml is <?xml version="1.0" encoding="utf-8"?> <ads> <loop>no</loop> <item> <description>Description 1</description> </item> <item> <description>Text in item2</description> </item> <item> <description>Let play with this XML</description> </item> </ads> I want to change the value of loop and description tag both, and it should be saved in test.xml like: <?xml version="1.0" encoding="utf-8"?> <ads> <loop>yes</loop> <item> <description>Description Changing Here</description> </item> <item> <description>Changing text in item2</description> </item> <item> <description>We will play later</description> </item> </ads> I tried code in PHP: <? $file = "test.xml"; $fp = fopen($file, "rb") or die("cannot open file"); $str = fread($fp, filesize($file)); $dom=new DOMDocument(); $dom->formatOutput = true; $dom->preserveWhiteSpace = false; $dom->loadXML($str) or die("Error"); //$dom->load("items.xml"); $root=$dom->documentElement; // This can differ (I am not sure, it can be only documentElement or documentElement->firstChild or only firstChild) $loop=$root->getElementsByTagName('loop')->item(0);//->textContent; //echo $loop; if(trim($loop->textContent)=='no') { echo 'ok'; $root->getElementsByTagName('loop')->item(0)->nodeValue ='yes'; } echo "<xmp>NEW:\n". $dom->saveXML() ."</xmp>"; ?> I tried only for loop tag.I don't know how to replace nodevalue in description tag. When I run this page it shows output like: ok NEW: <?xml version="1.0" encoding="utf-8"?> <ads> <loop>yes</loop> <item> <description>Description 1</description> </item> <item> <description>Changing text in item2</description> </item> <item> <description>Let play with this XML</description> </item> </ads> It gives the value yes in browser but don't save it in test.xml any reason?

    Read the article

  • Is this a good way to do a game loop for an iPhone game?

    - by Danny Tuppeny
    Hi all, I'm new to iPhone dev, but trying to build a 2D game. I was following a book, but the game loop it created basically said: function gameLoop update() render() sleep(1/30th second) gameLoop The reasoning was that this would run at 30fps. However, this seemed a little mental, because if my frame took 1/30th second, then it would run at 15fps (since it'll spend as much time sleeping as updating). So, I did some digging and found the CADisplayLink class which would sync calls to my gameLoop function to the refresh rate (or a fraction of it). I can't find many samples of it, so I'm posting here for a code review :-) It seems to work as expected, and it includes passing the elapsed (frame) time into the Update method so my logic can be framerate-independant (however I can't actually find in the docs what CADisplayLink would do if my frame took more than its allowed time to run - I'm hoping it just does its best to catch up, and doesn't crash!). // // GameAppDelegate.m // // Created by Danny Tuppeny on 10/03/2010. // Copyright Danny Tuppeny 2010. All rights reserved. // #import "GameAppDelegate.h" #import "GameViewController.h" #import "GameStates/gsSplash.h" @implementation GameAppDelegate @synthesize window; @synthesize viewController; - (void) applicationDidFinishLaunching:(UIApplication *)application { // Create an instance of the first GameState (Splash Screen) [self doStateChange:[gsSplash class]]; // Set up the game loop displayLink = [CADisplayLink displayLinkWithTarget:self selector:@selector(gameLoop)]; [displayLink setFrameInterval:2]; [displayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSDefaultRunLoopMode]; } - (void) gameLoop { // Calculate how long has passed since the previous frame CFTimeInterval currentFrameTime = [displayLink timestamp]; CFTimeInterval elapsed = 0; // For the first frame, we want to pass 0 (since we haven't elapsed any time), so only // calculate this in the case where we're not the first frame if (lastFrameTime != 0) { elapsed = currentFrameTime - lastFrameTime; } // Keep track of this frames time (so we can calculate this next time) lastFrameTime = currentFrameTime; NSLog([NSString stringWithFormat:@"%f", elapsed]); // Call update, passing the elapsed time in [((GameState*)viewController.view) Update:elapsed]; } - (void) doStateChange:(Class)state { // Remove the previous GameState if (viewController.view != nil) { [viewController.view removeFromSuperview]; [viewController.view release]; } // Create the new GameState viewController.view = [[state alloc] initWithFrame:CGRectMake(0, 0, IPHONE_WIDTH, IPHONE_HEIGHT) andManager:self]; // Now set as visible [window addSubview:viewController.view]; [window makeKeyAndVisible]; } - (void) dealloc { [viewController release]; [window release]; [super dealloc]; } @end Any feedback would be appreciated :-) PS. Bonus points if you can tell me why all the books use "viewController.view" but for everything else seem to use "[object name]" format. Why not [viewController view]?

    Read the article

< Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89  | Next Page >