Search Results

Search found 3641 results on 146 pages for 'threads'.

Page 41/146 | < Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >

How to read from multiple queues in real-world?

- by Leon Cullens

Here's a theoretical question: When I'm building an application using message queueing, I'm going to need multiple queues support different data types for different purposes. Let's assume I have 20 queues (e.g. one to create new users, one to process new orders, one to edit user settings, etc.). I'm going to deploy this to Windows Azure using the 'minimum' of 1 web role and 1 worker role. How does one read from all those 20 queues in a proper way? This is what I had in mind, but I have little or no real-world practical experience with this: Create a class that spawns 20 threads in the worker role 'main' class. Let each of these threads execute a method to poll a different queue, and let all those threads sleep between each poll (of course with a back-off mechanism that increases the sleep time). This leads to have 20 threads (or 21?), and 20 queues that are being actively polled, resulting in a lot of wasted messages (each time you poll an empty queue it's being billed as a message). How do you solve this problem?

Read the article
How do I configure encodings (UTF-8) for code executed by Quartz scheduled Jobs in Spring framework

- by Martin

I wonder how to configure Quartz scheduled job threads to reflect proper encoding. Code which otherwise executes fine within Springframework injection loaded webapps (java) will get encoding issues when run in threads scheduled by quartz. Is there anyone who can help me out? All source is compiled using maven2 with source and file encodings configured as UTF-8. In the quartz threads any string will have encoding errors if outside ISO 8859-1 characters: Example config <bean name="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean"> <property name="jobClass" value="example.ExampleJob" /> </bean> <bean id="jobTrigger" class="org.springframework.scheduling.quartz.SimpleTriggerBean"> <property name="jobDetail" ref="jobDetail" /> <property name="startDelay" value="1000" /> <property name="repeatCount" value="0" /> <property name="repeatInterval" value="1" /> </bean> <bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean"> <property name="triggers"> <list> <ref bean="jobTrigger"/> </list> </property> </bean> Example implementation public class ExampleJob extends QuartzJobBean { private Log log = LogFactory.getLog(ExampleJob.class); protected void executeInternal(JobExecutionContext ctx) throws JobExecutionException { log.info("ÅÄÖ"); log.info(Charset.defaultCharset()); } } Example output 2010-05-20 17:04:38,285 1342 INFO [QuartzScheduler_Worker-9] ExampleJob - vÖvÑvñ 2010-05-20 17:04:38,286 1343 INFO [QuartzScheduler_Worker-9] ExampleJob - UTF-8 The same lines of code executed within spring injected beans referenced by servlets in the web-container will output proper encoding. What is it that make Quartz threads encoding dependent?

Read the article
Design for fastest page download

- by mexxican

I have a file with millions of URLs/IPs and have to write a program to download the pages really fast. The connection rate should be at least 6000/s and file download speed at least 2000 with avg. 15kb file size. The network bandwidth is 1 Gbps. My approach so far has been: Creating 600 socket threads with each having 60 sockets and using WSAEventSelect to wait for data to read. As soon as a file download is complete, add that memory address(of the downloaded file) to a pipeline( a simple vector ) and fire another request. When the total download is more than 50Mb among all socket threads, write all the files downloaded to the disk and free the memory. So far, this approach has been not very successful with the rate at which I could hit not shooting beyond 2900 connections/s and downloaded data rate even less. Can somebody suggest an alternative approach which could give me better stats. Also I am working windows server 2008 machine with 8 Gig of memory. Also, do we need to hack the kernel so as we could use more threads and memory. Currently I can create a max. of 1500 threads and memory usage not going beyond 2 gigs [ which technically should be much more as this is a 64-bit machine ]. And IOCP is out of question as I have no experience in that so far and have to fix this application today. Thanks Guys!

Read the article
The cross-thread usage of "HttpContext.Current" property and related things

- by smwikipedia

I read from < Essential ASP.NET with Examples in C# the following statement: Another useful property to know about is the static Current property of the HttpContext class. This property always points to the current instance of the HttpContext class for the request being serviced. This can be convenient if you are writing helper classes that will be used from pages or other pipeline classes and may need to access the context for whatever reason. By using the static Current property to retrieve the context, you can avoid passing a reference to it to helper classes. For example, the class shown in Listing 4-1 uses the Current property of the context to access the QueryString and print something to the current response buffer. Note that for this static property to be correctly initialized, the caller must be executing on the original request thread, so if you have spawned additional threads to perform work during a request, you must take care to provide access to the context class yourself. I am wondering about the root cause of the bold part, and one thing leads to another, here is my thoughts: We know that a process can have multiple threads. Each of these threads have their own stacks, respectively. These threads also have access to a shared memory area, the heap. The stack then, as I understand it, is kind of where all the context for that thread is stored. For a thread to access something in the heap it must use a pointer, and the pointer is stored on its stack. So when we make some cross-thread calls, we must make sure that all the necessary context info is passed from the caller thread's stack to the callee thread's stack. But I am not quite sure if I made any mistake. Any comments will be deeply appreciated. Thanks. ADD Here the stack is limited to user stack.

Read the article
How should a multi-threaded C application handle a failed malloc()?

- by user294463

A part of an application I'm working on is a simple pthread-based server that communicates over a TCP/IP socket. I am writing it in C because it's going to be running in a memory constrained environment. My question is: what should the program do if one of the threads encounters a malloc() that returns NULL? Possibilities I've come up with so far: No special handling. Let malloc() return NULL and let it be dereferenced so that the whole thing segfaults. Exit immediately on a failed malloc(), by calling abort() or exit(-1). Assume that the environment will clean everything up. Jump out of the main event loop and attempt to pthread_join() all the threads, then shut down. The first option is obviously the easiest, but seems very wrong. The second one also seems wrong since I don't know exactly what will happen. The third option seems tempting except for two issues: first, all of the threads need not be joined back to the main thread under normal circumstances and second, in order to complete the thread execution, most of the remaining threads will have to call malloc() again anyway. What shall I do?

Read the article
SPARC T4-4 Beats 8-CPU IBM POWER7 on TPC-H @3000GB Benchmark

- by Brian

Oracle's SPARC T4-4 server delivered a world record TPC-H @3000GB benchmark result for systems with four processors. This result beats eight processor results from IBM (POWER7) and HP (x86). The SPARC T4-4 server also delivered better performance per core than these eight processor systems from IBM and HP. Comparisons below are based upon system to system comparisons, highlighting Oracle's complete software and hardware solution. This database world record result used Oracle's Sun Storage 2540-M2 arrays (rotating disk) connected to a SPARC T4-4 server running Oracle Solaris 11 and Oracle Database 11g Release 2 demonstrating the power of Oracle's integrated hardware and software solution. The SPARC T4-4 server based configuration achieved a TPC-H scale factor 3000 world record for four processor systems of 205,792 QphH@3000GB with price/performance of $4.10/QphH@3000GB. The SPARC T4-4 server with four SPARC T4 processors (total of 32 cores) is 7% faster than the IBM Power 780 server with eight POWER7 processors (total of 32 cores) on the TPC-H @3000GB benchmark. The SPARC T4-4 server is 36% better in price performance compared to the IBM Power 780 server on the TPC-H @3000GB Benchmark. The SPARC T4-4 server is 29% faster than the IBM Power 780 for data loading. The SPARC T4-4 server is up to 3.4 times faster than the IBM Power 780 server for the Refresh Function. The SPARC T4-4 server with four SPARC T4 processors is 27% faster than the HP ProLiant DL980 G7 server with eight x86 processors on the TPC-H @3000GB benchmark. The SPARC T4-4 server is 52% faster than the HP ProLiant DL980 G7 server for data loading. The SPARC T4-4 server is up to 3.2 times faster than the HP ProLiant DL980 G7 for the Refresh Function. The SPARC T4-4 server achieved a peak IO rate from the Oracle database of 17 GB/sec. This rate was independent of the storage used, as demonstrated by the TPC-H @3000TB benchmark which used twelve Sun Storage 2540-M2 arrays (rotating disk) and the TPC-H @1000TB benchmark which used four Sun Storage F5100 Flash Array devices (flash storage). [*] The SPARC T4-4 server showed linear scaling from TPC-H @1000GB to TPC-H @3000GB. This demonstrates that the SPARC T4-4 server can handle the increasingly larger databases required of DSS systems. [*] The SPARC T4-4 server benchmark results demonstrate a complete solution of building Decision Support Systems including data loading, business questions and refreshing data. Each phase usually has a time constraint and the SPARC T4-4 server shows superior performance during each phase. [*] The TPC believes that comparisons of results published with different scale factors are misleading and discourages such comparisons. Performance Landscape The table lists the leading TPC-H @3000GB results for non-clustered systems. TPC-H @3000GB, Non-Clustered Systems System Processor P/C/T – Memory Composite(QphH) $/perf($/QphH) Power(QppH) Throughput(QthH) Database Available SPARC Enterprise M9000 3.0 GHz SPARC64 VII+ 64/256/256 – 1024 GB 386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g R2 09/22/11 SPARC T4-4 3.0 GHz SPARC T4 4/32/256 – 1024 GB 205,792.0 $4.10 190,325.1 222,515.9 Oracle 11g R2 05/31/12 SPARC Enterprise M9000 2.88 GHz SPARC64 VII 32/128/256 – 512 GB 198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g R2 12/09/10 IBM Power 780 4.1 GHz POWER7 8/32/128 – 1024 GB 192,001.1 $6.37 210,368.4 175,237.4 Sybase 15.4 11/30/11 HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 8/64/128 – 512 GB 162,601.7 $2.68 185,297.7 142,685.6 SQL Server 2008 10/13/10 P/C/T = Processors, Cores, Threads QphH = the Composite Metric (bigger is better) $/QphH = the Price/Performance metric in USD (smaller is better) QppH = the Power Numerical Quantity QthH = the Throughput Numerical Quantity The following table lists data load times and refresh function times during the power run. TPC-H @3000GB, Non-Clustered Systems Database Load & Database Refresh System Processor Data Loading(h:m:s) T4Advan RF1(sec) T4Advan RF2(sec) T4Advan SPARC T4-4 3.0 GHz SPARC T4 04:08:29 1.0x 67.1 1.0x 39.5 1.0x IBM Power 780 4.1 GHz POWER7 05:51:50 1.5x 147.3 2.2x 133.2 3.4x HP ProLiant DL980 G7 2.27 GHz Intel Xeon X7560 08:35:17 2.1x 173.0 2.6x 126.3 3.2x Data Loading = database load time RF1 = power test first refresh transaction RF2 = power test second refresh transaction T4 Advan = the ratio of time to T4 time Complete benchmark results found at the TPC benchmark website http://www.tpc.org. Configuration Summary and Results Hardware Configuration: SPARC T4-4 server 4 x SPARC T4 3.0 GHz processors (total of 32 cores, 128 threads) 1024 GB memory 8 x internal SAS (8 x 300 GB) disk drives External Storage: 12 x Sun Storage 2540-M2 array storage, each with 12 x 15K RPM 300 GB drives, 2 controllers, 2 GB cache Software Configuration: Oracle Solaris 11 11/11 Oracle Database 11g Release 2 Enterprise Edition Audited Results: Database Size: 3000 GB (Scale Factor 3000) TPC-H Composite: 205,792.0 QphH@3000GB Price/performance: $4.10/QphH@3000GB Available: 05/31/2012 Total 3 year Cost: $843,656 TPC-H Power: 190,325.1 TPC-H Throughput: 222,515.9 Database Load Time: 4:08:29 Benchmark Description The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC. TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system. The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor. Key Points and Best Practices Twelve Sun Storage 2540-M2 arrays were used for the benchmark. Each Sun Storage 2540-M2 array contains 12 15K RPM drives and is connected to a single dual port 8Gb FC HBA using 2 ports. Each Sun Storage 2540-M2 array showed 1.5 GB/sec for sequential read operations and showed linear scaling, achieving 18 GB/sec with twelve Sun Storage 2540-M2 arrays. These were stand alone IO tests. The peak IO rate measured from the Oracle database was 17 GB/sec. Oracle Solaris 11 11/11 required very little system tuning. Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric in which to compare systems. The SPARC T4-4 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle Database parallel processes. Six Sun Storage 2540-M2 arrays were mirrored to another six Sun Storage 2540-M2 arrays on which all of the Oracle database files were placed. IO performance was high and balanced across all the arrays. The TPC-H Refresh Function (RF) simulates periodical refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T4-4 server outperformed both the IBM POWER7 server and HP ProLiant DL980 G7 server. (See the RF columns above.) See Also Transaction Processing Performance Council (TPC) Home Page Ideas International Benchmark Page SPARC T4-4 Server oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 Enterprise Edition oracle.com OTN Sun Storage 2540-M2 Array oracle.com OTN Disclosure Statement TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org. SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads.

Read the article
Improved Performance on PeopleSoft Combined Benchmark using SPARC T4-4

- by Brian

Oracle's SPARC T4-4 server running Oracle's PeopleSoft HCM 9.1 combined online and batch benchmark achieved a world record 18,000 concurrent users experiencing subsecond response time while executing a PeopleSoft Payroll batch job of 500,000 employees in 32.4 minutes. This result was obtained with a SPARC T4-4 server running Oracle Database 11g Release 2, a SPARC T4-4 server running PeopleSoft HCM 9.1 application server and a SPARC T4-2 server running Oracle WebLogic Server in the web tier. The SPARC T4-4 server running the application tier used Oracle Solaris Zones which provide a flexible, scalable and manageable virtualization environment. The average CPU utilization on the SPARC T4-2 server in the web tier was 17%, on the SPARC T4-4 server in the application tier it was 59%, and on the SPARC T4-4 server in the database tier was 47% (online and batch) leaving significant headroom for additional processing across the three tiers. The SPARC T4-4 server used for the database tier hosted Oracle Database 11g Release 2 using Oracle Automatic Storage Management (ASM) for database files management with I/O performance equivalent to raw devices. Performance Landscape Results are presented for the PeopleSoft HRMS Self-Service and Payroll combined benchmark. The new result with 128 streams shows significant improvement in the payroll batch processing time with little impact on the self-service component response time. PeopleSoft HRMS Self-Service and Payroll Benchmark Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-2 (web) SPARC T4-4 (app) SPARC T4-4 (db) 18,000 0.988 0.539 32.4 128 SPARC T4-2 (web) SPARC T4-4 (app) SPARC T4-4 (db) 18,000 0.944 0.503 43.3 64 The following results are for the PeopleSoft HRMS Self-Service benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the payroll component. PeopleSoft HRMS Self-Service 9.1 Benchmark Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-2 (web) SPARC T4-4 (app) 2x SPARC T4-2 (db) 18,000 1.048 0.742 N/A N/A The following results are for the PeopleSoft Payroll benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the self-service component. PeopleSoft Payroll (N.A.) 9.1 - 500K Employees (7 Million SQL PayCalc, Unicode) Systems Users Ave Response Search (sec) Ave Response Save (sec) Batch Time (min) Streams SPARC T4-4 (db) N/A N/A N/A 30.84 96 Configuration Summary Application Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 512 GB memory Oracle Solaris 11 11/11 PeopleTools 8.52 PeopleSoft HCM 9.1 Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031 Java Platform, Standard Edition Development Kit 6 Update 32 Database Configuration: 1 x SPARC T4-4 server with 4 x SPARC T4 processors, 3.0 GHz 256 GB memory Oracle Solaris 11 11/11 Oracle Database 11g Release 2 PeopleTools 8.52 Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031 Micro Focus Server Express (COBOL v 5.1.00) Web Tier Configuration: 1 x SPARC T4-2 server with 2 x SPARC T4 processors, 2.85 GHz 256 GB memory Oracle Solaris 11 11/11 PeopleTools 8.52 Oracle WebLogic Server 10.3.4 Java Platform, Standard Edition Development Kit 6 Update 32 Storage Configuration: 1 x Sun Server X2-4 as a COMSTAR head for data 4 x Intel Xeon X7550, 2.0 GHz 128 GB memory 1 x Sun Storage F5100 Flash Array (80 flash modules) 1 x Sun Storage F5100 Flash Array (40 flash modules) 1 x Sun Fire X4275 as a COMSTAR head for redo logs 12 x 2 TB SAS disks with Niwot Raid controller Benchmark Description This benchmark combines PeopleSoft HCM 9.1 HR Self Service online and PeopleSoft Payroll batch workloads to run on a unified database deployed on Oracle Database 11g Release 2. The PeopleSoft HRSS benchmark kit is a Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark where DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published. PeopleSoft HR SS defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consist of 14 scenarios which emulate users performing typical HCM transactions such as viewing paycheck, promoting and hiring employees, updating employee profile and other typical HCM application transactions. All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. This benchmark metric is the weighted average response search/save time for all the transactions. The PeopleSoft 9.1 Payroll (North America) benchmark demonstrates system performance for a range of processing volumes in a specific configuration. This workload represents large batch runs typical of a ERP environment during a mass update. The benchmark measures five application business process run times for a database representing large organization. They are Paysheet Creation, Payroll Calculation, Payroll Confirmation, Print Advice forms, and Create Direct Deposit File. The benchmark metric is the cumulative elapsed time taken to complete the Paysheet Creation, Payroll Calculation and Payroll Confirmation business application processes. The benchmark metrics are taken for each respective benchmark while running simultaneously on the same database back-end. Specifically, the payroll batch processes are started when the online workload reaches steady state (the maximum number of online users) and overlap with online transactions for the duration of the steady state. Key Points and Best Practices Two PeopleSoft Domain sets with 200 application servers each on a SPARC T4-4 server were hosted in 2 separate Oracle Solaris Zones to demonstrate consolidation of multiple application servers, ease of administration and performance tuning. Each Oracle Solaris Zone was bound to a separate processor set, each containing 15 cores (total 120 threads). The default set (1 core from first and third processor socket, total 16 threads) was used for network and disk interrupt handling. This was done to improve performance by reducing memory access latency by using the physical memory closest to the processors and offload I/O interrupt handling to default set threads, freeing up cpu resources for Application Servers threads and balancing application workload across 240 threads. A total of 128 PeopleSoft streams server processes where used on the database node to complete payroll batch job of 500,000 employees in 32.4 minutes. See Also Oracle PeopleSoft Benchmark White Papers oracle.com SPARC T4-2 Server oracle.com OTN SPARC T4-4 Server oracle.com OTN PeopleSoft Enterprise Human Capital Managementoracle.com OTN PeopleSoft Enterprise Human Capital Management (Payroll) oracle.com OTN Oracle Solaris oracle.com OTN Oracle Database 11g Release 2 oracle.com OTN Disclosure Statement Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 8 November 2012.

Read the article
NUMA-aware placement of communication variables

- by Dave

For classic NUMA-aware programming I'm typically most concerned about simple cold, capacity and compulsory misses and whether we can satisfy the miss by locally connected memory or whether we have to pull the line from its home node over the coherent interconnect -- we'd like to minimize channel contention and conserve interconnect bandwidth. That is, for this style of programming we're quite aware of where memory is homed relative to the threads that will be accessing it. Ideally, a page is collocated on the node with the thread that's expected to most frequently access the page, as simple misses on the page can be satisfied without resorting to transferring the line over the interconnect. The default "first touch" NUMA page placement policy tends to work reasonable well in this regard. When a virtual page is first accessed, the operating system will attempt to provision and map that virtual page to a physical page allocated from the node where the accessing thread is running. It's worth noting that the node-level memory interleaving granularity is usually a multiple of the page size, so we can say that a given page P resides on some node N. That is, the memory underlying a page resides on just one node. But when thinking about accesses to heavily-written communication variables we normally consider what caches the lines underlying such variables might be resident in, and in what states. We want to minimize coherence misses and cache probe activity and interconnect traffic in general. I don't usually give much thought to the location of the home NUMA node underlying such highly shared variables. On a SPARC T5440, for instance, which consists of 4 T2+ processors connected by a central coherence hub, the home node and placement of heavily accessed communication variables has very little impact on performance. The variables are frequently accessed so likely in M-state in some cache, and the location of the home node is of little consequence because a requester can use cache-to-cache transfers to get the line. Or at least that's what I thought. Recently, though, I was exploring a simple shared memory point-to-point communication model where a client writes a request into a request mailbox and then busy-waits on a response variable. It's a simple example of delegation based on message passing. The server polls the request mailbox, and having fetched a new request value, performs some operation and then writes a reply value into the response variable. As noted above, on a T5440 performance is insensitive to the placement of the communication variables -- the request and response mailbox words. But on a Sun/Oracle X4800 I noticed that was not the case and that NUMA placement of the communication variables was actually quite important. For background an X4800 system consists of 8 Intel X7560 Xeons . Each package (socket) has 8 cores with 2 contexts per core, so the system is 8x8x2. Each package is also a NUMA node and has locally attached memory. Every package has 3 point-to-point QPI links for cache coherence, and the system is configured with a twisted ladder "mobius" topology. The cache coherence fabric is glueless -- there's not central arbiter or coherence hub. The maximum distance between any two nodes is just 2 hops over the QPI links. For any given node, 3 other nodes are 1 hop distant and the remaining 4 nodes are 2 hops distant. Using a single request (client) thread and a single response (server) thread, a benchmark harness explored all permutations of NUMA placement for the two threads and the two communication variables, measuring the average round-trip-time and throughput rate between the client and server. In this benchmark the server simply acts as a simple transponder, writing the request value plus 1 back into the reply field, so there's no particular computation phase and we're only measuring communication overheads. In addition to varying the placement of communication variables over pairs of nodes, we also explored variations where both variables were placed on one page (and thus on one node) -- either on the same cache line or different cache lines -- while varying the node where the variables reside along with the placement of the threads. The key observation was that if the client and server threads were on different nodes, then the best placement of variables was to have the request variable (written by the client and read by the server) reside on the same node as the client thread, and to place the response variable (written by the server and read by the client) on the same node as the server. That is, if you have a variable that's to be written by one thread and read by another, it should be homed with the writer thread. For our simple client-server model that means using split request and response communication variables with unidirectional message flow on a given page. This can yield up to twice the throughput of less favorable placement strategies. Our X4800 uses the QPI 1.0 protocol with source-based snooping. Briefly, when node A needs to probe a cache line it fires off snoop requests to all the nodes in the system. Those recipients then forward their response not to the original requester, but to the home node H of the cache line. H waits for and collects the responses, adjudicates and resolves conflicts and ensures memory-model ordering, and then sends a definitive reply back to the original requester A. If some node B needed to transfer the line to A, it will do so by cache-to-cache transfer and let H know about the disposition of the cache line. A needs to wait for the authoritative response from H. So if a thread on node A wants to write a value to be read by a thread on node B, the latency is dependent on the distances between A, B, and H. We observe the best performance when the written-to variable is co-homed with the writer A. That is, we want H and A to be the same node, as the writer doesn't need the home to respond over the QPI link, as the writer and the home reside on the very same node. With architecturally informed placement of communication variables we eliminate at least one QPI hop from the critical path. Newer Intel processors use the QPI 1.1 coherence protocol with home-based snooping. As noted above, under source-snooping a requester broadcasts snoop requests to all nodes. Those nodes send their response to the home node of the location, which provides memory ordering, reconciles conflicts, etc., and then posts a definitive reply to the requester. In home-based snooping the snoop probe goes directly to the home node and are not broadcast. The home node can consult snoop filters -- if present -- and send out requests to retrieve the line if necessary. The 3rd party owner of the line, if any, can respond either to the home or the original requester (or even to both) according to the protocol policies. There are myriad variations that have been implemented, and unfortunately vendor terminology doesn't always agree between vendors or with the academic taxonomy papers. The key is that home-snooping enables the use of a snoop filter to reduce interconnect traffic. And while home-snooping might have a longer critical path (latency) than source-based snooping, it also may require fewer messages and less overall bandwidth. It'll be interesting to reprise these experiments on a platform with home-based snooping. While collecting data I also noticed that there are placement concerns even in the seemingly trivial case when both threads and both variables reside on a single node. Internally, the cores on each X7560 package are connected by an internal ring. (Actually there are multiple contra-rotating rings). And the last-level on-chip cache (LLC) is partitioned in banks or slices, which with each slice being associated with a core on the ring topology. A hardware hash function associates each physical address with a specific home bank. Thus we face distance and topology concerns even for intra-package communications, although the latencies are not nearly the magnitude we see inter-package. I've not seen such communication distance artifacts on the T2+, where the cache banks are connected to the cores via a high-speed crossbar instead of a ring -- communication latencies seem more regular.

Read the article
How to search and delete individual messages in Gmail

- by xiechao

How to search and delete individual messages (NOT threads) in Gmail? When I use simple steps like: search select all delete then, all messages in the threads are deleted, but I want to delete the individual messages matching my search criteria only.

Read the article
Outlook 2003: How to display my own messages in conversation view?

- by Godsmith

When I select View-Arrange By-Conversation in Outlook 2003, the messages I sent myself are not shown in the message threads (unlike the conversation view in say, Gmail). To show my own messages I have to go to the Sent Items folder, if not someone has replied to one of my messages and included my original text. Is there a way to make my own messages visible in the conversation threads? Thank you! /Filip

Read the article
"Can't Connect to Server" from 2nd virtual host on VPS

- by chaoskreator

I'm using Debian 7 Wheezy and Apache 2.2.22, and I'm setting up Virtual Hosts for a number of websites on my VPS. I've successfully configured the VirtualHost directives for one of the sites, but the second one continually gives "Problem Loading Page" in Firefox. I've run configtest and it has verified all my syntax is correct, and I've checked all the permissions. Everything on the 2nd domain is pretty much copy/pasted from the first, so I'm not sure what the issue is, as there are no entries into /var/log/apache2/error.log other than where I have reloaded the configurations: /# cat /var/log/apache2/error.log [Thu May 29 01:19:00 2014] [notice] Graceful restart requested, doing restart [Thu May 29 01:19:00 2014] [info] Init: Seeding PRNG with 656 bytes of entropy [Thu May 29 01:19:00 2014] [info] Init: Generating temporary RSA private keys (512/1024 bits) [Thu May 29 01:19:00 2014] [info] Init: Generating temporary DH parameters (512/1024 bits) [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(253): shmcb_init allocated 512000 bytes of shared memory [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(272): for 511920 bytes (512000 including header), recommending 32 subcaches, 133 indexes each [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(306): shmcb_init_memory choices follow [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(308): subcache_num = 32 [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(310): subcache_size = 15992 [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(312): subcache_data_offset = 3208 [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(314): subcache_data_size = 12784 [Thu May 29 01:19:00 2014] [debug] ssl_scache_shmcb.c(316): index_num = 133 [Thu May 29 01:19:00 2014] [info] Shared memory session cache initialised [Thu May 29 01:19:00 2014] [info] Init: Initializing (virtual) servers for SSL [Thu May 29 01:19:00 2014] [info] mod_ssl/2.2.22 compiled against Server: Apache/2.2.22, Library: OpenSSL/1.0.1e [Thu May 29 01:19:00 2014] [notice] Apache/2.2.22 (Debian) PHP/5.4.4-14+deb7u9 mod_ssl/2.2.22 OpenSSL/1.0.1e mod_perl/2.0.7 Perl/v5.14.2 configured -- resuming normal operations [Thu May 29 01:19:00 2014] [info] Server built: Mar 4 2013 22:05:16 [Thu May 29 01:19:00 2014] [debug] prefork.c(1023): AcceptMutex: sysvsem (default: sysvsem) I've ensured to enable each vhost with a2ensite {sitename.conf} with no errors there, either. Below are the contents of the configuration files... /etc/apache2/apache2.conf # Global configuration # LockFile ${APACHE_LOCK_DIR}/accept.lock PidFile ${APACHE_PID_FILE} Timeout 300 KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 5 ## ## Server-Pool Size Regulation (MPM specific) ## # prefork MPM # StartServers: number of server processes to start # MinSpareServers: minimum number of server processes which are kept spare # MaxSpareServers: maximum number of server processes which are kept spare # MaxClients: maximum number of server processes allowed to start # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_prefork_module> StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 </IfModule> # worker MPM # StartServers: initial number of server processes to start # MinSpareThreads: minimum number of worker threads which are kept spare # MaxSpareThreads: maximum number of worker threads which are kept spare # ThreadLimit: ThreadsPerChild can be changed to this maximum value during a # graceful restart. ThreadLimit can only be changed by stopping # and starting Apache. # ThreadsPerChild: constant number of worker threads in each server process # MaxClients: maximum number of simultaneous client connections # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_worker_module> StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxClients 150 MaxRequestsPerChild 0 </IfModule> # event MPM # StartServers: initial number of server processes to start # MinSpareThreads: minimum number of worker threads which are kept spare # MaxSpareThreads: maximum number of worker threads which are kept spare # ThreadsPerChild: constant number of worker threads in each server process # MaxClients: maximum number of simultaneous client connections # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_event_module> StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxClients 150 MaxRequestsPerChild 0 </IfModule> # These need to be set in /etc/apache2/envvars User ${APACHE_RUN_USER} Group ${APACHE_RUN_GROUP} # # AccessFileName: The name of the file to look for in each directory # for additional configuration directives. See also the AllowOverride # directive. # AccessFileName .htaccess # # The following lines prevent .htaccess and .htpasswd files from being # viewed by Web clients. # <Files ~ "^\.ht"> Order allow,deny Deny from all Satisfy all </Files> DefaultType None HostnameLookups Off ErrorLog ${APACHE_LOG_DIR}/error.log LogLevel debug # Include module configuration: Include mods-enabled/*.load Include mods-enabled/*.conf # Include list of ports to listen on and which to use for name based vhosts Include ports.conf # # The following directives define some format nicknames for use with # a CustomLog directive (see below). # If you are behind a reverse proxy, you might want to change %h into %{X-Forwarded-For}i # # LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%h %l %u %t \"%r\" %>s %O" common LogFormat "%{Referer}i -> %U" referer LogFormat "%{User-agent}i" agent <Directory "/var/www"> Order allow,deny Allow from all Require all granted </Directory> # Include generic snippets of statements Include conf.d/ # Include the virtual host configurations: Include sites-enabled/*.conf NameVirtualHost *:80 /etc/apache2/sites-available/site1.net.conf <VirtualHost *:80> ServerName site1.net ServerAlias site1.net *.site1.net DocumentRoot "/var/www/site1" ErrorLog "/var/www/site1/logs/error.log" CustomLog "/var/www/site1/logs/access.log" vhost_combined <Directory "/var/www/site1"> Options None AllowOverride All Order allow,deny Allow from all Satisfy Any </Directory> </VirtualHost> /etc/apache2/sites-available/site2.com.conf <VirtualHost *:80> ServerName site2.com ServerAlias site2.com *.site2.com DocumentRoot "/var/www/site2" ErrorLog "/var/www/site2/logs/error.log" CustomLog "/var/www/site2/logs/access.log" vhost_combined <Directory "/var/www/site2"> Options None AllowOverride All Order allow,deny Allow from all Satisfy Any </Directory> </VirtualHost> I've also tried setting NameVirtualHost like: Listen 80 NameVirtualHost 23.88.121.82:80 NameVirtualHost 127.0.0.1:80 and the VirtualHost Directives: <VirtualHost 23.88.121.82:80> ... </VirtualHost> for both sites, but that causes the first site to fail, as well. I'm wondering if I need to set up individual IPs for each site, possibly? I have 2 more IPv4 and 3 IPv6 addresses available, if that would make a difference. Also, in the grand scheme of things, I will need to enable SSL for the first site. I've been reading that I'll need to basically just mimic the directives for listening on port 80, only on port 443, and make sure mod_ssl is enabled? EDIT: I just ran apache2 -t to test the config files that way, and got the error: apache2: bad user name ${APACHE_RUN_USER}. However, apachectl configtest returns Syntax OK. There are no other mentions of errors with the mutex anywhere else, however. I was pretty sure if there was an error with the user apache was supposed to run under, the server wouldn't start at all... EDIT 2: Restarting apache fixed the bad user name error.

Read the article
How to optimally configure memcache running on 16 cores 144G ram server?

- by Ivko Maksimovic

Memcache is the only important app running on the server Server has 16 cores and 144G RAM Memcache is given 135G Memcache runs at 32 threads Gigabit network, test shows at least 300Mbit/s availability on network port 600 connections 3000 requests per second Say that memcache (memory) usage is at 50% - it's definitely not full As we increase number of requests towards server, requests slow down (from 8ms to 100ms per request) but server load remains 0.00. We suspect this can be solved by adjusting configuration but we don't understand many of the configuration parameters (besides, maybe, the number of threads). Any ideas?

Read the article
Difference between performSelectorInBackground and NSOperation Subclass

- by AmitSri

I have created one testing app for running deep counter loop. I run the loop fuction in background thread using performSelectorInBackground and also NSOperation subclass separately. I am also using performSelectorOnMainThread to notify main thread within backgroundthread method and [NSNotificationCenter defaultCenter] postNotificationName within NSOperation subclass to notify main thread for updating UI. Initially both the implementation giving me same result and i am able to update UI without having any problem. The only difference i found is the Thread count between two implementations. The performSelectorInBackground implementation created one thread and got terminated after loop finished and my app thread count again goes to 1. The NSOperation subclass implementation created two new threads and keep exists in the application and i can see 3 threads after loop got finished in main() function. So, my question is why two threads created by NSOperation and why it didn't get terminated just like the first background thread implementation? I am little bit confuse and unable to decide which implementation is best in-terms of performance and memory management. Thanks

Read the article
are posix pipes lightweight?

- by Nils Pipenbrinck

In a linux application I'm using pipes to pass information between threads. The idea behind using pipes is that I can wait for multiple pipes at once using poll(2). That works well in practice, and my threads are sleeping most of the time and only wake up if there is something to do for them. On the user-space the pipes look just like two file-handles. Now I wonder wonder how much resources such a pipes use on the OS side. Btw: In my application I only send single bytes every now and then. Think about my pipes as simple message queues that allow me to wake-up receiving threads, tell them to send some status-data or to terminate.

Read the article
Using SqlBulkCopy in a multithread scenario with ThreadPool issue

- by Ruben F.

Hi. I'm facing a dilemma (!). In a first scenario, i implemented a solution that replicates data from one data base to another using SQLBulkCopy synchronously and i had no problem at all. Now, using ThreadPool, i implemented the same in a assynchronously scenario, a thread per table, and all works fine, but past some time (usualy 1hour because the operations of copy takes about the same time), the operations sended to the ThreadPool stop being executed. There are one diferent SQLBulkCopy using one diferent SQLConnection per thread. I already see the number of free threads, and they are all free at the begining of the invocation...I have one AutoResetEvent to wait that the threads finish their job before launching again, and a Semaphore FIFO that hold the counter of active threads. There are some issue that I have forgotten or that I should avaliate when using SqlBulkCopy? I apreciate some help, because my ideas are over;) Thanks

Read the article
Using ftplib for multithread uploads

- by Arty

I'm trying to do multithread uploads, but get errors. I guessed that maybe it's impossible to use multithreads with ftplib? Here comes my code: class myThread (threading.Thread): def __init__(self, threadID, src, counter, image_name): self.threadID = threadID self.src = src self.counter = counter self.image_name = image_name threading.Thread.__init__(self) def run(self): uploadFile(self.src, self.image_name) def uploadFile(src, image_name): f = open(src, "rb") ftp.storbinary('STOR ' + image_name, f) f.close() ftp = FTP('host') # connect to host, default port ftp.login() # user anonymous, passwd anonymous@ dirname = "/home/folder/" i = 1 threads = [] for image in os.listdir(dirname): if os.path.isfile(dirname + image): thread = myThread(i , dirname + image, i, image ) thread.start() threads.append( thread ) i += 1 for t in threads: t.join() Get bunch of ftplib errors like raise error_reply, resp error_reply: 200 Type set to I If I try to upload one by one, everything works fine

Read the article
deciding between subprocess, multiprocesser and thread in Python?

- by user248237

I'd like to parallelize my Python program so that it can make use of multiple processors on the machine that it runs on. My parallelization is very simple, in that all the parallel "threads" of the program are independent and write their output to separate files. I don't need the threads to exchange information but it is imperative that I know when the threads finish since some steps of my pipeline depend on their output. Portability is important, in that I'd like this to run on any Python version on Mac, Linux and Windows. Given these constraints, which is the most appropriate Python module for implementing this? I am tryign to decide between thread, subprocess and multiprocessing, which all seem to provide related functionality. Any thoughts on this? I'd like the simplest solution that's portable. Thanks.

Read the article
JavaScript multithreading

- by Krzysztof Hasinski

I'm working on comparison for several different methods of implementing (real or fake) multithreading in JavaScript. As far as I know only webworkers and Google Gears WorkerPool can give you real threads (ie. spread across multiple processors with real parallel execution). I've found the following methods: switch between tasks using yield() use setInterval() (or other non-blocking function) with threads waiting one for another use Google Gears WorkerPool threads (with plugin) use html5 web workers I read related questions and found several variations of the above methods, but most of those questions are old, so there might be a few new ideas. I'm wondering - how else can you achieve multithreading in JavaScript? Any other important methods? UPDATE: As pointed out in comments what I really meant was concurrency. UPDATE 2: I found information that Silverlight + JScript supports multithreading, but I'm unable to verify this. UPDATE 3: Google deprecated Gears: http://code.google.com/apis/gears/api_workerpool.html

Read the article
C builder RAD 2010 RTL/VCL Application->Terminate() Function NOT TERMINATING THE APPLICATION

- by ergey

Hello, I have problem descriebed also here: http://www.delphigroups.info/3/9/106748.html I have tryed almost all forms of placing Application-Terminate() func everywhere in the code, following and not 'return 0', 'ExitProcess(0)', 'ExitThread(0)', exit(0). None working variant closes the app. Instead the code after Application-Terminate() statement is running. I have two or more threads in the app. I tryed calling terminate func in created after execution threads and in main thread. Also this is not related (as i can imagine) with CodeGuard / madExcept (i have turned it off and on, no effect). CodeGuard turning also did not do success. What i should do to terminate all the threads in c builder 2010 application and then terminate the process? Thank you.

Read the article
Can I partition the C# System.Threading.ThreadPool?

- by Drew Shafer

I love ThreadPool. It makes my life better. However, my love may have quietly turned into an abusive relationship that I need to escape from, so I need some advice from my SO brothers (and presumably sisters, although I haven't seen any actual evidence of that yet). My basic problem is that I have several different libraries that are all using the threadpool in an uncoordinated way, and running out of threads is a possibility. I was hoping there was some way I could partition the ThreadPool up so I could give a certain class 1 thread, another 20 threads, another 5 threads, and so on. I know I could write my own ThreadPool implementation. I don't want to do that, because I'm lazy. So, is there a simple solution already out there? Currently I'm constrained to using the 3.5 CLR. I know a lot of this stuff becomes easier in 4.0.

Read the article
Image processing in a multhithreaded mode using Java

- by jadaaih

Hi Folks, I am supposed to process images in a multithreaded mode using Java. I may having varying number of images where as my number of threads are fixed. I have to process all the images using the fixed set of threads. I am just stuck up on how to do it, I had a look ThreadExecutor and BlockingQueues etc...I am still not clear. What I am doing is, - Get the images and add them in a LinkedBlockingQueue which has runnable code of the image processor. - Create a threadpoolexecutor for which one of the arguements is the LinkedBlockingQueue earlier. - Iterate through a for loop till the queue size and do a threadpoolexecutor.execute(linkedblockingqueue.poll). - all i see is it processes only 100 images which is the minimum thread size passed in LinkedBlockingQueue size. I see I am seriously wrong in my understanding somewhere, how do I process all the images in sets of 100(threads) until they are all done? Any examples or psuedocodes would be highly helpful Thanks! J

Read the article
Why Eventmachine Defer slower than Ruby Thread?

- by allenwei

I have two scripts which using mechanize to fetch google index page. I assuming Eventmachine will faster than ruby thread, but not. Eventmachine code cost "0.24s user 0.08s system 2% cpu 12.682 total" Ruby Thread code cost "0.22s user 0.08s system 5% cpu 5.167 total " Am I use eventmachine in wrong way? Who can explain it to me, thanks! 1 Eventmachine require 'rubygems' require 'mechanize' require 'eventmachine' trap("INT") {EM.stop} EM.run do num = 0 operation = proc { agent = Mechanize.new sleep 1 agent.get("http://google.com").body.to_s.size } callback = proc { |result| sleep 1 puts result num+=1 EM.stop if num == 9 } 10.times do EventMachine.defer operation, callback end end 2 Ruby Thread require 'rubygems' require 'mechanize' threads = [] 10.times do threads << Thread.new do agent = Mechanize.new sleep 1 puts agent.get("http://google.com").body.to_s.size sleep 1 end end threads.each do |aThread| aThread.join end

Read the article
[Java] - Problem having my main thread sleeping

- by Chris

I'm in a Java class and our assignment is to let us explore threads in Java. So far so good except for this one this one problem. And I believe that could be because of my lack of understanding how Java threads work at the moment. I have the main thread of execution which spawns new threads. In the main thread of execution in main() I am calling Thread.sleep(). When I do I get an Unhandled exception type InterruptedException. I am unsure of why I am getting this? I thought this was because I needed a reference to the main thread so I went ahead and made a reference to it via Thread.currentThread(). Is this not the way to have the thread sleep? What I need to do is have the main thread wait/sleep/delay till it does it required work again. Any help would be much appreciated.

Read the article
BackgroundWorker and WebBrowser Control

- by James Jeffery

Is it possible/recommended to use background worker threads with the web browser control? I am creating a bot that searches google for keywords, then checks for sites in the first 10 pages to see if a site is ranked. The user can provide a maximum of 20 sites to check, and can use proxies. So ideally I'd like to have 5 threads working at once. Is it possible? I might have heard somewhere that there are problems with WebBrowser control and threads.

Read the article
[C#] Async threaded tcp server

- by mark_dj

I want to create a high performance server in C# which could take about ~10k clients. Now i started writing a TcpServer with C# and for each client-connection i open a new thread. I also use one thread to accept the connections. So far so good, works fine. The server has to deserialize AMF incoming objects do some logic ( like saving the position of a player ) and send some object back ( serializing objects ). I am not worried about the serializing/deserializing part atm. My main concern is that I will have a lot of threads with 10k clients and i've read somewhere that an OS can only hold like a few hunderd threads. Are there any sources/articles available on writing a decent async threaded server ? Are there other possibilties or will 10k threads work fine ? I've looked on google, but i couldn't find much info about design patterns or ways which explain it clearly

Read the article

< Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >