Search Results

Search found 9311 results on 373 pages for 'loop counter'.

Page 199/373 | < Previous Page | 195 196 197 198 199 200 201 202 203 204 205 206  | Next Page >

  • Performance issues when using SSD for a developer notebook (WAMP/LAMP stack)?

    - by András Szepesházi
    I'm a web application developer using my notebook as a standalone development environment (WAMP stack). I just switched from a Core2-duo Vista 32 bit notebook with 2Gb RAM and SATA HDD, to an i5-2520M Win7 64 bit with 4Gb RAM and 128 GB SDD (Corsair P3 128). My initial experience was what I expected, fast boot, quick load of all the applications (Eclipse takes now 5 seconds as opposed to 30s on my old notebook), overall great experience. Then I started to build up my development stack, both as LAMP (using VirtualBox with a debian guest) and WAMP (windows native apache + mysql + php). I wanted to compare those two. This still all worked great out, then I started to pull in my projects to these stacks. And here came the nasty surprise, one of those projects produced a lot worse response times than on my old notebook (that was true for both the VirtualBox and WAMP stack). Apache, php and mysql configurations were practically identical in all environments. I started to do a lot of benchmarking and profiling, and here is what I've found: All general benchmarks (Performance Test 7.0, HDTune Pro, wPrime2 and some more) gave a big advantage to the new notebook. Nothing surprising here. Disc specific tests showed that read/write operations peaked around 380M/160M for the SSD, and all the different sized block operations also performed very well. Started apache performance benchmarking with Apache Benchmark for a small static html file (10 concurrent threads, 500 iterations). Old notebook: min 47ms, median 111ms, max 156ms New WAMP stack: min 71ms, median 135ms, max 296ms New LAMP stack (in VirtualBox): min 6ms, median 46ms, max 175ms Right here I don't get why the native WAMP stack performed so bad, but at least the LAMP environment brought the expected speed. Apache performance measurement for non-cached php content. The php runs a loop of 1000 and generates sha1(uniqid()) inisde. Again, 10 concurrent threads, 500 iterations were used for the benchmark. Old notebook: min 0ms, median 39ms, max 218ms New WAMP stack: min 20ms, median 61ms, max 186ms New LAMP stack (in VirtualBox): min 124ms, median 704ms, max 2463ms What the hell? The new LAMP performed miserably, and even the new native WAMP was outperformed by the old notebook. php + mysql test. The test consists of connecting to a database and reading a single record form a table using INNER JOIN on 3 more (indexed) tables, repeated 100 times within a loop. Databases were identical. 10 concurrent threads, 100 iterations were used for the benchmark. Old notebook: min 1201ms, median 1734ms, max 3728ms New WAMP stack: min 367ms, median 675ms, max 1893ms New LAMP stack (in VirtualBox): min 1410ms, median 3659ms, max 5045ms And the same test with concurrency set to 1 (instead of 10): Old notebook: min 1201ms, median 1261ms, max 1357ms New WAMP stack: min 399ms, median 483ms, max 539ms New LAMP stack (in VirtualBox): min 285ms, median 348ms, max 444ms Strictly for my purposes, as I'm using a self contained development environment (= low concurrency) I could be satisfied with the second test's result. Though I have no idea why the VirtualBox environment performed so bad with higher concurrency. Finally I performed a test of including many php files. The application that I mentioned at the beginning, the one that was performing so bad, has a heavy bootstrap, loads hundreds of small library and configuration files while initializing. So this test does nothing else just includes about 100 files. Concurrency set to 1, 100 iterations: Old notebook: min 140ms, median 168ms, max 406ms New WAMP stack: min 434ms, median 488ms, max 604ms New LAMP stack (in VirtualBox): min 413ms, median 1040ms, max 1921ms Even if I consider that VirtualBox reached those files via shared folders, and that slows things down a bit, I still don't see how could the old notebook outperform so heavily both new configurations. And I think this is the real root of the slow performance, as the application uses even more includes, and the whole bootstrap will occur several times within a page request (for each ajax call, for example). To sum it up, here I am with a brand new high-performance notebook that loads the same page in 20 seconds, that my old notebook can do in 5-7 seconds. Needless to say, I'm not a very happy person right now. Why do you think I experience these poor performance values? What are my options to remedy this situation?

    Read the article

  • php-fpm + persistent sockets = 502 bad gateway

    - by leeoniya
    Put on your reading glasses - this will be a long-ish one. First, what I'm doing. I'm building a web-app interface for some particularly slow tcp devices. Opening a socket to them takes 200ms and an fwrite/fread cycle takes another 300ms. To reduce the need for both of these actions on each request, I'm opening a persistent tcp socket which reduces the response time by the aforementioned 200ms. I was hoping PHP-FPM would share the persistent connections between requests from different clients (and indeed it does!), but there are some issues which I havent been able to resolve after 2 days of interneting, reading logs and modifying settings. I have somewhat narrowed it down though. Setup: Ubuntu 13.04 x64 Server (fully updated) on Linode PHP 5.5.0-6~raring+1 (fpm-fcgi) nginx/1.5.2 Relevent config: nginx worker_processes 4; php-fpm/pool.d pm = dynamic pm.max_children = 2 pm.start_servers = 2 pm.min_spare_servers = 2 Let's go from coarse to fine detail of what happens. After a fresh start I have 4x nginx processes and 2x php5-fpm processes waiting to handle requests. Then I send requests every couple seconds to the script. The first take a while to open the socket connection and returns with the data in about 500ms, the second returns data in 300ms (yay it's re-using the socket), the third also succeeds in about 300ms, the fourth request = 502 Bad Gateway, same with the 5th. Sixth request once again returns data, except now it took 500ms again. The process repeats for several cycles after which every 4 requests result in 2x 502 Bad Gateways and 2x 500ms Data responses. If I double all the fpm pool values and have 4x php-fpm processes running, the cycles settles in with 4x successful 500ms responses followed by 4x Bad Gateway errors. If I don't use persistent sockets, this issue goes away but then every request is 500ms. What I suspect is happening is the persistent socket keeps each php-fpm process from idling and ties it up, so the next one gets chosen until none are left and as they error out, maybe they are restarted and become available on the next round-robin loop ut the socket dies with the process. I haven't yet checked the 'slowlog', but the nginx error log shows lots of this: *188 recv() failed (104: Connection reset by peer) while reading response header from upstream, client:... All the suggestions on the internet regarding fixing nginx/php-fpm/502 bad gateway relate to high load or fcgi_pass misconfiguration. This is not the case here. Increasing buffers/sizes, changing timeouts, switching from unix socket to tcp socket for fcgi_pass, upping connection limits on the system....none of this stuff applies here. I've had some other success with setting pm = ondemand rather than dynamic, but as soon as the initial fpm-process gets killed off after idling, the persistent socket is gone for all subsequent php-fpm spawns. For the php script, I'm using stream_socket_client() with a STREAM_CLIENT_PERSISTENT flag. A while/stream_select() loop to detect socket data and fread($sock, 4096) to grab the data. I don't call fclose() obviously. If anyone has some additional questions or advice on how to get a persistent socket without tying up the php-fpm processes beyond the request completion, or maybe some other things to try, I'd appreciate it. some useful links: Nginx + php-fpm - recv() error Nginx + php-fpm "504 Gateway Time-out" error with almost zero load (on a test-server) Nginx + PHP-FPM "error 104 Connection reset by peer" causes occasional duplicate posts http://www.linuxquestions.org/questions/programming-9/php-pfsockopen-552084/ http://stackoverflow.com/questions/14268018/concurrent-use-of-a-persistent-php-socket http://devzone.zend.com/303/extension-writing-part-i-introduction-to-php-and-zend/#Heading3 http://stackoverflow.com/questions/242316/how-to-keep-a-php-stream-socket-alive http://php.net/manual/en/install.fpm.configuration.php https://www.google.com/search?q=recv%28%29+failed+%28104:+Connection+reset+by+peer%29+while+reading+response+header+from+upstream+%22502%22&ei=mC1XUrm7F4WQyAHbv4H4AQ&start=10&sa=N&biw=1920&bih=953&dpr=1

    Read the article

  • Parallel processing slower than sequential?

    - by zebediah49
    EDIT: For anyone who stumbles upon this in the future: Imagemagick uses a MP library. It's faster to use available cores if they're around, but if you have parallel jobs, it's unhelpful. Do one of the following: do your jobs serially (with Imagemagick in parallel mode) set MAGICK_THREAD_LIMIT=1 for your invocation of the imagemagick binary in question. By making Imagemagick use only one thread, it slows down by 20-30% in my test cases, but meant I could run one job per core without issues, for a significant net increase in performance. Original question: While converting some images using ImageMagick, I noticed a somewhat strange effect. Using xargs was significantly slower than a standard for loop. Since xargs limited to a single process should act like a for loop, I tested that, and found it to be about the same. Thus, we have this demonstration. Quad core (AMD Athalon X4, 2.6GHz) Working entirely on a tempfs (16g ram total; no swap) No other major loads Results: /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 0m3.784s user 0m2.240s sys 0m0.230s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 0m9.097s user 0m28.020s sys 0m0.910s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 0m9.844s user 0m33.200s sys 0m1.270s Can anyone think of a reason why running two instances of this program takes more than twice as long in real time, and more than ten times as long in processor time to complete the same task? After that initial hit, more processes do not seem to have as significant of an effect. I thought it might have to do with disk seeking, so I did that test entirely in ram. Could it have something to do with how Convert works, and having more than one copy at once means it cannot use processor cache as efficiently or something? EDIT: When done with 1000x 769KB files, performance is as expected. Interesting. /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.679s user 5m6.980s sys 0m6.340s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.152s user 5m6.140s sys 0m6.530s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 2m7.578s user 5m35.410s sys 0m6.050s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 4 convert -auto-level real 1m36.959s user 5m48.900s sys 0m6.350s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 1m36.392s user 5m54.840s sys 0m5.650s

    Read the article

  • Specifying a Postfix Instance to send outbound email

    - by Catherine Jefferson
    I have a CentOS 6.5 server running Postfix 2.6x (the default distribution) with five public IPv4 IPs bound to it. Each IP has DNS and rDNS set separately. Each uses a different hostname at a different domain. I have five Postfix instances, one bound to each IP, like this example: 192.168.34.104 red.example.com /etc/postfix 192.168.36.48 green.example.net /etc/postfix-green 192.168.36.49 pink.example.org /etc/postfix-pink 192.168.36.50 orange.example.info /etc/postfix-orange 192.168.36.51 blue.example.us /etc/postfix-blue I've tested each IP by telneting to port 25. Postfix answers and banners properly with the correct hostname. Email is received on all of these instances with no problems and is routed to the correct place. This setup, minus the final instance, has existed for a couple of years and works. I never bothered to set up outbound email to go through any but the main instance, however; there was no need. Now I need to send email from blue.example.us that actually leaves from that interface and IP, such that the Received headers show blue.example.us as the sending mailhost, so that SPF and DKIM validate, etc etc. The email that will be sent from blue.example.com is a feedback loop sent by a single shell account on the server (account5), an account that is dedicated to sending this email. The account receives the feedback loop emails from servers on other networks, saves the bodies of those emails, and then generates a new outbound email header, appends the saved body, and sends the email. It's sending by piping each email to sendmail -oi -t. We're doing it this way to mask the identities of the initial servers. The procmail script that processes these emails works correctly. However, I cannot configure this account to send email through the proper Postfix instance/IP/interface. The exact same account and script sends email through the main Postfix instance /etc/postfix without any issues. When I change MAIL_CONFIG to point to /etc/postfix-blue in either .bash_profile or the Procmail script that handles this email, though, I get this error: sendmail: fatal: User account5(###) is not allowed to submit mail I've read the manuals on Postfix.org, searched Google, and tried the suggestions in three previous answers here on ServerFault.com: Postfix - specify interface to deliver outbound mail on Postfix user is not allowed to submit mail Postfix rejects php mails I have been careful to stop and restart Postfix after each configuration change, and tested the results. Nothing has worked. The main postfix instance happily accepts outbound email from account5. The postfix-blue instance continues to reject email from account5 with the sendmail error above. As tempting as it is to blame machine hostility, I know that I must be missing something or doing something wrong. Does anybody have any suggestions as to what it might be? Please feel free to ask for further information about my setup if you need it. =-=-=-=-=-=-=-=-=-= At the request of the responder, here are main.cf and master.cf for a) the main postfix instance ("red.example.com") and b) the FBL instance ("blue.example.us") [NOTE: All parameters not specified below were left at the default Postfix 2.6 settings] MAIN: master.cf smtp inet n - n - - smtpd main.cf myhostname = red.example.com mydomain = example.com inet_interfaces = $myhostname, localhost inet_protocols = all lmtp_host_lookup = native smtp_host_lookup = native ignore_mx_lookup_error = yes mydestination = $myhostname, localhost.$mydomain, localhost local_recipient_maps = mynetworks = 192.168.34.104/32 relay_domains = example.com, example.info, example.net, example.org, example.us relayhost = [192.168.34.102] # Separate physical server, main mailserver. relay_recipient_maps = hash:/etc/postfix/relay_recipients alias_maps = hash:/etc/aliases alias_database = hash:/etc/aliases smtpd_banner = $myhostname ESMTP $mail_name multi_instance_wrapper = ${command_directory}/postmulti -p -- multi_instance_enable = yes multi_instance_directories = /etc/postfix-green /etc/postfix-pink /etc/postfix-orange /etc/postfix-blue FBL: master.cf 184.173.119.103:25 inet n - n - - smtpd main.cf myhostname = blue.example.us mydomain = blue.example.us <= Deliberately set to subdomain only. myorigin = $mydomain inet_interfaces = $myhostname lmtp_host_lookup = native smtp_host_lookup = native ignore_mx_lookup_error = yes mydestination = $myhostname local_recipient_maps = unix:passwd.byname $alias_maps $virtual_alias_maps mynetworks = 192.168.36.51/32, 192.168.35.20/31 <= Second IP is backup MX servers relay_domains = $mydestination recipient_canonical_maps = hash:/etc/postfix-blue/canonical virtual_alias_maps = hash:/etc/postfix-fbl/virtual alias_maps = hash:/etc/aliases, hash:/etc/postfix-blue/canonical alias_maps = hash:/etc/aliases, hash:/etc/postfix-blue/canonical mailbox_command = /usr/bin/procmail -a "$EXTENSION" DEFAULT=$HOME/Mail/ MAILDIR=$HOME/Mail smtpd_banner = $myhostname ESMTP $mail_name authorized_submit_users = multi_instance_name = postfix-blue multi_instance_enable = yes

    Read the article

  • Apache SSL reverse proxy to a Embed Tomcat

    - by ggarcia24
    I'm trying to put in place a reverse proxy for an application that is running a tomcat embed server over SSL. The application needs to run over SSL on the port 9002 so I have no way of "disabling SSL" for this app. The current setup schema looks like this: [192.168.0.10:443 - Apache with mod_proxy] --> [192.168.0.10:9002 - Tomcat App] After googling on how to make such a setup (and testing) I came across this: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/861137 Which lead to make my current configuration (to try to emulate the --secure-protocol=sslv3 option of wget) /etc/apache2/sites/enabled/default-ssl: <VirtualHost _default_:443> SSLEngine On SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key SSLProxyEngine On SSLProxyProtocol SSLv3 SSLProxyCipherSuite SSLv3 ProxyPass /test/ https://192.168.0.10:9002/ ProxyPassReverse /test/ https://192.168.0.10:9002/ LogLevel debug ErrorLog /var/log/apache2/error-ssl.log CustomLog /var/log/apache2/access-ssl.log combined </VirtualHost> The thing is that the error log is showing error:14077102:SSL routines:SSL23_GET_SERVER_HELLO:unsupported protocol Complete request log: [Wed Mar 13 20:05:57 2013] [debug] mod_proxy.c(1020): Running scheme https handler (attempt 0) [Wed Mar 13 20:05:57 2013] [debug] mod_proxy_http.c(1973): proxy: HTTP: serving URL https://192.168.0.10:9002/ [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2011): proxy: HTTPS: has acquired connection for (192.168.0.10) [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2067): proxy: connecting https://192.168.0.10:9002/ to 192.168.0.10:9002 [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2193): proxy: connected / to 192.168.0.10:9002 [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2444): proxy: HTTPS: fam 2 socket created to connect to 192.168.0.10 [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2576): proxy: HTTPS: connection complete to 192.168.0.10:9002 (192.168.0.10) [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] Connection to child 0 established (server demo1agrubu01.demo.lab:443) [Wed Mar 13 20:05:57 2013] [info] Seeding PRNG with 656 bytes of entropy [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1866): OpenSSL: Handshake: start [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1874): OpenSSL: Loop: before/connect initialization [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1874): OpenSSL: Loop: unknown state [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1897): OpenSSL: read 7/7 bytes from BIO#7f122800a100 [mem: 7f1230018f60] (BIO dump follows) [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1830): +-------------------------------------------------------------------------+ [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1869): | 0000: 15 03 01 00 02 02 50 ......P | [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1875): +-------------------------------------------------------------------------+ [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1903): OpenSSL: Exit: error in unknown state [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] SSL Proxy connect failed [Wed Mar 13 20:05:57 2013] [info] SSL Library Error: 336032002 error:14077102:SSL routines:SSL23_GET_SERVER_HELLO:unsupported protocol [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] Connection closed to child 0 with abortive shutdown (server example1.domain.tld:443) [Wed Mar 13 20:05:57 2013] [error] (502)Unknown error 502: proxy: pass request body failed to 172.31.4.13:9002 (192.168.0.10) [Wed Mar 13 20:05:57 2013] [error] [client 192.168.0.10] proxy: Error during SSL Handshake with remote server returned by /dsfe/ [Wed Mar 13 20:05:57 2013] [error] proxy: pass request body failed to 192.168.0.10:9002 (172.31.4.13) from 172.31.4.13 () [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2029): proxy: HTTPS: has released connection for (172.31.4.13) [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1884): OpenSSL: Write: SSL negotiation finished successfully [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] Connection closed to child 6 with standard shutdown (server example1.domain.tld:443) If I do a wget --secure-protocol=sslv3 --no-check-certificate https://192.168.0.10:9002/ it works perfectly, but from apache is not working. I'm on an Ubuntu Server with the latest updates running apache2 with mod_proxy and mod_ssl enabled: ~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS" ~# dpkg -s apache2 ... Version: 2.2.22-1ubuntu1.2 ... ~# dpkg -s openssl ... Version: 1.0.1-4ubuntu5.7 ... Hope that anyone may help

    Read the article

  • Parallelism in .NET – Part 5, Partitioning of Work

    - by Reed
    When parallelizing any routine, we start by decomposing the problem.  Once the problem is understood, we need to break our work into separate tasks, so each task can be run on a different processing element.  This process is called partitioning. Partitioning our tasks is a challenging feat.  There are opposing forces at work here: too many partitions adds overhead, too few partitions leaves processors idle.  Trying to work the perfect balance between the two extremes is the goal for which we should aim.  Luckily, the Task Parallel Library automatically handles much of this process.  However, there are situations where the default partitioning may not be appropriate, and knowledge of our routines may allow us to guide the framework to making better decisions. First off, I’d like to say that this is a more advanced topic.  It is perfectly acceptable to use the parallel constructs in the framework without considering the partitioning taking place.  The default behavior in the Task Parallel Library is very well-behaved, even for unusual work loads, and should rarely be adjusted.  I have found few situations where the default partitioning behavior in the TPL is not as good or better than my own hand-written partitioning routines, and recommend using the defaults unless there is a strong, measured, and profiled reason to avoid using them.  However, understanding partitioning, and how the TPL partitions your data, helps in understanding the proper usage of the TPL. I indirectly mentioned partitioning while discussing aggregation.  Typically, our systems will have a limited number of Processing Elements (PE), which is the terminology used for hardware capable of processing a stream of instructions.  For example, in a standard Intel i7 system, there are four processor cores, each of which has two potential hardware threads due to Hyperthreading.  This gives us a total of 8 PEs – theoretically, we can have up to eight operations occurring concurrently within our system. In order to fully exploit this power, we need to partition our work into Tasks.  A task is a simple set of instructions that can be run on a PE.  Ideally, we want to have at least one task per PE in the system, since fewer tasks means that some of our processing power will be sitting idle.  A naive implementation would be to just take our data, and partition it with one element in our collection being treated as one task.  When we loop through our collection in parallel, using this approach, we’d just process one item at a time, then reuse that thread to process the next, etc.  There’s a flaw in this approach, however.  It will tend to be slower than necessary, often slower than processing the data serially. The problem is that there is overhead associated with each task.  When we take a simple foreach loop body and implement it using the TPL, we add overhead.  First, we change the body from a simple statement to a delegate, which must be invoked.  In order to invoke the delegate on a separate thread, the delegate gets added to the ThreadPool’s current work queue, and the ThreadPool must pull this off the queue, assign it to a free thread, then execute it.  If our collection had one million elements, the overhead of trying to spawn one million tasks would destroy our performance. The answer, here, is to partition our collection into groups, and have each group of elements treated as a single task.  By adding a partitioning step, we can break our total work into small enough tasks to keep our processors busy, but large enough tasks to avoid overburdening the ThreadPool.  There are two clear, opposing goals here: Always try to keep each processor working, but also try to keep the individual partitions as large as possible. When using Parallel.For, the partitioning is always handled automatically.  At first, partitioning here seems simple.  A naive implementation would merely split the total element count up by the number of PEs in the system, and assign a chunk of data to each processor.  Many hand-written partitioning schemes work in this exactly manner.  This perfectly balanced, static partitioning scheme works very well if the amount of work is constant for each element.  However, this is rarely the case.  Often, the length of time required to process an element grows as we progress through the collection, especially if we’re doing numerical computations.  In this case, the first PEs will finish early, and sit idle waiting on the last chunks to finish.  Sometimes, work can decrease as we progress, since previous computations may be used to speed up later computations.  In this situation, the first chunks will be working far longer than the last chunks.  In order to balance the workload, many implementations create many small chunks, and reuse threads.  This adds overhead, but does provide better load balancing, which in turn improves performance. The Task Parallel Library handles this more elaborately.  Chunks are determined at runtime, and start small.  They grow slowly over time, getting larger and larger.  This tends to lead to a near optimum load balancing, even in odd cases such as increasing or decreasing workloads.  Parallel.ForEach is a bit more complicated, however. When working with a generic IEnumerable<T>, the number of items required for processing is not known in advance, and must be discovered at runtime.  In addition, since we don’t have direct access to each element, the scheduler must enumerate the collection to process it.  Since IEnumerable<T> is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out.  By default, it uses a partitioning method similar to the one described above.  We can see this directly by looking at the Visual Partitioning sample shipped by the Task Parallel Library team, and available as part of the Samples for Parallel Programming.  When we run the sample, with four cores and the default, Load Balancing partitioning scheme, we see this: The colored bands represent each processing core.  You can see that, when we started (at the top), we begin with very small bands of color.  As the routine progresses through the Parallel.ForEach, the chunks get larger and larger (seen by larger and larger stripes). Most of the time, this is fantastic behavior, and most likely will out perform any custom written partitioning.  However, if your routine is not scaling well, it may be due to a failure in the default partitioning to handle your specific case.  With prior knowledge about your work, it may be possible to partition data more meaningfully than the default Partitioner. There is the option to use an overload of Parallel.ForEach which takes a Partitioner<T> instance.  The Partitioner<T> class is an abstract class which allows for both static and dynamic partitioning.  By overriding Partitioner<T>.SupportsDynamicPartitions, you can specify whether a dynamic approach is available.  If not, your custom Partitioner<T> subclass would override GetPartitions(int), which returns a list of IEnumerator<T> instances.  These are then used by the Parallel class to split work up amongst processors.  When dynamic partitioning is available, GetDynamicPartitions() is used, which returns an IEnumerable<T> for each partition.  If you do decide to implement your own Partitioner<T>, keep in mind the goals and tradeoffs of different partitioning strategies, and design appropriately. The Samples for Parallel Programming project includes a ChunkPartitioner class in the ParallelExtensionsExtras project.  This provides example code for implementing your own, custom allocation strategies, including a static allocator of a given chunk size.  Although implementing your own Partitioner<T> is possible, as I mentioned above, this is rarely required or useful in practice.  The default behavior of the TPL is very good, often better than any hand written partitioning strategy.

    Read the article

  • Parallelism in .NET – Part 6, Declarative Data Parallelism

    - by Reed
    When working with a problem that can be decomposed by data, we have a collection, and some operation being performed upon the collection.  I’ve demonstrated how this can be parallelized using the Task Parallel Library and imperative programming using imperative data parallelism via the Parallel class.  While this provides a huge step forward in terms of power and capabilities, in many cases, special care must still be given for relative common scenarios. C# 3.0 and Visual Basic 9.0 introduced a new, declarative programming model to .NET via the LINQ Project.  When working with collections, we can now write software that describes what we want to occur without having to explicitly state how the program should accomplish the task.  By taking advantage of LINQ, many operations become much shorter, more elegant, and easier to understand and maintain.  Version 4.0 of the .NET framework extends this concept into the parallel computation space by introducing Parallel LINQ. Before we delve into PLINQ, let’s begin with a short discussion of LINQ.  LINQ, the extensions to the .NET Framework which implement language integrated query, set, and transform operations, is implemented in many flavors.  For our purposes, we are interested in LINQ to Objects.  When dealing with parallelizing a routine, we typically are dealing with in-memory data storage.  More data-access oriented LINQ variants, such as LINQ to SQL and LINQ to Entities in the Entity Framework fall outside of our concern, since the parallelism there is the concern of the data base engine processing the query itself. LINQ (LINQ to Objects in particular) works by implementing a series of extension methods, most of which work on IEnumerable<T>.  The language enhancements use these extension methods to create a very concise, readable alternative to using traditional foreach statement.  For example, let’s revisit our minimum aggregation routine we wrote in Part 4: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re doing a very simple computation, but writing this in an imperative style.  This can be loosely translated to English as: Create a very large number, and save it in min Loop through each item in the collection. For every item: Perform some computation, and save the result If the computation is less than min, set min to the computation Although this is fairly easy to follow, it’s quite a few lines of code, and it requires us to read through the code, step by step, line by line, in order to understand the intention of the developer. We can rework this same statement, using LINQ: double min = collection.Min(item => item.PerformComputation()); Here, we’re after the same information.  However, this is written using a declarative programming style.  When we see this code, we’d naturally translate this to English as: Save the Min value of collection, determined via calling item.PerformComputation() That’s it – instead of multiple logical steps, we have one single, declarative request.  This makes the developer’s intentions very clear, and very easy to follow.  The system is free to implement this using whatever method required. Parallel LINQ (PLINQ) extends LINQ to Objects to support parallel operations.  This is a perfect fit in many cases when you have a problem that can be decomposed by data.  To show this, let’s again refer to our minimum aggregation routine from Part 4, but this time, let’s review our final, parallelized version: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Here, we’re doing the same computation as above, but fully parallelized.  Describing this in English becomes quite a feat: Create a very large number, and save it in min Create a temporary object we can use for locking Call Parallel.ForEach, specifying three delegates For the first delegate: Initialize a local variable to hold the local state to a very large number For the second delegate: For each item in the collection, perform some computation, save the result If the result is less than our local state, save the result in local state For the final delegate: Take a lock on our temporary object to protect our min variable Save the min of our min and local state variables Although this solves our problem, and does it in a very efficient way, we’ve created a set of code that is quite a bit more difficult to understand and maintain. PLINQ provides us with a very nice alternative.  In order to use PLINQ, we need to learn one new extension method that works on IEnumerable<T> – ParallelEnumerable.AsParallel(). That’s all we need to learn in order to use PLINQ: one single method.  We can write our minimum aggregation in PLINQ very simply: double min = collection.AsParallel().Min(item => item.PerformComputation()); By simply adding “.AsParallel()” to our LINQ to Objects query, we converted this to using PLINQ and running this computation in parallel!  This can be loosely translated into English easily, as well: Process the collection in parallel Get the Minimum value, determined by calling PerformComputation on each item Here, our intention is very clear and easy to understand.  We just want to perform the same operation we did in serial, but run it “as parallel”.  PLINQ completely extends LINQ to Objects: the entire functionality of LINQ to Objects is available.  By simply adding a call to AsParallel(), we can specify that a collection should be processed in parallel.  This is simple, safe, and incredibly useful.

    Read the article

  • SSIS Lookup component tuning tips

    - by jamiet
    Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final  thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • How can I get penetration depth from Minkowski Portal Refinement / Xenocollide?

    - by Raven Dreamer
    I recently got an implementation of Minkowski Portal Refinement (MPR) successfully detecting collision. Even better, my implementation returns a good estimate (local minimum) direction for the minimum penetration depth. So I took a stab at adjusting the algorithm to return the penetration depth in an arbitrary direction, and was modestly successful - my altered method works splendidly for face-edge collision resolution! What it doesn't currently do, is correctly provide the minimum penetration depth for edge-edge scenarios, such as the case on the right: What I perceive to be happening, is that my current method returns the minimum penetration depth to the nearest vertex - which works fine when the collision is actually occurring on the plane of that vertex, but not when the collision happens along an edge. Is there a way I can alter my method to return the penetration depth to the point of collision, rather than the nearest vertex? Here's the method that's supposed to return the minimum penetration distance along a specific direction: public static Vector3 CalcMinDistance(List<Vector3> shape1, List<Vector3> shape2, Vector3 dir) { //holding variables Vector3 n = Vector3.zero; Vector3 swap = Vector3.zero; // v0 = center of Minkowski sum v0 = Vector3.zero; // Avoid case where centers overlap -- any direction is fine in this case //if (v0 == Vector3.zero) return Vector3.zero; //always pass in a valid direction. // v1 = support in direction of origin n = -dir; //get the differnce of the minkowski sum Vector3 v11 = GetSupport(shape1, -n); Vector3 v12 = GetSupport(shape2, n); v1 = v12 - v11; //if the support point is not in the direction of the origin if (v1.Dot(n) <= 0) { //Debug.Log("Could find no points this direction"); return Vector3.zero; } // v2 - support perpendicular to v1,v0 n = v1.Cross(v0); if (n == Vector3.zero) { //v1 and v0 are parallel, which means //the direction leads directly to an endpoint n = v1 - v0; //shortest distance is just n //Debug.Log("2 point return"); return n; } //get the new support point Vector3 v21 = GetSupport(shape1, -n); Vector3 v22 = GetSupport(shape2, n); v2 = v22 - v21; if (v2.Dot(n) <= 0) { //can't reach the origin in this direction, ergo, no collision //Debug.Log("Could not reach edge?"); return Vector2.zero; } // Determine whether origin is on + or - side of plane (v1,v0,v2) //tests linesegments v0v1 and v0v2 n = (v1 - v0).Cross(v2 - v0); float dist = n.Dot(v0); // If the origin is on the - side of the plane, reverse the direction of the plane if (dist > 0) { //swap the winding order of v1 and v2 swap = v1; v1 = v2; v2 = swap; //swap the winding order of v11 and v12 swap = v12; v12 = v11; v11 = swap; //swap the winding order of v11 and v12 swap = v22; v22 = v21; v21 = swap; //and swap the plane normal n = -n; } /// // Phase One: Identify a portal while (true) { // Obtain the support point in a direction perpendicular to the existing plane // Note: This point is guaranteed to lie off the plane Vector3 v31 = GetSupport(shape1, -n); Vector3 v32 = GetSupport(shape2, n); v3 = v32 - v31; if (v3.Dot(n) <= 0) { //can't enclose the origin within our tetrahedron //Debug.Log("Could not reach edge after portal?"); return Vector3.zero; } // If origin is outside (v1,v0,v3), then eliminate v2 and loop if (v1.Cross(v3).Dot(v0) < 0) { //failed to enclose the origin, adjust points; v2 = v3; v21 = v31; v22 = v32; n = (v1 - v0).Cross(v3 - v0); continue; } // If origin is outside (v3,v0,v2), then eliminate v1 and loop if (v3.Cross(v2).Dot(v0) < 0) { //failed to enclose the origin, adjust points; v1 = v3; v11 = v31; v12 = v32; n = (v3 - v0).Cross(v2 - v0); continue; } bool hit = false; /// // Phase Two: Refine the portal int phase2 = 0; // We are now inside of a wedge... while (phase2 < 20) { phase2++; // Compute normal of the wedge face n = (v2 - v1).Cross(v3 - v1); n.Normalize(); // Compute distance from origin to wedge face float d = n.Dot(v1); // If the origin is inside the wedge, we have a hit if (d > 0 ) { //Debug.Log("Do plane test here"); float T = n.Dot(v2) / n.Dot(dir); Vector3 pointInPlane = (dir * T); return pointInPlane; } // Find the support point in the direction of the wedge face Vector3 v41 = GetSupport(shape1, -n); Vector3 v42 = GetSupport(shape2, n); v4 = v42 - v41; float delta = (v4 - v3).Dot(n); float separation = -(v4.Dot(n)); if (delta <= kCollideEpsilon || separation >= 0) { //Debug.Log("Non-convergance detected"); //Debug.Log("Do plane test here"); return Vector3.zero; } // Compute the tetrahedron dividing face (v4,v0,v1) float d1 = v4.Cross(v1).Dot(v0); // Compute the tetrahedron dividing face (v4,v0,v2) float d2 = v4.Cross(v2).Dot(v0); // Compute the tetrahedron dividing face (v4,v0,v3) float d3 = v4.Cross(v3).Dot(v0); if (d1 < 0) { if (d2 < 0) { // Inside d1 & inside d2 ==> eliminate v1 v1 = v4; v11 = v41; v12 = v42; } else { // Inside d1 & outside d2 ==> eliminate v3 v3 = v4; v31 = v41; v32 = v42; } } else { if (d3 < 0) { // Outside d1 & inside d3 ==> eliminate v2 v2 = v4; v21 = v41; v22 = v42; } else { // Outside d1 & outside d3 ==> eliminate v1 v1 = v4; v11 = v41; v12 = v42; } } } return Vector3.zero; } }

    Read the article

  • ODI 12c - Parallel Table Load

    - by David Allan
    In this post we will look at the ODI 12c capability of parallel table load from the aspect of the mapping developer and the knowledge module developer - two quite different viewpoints. This is about parallel table loading which isn't to be confused with loading multiple targets per se. It supports the ability for ODI mappings to be executed concurrently especially if there is an overlap of the datastores that they access, so any temporary resources created may be uniquely constructed by ODI. Temporary objects can be anything basically - common examples are staging tables, indexes, views, directories - anything in the ETL to help the data integration flow do its job. In ODI 11g users found a few workarounds (such as changing the technology prefixes - see here) to build unique temporary names but it was more of a challenge in error cases. ODI 12c mappings by default operate exactly as they did in ODI 11g with respect to these temporary names (this is also true for upgraded interfaces and scenarios) but can be configured to support the uniqueness capabilities. We will look at this feature from two aspects; that of a mapping developer and that of a developer (of procedures or KMs). 1. Firstly as a Mapping Developer..... 1.1 Control when uniqueness is enabled A new property is available to set unique name generation on/off. When unique names have been enabled for a mapping, all temporary names used by the collection and integration objects will be generated using unique names. This property is presented as a check-box in the Property Inspector for a deployment specification. 1.2 Handle cleanup after successful execution Provided that all temporary objects that are created have a corresponding drop statement then all of the temporary objects should be removed during a successful execution. This should be the case with the KMs developed by Oracle. 1.3 Handle cleanup after unsuccessful execution If an execution failed in ODI 11g then temporary tables would have been left around and cleaned up in the subsequent run. In ODI 12c, KM tasks can now have a cleanup-type task which is executed even after a failure in the main tasks. These cleanup tasks will be executed even on failure if the property 'Remove Temporary Objects on Error' is set. If the agent was to crash and not be able to execute this task, then there is an ODI tool (OdiRemoveTemporaryObjects here) you can invoke to cleanup the tables - it supports date ranges and the like. That's all there is to it from the aspect of the mapping developer it's much, much simpler and straightforward. You can now execute the same mapping concurrently or execute many mappings using the same resource concurrently without worrying about conflict.  2. Secondly as a Procedure or KM Developer..... In the ODI Operator the executed code shows the actual name that is generated - you can also see the runtime code prior to execution (introduced in 11.1.1.7), for example below in the code type I selected 'Pre-executed Code' this lets you see the code about to be processed and you can also see the executed code (which is the default view). References to the collection (C$) and integration (I$) names will be automatically made unique by using the odiRef APIs - these objects will have unique names whenever concurrency has been enabled for a particular mapping deployment specification. It's also possible to use name uniqueness functions in procedures and your own KMs. 2.1 New uniqueness tags  You can also make your own temporary objects have unique names by explicitly including either %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG in the name passed to calls to the odiRef APIs. Such names would always include the unique tag regardless of the concurrency setting. To illustrate, let's look at the getObjectName() method. At <% expansion time, this API will append %UNIQUE_STEP_TAG to the object name for collection and integration tables. The name parameter passed to this API may contain  %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. This API always generates to the <? version of getObjectName() At execution time this API will replace the unique tag macros with a string that is unique to the current execution scope. The returned name will conform to the name-length restriction for the target technology, and its pattern for the unique tag. Any necessary truncation will be performed against the initial name for the object and any other fixed text that may have been specified. Examples are:- <?=odiRef.getObjectName("L", "%COL_PRFEMP%UNIQUE_STEP_TAG", "D")?> SCOTT.C$_EABH7QI1BR1EQI3M76PG9SIMBQQ <?=odiRef.getObjectName("L", "EMP%UNIQUE_STEP_TAG_AE", "D")?> SCOTT.EMPAO96Q2JEKO0FTHQP77TMSAIOSR_ Methods which have this kind of support include getFrom, getTableName, getTable, getObjectShortName and getTemporaryIndex. There are APIs for retrieving this tag info also, the getInfo API has been extended with the following properties (the UNIQUE* properties can also be used in ODI procedures); UNIQUE_STEP_TAG - Returns the unique value for the current step scope, e.g. 5rvmd8hOIy7OU2o1FhsF61 Note that this will be a different value for each loop-iteration when the step is in a loop. UNIQUE_SESSION_TAG - Returns the unique value for the current session scope, e.g. 6N38vXLrgjwUwT5MseHHY9 IS_CONCURRENT - Returns info about the current mapping, will return 0 or 1 (only in % phase) GUID_SRC_SET - Returns the UUID for the current source set/execution unit (only in % phase) The getPop API has been extended with the IS_CONCURRENT property which returns info about an mapping, will return 0 or 1.  2.2 Additional APIs Some new APIs are provided including getFormattedName which will allow KM developers to construct a name from fixed-text or ODI symbols that can be optionally truncate to a max length and use a specific encoding for the unique tag. It has syntax getFormattedName(String pName[, String pTechnologyCode]) This API is available at both the % and the ? phase.  The format string can contain the ODI prefixes that are available for getObjectName(), e.g. %INT_PRF, %COL_PRF, %ERR_PRF, %IDX_PRF alongwith %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. The latter tags will be expanded into a unique string according to the specified technology. Calls to this API within the same execution context are guaranteed to return the same unique name provided that the same parameters are passed to the call. e.g. <%=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")%> <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")?> C$_MY_TAB7wDiBe80vBog1auacS1xB_AE <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG.log", "FILE")?> C2_MY_TAB7wDiBe80vBog1auacS1xB.log 2.3 Name length generation  As part of name generation, the length of the generated name will be compared with the maximum length for the target technology and truncation may need to be applied. When a unique tag is included in the generated string it is important that uniqueness is not compromised by truncation of the unique tag. When a unique tag is NOT part of the generated name, the name will be truncated by removing characters from the end - this is the existing 11g algorithm. When a unique tag is included, the algorithm will first truncate the <postfix> and if necessary  the <prefix>. It is recommended that users will ensure there is sufficient uniqueness in the <prefix> section to ensure uniqueness of the final resultant name. SUMMARY To summarize, ODI 12c make it much simpler to utilize mappings in concurrent cases and provides APIs for helping developing any procedures or custom knowledge modules in such a way they can be used in highly concurrent, parallel scenarios. 

    Read the article

  • SSIS - XML Source Script

    - by simonsabin
    The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml");   foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer")                        from order in customer.Elements("Orders").Elements("Order")                        select new { Account = customer.Attribute("AccountNumber").Value                                   , OrderDate = order.Attribute("OrderDate").Value }                        )) {     Output0Buffer.AddRow();     Output0Buffer.AccountNumber = xdata.Account;     Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer")                                from order in customer.Elements("Orders").Elements("Order")                                select new { Account = customer.Attribute("AccountNumber").Value                                           , OrderDate = order.Attribute("OrderDate").Value }                                ))         {             Output0Buffer.AddRow();             Output0Buffer.AccountNumber = xdata.Account;             Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate);         } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) {     using (XmlReader xr = XmlReader.Create(filename))     {         xr.MoveToContent();         while (xr.Read()) //Reads the first element         {             while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName)             {                 XElement node = (XElement)XElement.ReadFrom(xr);                   yield return node;             }         }         xr.Close();     } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.        

    Read the article

  • Using a "white list" for extracting terms for Text Mining

    - by [email protected]
    In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms.  However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM                                TOKEN DATA MINING                         DATAMINING ORACLE DATA MINING                  ORACLEDATAMINING 11G                                 ORACLE11G JAVA                                JAVA CRM                                 CRM CUSTOMER RELATIONSHIP MANAGEMENT    CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term     VARCHAR2(100),                                  query_string  VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over  the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE   cursor c2 is     select token, term     from my_term_token_map; BEGIN   for r_c2 in c2 loop     CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term);     EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values                        (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')'     using r_c2.token;   end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example:     'ORACLEDATAMINING'        SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on        my_thesaurus_rules(query_string)        indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

    Read the article

  • June 2013 Release of the Ajax Control Toolkit

    - by Stephen.Walther
    I’m happy to announce the June 2013 release of the Ajax Control Toolkit. For this release, we enhanced the AjaxFileUpload control to support uploading files directly to Windows Azure. We also improved the SlideShow control by adding support for CSS3 animations. You can get the latest release of the Ajax Control Toolkit by visiting the project page at CodePlex (http://AjaxControlToolkit.CodePlex.com). Alternatively, you can execute the following NuGet command from the Visual Studio Library Package Manager window: Uploading Files to Azure The AjaxFileUpload control enables you to efficiently upload large files and display progress while uploading. With this release, we’ve added support for uploading large files directly to Windows Azure Blob Storage (You can continue to upload to your server hard drive if you prefer). Imagine, for example, that you have created an Azure Blob Storage container named pictures. In that case, you can use the following AjaxFileUpload control to upload to the container: <toolkit:ToolkitScriptManager runat="server" /> <toolkit:AjaxFileUpload ID="AjaxFileUpload1" StoreToAzure="true" AzureContainerName="pictures" runat="server" /> Notice that the AjaxFileUpload control is declared with two properties related to Azure. The StoreToAzure property causes the AjaxFileUpload control to upload a file to Azure instead of the local computer. The AzureContainerName property points to the blob container where the file is uploaded. .int3{position:absolute;clip:rect(487px,auto,auto,444px);}SMALL cash advance VERY CHEAP To use the AjaxFileUpload control, you need to modify your web.config file so it contains some additional settings. You need to configure the AjaxFileUpload handler and you need to point your Windows Azure connection string to your Blob Storage account. <configuration> <appSettings> <!--<add key="AjaxFileUploadAzureConnectionString" value="UseDevelopmentStorage=true"/>--> <add key="AjaxFileUploadAzureConnectionString" value="DefaultEndpointsProtocol=https;AccountName=testact;AccountKey=RvqL89Iw4npvPlAAtpOIPzrinHkhkb6rtRZmD0+ojZupUWuuAVJRyyF/LIVzzkoN38I4LSr8qvvl68sZtA152A=="/> </appSettings> <system.web> <compilation debug="true" targetFramework="4.5" /> <httpRuntime targetFramework="4.5" /> <httpHandlers> <add verb="*" path="AjaxFileUploadHandler.axd" type="AjaxControlToolkit.AjaxFileUploadHandler, AjaxControlToolkit"/> </httpHandlers> </system.web> <system.webServer> <validation validateIntegratedModeConfiguration="false" /> <handlers> <add name="AjaxFileUploadHandler" verb="*" path="AjaxFileUploadHandler.axd" type="AjaxControlToolkit.AjaxFileUploadHandler, AjaxControlToolkit"/> </handlers> <security> <requestFiltering> <requestLimits maxAllowedContentLength="4294967295"/> </requestFiltering> </security> </system.webServer> </configuration> You supply the connection string for your Azure Blob Storage account with the AjaxFileUploadAzureConnectionString property. If you set the value “UseDevelopmentStorage=true” then the AjaxFileUpload will upload to the simulated Blob Storage on your local machine. After you create the necessary configuration settings, you can use the AjaxFileUpload control to upload files directly to Azure (even very large files). Here’s a screen capture of how the AjaxFileUpload control appears in Google Chrome: After the files are uploaded, you can view the uploaded files in the Windows Azure Portal. You can see that all 5 files were uploaded successfully: New AjaxFileUpload Events In response to user feedback, we added two new events to the AjaxFileUpload control (on both the server and the client): · UploadStart – Raised on the server before any files have been uploaded. · UploadCompleteAll – Raised on the server when all files have been uploaded. · OnClientUploadStart – The name of a function on the client which is called before any files have been uploaded. · OnClientUploadCompleteAll – The name of a function on the client which is called after all files have been uploaded. These new events are most useful when uploading multiple files at a time. The updated AjaxFileUpload sample page demonstrates how to use these events to show the total amount of time required to upload multiple files (see the AjaxFileUpload.aspx file in the Ajax Control Toolkit sample site). SlideShow Animated Slide Transitions With this release of the Ajax Control Toolkit, we also added support for CSS3 animations to the SlideShow control. The animation is used when transitioning from one slide to another. Here’s the complete list of animations: · FadeInFadeOut · ScaleX · ScaleY · ZoomInOut · Rotate · SlideLeft · SlideDown You specify the animation which you want to use by setting the SlideShowAnimationType property. For example, here is how you would use the Rotate animation when displaying a set of slides: <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="ShowSlideShow.aspx.cs" Inherits="TestACTJune2013.ShowSlideShow" %> <%@ Register TagPrefix="toolkit" Namespace="AjaxControlToolkit" Assembly="AjaxControlToolkit" %> <script runat="Server" type="text/C#"> [System.Web.Services.WebMethod] [System.Web.Script.Services.ScriptMethod] public static AjaxControlToolkit.Slide[] GetSlides() { return new AjaxControlToolkit.Slide[] { new AjaxControlToolkit.Slide("slides/Blue hills.jpg", "Blue Hills", "Go Blue"), new AjaxControlToolkit.Slide("slides/Sunset.jpg", "Sunset", "Setting sun"), new AjaxControlToolkit.Slide("slides/Winter.jpg", "Winter", "Wintery..."), new AjaxControlToolkit.Slide("slides/Water lilies.jpg", "Water lillies", "Lillies in the water"), new AjaxControlToolkit.Slide("slides/VerticalPicture.jpg", "Sedona", "Portrait style picture") }; } </script> <!DOCTYPE html> <html > <head runat="server"> <title></title> </head> <body> <form id="form1" runat="server"> <div> <toolkit:ToolkitScriptManager ID="ToolkitScriptManager1" runat="server" /> <asp:Image ID="Image1" Height="300" Runat="server" /> <toolkit:SlideShowExtender ID="SlideShowExtender1" TargetControlID="Image1" SlideShowServiceMethod="GetSlides" AutoPlay="true" Loop="true" SlideShowAnimationType="Rotate" runat="server" /> </div> </form> </body> </html> In the code above, the set of slides is exposed by a page method named GetSlides(). The SlideShowAnimationType property is set to the value Rotate. The following animated GIF gives you an idea of the resulting slideshow: If you want to use either the SlideDown or SlideRight animations, then you must supply both an explicit width and height for the Image control which is the target of the SlideShow extender. For example, here is how you would declare an Image and SlideShow control to use a SlideRight animation: <toolkit:ToolkitScriptManager ID="ToolkitScriptManager1" runat="server" /> <asp:Image ID="Image1" Height="300" Width="300" Runat="server" /> <toolkit:SlideShowExtender ID="SlideShowExtender1" TargetControlID="Image1" SlideShowServiceMethod="GetSlides" AutoPlay="true" Loop="true" SlideShowAnimationType="SlideRight" runat="server" /> Notice that the Image control includes both a Height and Width property. Here’s an approximation of this animation using an animated GIF: Summary The Superexpert team worked hard on this release. We hope you like the new improvements to both the AjaxFileUpload and the SlideShow controls. We’d love to hear your feedback in the comments. On to the next sprint!

    Read the article

  • Combination of Operating Mode and Commit Strategy

    - by Kevin Yang
    If you want to populate a source into multiple targets, you may also want to ensure that every row from the source affects all targets uniformly (or separately). Let’s consider the Example Mapping below. If a row from SOURCE causes different changes in multiple targets (TARGET_1, TARGET_2 and TARGET_3), for example, it can be successfully inserted into TARGET_1 and TARGET_3, but failed to be inserted into TARGET_2, and the current Mapping Property TLO (target load order) is “TARGET_1 -> TARGET_2 -> TARGET_3”. What should Oracle Warehouse Builder do, in order to commit the appropriate data to all affected targets at the same time? If it doesn’t behave as you intended, the data could become inaccurate and possibly unusable.                                               Example Mapping In OWB, we can use Mapping Configuration Commit Strategies and Operating Modes together to achieve this kind of requirements. Below we will explore the combination of these two features and how they affect the results in the target tables Before going to the example, let’s review some of the terms we will be using (Details can be found in white paper Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide11g Release 2): Operating Modes: Set-Based Mode: Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Row-Based Mode: Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. Row-Based (Target Only) Mode: Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. Commit Strategies: Automatic: Warehouse Builder loads and then automatically commits data based on the mapping design. If the mapping has multiple targets, Warehouse Builder commits and rolls back each target separately and independently of other targets. Use the automatic commit when the consequences of multiple targets being loaded unequally are not great or are irrelevant. Automatic correlated: It is a specialized type of automatic commit that applies to PL/SQL mappings with multiple targets only. Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets. Use the correlated commit when it is important to ensure that every row in the source affects all affected targets uniformly. Manual: select manual commit control for PL/SQL mappings when you want to interject complex business logic, perform validations, or run other mappings before committing data. Combination of the commit strategy and operating mode To understand the effects of each combination of operating mode and commit strategy, I’ll illustrate using the following example Mapping. Firstly we insert 100 rows into the SOURCE table and make sure that the 99th row and 100th row have the same ID value. And then we create a unique key constraint on ID column for TARGET_2 table. So while running the example mapping, OWB tries to load all 100 rows to each of the targets. But the mapping should fail to load the 100th row to TARGET_2, because it will violate the unique key constraint of table TARGET_2. With different combinations of Commit Strategy and Operating Mode, here are the results ¦ Set-based/ Correlated Commit: Configuration of Example mapping:                                                     Result:                                                      What’s happening: A single error anywhere in the mapping triggers the rollback of all data. OWB encounters the error inserting into Target_2, it reports an error for the table and does not load the row. OWB rolls back all the rows inserted into Target_1 and does not attempt to load rows to Target_3. No rows are added to any of the target tables. ¦ Row-based/ Correlated Commit: Configuration of Example mapping:                                                   Result:                                                  What’s happening: OWB evaluates each row separately and loads it to all three targets. Loading continues in this way until OWB encounters an error loading row 100th to Target_2. OWB reports the error and does not load the row. It rolls back the row 100th previously inserted into Target_1 and does not attempt to load row 100 to Target_3. Then, if there are remaining rows, OWB will continue loading them, resuming with loading rows to Target_1. The mapping completes with 99 rows inserted into each target. ¦ Set-based/ Automatic Commit: Configuration of Example mapping: Result: What’s happening: When OWB encounters the error inserting into Target_2, it does not load any rows and reports an error for the table. It does, however, continue to insert rows into Target_3 and does not roll back the rows previously inserted into Target_1. The mapping completes with one error message for Target_2, no rows inserted into Target_2, and 100 rows inserted into Target_1 and Target_3 separately. ¦ Row-based/Automatic Commit: Configuration of Example mapping: Result: What’s happening: OWB evaluates each row separately for loading into the targets. Loading continues in this way until OWB encounters an error loading row 100 to Target_2 and reports the error. OWB does not roll back row 100th from Target_1, does insert it into Target_3. If there are remaining rows, it will continue to load them. The mapping completes with 99 rows inserted into Target_2 and 100 rows inserted into each of the other targets. Note: Automatic Correlated commit is not applicable for row-based (target only). If you design a mapping with the row-based (target only) and correlated commit combination, OWB runs the mapping but does not perform the correlated commit. In set-based mode, correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or update/insert). Correlated commit operates transparently with PL/SQL bulk processing code. The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or that include a Queue, Match Merge, or Table Function operator. If you want to practice in your own environment, you can follow the steps: 1. Import the MDL file: commit_operating_mode.mdl 2. Fix the location for oracle module ORCL and deploy all tables under it. 3. Insert sample records into SOURCE table, using below plsql code: begin     for i in 1..99     loop         insert into source values(i, 'col_'||i);     end loop;     insert into source values(99, 'col_99'); end; 4. Configure MAPPING_1 to any combinations of operating mode and commit strategy you want to test. And make sure feature TLO of mapping is open. 5. Deploy Mapping “MAPPING_1”. 6. Run the mapping and check the result.

    Read the article

  • At most how many customized P3 attributes could be added into Agile?

    - by Jie Chen
    I have one customer/Oracle Partner Consultant asking me such question: how many customized attributes can be allowed to add to Agile's subclass Page Three? I never did research against this because Agile User Guide never says this and theoretically Agile supports unlimited amount of customized attributes, unless the browser itself cannot handle them in allocated memory. However my customers says when to add almost 1000 attributes, the browser (Web Client) will not show any Page Three attributes, including all the out-of-box attributes. Let's see why. Analysis It is horrible to add 1000 attributes manually. Let's do it by a batch SQL like below to add them to Item's subclass Page Three tab. Do not execute below SQL because it will not take effect due to your different node id. CREATE OR REPLACE PROCEDURE createP3Text(v_name IN VARCHAR2) IS v_nid NUMBER; v_pid NUMBER; BEGIN select SEQNODETABLE.nextval into v_nid from dual; Insert Into nodeTable ( id,parentID,description,objType,inherit,helpID,version,name ) values ( v_nid,2473003, v_name ,1,0,0,0, v_name); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,925, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,0,0,0,0,1,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,0,0,0,0,2,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,2,0,1,3,'50'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,5, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,2,0,1,6,'50'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,2,0,0,7,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,8,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,9,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,1,0,1,10,v_name); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,0,0,0,0,11,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,11743,1,14,'2'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,30, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,2,1,0,1,38, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,59,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,60,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,724,0,61, null); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,1,0,0,232,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,233,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,12239,1,415,'13307'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,2,1,0,0,605,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,610,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,1,4,1,451,0,716,'1'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,795,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,2000008821,1,864,'2'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,1,923,'0'); Insert Into propertyTable ( ID,parentID,readOnly,attType,dataType,selection,visible,propertyID,value ) values ( SEQPROPERTYTABLE.nextval,v_nid,0,4,1,451,0,719,'0'); Insert Into tableInfo ( tabID,tableID,classID,att,ordering ) values ( 2473005,1501,2473002,v_nid,9999); commit; END createP3Text; / BEGIN FOR i in 1..1000 LOOP createP3Text('MyText' || i); END LOOP; END; / DROP PROCEDURE createP3Text; COMMIT; Now restart Agile Server and check the Server's log, we noticed below: ***** Node Created : 85625 ***** Property Created : 184579 +++++++++++++++++++++++++++++++++++++ + Agile PLM Server Starting Up... + +++++++++++++++++++++++++++++++++++++ However the previously log before batch SQL is ***** Node Created : 84625 ***** Property Created : 157579 +++++++++++++++++++++++++++++++++++++ + Agile PLM Server Starting Up... + +++++++++++++++++++++++++++++++++++++ Obviously we successfully imported 1000 (85625-84625) attributes. Now go to JavaClient and confirm if we have them or not. Theoretically we are able to open such item object and see all these 1000 attributes and their values, but we get below error. We have no error tips in server log. But never mind we have the Java Console for JavaClient. If to open the same item in JavaClient we get a clear error and detailed trace in Java Console. ORA-01795: maximum number of expressions in a list is 1000 java.sql.SQLException: ORA-01795: maximum number of expressions in a list is 1000 at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:125) ... ... at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:128) at com.agile.pc.cmserver.base.AgileFlexUtil.setFlexValuesForOneRowTable(AgileFlexUtil.java:1104) at com.agile.pc.cmserver.base.BaseFlexTableDAO.loadExtraFlexAttValues(BaseFlexTableDAO.java:111) at com.agile.pc.cmserver.base.BasePageThreeDAO.loadTable(BasePageThreeDAO.java:108) If you are interested in the background of the problem, you may de-compile the class com.agile.pc.cmserver.base.AgileFlexUtil.setFlexValuesForOneRowTable and find the root cause that Agile happens to hit Oracle Database's limitation that more than 1000 values in the "IN" clause. Check here http://ora-01795.ora-code.com If you need Oracle Agile's final solution, please contact Oracle Agile Support. Performance Below two screenshot are jvm heap usage from before-SQL and after-SQL. We can see there is no big memory gap between two cases. So definitely there is no performance impact to Agile Application Server unless you have more than 1000 attributes for EACH of your dozens of  subclasses. And for client, 1000 attributes should not impact the browser's performance because in HTML we only use dt and dd for each attribute's pair: label and value. It is quite lightweight.

    Read the article

  • More Fun With Math

    - by PointsToShare
    More Fun with Math   The runaway student – three different ways of solving one problem Here is a problem I read in a Russian site: A student is running away. He is moving at 1 mph. Pursuing him are a lion, a tiger and his math teacher. The lion is 40 miles behind and moving at 6 mph. The tiger is 28 miles behind and moving at 4 mph. His math teacher is 30 miles behind and moving at 5 mph. Who will catch him first? Analysis Obviously we have a set of three problems. They are all basically the same, but the details are different. The problems are of the same class. Here is a little excursion into computer science. One of the things we strive to do is to create solutions for classes of problems rather than individual problems. In your daily routine, you call it re-usability. Not all classes of problems have such solutions. If a class has a general (re-usable) solution, it is called computable. Otherwise it is unsolvable. Within unsolvable classes, we may still solve individual (some but not all) problems, albeit with different approaches to each. Luckily the vast majority of our daily problems are computable, and the 3 problems of our runaway student belong to a computable class. So, let’s solve for the catch-up time by the math teacher, after all she is the most frightening. She might even make the poor runaway solve this very problem – perish the thought! Method 1 – numerical analysis. At 30 miles and 5 mph, it’ll take her 6 hours to come to where the student was to begin with. But by then the student has advanced by 6 miles. 6 miles require 6/5 hours, but by then the student advanced by another 6/5 of a mile as well. And so on and so forth. So what are we to do? One way is to write code and iterate it until we have solved it. But this is an infinite process so we’ll end up with an infinite loop. So what to do? We’ll use the principles of numerical analysis. Any calculator – your computer included – has a limited number of digits. A double floating point number is good for about 14 digits. Nothing can be computed at a greater accuracy than that. This means that we will not iterate ad infinidum, but rather to the point where 2 consecutive iterations yield the same result. When we do financial computations, we don’t even have to go that far. We stop at the 10th of a penny.  It behooves us here to stop at a 10th of a second (100 milliseconds) and this will how we will avoid an infinite loop. Interestingly this alludes to the Zeno paradoxes of motion – in particular “Achilles and the Tortoise”. Zeno says exactly the same. To catch the tortoise, Achilles must always first come to where the tortoise was, but the tortoise keeps moving – hence Achilles will never catch the tortoise and our math teacher (or lion, or tiger) will never catch the student, or the policeman the thief. Here is my resolution to the paradox. The distance and time in each step are smaller and smaller, so the student will be caught. The only thing that is infinite is the iterative solution. The race is a convergent geometric process so the steps are diminishing, but each step in the solution takes the same amount of effort and time so with an infinite number of steps, we’ll spend an eternity solving it.  This BTW is an original thought that I have never seen before. But I digress. Let’s simply write the code to solve the problem. To make sure that it runs everywhere, I’ll do it in JavaScript. function LongCatchUpTime(D, PV, FV) // D is Distance; PV is Pursuers Velocity; FV is Fugitive’ Velocity {     var t = 0;     var T = 0;     var d = parseFloat(D);     var pv = parseFloat (PV);     var fv = parseFloat (FV);     t = d / pv;     while (t > 0.000001) //a 10th of a second is 1/36,000 of an hour, I used 1/100,000     {         T = T + t;         d = t * fv;         t = d / pv;     }     return T;     } By and large, the higher the Pursuer’s velocity relative to the fugitive, the faster the calculation. Solving this with the 10th of a second limit yields: 7.499999232000001 Method 2 – Geometric Series. Each step in the iteration above is smaller than the next. As you saw, we stopped iterating when the last step was small enough, small enough not to really matter.  When we have a sequence of numbers in which the ratio of each number to its predecessor is fixed we call the sequence geometric. When we are looking at the sum of sequence, we call the sequence of sums series.  Now let’s look at our student and teacher. The teacher runs 5 times faster than the student, so with each iteration the distance between them shrinks to a fifth of what it was before. This is a fixed ratio so we deal with a geometric series.  We normally designate this ratio as q and when q is less than 1 (0 < q < 1) the sum of  + … +  is  – 1) / (q – 1). When q is less than 1, it is easier to use ) / (1 - q). Now, the steps are 6 hours then 6/5 hours then 6/5*5 and so on, so q = 1/5. And the whole series is multiplied by 6. Also because q is less than 1 , 1/  diminishes to 0. So the sum is just  / (1 - q). or 1/ (1 – 1/5) = 1 / (4/5) = 5/4. This times 6 yields 7.5 hours. We can now continue with some algebra and take it back to a simpler formula. This is arduous and I am not going to do it here. Instead let’s do some simpler algebra. Method 3 – Simple Algebra. If the time to capture the fugitive is T and the fugitive travels at 1 mph, then by the time the pursuer catches him he travelled additional T miles. Time is distance divided by speed, so…. (D + T)/V = T  thus D + T = VT  and D = VT – T = (V – 1)T  and T = D/(V – 1) This “strangely” coincides with the solution we just got from the geometric sequence. This is simpler ad faster. Here is the corresponding code. function ShortCatchUpTime(D, PV, FV) {     var d = parseFloat(D);     var pv = parseFloat (PV);     var fv = parseFloat (FV);     return d / (pv - fv); } The code above, for both the iterative solution and the algebraic solution are actually for a larger class of problems.  In our original problem the student’s velocity (speed) is 1 mph. In the code it may be anything as long as it is less than the pursuer’s velocity. As long as PV > FV, the pursuer will catch up. Here is the really general formula: T = D / (PV – FV) Finally, let’s run the program for each of the pursuers.  It could not be worse. I know he’d rather be eaten alive than suffering through yet another math lesson. See the code run? Select  “Catch Up Time” in www.mgsltns.com/games.htm The host is running on Unix, so the link is case sensitive. That’s All Folks

    Read the article

  • More Fun with C# Iterators and Generators

    - by James Michael Hare
    In my last post, I talked quite a bit about iterators and how they can be really powerful tools for filtering a list of items down to a subset of items.  This had both pros and cons over returning a full collection, which, in summary, were:   Pros: If traversal is only partial, does not have to visit rest of collection. If evaluation method is costly, only incurs that cost on elements visited. Adds little to no garbage collection pressure.    Cons: Very slight performance impact if you know caller will always consume all items in collection. And as we saw in the last post, that con for the cost was very, very small and only really became evident on very tight loops consuming very large lists completely.    One of the key items to note, though, is the garbage!  In the traditional (return a new collection) method, if you have a 1,000,000 element collection, and wish to transform or filter it in some way, you have to allocate space for that copy of the collection.  That is, say you have a collection of 1,000,000 items and you want to double every item in the collection.  Well, that means you have to allocate a collection to hold those 1,000,000 items to return, which is a lot especially if you are just going to use it once and toss it.   Iterators, though, don't have this problem.  Each time you visit the node, it would return the doubled value of the node (in this example) and not allocate a second collection of 1,000,000 doubled items.  Do you see the distinction?  In both cases, we're consuming 1,000,000 items.  But in one case we pass back each doubled item which is just an int (for example's sake) on the stack and in the other case, we allocate a list containing 1,000,000 items which then must be garbage collected.   So iterators in C# are pretty cool, eh?  Well, here's one more thing a C# iterator can do that a traditional "return a new collection" transformation can't!   It can return **unbounded** collections!   I know, I know, that smells a lot like an infinite loop, eh?  Yes and no.  Basically, you're relying on the caller to put the bounds on the list, and as long as the caller doesn't you keep going.  Consider this example:   public static class Fibonacci {     // returns the infinite fibonacci sequence     public static IEnumerable<int> Sequence()     {         int iteration = 0;         int first = 1;         int second = 1;         int current = 0;         while (true)         {             if (iteration++ < 2)             {                 current = 1;             }             else             {                 current = first + second;                 second = first;                 first = current;             }             yield return current;         }     } }   Whoa, you say!  Yes, that's an infinite loop!  What the heck is going on there?  Yes, that was intentional.  Would it be better to have a fibonacci sequence that returns only a specific number of items?  Perhaps, but that wouldn't give you the power to defer the execution to the caller.   The beauty of this function is it is as infinite as the sequence itself!  The fibonacci sequence is unbounded, and so is this method.  It will continue to return fibonacci numbers for as long as you ask for them.  Now that's not something you can do with a traditional method that would return a collection of ints representing each number.  In that case you would eventually run out of memory as you got to higher and higher numbers.  This method, though, never runs out of memory.   Now, that said, you do have to know when you use it that it is an infinite collection and bound it appropriately.  Fortunately, Linq provides a lot of these extension methods for you!   Let's say you only want the first 10 fibonacci numbers:       foreach(var fib in Fibonacci.Sequence().Take(10))     {         Console.WriteLine(fib);     }   Or let's say you only want the fibonacci numbers that are less than 100:       foreach(var fib in Fibonacci.Sequence().TakeWhile(f => f < 100))     {         Console.WriteLine(fib);     }   So, you see, one of the nice things about iterators is their power to work with virtually any size (even infinite) collections without adding the garbage collection overhead of making new collections.    You can also do fun things like this to make a more "fluent" interface for for loops:   // A set of integer generator extension methods public static class IntExtensions {     // Begins counting to inifity, use To() to range this.     public static IEnumerable<int> Every(this int start)     {         // deliberately avoiding condition because keeps going         // to infinity for as long as values are pulled.         for (var i = start; ; ++i)         {             yield return i;         }     }     // Begins counting to infinity by the given step value, use To() to     public static IEnumerable<int> Every(this int start, int byEvery)     {         // deliberately avoiding condition because keeps going         // to infinity for as long as values are pulled.         for (var i = start; ; i += byEvery)         {             yield return i;         }     }     // Begins counting to inifity, use To() to range this.     public static IEnumerable<int> To(this int start, int end)     {         for (var i = start; i <= end; ++i)         {             yield return i;         }     }     // Ranges the count by specifying the upper range of the count.     public static IEnumerable<int> To(this IEnumerable<int> collection, int end)     {         return collection.TakeWhile(item => item <= end);     } }   Note that there are two versions of each method.  One that starts with an int and one that starts with an IEnumerable<int>.  This is to allow more power in chaining from either an existing collection or from an int.  This lets you do things like:   // count from 1 to 30 foreach(var i in 1.To(30)) {     Console.WriteLine(i); }     // count from 1 to 10 by 2s foreach(var i in 0.Every(2).To(10)) {     Console.WriteLine(i); }     // or, if you want an infinite sequence counting by 5s until something inside breaks you out... foreach(var i in 0.Every(5)) {     if (someCondition)     {         break;     }     ... }     Yes, those are kinda play functions and not particularly useful, but they show some of the power of generators and extension methods to form a fluid interface.   So what do you think?  What are some of your favorite generators and iterators?

    Read the article

  • SQL University: What and why of database testing

    - by Mladen Prajdic
    This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover: SQLU part 1 - What and why of database testing SQLU part 2 - What and why of database refactoring SQLU part 2 – Tools of the trade With that out of the way let us sharpen our pencils and get going. Why test a database The sad state of the industry today is that there is very little emphasis on testing in general. Test driven development is still a small niche of the programming world while refactoring is even smaller. The cause of this is the inability of developers to convince themselves and their managers that writing tests is beneficial. At the moment they are mostly viewed as waste of time. This is because the average person (let’s not fool ourselves, we’re all average) is unable to think about lower future costs in relation to little more current work. It’s orders of magnitude easier to know about the current costs in relation to current amount of work. That’s why programmers convince themselves testing is a waste of time. However we have to ask ourselves what tests are really about? Maybe finding bugs? No, not really. If we introduce bugs, we’re likely to write test around those bugs too. But yes we can find some bugs with tests. The main point of tests is to have reproducible repeatability in our systems. By having a code base largely covered by tests we can know with better certainty what a small code change can break in other parts of the system. By having repeatability we can make code changes with confidence, since we know we’ll see what breaks in other tests. And here comes the inability to estimate future costs. By spending just a few more hours writing those tests we’d know instantly what broke where. Imagine we fix a reported bug. We check-in the code, deploy it and the users are happy. Until we get a call 2 weeks later about a certain monthly process has stopped working. What we don’t know is that this process was developed by a long gone coworker and for some reason it relied on that same bug we’ve happily fixed. There’s no way we could’ve known that. We say OK and go in and fix the monthly process. But what we have no clue about is that there’s this ETL job that relied on data from that monthly process. Now that we’ve fixed the process it’s giving unexpected (yet correct since we fixed it) data to the ETL job. So we have to fix that too. But there’s this part of the app we coded that relies on data from that exact ETL job. And just like that we enter the “Loop of maintenance horror”. With the loop eventually comes blame. Here’s a nice tip for all developers and DBAs out there: If you make a mistake man up and admit to it. All of the above is valid for any kind of software development. Keeping this in mind the database is nothing other than just a part of the application. But a big part! One reason why testing a database is even more important than testing an application is that one database is usually accessed from multiple applications and processes. This makes it the central and vital part of the enterprise software infrastructure. Knowing all this can we really afford not to have tests? What to test in a database Now that we’ve decided we’ll dive into this testing thing we have to ask ourselves what needs to be tested? The short answer is: everything. The long answer is: read on! There are 2 main ways of doing tests: Black box and White box testing. Black box testing means we have no idea how the system internals are built and we only have access to it’s inputs and outputs. With it we test that the internal changes to the system haven’t caused the input/output behavior of the system to change. The most important thing to test here are the edge conditions. It’s where most programs break. Having good edge condition tests we can be more confident that the systems changes won’t break. White box testing has the full knowledge of the system internals. With it we test the internal system changes, different states of the application, etc… White and Black box tests should be complementary to each other as they are very much interconnected. Testing database routines includes testing stored procedures, views, user defined functions and anything you use to access the data with. Database routines are your input/output interface to the database system. They count as black box testing. We test then for 2 things: Data and schema. When testing schema we only care about the columns and the data types they’re returning. After all the schema is the contract to the out side systems. If it changes we usually have to change the applications accessing it. One helpful T-SQL command when doing schema tests is SET FMTONLY ON. It tells the SQL Server to return only empty results sets. This speeds up tests because it doesn’t return any data to the client. After we’ve validated the schema we have to test the returned data. There no other way to do this but to have expected data known before the tests executes and comparing that data to the database routine output. Testing Authentication and Authorization helps us validate who has access to the SQL Server box (Authentication) and who has access to certain database objects (Authorization). For desktop applications and windows authentication this works well. But the biggest problem here are web apps. They usually connect to the database as a single user. Please ensure that that user is not SA or an account with admin privileges. That is just bad. Load testing ensures us that our database can handle peak loads. One often overlooked tool for load testing is Microsoft’s OSTRESS tool. It’s part of RML utilities (x86, x64) for SQL Server and can help determine if our database server can handle loads like 100 simultaneous users each doing 10 requests per second. SQL Profiler can also help us here by looking at why certain queries are slow and what to do to fix them.   One particular problem to think about is how to begin testing existing databases. First thing we have to do is to get to know those databases. We can’t test something when we don’t know how it works. To do this we have to talk to the users of the applications accessing the database, run SQL Profiler to see what queries are being run, use existing documentation to decipher all the object relationships, etc… The way to approach this is to choose one part of the database (say a logical grouping of tables that go together) and filter our traces accordingly. Once we’ve done that we move on to the next grouping and so on until we’ve covered the whole database. Then we move on to the next one. Database Testing is a topic that we can spent many hours discussing but let this be a nice intro to the world of database testing. See you in the next post.

    Read the article

  • Full-text Indexing Books Online

    - by Most Valuable Yak (Rob Volk)
    While preparing for a recent SQL Saturday presentation, I was struck by a crazy idea (shocking, I know): Could someone import the content of SQL Server Books Online into a database and apply full-text indexing to it?  The answer is yes, and it's really quite easy to do. The first step is finding the installed help files.  If you have SQL Server 2012, BOL is installed under the Microsoft Help Library.  You can find the install location by opening SQL Server Books Online and clicking the gear icon for the Help Library Manager.  When the new window pops up click the Settings link, you'll get the following: You'll see the path under Library Location. Once you navigate to that path you'll have to drill down a little further, to C:\ProgramData\Microsoft\HelpLibrary\content\Microsoft\store.  This is where the help file content is kept if you downloaded it for offline use. Depending on which products you've downloaded help for, you may see a few hundred files.  Fortunately they're named well and you can easily find the "SQL_Server_Denali_Books_Online_" files.  We are interested in the .MSHC files only, and can skip the Installation and Developer Reference files. Despite the .MHSC extension, these files are compressed with the standard Zip format, so your favorite archive utility (WinZip, 7Zip, WinRar, etc.) can open them.  When you do, you'll see a few thousand files in the archive.  We are only interested in the .htm files, but there's no harm in extracting all of them to a folder.  7zip provides a command-line utility and the following will extract to a D:\SQLHelp folder previously created: 7z e –oD:\SQLHelp "C:\ProgramData\Microsoft\HelpLibrary\content\Microsoft\store\SQL_Server_Denali_Books_Online_B780_SQL_110_en-us_1.2.mshc" *.htm Well that's great Rob, but how do I put all those files into a full-text index? I'll tell you in a second, but first we have to set up a few things on the database side.  I'll be using a database named Explore (you can certainly change that) and the following setup is a fragment of the script I used in my presentation: USE Explore; GO CREATE SCHEMA help AUTHORIZATION dbo; GO -- Create default fulltext catalog for later FT indexes CREATE FULLTEXT CATALOG FTC AS DEFAULT; GO CREATE TABLE help.files(file_id int not null IDENTITY(1,1) CONSTRAINT PK_help_files PRIMARY KEY, path varchar(256) not null CONSTRAINT UNQ_help_files_path UNIQUE, doc_type varchar(6) DEFAULT('.xml'), content varbinary(max) not null); CREATE FULLTEXT INDEX ON help.files(content TYPE COLUMN doc_type LANGUAGE 1033) KEY INDEX PK_help_files; This will give you a table, default full-text catalog, and full-text index on that table for the content you're going to insert.  I'll be using the command line again for this, it's the easiest method I know: for %a in (D:\SQLHelp\*.htm) do sqlcmd -S. -E -d Explore -Q"set nocount on;insert help.files(path,content) select '%a', cast(c as varbinary(max)) from openrowset(bulk '%a', SINGLE_CLOB) as c(c)" You'll need to copy and run that as one line in a command prompt.  I'll explain what this does while you run it and watch several thousand files get imported: The "for" command allows you to loop over a collection of items.  In this case we want all the .htm files in the D:\SQLHelp folder.  For each file it finds, it will assign the full path and file name to the %a variable.  In the "do" clause, we'll specify another command to be run for each iteration of the loop.  I make a call to "sqlcmd" in order to run a SQL statement.  I pass in the name of the server (-S.), where "." represents the local default instance. I specify -d Explore as the database, and -E for trusted connection.  I then use -Q to run a query that I enclose in double quotes. The query uses OPENROWSET(BULK…SINGLE_CLOB) to open the file as a data source, and to treat it as a single character large object.  In order for full-text indexing to work properly, I have to convert the text content to varbinary. I then INSERT these contents along with the full path of the file into the help.files table created earlier.  This process continues for each file in the folder, creating one new row in the table. And that's it! 5 SQL Statements and 2 command line statements to unzip and import SQL Server Books Online!  In case you're wondering why I didn't use FILESTREAM or FILETABLE, it's simply because I haven't learned them…yet. I may return to this blog after I figure that out and update it with the steps to do so.  I believe that will make it even easier. In the spirit of exploration, I'll leave you to work on some fulltext queries of this content.  I also recommend playing around with the sys.dm_fts_xxxx DMVs (I particularly like sys.dm_fts_index_keywords, it's pretty interesting).  There are additional example queries in the download material for my presentation linked above. Many thanks to Kevin Boles (t) for his advice on (re)checking the content of the help files.  Don't let that .htm extension fool you! The 2012 help files are actually XML, and you'd need to specify '.xml' in your document type column in order to extract the full-text keywords.  (You probably noticed this in the default definition for the doc_type column.)  You can query sys.fulltext_document_types to get a complete list of the types that can be full-text indexed. I also need to thank Hilary Cotter for giving me the original idea. I believe he used MSDN content in a full-text index for an article from waaaaaaaaaaay back, that I can't find now, and had forgotten about until just a few days ago.  He is also co-author of Pro Full-Text Search in SQL Server 2008, which I highly recommend.  He also has some FTS articles on Simple Talk: http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/ http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features,-part-2/

    Read the article

  • How-to delete a tree node using the context menu

    - by frank.nimphius
    Hierarchical trees in Oracle ADF make use of View Accessors, which means that only the top level node needs to be exposed as a View Object instance on the ADF Business Components Data Model. This also means that only the top level node has a representation in the PageDef file as a tree binding and iterator binding reference. Detail nodes are accessed through tree rule definitions that use the accessor mentioned above (or nested collections in the case of POJO or EJB business services). The tree component is configured for single node selection, which however can be declaratively changed for users to press the ctrl key and selecting multiple nodes. In the following, I explain how to create a context menu on the tree for users to delete the selected tree nodes. For this, the context menu item will access a managed bean, which then determines the selected node(s), the internal ADF node bindings and the rows they represent. As mentioned, the ADF Business Components Data Model only needs to expose the top level node data sources, which in this example is an instance of the Locations View Object. For the tree to work, you need to have associations defined between entities, which usually is done for you by Oracle JDeveloper if the database tables have foreign keys defined Note: As a general hint of best practices and to simplify your life: Make sure your database schema is well defined and designed before starting your development project. Don't treat the database as something organic that grows and changes with the requirements as you proceed in your project. Business service refactoring in response to database changes is possible, but should be treated as an exception, not the rule. Good database design is a necessity – even for application developers – and nothing evil. To create the tree component, expand the Data Controls panel and drag the View Object collection to the view. From the context menu, select the tree component entry and continue with defining the tree rules that make up the hierarchical structure. As you see, when pressing the green plus icon  in the Edit Tree Binding  dialog, the data structure, Locations -  Departments – Employees in my sample, shows without you having created a View Object instance for each of the nodes in the ADF Business Components Data Model. After you configured the tree structure in the Edit Tree Binding dialog, you press OK and the tree is created. Select the tree in the page editor and open the Structure Window (ctrl+shift+S). In the Structure window, expand the tree node to access the conextMenu facet. Use the right mouse button to insert a Popup  into the facet. Repeat the same steps to insert a Menu and a Menu Item into the Popup you created. The Menu item text should be changed to something meaningful like "Delete". Note that the custom menu item later is added to the context menu together with the default context menu options like expand and expand all. To define the action that is executed when the menu item is clicked on, you select the Action Listener property in the Property Inspector and click the arrow icon followed by the Edit menu option. Create or select a managed bean and define a method name for the action handler. Next, select the tree component and browse to its binding property in the Property Inspector. Again, use the arrow icon | Edit option to create a component binding in the same managed bean that has the action listener defined. The tree handle is used in the action listener code, which is shown below: public void onTreeNodeDelete(ActionEvent actionEvent) {   //access the tree from the JSF component reference created   //using the af:tree "binding" property. The "binding" property   //creates a pair of set/get methods to access the RichTree instance   RichTree tree = this.getTreeHandler();   //get the list of selected row keys   RowKeySet rks = tree.getSelectedRowKeys();   //access the iterator to loop over selected nodes   Iterator rksIterator = rks.iterator();          //The CollectionModel represents the tree model and is   //accessed from the tree "value" property   CollectionModel model = (CollectionModel) tree.getValue();   //The CollectionModel is a wrapper for the ADF tree binding   //class, which is JUCtrlHierBinding   JUCtrlHierBinding treeBinding =                  (JUCtrlHierBinding) model.getWrappedData();          //loop over the selected nodes and delete the rows they   //represent   while(rksIterator.hasNext()){     List nodeKey = (List) rksIterator.next();     //find the ADF node binding using the node key     JUCtrlHierNodeBinding node =                       treeBinding.findNodeByKeyPath(nodeKey);     //delete the row.     Row rw = node.getRow();       rw.remove();   }          //only refresh the tree if tree nodes have been selected   if(rks.size() > 0){     AdfFacesContext adfFacesContext =                          AdfFacesContext.getCurrentInstance();     adfFacesContext.addPartialTarget(tree);   } } Note: To enable multi node selection for a tree, select the tree and change the row selection setting from "single" to "multiple". Note: a fully pictured version of this post will become available at the end of the month in a PDF summary on ADF Code Corner : http://www.oracle.com/technetwork/developer-tools/adf/learnmore/index-101235.html 

    Read the article

  • Agile Testing Days 2012 – Day 3 – Agile or agile?

    - by Chris George
    Another early start for my last Lean Coffee of the conference, and again it was not wasted. We had some really interesting discussions around how to determine what test automation is useful, if agile is not faster, why do it? and a rather existential discussion on whether unicorns exist! First keynote of the day was entitled “Fast Feedback Teams” by Ola Ellnestam. Again this relates nicely to the releasing faster talk on day 2, and something that we are looking at and some teams are actively trying. Introducing the notion of feedback, Ola describes a game he wrote for his eldest child. It was a simple game where every time he clicked a button, it displayed “You’ve Won!”. He then changed it to be a Win-Lose-Win-Lose pattern and watched the feedback from his son who then twigged the pattern and got his younger brother to play, alternating turns… genius! (must do that with my children). The idea behind this was that you need that feedback loop to learn and progress. If you are not getting the feedback you need to close that loop. An interesting point Ola made was to solve problems BEFORE writing software. It may be that you don’t have to write anything at all, perhaps it’s a communication/training issue? Perhaps the problem can be solved another way. Writing software, although it’s the business we are in, is expensive, and this should be taken into account. He again mentions frequent releases, and how they should be made as soon as stuff is ready to be released, don’t leave stuff on the shelf cause it’s not earning you anything, money or data. I totally agree with this and it’s something that we will be aiming for moving forwards. “Exceptions, Assumptions and Ambiguity: Finding the truth behind the story” by David Evans started off very promising by making references to ‘Grim up North’ referring to the north of England. Not sure it was appreciated by most of the audience, but it made me laugh! David explained how there are always risks associated with exceptions, giving the example of a one-way road near where he lives, with an exception sign giving rights to coaches to go the wrong way. Therefore you could merrily swing around the corner of the one way road straight into a coach! David showed the danger in making assumptions with lyrical quotes from Lola by The Kinks “I’m glad I’m a man, and so is Lola” and with a picture of a toilet flush that needed instructions to operate the full and half flush. With this particular flush, you pulled the handle all the way down to half flush, and half way down to full flush! hmmm, a bit of a crappy user experience methinks! Then through a clever use of a passage from the Jabberwocky, David then went onto show how mis-translation/ambiguity is the can completely distort the original meaning of something, and this is a real enemy of software development. This was all helping to demonstrate that the term Story is often heavily overloaded in the Agile world, and should really be stripped back to what it is really for, stating a business problem, and offering a technical solution. Therefore a story could be worded as “In order to {make some improvement}, we will { do something}”. The first ‘in order to’ statement is stakeholder neutral, and states the problem through requesting an improvement to the software/process etc. The second part of the story is the verb, the doing bit. So to achieve the ‘improvement’ which is not currently true, we will do something to make this true in the future. My PM is very interested in this, and he’s observed some of the problems of overloading stories so I’m hoping between us we can use some of David’s suggestions to help clarify our stories better. The second keynote of the day (and our last) proved to be the most entertaining and exhausting of the conference for me. “The ongoing evolution of testing in agile development” by Scott Barber. I’ve never had the pleasure of seeing Scott before… OMG I would love to have even half of the energy he has! What struck me during this presentation was Scott’s explanation of how testing has become the role/job that it is (largely) today, and how this has led to the need for ‘methodologies’ to make dev and test work! The argument that we should be trying to converge the roles again is a very valid one, and one that a couple of the teams at work are actively doing with great results. Making developers as responsible for quality as testers is something that has been lost over the years, but something that we are now striving to achieve. The idea that we (testers) should be testing experts/specialists, not testing ‘union members’, supports this idea so the entire team works on all aspects of a feature/product, with the ‘specialists’ taking the lead and advising/coaching the others. This leads to better propagation of information around the team, a greater holistic understanding of the project and it allows the team to continue functioning if some of it’s members are off sick, for example. Feeling somewhat drained from Scott’s keynote (but at the same time excited that alot of the points he raised supported actions we are taking at work), I headed into my last presentation for Agile Testing Days 2012 before having to make my way to Tegel to catch the flight home. “Thinking and working agile in an unbending world” with Pete Walen was a talk I was not going to miss! Having spoken to Pete several times during the past few days, I was looking forward to hearing what he was going to say, and I was not disappointed. Pete started off by trying to separate the definitions of ‘Agile’ as in the methodology, and ‘agile’ as in the adjective by pronouncing them the ‘english’ and ‘american’ ways. So Agile pronounced (Ajyle) and agile pronounced (ajul). There was much confusion around what the hell he was talking about, although I thought it was quite clear. Agile – Software development methodology agile – Marked by ready ability to move with quick easy grace; Having a quick resourceful and adaptable character. Anyway, that aside (although it provided a few laughs during the presentation), the point was that many teams that claim to be ‘Agile’ but are not, in fact, ‘agile’ by nature. Implementing ‘Agile’ methodologies that are so prescriptive actually goes against the very nature of Agile development where a team should anticipate, adapt and explore. Pete made a valid point that very few companies intentionally put up roadblocks to impede work, so if work is being blocked/delayed, why? This is where being agile as a team pays off because the team can inspect what’s going on, explore options and adapt their processes. It is through experimentation (and that means trying and failing as well as trying and succeeding) that a team will improve and grow leading to focussing on what really needs to be done to achieve X. So, that was it, the last talk of our conference. I was gutted that we had to miss the closing keynote from Matt Heusser, as Matt was another person I had spoken too a few times during the conference, but the flight would not wait, and just as well we left when we did because the traffic was a nightmare! My Takeaway Triple from Day 3: Release often and release small – don’t leave stuff on the shelf Keep the meaning of the word ‘agile’ in mind when working in ‘Agile Look at testing as more of a skill than a role  

    Read the article

  • External File Upload Optimizations for Windows Azure

    - by rgillen
    [Cross posted from here: http://rob.gillenfamily.net/post/External-File-Upload-Optimizations-for-Windows-Azure.aspx] I’m wrapping up a bit of the work we’ve been doing on data movement optimizations for cloud computing and the latest set of data yielded some interesting points I thought I’d share. The work done here is not really rocket science but may, in some ways, be slightly counter-intuitive and therefore seemed worthy of posting. Summary: for those who don’t like to read detailed posts or don’t have time, the synopsis is that if you are uploading data to Azure, block your data (even down to 1MB) and upload in parallel. Set your block size based on your source file size, but if you must choose a fixed value, use 1MB. Following the above will result in significant performance gains… upwards of 10x-24x and a reduction in overall file transfer time of upwards of 90% (eg, uploading a 1GB file averaged 46.37 minutes prior to optimizations and averaged 1.86 minutes afterwards). Detail: For those of you who want more detail, or think that the claims at the end of the preceding paragraph are over-reaching, what follows is information and code supporting these claims. As the title would indicate, these tests were run from our research facility pointing to the Azure cloud (specifically US North Central as it is physically closest to us) and do not represent intra-cloud results… we have performed intra-cloud tests and the overall results are similar in notion but the data rates are significantly different as well as the tipping points for the various block sizes… this will be detailed separately). We started by building a very simple console application that would loop through a directory and upload each file to Azure storage. This application used the shipping storage client library from the 1.1 version of the azure tools. The only real variation from the client library is that we added code to collect and record the duration (in ms) and size (in bytes) for each file transferred. The code is available here. We then created a directory that had a collection of files for the following sizes: 2KB, 32KB, 64KB, 128KB, 512KB, 1MB, 5MB, 10MB, 25MB, 50MB, 100MB, 250MB, 500MB, 750MB, and 1GB (50 files for each size listed). These files contained randomly-generated binary data and do not benefit from compression (a separate discussion topic). Our file generation tool is available here. The baseline was established by running the application described above against the directory containing all of the data files. This application uploads the files in a random order so as to avoid transferring all of the files of a given size sequentially and thereby spreading the affects of periodic Internet delays across the collection of results.  We then ran some scripts to split the resulting data and generate some reports. The raw data collected for our non-optimized tests is available via the links in the Related Resources section at the bottom of this post. For each file size, we calculated the average upload time (and standard deviation) and the average transfer rate (and standard deviation). As you likely are aware, transferring data across the Internet is susceptible to many transient delays which can cause anomalies in the resulting data. It is for this reason that we randomized the order of source file processing as well as executed the tests 50x for each file size. We expect that these steps will yield a sufficiently balanced set of results. Once the baseline was collected and analyzed, we updated the test harness application with some methods to split the source file into user-defined block sizes and then to upload those blocks in parallel (using the PutBlock() method of Azure storage). The parallelization was handled by simply relying on the Parallel Extensions to .NET to provide a Parallel.For loop (see linked source for specific implementation details in Program.cs, line 173 and following… less than 100 lines total). Once all of the blocks were uploaded, we called PutBlockList() to assemble/commit the file in Azure storage. For each block transferred, the MD5 was calculated and sent ensuring that the bits that arrived matched was was intended. The timer for the blocked/parallelized transfer method wraps the entire process (source file splitting, block transfer, MD5 validation, file committal). A diagram of the process is as follows: We then tested the affects of blocking & parallelizing the transfers by running the updated application against the same source set and did a parameter sweep on the block size including 256KB, 512KB, 1MB, 2MB, and 4MB (our assumption was that anything lower than 256KB wasn’t worth the trouble and 4MB is the maximum size of a block supported by Azure). The raw data for the parallel tests is available via the links in the Related Resources section at the bottom of this post. This data was processed and then compared against the single-threaded / non-optimized transfer numbers and the results were encouraging. The Excel version of the results is available here. Two semi-obvious points need to be made prior to reviewing the data. The first is that if the block size is larger than the source file size you will end up with a “negative optimization” due to the overhead of attempting to block and parallelize. The second is that as the files get smaller, the clock-time cost of blocking and parallelizing (overhead) is more apparent and can tend towards negative optimizations. For this reason (and is supported in the raw data provided in the linked worksheet) the charts and dialog below ignore source file sizes less than 1MB. (click chart for full size image) The chart above illustrates some interesting points about the results: When the block size is smaller than the source file, performance increases but as the block size approaches and then passes the source file size, you see decreasing benefit to the point of negative gains (see the values for the 1MB file size) For some of the moderately-sized source files, small blocks (256KB) are best As the size of the source file gets larger (see values for 50MB and up), the smallest block size is not the most efficient (presumably due, at least in part, to the increased number of blocks, increased number of individual transfer requests, and reassembly/committal costs). Once you pass the 250MB source file size, the difference in rate for 1MB to 4MB blocks is more-or-less constant The 1MB block size gives the best average improvement (~16x) but the optimal approach would be to vary the block size based on the size of the source file.    (click chart for full size image) The above is another view of the same data as the prior chart just with the axis changed (x-axis represents file size and plotted data shows improvement by block size). It again highlights the fact that the 1MB block size is probably the best overall size but highlights the benefits of some of the other block sizes at different source file sizes. This last chart shows the change in total duration of the file uploads based on different block sizes for the source file sizes. Nothing really new here other than this view of the data highlights the negative affects of poorly choosing a block size for smaller files.   Summary What we have found so far is that blocking your file uploads and uploading them in parallel results in significant performance improvements. Further, utilizing extension methods and the Task Parallel Library (.NET 4.0) make short work of altering the shipping client library to provide this functionality while minimizing the amount of change to existing applications that might be using the client library for other interactions.   Related Resources Source code for upload test application Source code for random file generator ODatas feed of raw data from non-optimized transfer tests Experiment Metadata Experiment Datasets 2KB Uploads 32KB Uploads 64KB Uploads 128KB Uploads 256KB Uploads 512KB Uploads 1MB Uploads 5MB Uploads 10MB Uploads 25MB Uploads 50MB Uploads 100MB Uploads 250MB Uploads 500MB Uploads 750MB Uploads 1GB Uploads Raw Data OData feeds of raw data from blocked/parallelized transfer tests Experiment Metadata Experiment Datasets Raw Data 256KB Blocks 512KB Blocks 1MB Blocks 2MB Blocks 4MB Blocks Excel worksheet showing summarizations and comparisons

    Read the article

  • Partner Blog Series: PwC Perspectives Part 2 - Jumpstarting your IAM program with R2

    - by Tanu Sood
    Identity and access management (IAM) isn’t a new concept. Over the past decade, companies have begun to address identity management through a variety of solutions that have primarily focused on provisioning. . The new age workforce is converging at a rapid pace with ever increasing demand to use diverse portfolio of applications and systems to interact and interface with their peers in the industry and customers alike. Oracle has taken a significant leap with their release of Identity and Access Management 11gR2 towards enabling this global workforce to conduct their business in a secure, efficient and effective manner. As companies deal with IAM business drivers, it becomes immediately apparent that holistic, rather than piecemeal, approaches better address their needs. When planning an enterprise-wide IAM solution, the first step is to create a common framework that serves as the foundation on which to build the cost, compliance and business process efficiencies. As a leading industry practice, IAM should be established on a foundation of accurate data for identity management, making this data available in a uniform manner to downstream applications and processes. Mature organizations are looking beyond IAM’s basic benefits to harness more advanced capabilities in user lifecycle management. For any organization looking to embark on an IAM initiative, consider the following use cases in managing and administering user access. Expanding the Enterprise Provisioning Footprint Almost all organizations have some helpdesk resources tied up in handling access requests from users, a distraction from their core job of handling problem tickets. This dependency has mushroomed from the traditional acceptance of provisioning solutions integrating and addressing only a portion of applications in the heterogeneous landscape Oracle Identity Manager (OIM) 11gR2 solves this problem by offering integration with third party ticketing systems as “disconnected applications”. It allows for the existing business processes to be seamlessly integrated into the system and tracked throughout its lifecycle. With minimal effort and analysis, an organization can begin integrating OIM with groups or applications that are involved with manually intensive access provisioning and de-provisioning activities. This aspect of OIM allows organizations to on-board applications and associated business processes quickly using out of box templates and frameworks. This is especially important for organizations looking to fold in users and resources from mergers and acquisitions. Simplifying Access Requests Organizations looking to implement access request solutions often find it challenging to get their users to accept and adopt the new processes.. So, how do we improve the user experience, make it intuitive and personalized and yet simplify the user access process? With R2, OIM helps organizations alleviate the challenge by placing the most used functionality front and centre in the new user request interface. Roles, application accounts, and entitlements can all be found in the same interface as catalog items, giving business users a single location to go to whenever they need to initiate, approve or track a request. Furthermore, if a particular item is not relevant to a user’s job function or area inside the organization, it can be hidden so as to not overwhelm or confuse the user with superfluous options. The ability to customize the user interface to suit your needs helps in exercising the business rules effectively and avoiding access proliferation within the organization. Saving Time with Templates A typical use case that is most beneficial to business users is flexibility to place, edit, and withdraw requests based on changing circumstances and business needs. With OIM R2, multiple catalog items can now be added and removed from the shopping cart, an ecommerce paradigm that many users are already familiar with. This feature can be especially useful when setting up a large number of new employees or granting existing department or group access to a newly integrated application. Additionally, users can create their own shopping cart templates in order to complete subsequent requests more quickly. This feature saves the user from having to search for and select items all over again if a request is similar to a previous one. Advanced Delegated Administration A key feature of any provisioning solution should be to empower each business unit in managing their own access requests. By bringing administration closer to the user, you improve user productivity, enable efficiency and alleviate the administration overhead. To do so requires a federated services model so that the business units capable of shouldering the onus of user life cycle management of their business users can be enabled to do so. OIM 11gR2 offers advanced administrative options for creating, managing and controlling business logic and workflows through easy to use administrative interface and tools that can be exposed to delegated business administrators. For example, these business administrators can establish or modify how certain requests and operations should be handled within their business unit based on a number of attributes ranging from the type of request or the risk level of the individual items requested. Closed-Loop Remediation Security continues to be a major concern for most organizations. Identity management solutions bolster security by ensuring only the right users have the right access to the right resources. To prevent unauthorized access and where it already exists, the ability to detect and remediate it, are key requirements of an enterprise-grade proven solution. But the challenge with most solutions today is that some of this information still exists in silos. And when changes are made to systems directly, not all information is captured. With R2, oracle is offering a comprehensive Identity Governance solution that our customer organizations are leveraging for closed loop remediation that allows for an automated way for administrators to revoke unauthorized access. The change is automatically captured and the action noted for continued management. Conclusion While implementing provisioning solutions, it is important to keep the near term and the long term goals in mind. The provisioning solution should always be a part of a larger security and identity management program but with the ability to seamlessly integrate not only with the company’s infrastructure but also have the ability to leverage the information, business models compiled and used by the other identity management solutions. This allows organizations to reduce the cost of ownership, close security gaps and leverage the existing infrastructure. And having done so a multiple clients’ sites, this is the approach we recommend. In our next post, we will take a journey through our experiences of advising clients looking to upgrade to R2 from a previous version or migrating from a different solution. Meet the Writers:   Praveen Krishna is a Manager in the Advisory Security practice within PwC.  Over the last decade Praveen has helped clients plan, architect and implement Oracle identity solutions across diverse industries.  His experience includes delivering security across diverse topics like network, infrastructure, application and data where he brings a holistic point of view to problem solving. Dharma Padala is a Director in the Advisory Security practice within PwC.  He has been implementing medium to large scale Identity Management solutions across multiple industries including utility, health care, entertainment, retail and financial sectors.   Dharma has 14 years of experience in delivering IT solutions out of which he has been implementing Identity Management solutions for the past 8 years. Scott MacDonald is a Director in the Advisory Security practice within PwC.  He has consulted for several clients across multiple industries including financial services, health care, automotive and retail.   Scott has 10 years of experience in delivering Identity Management solutions. John Misczak is a member of the Advisory Security practice within PwC.  He has experience implementing multiple Identity and Access Management solutions, specializing in Oracle Identity Manager and Business Process Engineering Language (BPEL). Jenny (Xiao) Zhang is a member of the Advisory Security practice within PwC.  She has consulted across multiple industries including financial services, entertainment and retail. Jenny has three years of experience in delivering IT solutions out of which she has been implementing Identity Management solutions for the past one and a half years.

    Read the article

  • Recursion in the form of a Recursive Func&lt;T, T&gt;

    - by ToStringTheory
    I gotta admit, I am kind of surprised that I didn’t realize I could do this sooner.  I recently had a problem which required a recursive function call to come up with the answer.  After some time messing around with a recursive method, and creating an API that I was not happy with, I was able to create an API that I enjoy, and seems intuitive. Introduction To bring it to a simple example, consider the summation to n: A mathematically identical formula is: In a .NET function, this can be represented by a function: Func<int, int> summation = x => x*(x+1)/2 Calling summation with an input integer will yield the summation to that number: var sum10 = summation(4); //sum10 would be equal to 10 But what if I wanted to get a second level summation…  First some to n, and then use that argument as the input to the same function, to find the second level summation: So as an easy example, calculate the summation to 3, which yields 6.  Then calculate the summation to 6 which yields 21. Represented as a mathematical formula - So what if I wanted to represent this as .NET functions.  I can always do: //using the summation formula from above var sum3 = summation(3); //sets sum3 to 6 var sum3_2 = summation(sum3); //sets sum3 to 21 I could always create a while loop to perform the calculations too: Func<int, int> summation = x => x*(x+1)/2; //for the interests of a smaller example, using shorthand int sumResultTo = 3; int level = 2; while(level-- > 0) { sumResultTo = summation(sumResultTo); } //sumResultTo is equal to 21 now. Or express it as a for-loop, method calls, etc…  I really didn’t like any of the options that I tried.  Then it dawned on me – since I was using a Func<T, T> anyways, why not use the Func’s output from one call as the input as another directly. Some Code So, I decided that I wanted a recursion class.  Something that I would be generic and reusable in case I ever wanted to do something like this again. It is limited to only the Func<T1, T2> level of Func, and T1 must be the same as T2. The first thing in this class is a private field for the function: private readonly Func<T, T> _functionToRecurse; So, I since I want the function to be unchangeable, I have defined it as readonly.  Therefore my constructor looks like: public Recursion(Func<T, T> functionToRecurse) { if (functionToRecurse == null) { throw new ArgumentNullException("functionToRecurse", "The function to recurse can not be null"); } _functionToRecurse = functionToRecurse; } Simple enough.  If you have any questions, feel free to post them in the comments, and I will be sure to answer them. Next, I want enough. If be able to get the result of a function dependent on how many levels of recursion: private Func<T, T> GetXLevel(int level) { if (level < 1) { throw new ArgumentOutOfRangeException("level", level, "The level of recursion must be greater than 0"); } if (level == 1) return _functionToRecurse; return _GetXLevel(level - 1, _functionToRecurse); } So, if you pass in 1 for the level, you get just the Func<T,T> back.  If you say that you want to go deeper down the rabbit hole, it calls a method which accepts the level it is at, and the function which it needs to use to recurse further: private Func<T, T> _GetXLevel(int level, Func<T, T> prevFunc) { if (level == 1) return y => prevFunc(_functionToRecurse(y)); return _GetXLevel(level - 1, y => prevFunc(_functionToRecurse(y))); } That is really all that is needed for this class. If I exposed the GetXLevel function publicly, I could use that to get the function for a level, and pass in the argument..  But I wanted something better.  So, I used the ‘this’ array operator for the class: public Func<T,T> this[int level] { get { if (level < 1) { throw new ArgumentOutOfRangeException("level", level, "The level of recursion must be greater than 0"); } return this.GetXLevel(level); } } So, using the same example above of finding the second recursion of the summation of 3: var summator = new Recursion<int>(x => (x * (x + 1)) / 2); var sum_3_level2 = summator[2](3); //yields 21 You can even find just store the delegate to the second level summation, and use it multiple times: var summator = new Recursion<int>(x => (x * (x + 1)) / 2); var sum_level2 = summator[2]; var sum_3_level2 = sum_level2(3); //yields 21 var sum_4_level2 = sum_level2(4); //yields 55 var sum_5_level2 = sum_level2(5); //yields 120 Full Code Don’t think I was just going to hold off on the full file together and make you do the hard work…  Copy this into a new class file: public class Recursion<T> { private readonly Func<T, T> _functionToRecurse; public Recursion(Func<T, T> functionToRecurse) { if (functionToRecurse == null) { throw new ArgumentNullException("functionToRecurse", "The function to recurse can not be null"); } _functionToRecurse = functionToRecurse; } public Func<T,T> this[int level] { get { if (level < 1) { throw new ArgumentOutOfRangeException("level", level, "The level of recursion must be greater than 0"); } return this.GetXLevel(level); } } private Func<T, T> GetXLevel(int level) { if (level < 1) { throw new ArgumentOutOfRangeException("level", level, "The level of recursion must be greater than 0"); } if (level == 1) return _functionToRecurse; return _GetXLevel(level - 1, _functionToRecurse); } private Func<T, T> _GetXLevel(int level, Func<T, T> prevFunc) { if (level == 1) return y => prevFunc(_functionToRecurse(y)); return _GetXLevel(level - 1, y => prevFunc(_functionToRecurse(y))); } } Conclusion The great thing about this class, is that it can be used with any function with same input/output parameters.  I strived to find an implementation that I found clean and useful, and I finally settled on this.  If you have feedback – good or bad, I would love to hear it!

    Read the article

  • Struct Method for Loops Problem

    - by Annalyne
    I have tried numerous times how to make a do-while loop using the float constructor for my code but it seems it does not work properly as I wanted. For summary, I am making a TBRPG in C++ and I encountered few problems. But before that, let me post my code. #include <iostream> #include <string> #include <ctime> #include <cstdlib> using namespace std; int char_level = 1; //the starting level of the character. string town; //town string town_name; //the name of the town the character is in. string charname; //holds the character's name upon the start of the game int gems = 0; //holds the value of the games the character has. const int MAX_ITEMS = 15; //max items the character can carry string inventory [MAX_ITEMS]; //the inventory of the character in game int itemnum = 0; //number of items that the character has. bool GameOver = false; //boolean intended for the game over scr. string monsterTroop [] = {"Slime", "Zombie", "Imp", "Sahaguin, Hounds, Vampire"}; //monster name float monsterTroopHealth [] = {5.0f, 10.0f, 15.0f, 20.0f, 25.0f}; // the health of the monsters int monLifeBox; //life carrier of the game's enemy troops int enemNumber; //enemy number //inventory[itemnum++] = "Sword"; class RPG_Game_Enemy { public: void enemyAppear () { srand(time(0)); enemNumber = 1+(rand()%3); if (enemNumber == 1) cout << monsterTroop[1]; //monster troop 1 else if (enemNumber == 2) cout << monsterTroop[2]; //monster troop 2 else if (enemNumber == 3) cout << monsterTroop[3]; //monster troop 3 else if (enemNumber == 4) cout << monsterTroop[4]; //monster troop 4 } void enemDefeat () { cout << "The foe has been defeated. You are victorious." << endl; } void enemyDies() { //if the enemy dies: //collapse declaration cout << "The foe vanished and you are victorious!" << endl; } }; class RPG_Scene_Battle { public: RPG_Scene_Battle(float ini_health) : health (ini_health){}; float getHealth() { return health; } void setHealth(float rpg_val){ health = rpg_val;}; private: float health; }; //---------------------------------------------------------------// // Conduct Damage for the Scene Battle's Damage //---------------------------------------------------------------// float conductDamage(RPG_Scene_Battle rpg_tr, float damage) { rpg_tr.setHealth(rpg_tr.getHealth() - damage); return rpg_tr.getHealth(); }; // ------------------------------------------------------------- // void RPG_Scene_DisplayItem () { cout << "Items: \n"; for (int i=0; i < itemnum; ++i) cout << inventory[i] <<endl; }; In this code I have so far, the problem I have is the battle scene. For example, the player battles a Ghost with 10 HP, when I use a do while loop to subtract the HP of the character and the enemy, it only deducts once in the do while. Some people said I should use a struct, but I have no idea how to make it. Is there a way someone can display a code how to implement it on my game? Edit: I made the do-while by far like this: do RPG_Scene_Battle (player, 20.0f); RPG_Scene_Battle (enemy, 10.0f); cout << "Battle starts!" <<endl; cout << "You used a blade skill and deducted 2 hit points to the enemy!" conductDamage (enemy, 2.0f); while (enemy!=0) also, I made something like this: #include <iostream> using namespace std; int gems = 0; class Entity { public: Entity(float startingHealth) : health(startingHealth){}; // initialize health float getHealth(){return health;} void setHealth(float value){ health = value;}; private: float health; }; float subtractHealthFrom(Entity& ent, float damage) { ent.setHealth(ent.getHealth() - damage); return ent.getHealth(); }; int main () { Entity character(10.0f); Entity enemy(10.0f); cout << "Hero Life: "; cout << subtractHealthFrom(character, 2.0f) <<endl; cout << "Monster Life: "; cout << subtractHealthFrom(enemy, 2.0f) <<endl; cout << "Hero Life: "; cout << subtractHealthFrom(character, 2.0f) <<endl; cout << "Monster Life: "; cout << subtractHealthFrom(enemy, 2.0f) <<endl; }; Struct method, they say, should solve this problem. How can I continously deduct hp from the enemy? Whenever I deduct something, it would return to its original value -_-

    Read the article

< Previous Page | 195 196 197 198 199 200 201 202 203 204 205 206  | Next Page >