loop - Page 156 - Developer IT

HTML Response compared with old html

- by inetech

I have a respone returned as 'response' I need to compare it with old HTML. If it is different it will just do some css. The problem is it detects a different change everytime even when there is no difference and there is a continueous loop which makes my animation on css repeat itself. Anyone help?

Read the article

Changing video on mouseclick?

- by user330561

I have two videos, and I want to loop the first video until the user clicks the mouse, at which point I want to switch to playing the second video. Does anyone know what the best way to do this would be?

Read the article

Show progress without using an image

- by Jon1

I would like to design a progress bar, without using an image (eg animated gif...). Can this be done with just html css and jquery? trying to be creative here :) Update: the progress percentage cannot be determined, so it has to be a loop

Read the article

java source code doubt

- by abson

Why is it that class swi22 { public static void main(String[] args) { int a=98; switch(a) { default:{ System.out.println("default");continue;} case 'b':{ System.out.println(a); continue;} case 'a':{ System.out.println(a);} } System.out.println("Switch Completed"); } } gives error as: continue outside of loop

Read the article

How to select textboxes that have a length > 0 with jquery?

- by chobo2

Hi How would I write a selector that would look through say a div of textboxes and find which ones have a value in the textbox(so length would be greater than zero). Can I do this all in one selector or do I have to get all textboxes then loop through them?

Read the article

How do I access the array submit() calls in the dom?

- by Alex

I want to grab all the inputs inside a form, in order to submit them. The point of this is to use jquery's ajax to submit a dynamically sized form. Surely there must be an array inside the dom somewhere which i can just do something like... docment.forms['form'].elements which only lists inputs for that form, meaning I can loop through them and grab their values to play around with?

Read the article

Not initializing ints in C++

- by guest

If I have a block of code in a for loop that looks like total1 += i * i;, I get a weird, long value in total1 even if i increments from 1 to 10. However, this only happens if I don't do int total1 = 0;. This makes me wonder, what happens if I don't do that and why do I get a number in the 2 billions instead? (Though it's not the max value in an int either).

Read the article

Not initilizaing ints in C++

- by guest

If I have a block of code in a for loop that looks like total1+= i*i;, I get a weird, long value in total1 even if i increments from 1 to 10. However, this only happens if I don't do int total1=0;. This makes me wonder, what happens if I don't do that and why do I get a number in the 2 billions instead? (Though it's not the max value in an int either).

Read the article

What is the difference between i++ and ++i [closed]

- by user299233

Possible Duplicate: Is there a performance difference between i++ and ++i in C++? What is the pratical difference between these 2 statements on, for example, a while loop?

Read the article

Suspend orientation change

- by OkyDokyman

Documentation says: "a configuration change (such as a change in screen orientation, language, input devices, etc) will cause your current activity to be destroyed, going through the normal activity lifecycle process of onPause(), onStop(), and onDestroy()." I would like to suspend the orientation change, since it crashes my app if it was done in the middle of a a loop (of reading a file). How can I do this? Also - looking for some kind of "onOrientationChnage" function :)

Read the article

how should i get ONE row from mysql? ( fetch_array etc? )

- by Haroldo

I'm pretty sure i'm doing an extra loop here: $q = "SELECT * FROM genres WHERE genre.g_url = '$genre_url' LIMIT 1"; $res = mysql_query($q); while ($r = mysql_fetch_array($res, MYSQL_ASSOC)){ foreach( array_key_exists($r) as $k ){ $g[$k] = $r[$k]; } } return $g;

Read the article

Forming vectors from the same assigned value (in Matlab)

- by imbibi

I've got a bunch of values that are all assigned the same variable due to running through a for loop several times, so for example: d = 3.44434 d = 2.4444 d = 2.7777 How do I put them all into a vector?

Read the article

Performance issues when using SSD for a developer notebook (WAMP/LAMP stack)?

- by András Szepesházi

I'm a web application developer using my notebook as a standalone development environment (WAMP stack). I just switched from a Core2-duo Vista 32 bit notebook with 2Gb RAM and SATA HDD, to an i5-2520M Win7 64 bit with 4Gb RAM and 128 GB SDD (Corsair P3 128). My initial experience was what I expected, fast boot, quick load of all the applications (Eclipse takes now 5 seconds as opposed to 30s on my old notebook), overall great experience. Then I started to build up my development stack, both as LAMP (using VirtualBox with a debian guest) and WAMP (windows native apache + mysql + php). I wanted to compare those two. This still all worked great out, then I started to pull in my projects to these stacks. And here came the nasty surprise, one of those projects produced a lot worse response times than on my old notebook (that was true for both the VirtualBox and WAMP stack). Apache, php and mysql configurations were practically identical in all environments. I started to do a lot of benchmarking and profiling, and here is what I've found: All general benchmarks (Performance Test 7.0, HDTune Pro, wPrime2 and some more) gave a big advantage to the new notebook. Nothing surprising here. Disc specific tests showed that read/write operations peaked around 380M/160M for the SSD, and all the different sized block operations also performed very well. Started apache performance benchmarking with Apache Benchmark for a small static html file (10 concurrent threads, 500 iterations). Old notebook: min 47ms, median 111ms, max 156ms New WAMP stack: min 71ms, median 135ms, max 296ms New LAMP stack (in VirtualBox): min 6ms, median 46ms, max 175ms Right here I don't get why the native WAMP stack performed so bad, but at least the LAMP environment brought the expected speed. Apache performance measurement for non-cached php content. The php runs a loop of 1000 and generates sha1(uniqid()) inisde. Again, 10 concurrent threads, 500 iterations were used for the benchmark. Old notebook: min 0ms, median 39ms, max 218ms New WAMP stack: min 20ms, median 61ms, max 186ms New LAMP stack (in VirtualBox): min 124ms, median 704ms, max 2463ms What the hell? The new LAMP performed miserably, and even the new native WAMP was outperformed by the old notebook. php + mysql test. The test consists of connecting to a database and reading a single record form a table using INNER JOIN on 3 more (indexed) tables, repeated 100 times within a loop. Databases were identical. 10 concurrent threads, 100 iterations were used for the benchmark. Old notebook: min 1201ms, median 1734ms, max 3728ms New WAMP stack: min 367ms, median 675ms, max 1893ms New LAMP stack (in VirtualBox): min 1410ms, median 3659ms, max 5045ms And the same test with concurrency set to 1 (instead of 10): Old notebook: min 1201ms, median 1261ms, max 1357ms New WAMP stack: min 399ms, median 483ms, max 539ms New LAMP stack (in VirtualBox): min 285ms, median 348ms, max 444ms Strictly for my purposes, as I'm using a self contained development environment (= low concurrency) I could be satisfied with the second test's result. Though I have no idea why the VirtualBox environment performed so bad with higher concurrency. Finally I performed a test of including many php files. The application that I mentioned at the beginning, the one that was performing so bad, has a heavy bootstrap, loads hundreds of small library and configuration files while initializing. So this test does nothing else just includes about 100 files. Concurrency set to 1, 100 iterations: Old notebook: min 140ms, median 168ms, max 406ms New WAMP stack: min 434ms, median 488ms, max 604ms New LAMP stack (in VirtualBox): min 413ms, median 1040ms, max 1921ms Even if I consider that VirtualBox reached those files via shared folders, and that slows things down a bit, I still don't see how could the old notebook outperform so heavily both new configurations. And I think this is the real root of the slow performance, as the application uses even more includes, and the whole bootstrap will occur several times within a page request (for each ajax call, for example). To sum it up, here I am with a brand new high-performance notebook that loads the same page in 20 seconds, that my old notebook can do in 5-7 seconds. Needless to say, I'm not a very happy person right now. Why do you think I experience these poor performance values? What are my options to remedy this situation?

Read the article

php-fpm + persistent sockets = 502 bad gateway

- by leeoniya

Put on your reading glasses - this will be a long-ish one. First, what I'm doing. I'm building a web-app interface for some particularly slow tcp devices. Opening a socket to them takes 200ms and an fwrite/fread cycle takes another 300ms. To reduce the need for both of these actions on each request, I'm opening a persistent tcp socket which reduces the response time by the aforementioned 200ms. I was hoping PHP-FPM would share the persistent connections between requests from different clients (and indeed it does!), but there are some issues which I havent been able to resolve after 2 days of interneting, reading logs and modifying settings. I have somewhat narrowed it down though. Setup: Ubuntu 13.04 x64 Server (fully updated) on Linode PHP 5.5.0-6~raring+1 (fpm-fcgi) nginx/1.5.2 Relevent config: nginx worker_processes 4; php-fpm/pool.d pm = dynamic pm.max_children = 2 pm.start_servers = 2 pm.min_spare_servers = 2 Let's go from coarse to fine detail of what happens. After a fresh start I have 4x nginx processes and 2x php5-fpm processes waiting to handle requests. Then I send requests every couple seconds to the script. The first take a while to open the socket connection and returns with the data in about 500ms, the second returns data in 300ms (yay it's re-using the socket), the third also succeeds in about 300ms, the fourth request = 502 Bad Gateway, same with the 5th. Sixth request once again returns data, except now it took 500ms again. The process repeats for several cycles after which every 4 requests result in 2x 502 Bad Gateways and 2x 500ms Data responses. If I double all the fpm pool values and have 4x php-fpm processes running, the cycles settles in with 4x successful 500ms responses followed by 4x Bad Gateway errors. If I don't use persistent sockets, this issue goes away but then every request is 500ms. What I suspect is happening is the persistent socket keeps each php-fpm process from idling and ties it up, so the next one gets chosen until none are left and as they error out, maybe they are restarted and become available on the next round-robin loop ut the socket dies with the process. I haven't yet checked the 'slowlog', but the nginx error log shows lots of this: *188 recv() failed (104: Connection reset by peer) while reading response header from upstream, client:... All the suggestions on the internet regarding fixing nginx/php-fpm/502 bad gateway relate to high load or fcgi_pass misconfiguration. This is not the case here. Increasing buffers/sizes, changing timeouts, switching from unix socket to tcp socket for fcgi_pass, upping connection limits on the system....none of this stuff applies here. I've had some other success with setting pm = ondemand rather than dynamic, but as soon as the initial fpm-process gets killed off after idling, the persistent socket is gone for all subsequent php-fpm spawns. For the php script, I'm using stream_socket_client() with a STREAM_CLIENT_PERSISTENT flag. A while/stream_select() loop to detect socket data and fread($sock, 4096) to grab the data. I don't call fclose() obviously. If anyone has some additional questions or advice on how to get a persistent socket without tying up the php-fpm processes beyond the request completion, or maybe some other things to try, I'd appreciate it. some useful links: Nginx + php-fpm - recv() error Nginx + php-fpm "504 Gateway Time-out" error with almost zero load (on a test-server) Nginx + PHP-FPM "error 104 Connection reset by peer" causes occasional duplicate posts http://www.linuxquestions.org/questions/programming-9/php-pfsockopen-552084/ http://stackoverflow.com/questions/14268018/concurrent-use-of-a-persistent-php-socket http://devzone.zend.com/303/extension-writing-part-i-introduction-to-php-and-zend/#Heading3 http://stackoverflow.com/questions/242316/how-to-keep-a-php-stream-socket-alive http://php.net/manual/en/install.fpm.configuration.php https://www.google.com/search?q=recv%28%29+failed+%28104:+Connection+reset+by+peer%29+while+reading+response+header+from+upstream+%22502%22&ei=mC1XUrm7F4WQyAHbv4H4AQ&start=10&sa=N&biw=1920&bih=953&dpr=1

Read the article

Parallel processing slower than sequential?

- by zebediah49

EDIT: For anyone who stumbles upon this in the future: Imagemagick uses a MP library. It's faster to use available cores if they're around, but if you have parallel jobs, it's unhelpful. Do one of the following: do your jobs serially (with Imagemagick in parallel mode) set MAGICK_THREAD_LIMIT=1 for your invocation of the imagemagick binary in question. By making Imagemagick use only one thread, it slows down by 20-30% in my test cases, but meant I could run one job per core without issues, for a significant net increase in performance. Original question: While converting some images using ImageMagick, I noticed a somewhat strange effect. Using xargs was significantly slower than a standard for loop. Since xargs limited to a single process should act like a for loop, I tested that, and found it to be about the same. Thus, we have this demonstration. Quad core (AMD Athalon X4, 2.6GHz) Working entirely on a tempfs (16g ram total; no swap) No other major loads Results: /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 0m3.784s user 0m2.240s sys 0m0.230s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 0m9.097s user 0m28.020s sys 0m0.910s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 0m9.844s user 0m33.200s sys 0m1.270s Can anyone think of a reason why running two instances of this program takes more than twice as long in real time, and more than ten times as long in processor time to complete the same task? After that initial hit, more processes do not seem to have as significant of an effect. I thought it might have to do with disk seeking, so I did that test entirely in ram. Could it have something to do with how Convert works, and having more than one copy at once means it cannot use processor cache as efficiently or something? EDIT: When done with 1000x 769KB files, performance is as expected. Interesting. /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.679s user 5m6.980s sys 0m6.340s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 1 convert -auto-level real 3m37.152s user 5m6.140s sys 0m6.530s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 2 convert -auto-level real 2m7.578s user 5m35.410s sys 0m6.050s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 4 convert -auto-level real 1m36.959s user 5m48.900s sys 0m6.350s /media/ramdisk/img$ time for f in *.bmp; do echo $f ${f%bmp}png; done | xargs -n 2 -P 10 convert -auto-level real 1m36.392s user 5m54.840s sys 0m5.650s

Read the article

Specifying a Postfix Instance to send outbound email

- by Catherine Jefferson

I have a CentOS 6.5 server running Postfix 2.6x (the default distribution) with five public IPv4 IPs bound to it. Each IP has DNS and rDNS set separately. Each uses a different hostname at a different domain. I have five Postfix instances, one bound to each IP, like this example: 192.168.34.104 red.example.com /etc/postfix 192.168.36.48 green.example.net /etc/postfix-green 192.168.36.49 pink.example.org /etc/postfix-pink 192.168.36.50 orange.example.info /etc/postfix-orange 192.168.36.51 blue.example.us /etc/postfix-blue I've tested each IP by telneting to port 25. Postfix answers and banners properly with the correct hostname. Email is received on all of these instances with no problems and is routed to the correct place. This setup, minus the final instance, has existed for a couple of years and works. I never bothered to set up outbound email to go through any but the main instance, however; there was no need. Now I need to send email from blue.example.us that actually leaves from that interface and IP, such that the Received headers show blue.example.us as the sending mailhost, so that SPF and DKIM validate, etc etc. The email that will be sent from blue.example.com is a feedback loop sent by a single shell account on the server (account5), an account that is dedicated to sending this email. The account receives the feedback loop emails from servers on other networks, saves the bodies of those emails, and then generates a new outbound email header, appends the saved body, and sends the email. It's sending by piping each email to sendmail -oi -t. We're doing it this way to mask the identities of the initial servers. The procmail script that processes these emails works correctly. However, I cannot configure this account to send email through the proper Postfix instance/IP/interface. The exact same account and script sends email through the main Postfix instance /etc/postfix without any issues. When I change MAIL_CONFIG to point to /etc/postfix-blue in either .bash_profile or the Procmail script that handles this email, though, I get this error: sendmail: fatal: User account5(###) is not allowed to submit mail I've read the manuals on Postfix.org, searched Google, and tried the suggestions in three previous answers here on ServerFault.com: Postfix - specify interface to deliver outbound mail on Postfix user is not allowed to submit mail Postfix rejects php mails I have been careful to stop and restart Postfix after each configuration change, and tested the results. Nothing has worked. The main postfix instance happily accepts outbound email from account5. The postfix-blue instance continues to reject email from account5 with the sendmail error above. As tempting as it is to blame machine hostility, I know that I must be missing something or doing something wrong. Does anybody have any suggestions as to what it might be? Please feel free to ask for further information about my setup if you need it. =-=-=-=-=-=-=-=-=-= At the request of the responder, here are main.cf and master.cf for a) the main postfix instance ("red.example.com") and b) the FBL instance ("blue.example.us") [NOTE: All parameters not specified below were left at the default Postfix 2.6 settings] MAIN: master.cf smtp inet n - n - - smtpd main.cf myhostname = red.example.com mydomain = example.com inet_interfaces = $myhostname, localhost inet_protocols = all lmtp_host_lookup = native smtp_host_lookup = native ignore_mx_lookup_error = yes mydestination = $myhostname, localhost.$mydomain, localhost local_recipient_maps = mynetworks = 192.168.34.104/32 relay_domains = example.com, example.info, example.net, example.org, example.us relayhost = [192.168.34.102] # Separate physical server, main mailserver. relay_recipient_maps = hash:/etc/postfix/relay_recipients alias_maps = hash:/etc/aliases alias_database = hash:/etc/aliases smtpd_banner = $myhostname ESMTP $mail_name multi_instance_wrapper = ${command_directory}/postmulti -p -- multi_instance_enable = yes multi_instance_directories = /etc/postfix-green /etc/postfix-pink /etc/postfix-orange /etc/postfix-blue FBL: master.cf 184.173.119.103:25 inet n - n - - smtpd main.cf myhostname = blue.example.us mydomain = blue.example.us <= Deliberately set to subdomain only. myorigin = $mydomain inet_interfaces = $myhostname lmtp_host_lookup = native smtp_host_lookup = native ignore_mx_lookup_error = yes mydestination = $myhostname local_recipient_maps = unix:passwd.byname $alias_maps $virtual_alias_maps mynetworks = 192.168.36.51/32, 192.168.35.20/31 <= Second IP is backup MX servers relay_domains = $mydestination recipient_canonical_maps = hash:/etc/postfix-blue/canonical virtual_alias_maps = hash:/etc/postfix-fbl/virtual alias_maps = hash:/etc/aliases, hash:/etc/postfix-blue/canonical alias_maps = hash:/etc/aliases, hash:/etc/postfix-blue/canonical mailbox_command = /usr/bin/procmail -a "$EXTENSION" DEFAULT=$HOME/Mail/ MAILDIR=$HOME/Mail smtpd_banner = $myhostname ESMTP $mail_name authorized_submit_users = multi_instance_name = postfix-blue multi_instance_enable = yes

Read the article

Apache SSL reverse proxy to a Embed Tomcat

- by ggarcia24

I'm trying to put in place a reverse proxy for an application that is running a tomcat embed server over SSL. The application needs to run over SSL on the port 9002 so I have no way of "disabling SSL" for this app. The current setup schema looks like this: [192.168.0.10:443 - Apache with mod_proxy] --> [192.168.0.10:9002 - Tomcat App] After googling on how to make such a setup (and testing) I came across this: https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/861137 Which lead to make my current configuration (to try to emulate the --secure-protocol=sslv3 option of wget) /etc/apache2/sites/enabled/default-ssl: <VirtualHost _default_:443> SSLEngine On SSLCertificateFile /etc/ssl/certs/ssl-cert-snakeoil.pem SSLCertificateKeyFile /etc/ssl/private/ssl-cert-snakeoil.key SSLProxyEngine On SSLProxyProtocol SSLv3 SSLProxyCipherSuite SSLv3 ProxyPass /test/ https://192.168.0.10:9002/ ProxyPassReverse /test/ https://192.168.0.10:9002/ LogLevel debug ErrorLog /var/log/apache2/error-ssl.log CustomLog /var/log/apache2/access-ssl.log combined </VirtualHost> The thing is that the error log is showing error:14077102:SSL routines:SSL23_GET_SERVER_HELLO:unsupported protocol Complete request log: [Wed Mar 13 20:05:57 2013] [debug] mod_proxy.c(1020): Running scheme https handler (attempt 0) [Wed Mar 13 20:05:57 2013] [debug] mod_proxy_http.c(1973): proxy: HTTP: serving URL https://192.168.0.10:9002/ [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2011): proxy: HTTPS: has acquired connection for (192.168.0.10) [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2067): proxy: connecting https://192.168.0.10:9002/ to 192.168.0.10:9002 [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2193): proxy: connected / to 192.168.0.10:9002 [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2444): proxy: HTTPS: fam 2 socket created to connect to 192.168.0.10 [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2576): proxy: HTTPS: connection complete to 192.168.0.10:9002 (192.168.0.10) [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] Connection to child 0 established (server demo1agrubu01.demo.lab:443) [Wed Mar 13 20:05:57 2013] [info] Seeding PRNG with 656 bytes of entropy [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1866): OpenSSL: Handshake: start [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1874): OpenSSL: Loop: before/connect initialization [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1874): OpenSSL: Loop: unknown state [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1897): OpenSSL: read 7/7 bytes from BIO#7f122800a100 [mem: 7f1230018f60] (BIO dump follows) [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1830): +-------------------------------------------------------------------------+ [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1869): | 0000: 15 03 01 00 02 02 50 ......P | [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_io.c(1875): +-------------------------------------------------------------------------+ [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1903): OpenSSL: Exit: error in unknown state [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] SSL Proxy connect failed [Wed Mar 13 20:05:57 2013] [info] SSL Library Error: 336032002 error:14077102:SSL routines:SSL23_GET_SERVER_HELLO:unsupported protocol [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] Connection closed to child 0 with abortive shutdown (server example1.domain.tld:443) [Wed Mar 13 20:05:57 2013] [error] (502)Unknown error 502: proxy: pass request body failed to 172.31.4.13:9002 (192.168.0.10) [Wed Mar 13 20:05:57 2013] [error] [client 192.168.0.10] proxy: Error during SSL Handshake with remote server returned by /dsfe/ [Wed Mar 13 20:05:57 2013] [error] proxy: pass request body failed to 192.168.0.10:9002 (172.31.4.13) from 172.31.4.13 () [Wed Mar 13 20:05:57 2013] [debug] proxy_util.c(2029): proxy: HTTPS: has released connection for (172.31.4.13) [Wed Mar 13 20:05:57 2013] [debug] ssl_engine_kernel.c(1884): OpenSSL: Write: SSL negotiation finished successfully [Wed Mar 13 20:05:57 2013] [info] [client 192.168.0.10] Connection closed to child 6 with standard shutdown (server example1.domain.tld:443) If I do a wget --secure-protocol=sslv3 --no-check-certificate https://192.168.0.10:9002/ it works perfectly, but from apache is not working. I'm on an Ubuntu Server with the latest updates running apache2 with mod_proxy and mod_ssl enabled: ~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION="Ubuntu 12.04.2 LTS" ~# dpkg -s apache2 ... Version: 2.2.22-1ubuntu1.2 ... ~# dpkg -s openssl ... Version: 1.0.1-4ubuntu5.7 ... Hope that anyone may help

Read the article

Parallelism in .NET – Part 5, Partitioning of Work

- by Reed

When parallelizing any routine, we start by decomposing the problem. Once the problem is understood, we need to break our work into separate tasks, so each task can be run on a different processing element. This process is called partitioning. Partitioning our tasks is a challenging feat. There are opposing forces at work here: too many partitions adds overhead, too few partitions leaves processors idle. Trying to work the perfect balance between the two extremes is the goal for which we should aim. Luckily, the Task Parallel Library automatically handles much of this process. However, there are situations where the default partitioning may not be appropriate, and knowledge of our routines may allow us to guide the framework to making better decisions. First off, I’d like to say that this is a more advanced topic. It is perfectly acceptable to use the parallel constructs in the framework without considering the partitioning taking place. The default behavior in the Task Parallel Library is very well-behaved, even for unusual work loads, and should rarely be adjusted. I have found few situations where the default partitioning behavior in the TPL is not as good or better than my own hand-written partitioning routines, and recommend using the defaults unless there is a strong, measured, and profiled reason to avoid using them. However, understanding partitioning, and how the TPL partitions your data, helps in understanding the proper usage of the TPL. I indirectly mentioned partitioning while discussing aggregation. Typically, our systems will have a limited number of Processing Elements (PE), which is the terminology used for hardware capable of processing a stream of instructions. For example, in a standard Intel i7 system, there are four processor cores, each of which has two potential hardware threads due to Hyperthreading. This gives us a total of 8 PEs – theoretically, we can have up to eight operations occurring concurrently within our system. In order to fully exploit this power, we need to partition our work into Tasks. A task is a simple set of instructions that can be run on a PE. Ideally, we want to have at least one task per PE in the system, since fewer tasks means that some of our processing power will be sitting idle. A naive implementation would be to just take our data, and partition it with one element in our collection being treated as one task. When we loop through our collection in parallel, using this approach, we’d just process one item at a time, then reuse that thread to process the next, etc. There’s a flaw in this approach, however. It will tend to be slower than necessary, often slower than processing the data serially. The problem is that there is overhead associated with each task. When we take a simple foreach loop body and implement it using the TPL, we add overhead. First, we change the body from a simple statement to a delegate, which must be invoked. In order to invoke the delegate on a separate thread, the delegate gets added to the ThreadPool’s current work queue, and the ThreadPool must pull this off the queue, assign it to a free thread, then execute it. If our collection had one million elements, the overhead of trying to spawn one million tasks would destroy our performance. The answer, here, is to partition our collection into groups, and have each group of elements treated as a single task. By adding a partitioning step, we can break our total work into small enough tasks to keep our processors busy, but large enough tasks to avoid overburdening the ThreadPool. There are two clear, opposing goals here: Always try to keep each processor working, but also try to keep the individual partitions as large as possible. When using Parallel.For, the partitioning is always handled automatically. At first, partitioning here seems simple. A naive implementation would merely split the total element count up by the number of PEs in the system, and assign a chunk of data to each processor. Many hand-written partitioning schemes work in this exactly manner. This perfectly balanced, static partitioning scheme works very well if the amount of work is constant for each element. However, this is rarely the case. Often, the length of time required to process an element grows as we progress through the collection, especially if we’re doing numerical computations. In this case, the first PEs will finish early, and sit idle waiting on the last chunks to finish. Sometimes, work can decrease as we progress, since previous computations may be used to speed up later computations. In this situation, the first chunks will be working far longer than the last chunks. In order to balance the workload, many implementations create many small chunks, and reuse threads. This adds overhead, but does provide better load balancing, which in turn improves performance. The Task Parallel Library handles this more elaborately. Chunks are determined at runtime, and start small. They grow slowly over time, getting larger and larger. This tends to lead to a near optimum load balancing, even in odd cases such as increasing or decreasing workloads. Parallel.ForEach is a bit more complicated, however. When working with a generic IEnumerable<T>, the number of items required for processing is not known in advance, and must be discovered at runtime. In addition, since we don’t have direct access to each element, the scheduler must enumerate the collection to process it. Since IEnumerable<T> is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out. By default, it uses a partitioning method similar to the one described above. We can see this directly by looking at the Visual Partitioning sample shipped by the Task Parallel Library team, and available as part of the Samples for Parallel Programming. When we run the sample, with four cores and the default, Load Balancing partitioning scheme, we see this: The colored bands represent each processing core. You can see that, when we started (at the top), we begin with very small bands of color. As the routine progresses through the Parallel.ForEach, the chunks get larger and larger (seen by larger and larger stripes). Most of the time, this is fantastic behavior, and most likely will out perform any custom written partitioning. However, if your routine is not scaling well, it may be due to a failure in the default partitioning to handle your specific case. With prior knowledge about your work, it may be possible to partition data more meaningfully than the default Partitioner. There is the option to use an overload of Parallel.ForEach which takes a Partitioner<T> instance. The Partitioner<T> class is an abstract class which allows for both static and dynamic partitioning. By overriding Partitioner<T>.SupportsDynamicPartitions, you can specify whether a dynamic approach is available. If not, your custom Partitioner<T> subclass would override GetPartitions(int), which returns a list of IEnumerator<T> instances. These are then used by the Parallel class to split work up amongst processors. When dynamic partitioning is available, GetDynamicPartitions() is used, which returns an IEnumerable<T> for each partition. If you do decide to implement your own Partitioner<T>, keep in mind the goals and tradeoffs of different partitioning strategies, and design appropriately. The Samples for Parallel Programming project includes a ChunkPartitioner class in the ParallelExtensionsExtras project. This provides example code for implementing your own, custom allocation strategies, including a static allocator of a given chunk size. Although implementing your own Partitioner<T> is possible, as I mentioned above, this is rarely required or useful in practice. The default behavior of the TPL is very good, often better than any hand written partitioning strategy.

Read the article

Parallelism in .NET – Part 6, Declarative Data Parallelism

- by Reed

When working with a problem that can be decomposed by data, we have a collection, and some operation being performed upon the collection. I’ve demonstrated how this can be parallelized using the Task Parallel Library and imperative programming using imperative data parallelism via the Parallel class. While this provides a huge step forward in terms of power and capabilities, in many cases, special care must still be given for relative common scenarios. C# 3.0 and Visual Basic 9.0 introduced a new, declarative programming model to .NET via the LINQ Project. When working with collections, we can now write software that describes what we want to occur without having to explicitly state how the program should accomplish the task. By taking advantage of LINQ, many operations become much shorter, more elegant, and easier to understand and maintain. Version 4.0 of the .NET framework extends this concept into the parallel computation space by introducing Parallel LINQ. Before we delve into PLINQ, let’s begin with a short discussion of LINQ. LINQ, the extensions to the .NET Framework which implement language integrated query, set, and transform operations, is implemented in many flavors. For our purposes, we are interested in LINQ to Objects. When dealing with parallelizing a routine, we typically are dealing with in-memory data storage. More data-access oriented LINQ variants, such as LINQ to SQL and LINQ to Entities in the Entity Framework fall outside of our concern, since the parallelism there is the concern of the data base engine processing the query itself. LINQ (LINQ to Objects in particular) works by implementing a series of extension methods, most of which work on IEnumerable<T>. The language enhancements use these extension methods to create a very concise, readable alternative to using traditional foreach statement. For example, let’s revisit our minimum aggregation routine we wrote in Part 4: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re doing a very simple computation, but writing this in an imperative style. This can be loosely translated to English as: Create a very large number, and save it in min Loop through each item in the collection. For every item: Perform some computation, and save the result If the computation is less than min, set min to the computation Although this is fairly easy to follow, it’s quite a few lines of code, and it requires us to read through the code, step by step, line by line, in order to understand the intention of the developer. We can rework this same statement, using LINQ: double min = collection.Min(item => item.PerformComputation()); Here, we’re after the same information. However, this is written using a declarative programming style. When we see this code, we’d naturally translate this to English as: Save the Min value of collection, determined via calling item.PerformComputation() That’s it – instead of multiple logical steps, we have one single, declarative request. This makes the developer’s intentions very clear, and very easy to follow. The system is free to implement this using whatever method required. Parallel LINQ (PLINQ) extends LINQ to Objects to support parallel operations. This is a perfect fit in many cases when you have a problem that can be decomposed by data. To show this, let’s again refer to our minimum aggregation routine from Part 4, but this time, let’s review our final, parallelized version: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Here, we’re doing the same computation as above, but fully parallelized. Describing this in English becomes quite a feat: Create a very large number, and save it in min Create a temporary object we can use for locking Call Parallel.ForEach, specifying three delegates For the first delegate: Initialize a local variable to hold the local state to a very large number For the second delegate: For each item in the collection, perform some computation, save the result If the result is less than our local state, save the result in local state For the final delegate: Take a lock on our temporary object to protect our min variable Save the min of our min and local state variables Although this solves our problem, and does it in a very efficient way, we’ve created a set of code that is quite a bit more difficult to understand and maintain. PLINQ provides us with a very nice alternative. In order to use PLINQ, we need to learn one new extension method that works on IEnumerable<T> – ParallelEnumerable.AsParallel(). That’s all we need to learn in order to use PLINQ: one single method. We can write our minimum aggregation in PLINQ very simply: double min = collection.AsParallel().Min(item => item.PerformComputation()); By simply adding “.AsParallel()” to our LINQ to Objects query, we converted this to using PLINQ and running this computation in parallel! This can be loosely translated into English easily, as well: Process the collection in parallel Get the Minimum value, determined by calling PerformComputation on each item Here, our intention is very clear and easy to understand. We just want to perform the same operation we did in serial, but run it “as parallel”. PLINQ completely extends LINQ to Objects: the entire functionality of LINQ to Objects is available. By simply adding a call to AsParallel(), we can specify that a collection should be processed in parallel. This is simple, safe, and incredibly useful.

Read the article

SSIS Lookup component tuning tips

- by jamiet

Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article

How can I get penetration depth from Minkowski Portal Refinement / Xenocollide?

- by Raven Dreamer

I recently got an implementation of Minkowski Portal Refinement (MPR) successfully detecting collision. Even better, my implementation returns a good estimate (local minimum) direction for the minimum penetration depth. So I took a stab at adjusting the algorithm to return the penetration depth in an arbitrary direction, and was modestly successful - my altered method works splendidly for face-edge collision resolution! What it doesn't currently do, is correctly provide the minimum penetration depth for edge-edge scenarios, such as the case on the right: What I perceive to be happening, is that my current method returns the minimum penetration depth to the nearest vertex - which works fine when the collision is actually occurring on the plane of that vertex, but not when the collision happens along an edge. Is there a way I can alter my method to return the penetration depth to the point of collision, rather than the nearest vertex? Here's the method that's supposed to return the minimum penetration distance along a specific direction: public static Vector3 CalcMinDistance(List<Vector3> shape1, List<Vector3> shape2, Vector3 dir) { //holding variables Vector3 n = Vector3.zero; Vector3 swap = Vector3.zero; // v0 = center of Minkowski sum v0 = Vector3.zero; // Avoid case where centers overlap -- any direction is fine in this case //if (v0 == Vector3.zero) return Vector3.zero; //always pass in a valid direction. // v1 = support in direction of origin n = -dir; //get the differnce of the minkowski sum Vector3 v11 = GetSupport(shape1, -n); Vector3 v12 = GetSupport(shape2, n); v1 = v12 - v11; //if the support point is not in the direction of the origin if (v1.Dot(n) <= 0) { //Debug.Log("Could find no points this direction"); return Vector3.zero; } // v2 - support perpendicular to v1,v0 n = v1.Cross(v0); if (n == Vector3.zero) { //v1 and v0 are parallel, which means //the direction leads directly to an endpoint n = v1 - v0; //shortest distance is just n //Debug.Log("2 point return"); return n; } //get the new support point Vector3 v21 = GetSupport(shape1, -n); Vector3 v22 = GetSupport(shape2, n); v2 = v22 - v21; if (v2.Dot(n) <= 0) { //can't reach the origin in this direction, ergo, no collision //Debug.Log("Could not reach edge?"); return Vector2.zero; } // Determine whether origin is on + or - side of plane (v1,v0,v2) //tests linesegments v0v1 and v0v2 n = (v1 - v0).Cross(v2 - v0); float dist = n.Dot(v0); // If the origin is on the - side of the plane, reverse the direction of the plane if (dist > 0) { //swap the winding order of v1 and v2 swap = v1; v1 = v2; v2 = swap; //swap the winding order of v11 and v12 swap = v12; v12 = v11; v11 = swap; //swap the winding order of v11 and v12 swap = v22; v22 = v21; v21 = swap; //and swap the plane normal n = -n; } /// // Phase One: Identify a portal while (true) { // Obtain the support point in a direction perpendicular to the existing plane // Note: This point is guaranteed to lie off the plane Vector3 v31 = GetSupport(shape1, -n); Vector3 v32 = GetSupport(shape2, n); v3 = v32 - v31; if (v3.Dot(n) <= 0) { //can't enclose the origin within our tetrahedron //Debug.Log("Could not reach edge after portal?"); return Vector3.zero; } // If origin is outside (v1,v0,v3), then eliminate v2 and loop if (v1.Cross(v3).Dot(v0) < 0) { //failed to enclose the origin, adjust points; v2 = v3; v21 = v31; v22 = v32; n = (v1 - v0).Cross(v3 - v0); continue; } // If origin is outside (v3,v0,v2), then eliminate v1 and loop if (v3.Cross(v2).Dot(v0) < 0) { //failed to enclose the origin, adjust points; v1 = v3; v11 = v31; v12 = v32; n = (v3 - v0).Cross(v2 - v0); continue; } bool hit = false; /// // Phase Two: Refine the portal int phase2 = 0; // We are now inside of a wedge... while (phase2 < 20) { phase2++; // Compute normal of the wedge face n = (v2 - v1).Cross(v3 - v1); n.Normalize(); // Compute distance from origin to wedge face float d = n.Dot(v1); // If the origin is inside the wedge, we have a hit if (d > 0 ) { //Debug.Log("Do plane test here"); float T = n.Dot(v2) / n.Dot(dir); Vector3 pointInPlane = (dir * T); return pointInPlane; } // Find the support point in the direction of the wedge face Vector3 v41 = GetSupport(shape1, -n); Vector3 v42 = GetSupport(shape2, n); v4 = v42 - v41; float delta = (v4 - v3).Dot(n); float separation = -(v4.Dot(n)); if (delta <= kCollideEpsilon || separation >= 0) { //Debug.Log("Non-convergance detected"); //Debug.Log("Do plane test here"); return Vector3.zero; } // Compute the tetrahedron dividing face (v4,v0,v1) float d1 = v4.Cross(v1).Dot(v0); // Compute the tetrahedron dividing face (v4,v0,v2) float d2 = v4.Cross(v2).Dot(v0); // Compute the tetrahedron dividing face (v4,v0,v3) float d3 = v4.Cross(v3).Dot(v0); if (d1 < 0) { if (d2 < 0) { // Inside d1 & inside d2 ==> eliminate v1 v1 = v4; v11 = v41; v12 = v42; } else { // Inside d1 & outside d2 ==> eliminate v3 v3 = v4; v31 = v41; v32 = v42; } } else { if (d3 < 0) { // Outside d1 & inside d3 ==> eliminate v2 v2 = v4; v21 = v41; v22 = v42; } else { // Outside d1 & outside d3 ==> eliminate v1 v1 = v4; v11 = v41; v12 = v42; } } } return Vector3.zero; } }

Read the article

ODI 12c - Parallel Table Load

- by David Allan

In this post we will look at the ODI 12c capability of parallel table load from the aspect of the mapping developer and the knowledge module developer - two quite different viewpoints. This is about parallel table loading which isn't to be confused with loading multiple targets per se. It supports the ability for ODI mappings to be executed concurrently especially if there is an overlap of the datastores that they access, so any temporary resources created may be uniquely constructed by ODI. Temporary objects can be anything basically - common examples are staging tables, indexes, views, directories - anything in the ETL to help the data integration flow do its job. In ODI 11g users found a few workarounds (such as changing the technology prefixes - see here) to build unique temporary names but it was more of a challenge in error cases. ODI 12c mappings by default operate exactly as they did in ODI 11g with respect to these temporary names (this is also true for upgraded interfaces and scenarios) but can be configured to support the uniqueness capabilities. We will look at this feature from two aspects; that of a mapping developer and that of a developer (of procedures or KMs). 1. Firstly as a Mapping Developer..... 1.1 Control when uniqueness is enabled A new property is available to set unique name generation on/off. When unique names have been enabled for a mapping, all temporary names used by the collection and integration objects will be generated using unique names. This property is presented as a check-box in the Property Inspector for a deployment specification. 1.2 Handle cleanup after successful execution Provided that all temporary objects that are created have a corresponding drop statement then all of the temporary objects should be removed during a successful execution. This should be the case with the KMs developed by Oracle. 1.3 Handle cleanup after unsuccessful execution If an execution failed in ODI 11g then temporary tables would have been left around and cleaned up in the subsequent run. In ODI 12c, KM tasks can now have a cleanup-type task which is executed even after a failure in the main tasks. These cleanup tasks will be executed even on failure if the property 'Remove Temporary Objects on Error' is set. If the agent was to crash and not be able to execute this task, then there is an ODI tool (OdiRemoveTemporaryObjects here) you can invoke to cleanup the tables - it supports date ranges and the like. That's all there is to it from the aspect of the mapping developer it's much, much simpler and straightforward. You can now execute the same mapping concurrently or execute many mappings using the same resource concurrently without worrying about conflict. 2. Secondly as a Procedure or KM Developer..... In the ODI Operator the executed code shows the actual name that is generated - you can also see the runtime code prior to execution (introduced in 11.1.1.7), for example below in the code type I selected 'Pre-executed Code' this lets you see the code about to be processed and you can also see the executed code (which is the default view). References to the collection (C$) and integration (I$) names will be automatically made unique by using the odiRef APIs - these objects will have unique names whenever concurrency has been enabled for a particular mapping deployment specification. It's also possible to use name uniqueness functions in procedures and your own KMs. 2.1 New uniqueness tags You can also make your own temporary objects have unique names by explicitly including either %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG in the name passed to calls to the odiRef APIs. Such names would always include the unique tag regardless of the concurrency setting. To illustrate, let's look at the getObjectName() method. At <% expansion time, this API will append %UNIQUE_STEP_TAG to the object name for collection and integration tables. The name parameter passed to this API may contain %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. This API always generates to the <? version of getObjectName() At execution time this API will replace the unique tag macros with a string that is unique to the current execution scope. The returned name will conform to the name-length restriction for the target technology, and its pattern for the unique tag. Any necessary truncation will be performed against the initial name for the object and any other fixed text that may have been specified. Examples are:- <?=odiRef.getObjectName("L", "%COL_PRFEMP%UNIQUE_STEP_TAG", "D")?> SCOTT.C$_EABH7QI1BR1EQI3M76PG9SIMBQQ <?=odiRef.getObjectName("L", "EMP%UNIQUE_STEP_TAG_AE", "D")?> SCOTT.EMPAO96Q2JEKO0FTHQP77TMSAIOSR_ Methods which have this kind of support include getFrom, getTableName, getTable, getObjectShortName and getTemporaryIndex. There are APIs for retrieving this tag info also, the getInfo API has been extended with the following properties (the UNIQUE* properties can also be used in ODI procedures); UNIQUE_STEP_TAG - Returns the unique value for the current step scope, e.g. 5rvmd8hOIy7OU2o1FhsF61 Note that this will be a different value for each loop-iteration when the step is in a loop. UNIQUE_SESSION_TAG - Returns the unique value for the current session scope, e.g. 6N38vXLrgjwUwT5MseHHY9 IS_CONCURRENT - Returns info about the current mapping, will return 0 or 1 (only in % phase) GUID_SRC_SET - Returns the UUID for the current source set/execution unit (only in % phase) The getPop API has been extended with the IS_CONCURRENT property which returns info about an mapping, will return 0 or 1. 2.2 Additional APIs Some new APIs are provided including getFormattedName which will allow KM developers to construct a name from fixed-text or ODI symbols that can be optionally truncate to a max length and use a specific encoding for the unique tag. It has syntax getFormattedName(String pName[, String pTechnologyCode]) This API is available at both the % and the ? phase. The format string can contain the ODI prefixes that are available for getObjectName(), e.g. %INT_PRF, %COL_PRF, %ERR_PRF, %IDX_PRF alongwith %UNIQUE_STEP_TAG or %UNIQUE_SESSION_TAG. The latter tags will be expanded into a unique string according to the specified technology. Calls to this API within the same execution context are guaranteed to return the same unique name provided that the same parameters are passed to the call. e.g. <%=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")%> <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG_AE", "ORACLE")?> C$_MY_TAB7wDiBe80vBog1auacS1xB_AE <?=odiRef.getFormattedName("%COL_PRFMY_TABLE%UNIQUE_STEP_TAG.log", "FILE")?> C2_MY_TAB7wDiBe80vBog1auacS1xB.log 2.3 Name length generation As part of name generation, the length of the generated name will be compared with the maximum length for the target technology and truncation may need to be applied. When a unique tag is included in the generated string it is important that uniqueness is not compromised by truncation of the unique tag. When a unique tag is NOT part of the generated name, the name will be truncated by removing characters from the end - this is the existing 11g algorithm. When a unique tag is included, the algorithm will first truncate the <postfix> and if necessary the <prefix>. It is recommended that users will ensure there is sufficient uniqueness in the <prefix> section to ensure uniqueness of the final resultant name. SUMMARY To summarize, ODI 12c make it much simpler to utilize mappings in concurrent cases and provides APIs for helping developing any procedures or custom knowledge modules in such a way they can be used in highly concurrent, parallel scenarios.

Read the article

SSIS - XML Source Script

- by simonsabin

The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml"); foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer") from order in customer.Elements("Orders").Elements("Order") select new { Account = customer.Attribute("AccountNumber").Value , OrderDate = order.Attribute("OrderDate").Value } )) { Output0Buffer.AddRow(); Output0Buffer.AccountNumber = xdata.Account; Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer") from order in customer.Elements("Orders").Elements("Order") select new { Account = customer.Attribute("AccountNumber").Value , OrderDate = order.Attribute("OrderDate").Value } )) { Output0Buffer.AddRow(); Output0Buffer.AccountNumber = xdata.Account; Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) { using (XmlReader xr = XmlReader.Create(filename)) { xr.MoveToContent(); while (xr.Read()) //Reads the first element { while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName) { XElement node = (XElement)XElement.ReadFrom(xr); yield return node; } } xr.Close(); } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.

Read the article

Using a "white list" for extracting terms for Text Mining

- by [email protected]

In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms. However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM TOKEN DATA MINING DATAMINING ORACLE DATA MINING ORACLEDATAMINING 11G ORACLE11G JAVA JAVA CRM CRM CUSTOMER RELATIONSHIP MANAGEMENT CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term VARCHAR2(100), query_string VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE cursor c2 is select token, term from my_term_token_map; BEGIN for r_c2 in c2 loop CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term); EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')' using r_c2.token; end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example: 'ORACLEDATAMINING' SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on my_thesaurus_rules(query_string) indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

Read the article

June 2013 Release of the Ajax Control Toolkit

- by Stephen.Walther

I’m happy to announce the June 2013 release of the Ajax Control Toolkit. For this release, we enhanced the AjaxFileUpload control to support uploading files directly to Windows Azure. We also improved the SlideShow control by adding support for CSS3 animations. You can get the latest release of the Ajax Control Toolkit by visiting the project page at CodePlex (http://AjaxControlToolkit.CodePlex.com). Alternatively, you can execute the following NuGet command from the Visual Studio Library Package Manager window: Uploading Files to Azure The AjaxFileUpload control enables you to efficiently upload large files and display progress while uploading. With this release, we’ve added support for uploading large files directly to Windows Azure Blob Storage (You can continue to upload to your server hard drive if you prefer). Imagine, for example, that you have created an Azure Blob Storage container named pictures. In that case, you can use the following AjaxFileUpload control to upload to the container: <toolkit:ToolkitScriptManager runat="server" /> <toolkit:AjaxFileUpload ID="AjaxFileUpload1" StoreToAzure="true" AzureContainerName="pictures" runat="server" /> Notice that the AjaxFileUpload control is declared with two properties related to Azure. The StoreToAzure property causes the AjaxFileUpload control to upload a file to Azure instead of the local computer. The AzureContainerName property points to the blob container where the file is uploaded. .int3{position:absolute;clip:rect(487px,auto,auto,444px);}SMALL cash advance VERY CHEAP To use the AjaxFileUpload control, you need to modify your web.config file so it contains some additional settings. You need to configure the AjaxFileUpload handler and you need to point your Windows Azure connection string to your Blob Storage account. <configuration> <appSettings>  <add key="AjaxFileUploadAzureConnectionString" value="DefaultEndpointsProtocol=https;AccountName=testact;AccountKey=RvqL89Iw4npvPlAAtpOIPzrinHkhkb6rtRZmD0+ojZupUWuuAVJRyyF/LIVzzkoN38I4LSr8qvvl68sZtA152A=="/> </appSettings> <system.web> <compilation debug="true" targetFramework="4.5" /> <httpRuntime targetFramework="4.5" /> <httpHandlers> <add verb="*" path="AjaxFileUploadHandler.axd" type="AjaxControlToolkit.AjaxFileUploadHandler, AjaxControlToolkit"/> </httpHandlers> </system.web> <system.webServer> <validation validateIntegratedModeConfiguration="false" /> <handlers> <add name="AjaxFileUploadHandler" verb="*" path="AjaxFileUploadHandler.axd" type="AjaxControlToolkit.AjaxFileUploadHandler, AjaxControlToolkit"/> </handlers> <security> <requestFiltering> <requestLimits maxAllowedContentLength="4294967295"/> </requestFiltering> </security> </system.webServer> </configuration> You supply the connection string for your Azure Blob Storage account with the AjaxFileUploadAzureConnectionString property. If you set the value “UseDevelopmentStorage=true” then the AjaxFileUpload will upload to the simulated Blob Storage on your local machine. After you create the necessary configuration settings, you can use the AjaxFileUpload control to upload files directly to Azure (even very large files). Here’s a screen capture of how the AjaxFileUpload control appears in Google Chrome: After the files are uploaded, you can view the uploaded files in the Windows Azure Portal. You can see that all 5 files were uploaded successfully: New AjaxFileUpload Events In response to user feedback, we added two new events to the AjaxFileUpload control (on both the server and the client): · UploadStart – Raised on the server before any files have been uploaded. · UploadCompleteAll – Raised on the server when all files have been uploaded. · OnClientUploadStart – The name of a function on the client which is called before any files have been uploaded. · OnClientUploadCompleteAll – The name of a function on the client which is called after all files have been uploaded. These new events are most useful when uploading multiple files at a time. The updated AjaxFileUpload sample page demonstrates how to use these events to show the total amount of time required to upload multiple files (see the AjaxFileUpload.aspx file in the Ajax Control Toolkit sample site). SlideShow Animated Slide Transitions With this release of the Ajax Control Toolkit, we also added support for CSS3 animations to the SlideShow control. The animation is used when transitioning from one slide to another. Here’s the complete list of animations: · FadeInFadeOut · ScaleX · ScaleY · ZoomInOut · Rotate · SlideLeft · SlideDown You specify the animation which you want to use by setting the SlideShowAnimationType property. For example, here is how you would use the Rotate animation when displaying a set of slides: <%@ Page Language="C#" AutoEventWireup="true" CodeBehind="ShowSlideShow.aspx.cs" Inherits="TestACTJune2013.ShowSlideShow" %> <%@ Register TagPrefix="toolkit" Namespace="AjaxControlToolkit" Assembly="AjaxControlToolkit" %> <script runat="Server" type="text/C#"> [System.Web.Services.WebMethod] [System.Web.Script.Services.ScriptMethod] public static AjaxControlToolkit.Slide[] GetSlides() { return new AjaxControlToolkit.Slide[] { new AjaxControlToolkit.Slide("slides/Blue hills.jpg", "Blue Hills", "Go Blue"), new AjaxControlToolkit.Slide("slides/Sunset.jpg", "Sunset", "Setting sun"), new AjaxControlToolkit.Slide("slides/Winter.jpg", "Winter", "Wintery..."), new AjaxControlToolkit.Slide("slides/Water lilies.jpg", "Water lillies", "Lillies in the water"), new AjaxControlToolkit.Slide("slides/VerticalPicture.jpg", "Sedona", "Portrait style picture") }; } </script> <!DOCTYPE html> <html > <head runat="server"> <title></title> </head> <body> <form id="form1" runat="server"> <div> <toolkit:ToolkitScriptManager ID="ToolkitScriptManager1" runat="server" /> <asp:Image ID="Image1" Height="300" Runat="server" /> <toolkit:SlideShowExtender ID="SlideShowExtender1" TargetControlID="Image1" SlideShowServiceMethod="GetSlides" AutoPlay="true" Loop="true" SlideShowAnimationType="Rotate" runat="server" /> </div> </form> </body> </html> In the code above, the set of slides is exposed by a page method named GetSlides(). The SlideShowAnimationType property is set to the value Rotate. The following animated GIF gives you an idea of the resulting slideshow: If you want to use either the SlideDown or SlideRight animations, then you must supply both an explicit width and height for the Image control which is the target of the SlideShow extender. For example, here is how you would declare an Image and SlideShow control to use a SlideRight animation: <toolkit:ToolkitScriptManager ID="ToolkitScriptManager1" runat="server" /> <asp:Image ID="Image1" Height="300" Width="300" Runat="server" /> <toolkit:SlideShowExtender ID="SlideShowExtender1" TargetControlID="Image1" SlideShowServiceMethod="GetSlides" AutoPlay="true" Loop="true" SlideShowAnimationType="SlideRight" runat="server" /> Notice that the Image control includes both a Height and Width property. Here’s an approximation of this animation using an animated GIF: Summary The Superexpert team worked hard on this release. We hope you like the new improvements to both the AjaxFileUpload and the SlideShow controls. We’d love to hear your feedback in the comments. On to the next sprint!

Search Results

Search found 8019 results on 321 pages for 'loop'.

Page 156/321 | < Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >

- by inetech

- by user330561

- by Jon1

- by abson

- by chobo2

- by Alex

- by guest

- by guest

- by user299233

- by OkyDokyman

- by Haroldo

- by imbibi

- by András Szepesházi

- by leeoniya

- by zebediah49

- by Catherine Jefferson

- by ggarcia24

- by Reed

- by Reed

- by jamiet

- by Raven Dreamer

- by David Allan

- by simonsabin

- by [email protected]

- by Stephen.Walther

< Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >