Search Results

Search found 4176 results on 168 pages for 'graph algorithms'.

Page 147/168 | < Previous Page | 143 144 145 146 147 148 149 150 151 152 153 154  | Next Page >

  • Question about permute-by-sorting

    - by davit-datuashvili
    In the book "Introduction to Algorithms", second edition, there is the following problem: Suppose we have some array: int a[] = {1,2,3,4} and some random priorities array: P = {36,3,97,19} and the goal is to permute the array a randomly using this priorities array. This is the pseudo code: PERMUTE-BY-SORTING (A) 1 n ? length[A] 2 for i ? 1 to n 3 do P[i] = RANDOM (1, n 3) 4 sort A, using P as sort keys 5 return A The result should be the permuted array: B={2, 4, 1, 3}; I have written this code: import java.util.*; public class Permute { public static void main (String[] args) { Random r = new Random(); int a[] = new int[] {1,2,3,4}; int n = a.length; int b[] = new int[a.length]; int p[] = new int[a.length]; for (int i=0; i<p.length; i++) { p[i] = r.nextInt(n*n*n) + 1; } // for (int i=0;i<p.length;i++){ // System.out.println(p[i]); //} } } How do I continue?

    Read the article

  • Java: multi-threaded maps: how do the implementations compare?

    - by user346629
    I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice. The options I know are: HashMap. Disastrously un-thread safe. ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance. Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives. Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe. Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of? Thanks in advance for your input!

    Read the article

  • How can I be a Guru? Is it possible? [closed]

    - by Kev
    Before 1999, I heard about computer. But, I don't know what it look like. TV? Maybe! Before 2001, I only saw it in book. It looks like a TV. Before 2005, I touched it in reality. It still looks like a TV + Black Box. In 2005, I entered university. I had a cource about Mathematica.I loved programming since then. In 2006, I owned a computer. I was coding C every day. if...else..., for..., while..., switch... entered my life. Since 2007, I have learned Data Structures, Algorithms...Then, C#, Java, Python, HTML/CSS/JavaScript, F#... A lot of languages. I'm still learning new lang. Unfortunately, I only know syntax! I'm always a novice(??)! I know some guru start programming at age of 8 or 12. I admire these gurus who are compiler writers, language designers, architecture designers, Linux hackers... Is it possible to become a guru like me. If possible, how? what should I do now? Any advice? Thank you very much.

    Read the article

  • What db fits me?

    - by afvasd
    Dear Everyone I am currently using mysql. I am finding that my schema is getting incredibly complicated. I seek to find a new db that will suit my needs: Let's assume I am building a news aggregrator (which collects news from multiple website). I then run algorithms to determine if two news from different sites are actually referring to the same topic. I run this algorithm to cluster news together. The relationship is depicted below: cluster \--news1 \--word1 \--word2 \--news2 \--word3 \--news3 \--word1 \--word3 And then I will apply some magic and determine the importance of each word. Summing all the importance of each word gives me the importance of a news article. Summing the importance of each news article gives me the importance of a cluster. Note that above cluster there are also subgroups( like split by region etc), and categories (like sports, etc) which I have to determine the importance of that in a particular day per se. I have used views in the past to do so, but I realized that views are very slow. So i will normally do an insert into an actual table and index them for better performance. As you can see this leads to multiple tables derived like (cluster, importance), (news, importance), (words, importance) etc which can get pretty messy. Also the "importance" metric will change. It has become increasingly difficult to alter tables, update data (which I am using TRUNCATE TABLE) and then inserting from null. I am currently looking into something schemaless like Mongodb. I do not need distributedness. I would very much want something that is reasonably fast (which can be indexed) and something that is a lot more flexible that traditional RDMBS. Also, I need something that has some kind of ORM because I personally like ORM a lot. I am currently using sqlalchemy Please help!

    Read the article

  • Strange Puzzle - Invalid memory access of location

    - by Rob Graeber
    The error message I'm getting consistently is: Invalid memory access of location 0x8 rip=0x10cf4ab28 What I'm doing is making a basic stock backtesting system, that is iterating huge arrays of stocks/historical data across various algorithms, using java + eclipse on the latest Mac Os X. I tracked down the code that seems to be causing it. A method that is used to get the massive arrays of data and is called thousands of times. Nothing is retained so I don't think there is a memory leak. However there seems to be a set limit of around 7000 times I can iterate over it before I get the memory error. The weird thing is that it works perfectly in debug mode. Does anyone know what debug mode does differently in Eclipse? Giving the jvm more memory doesn't help, and it appears to work fine using -xint. And again it works perfectly in debug mode. public static List<Stock> getStockArray(ExchangeType e){ List<Stock> stockArray = new ArrayList<Stock>(); if(e == ExchangeType.ALL){ stockArray.addAll(getStockArray(ExchangeType.NYSE)); stockArray.addAll(getStockArray(ExchangeType.NASDAQ)); }else if(e == ExchangeType.ETF){ stockArray.addAll(etfStockArray); }else if(e == ExchangeType.NYSE){ stockArray.addAll(nyseStockArray); }else if(e == ExchangeType.NASDAQ){ stockArray.addAll(nasdaqStockArray); } return stockArray; } A simple loop like this, iterated over 1000s of times, will cause the memory error. But not in debug mode. for (Stock stock : StockDatabase.getStockArray(ExchangeType.ETF)) { System.out.println(stock.symbol); }

    Read the article

  • How can I create an array of random numbers in C++

    - by Nick
    Instead of The ELEMENTS being 25 is there a way to randomly generate a large array of elements....10000, 100000, or even 1000000 elements and then use my insertion sort algorithms. I am trying to have a large array of elements and use insertion sort to put them in order and then also in reverse order. Next I used clock() in the time.h file to figure out the run time of each algorithm. I am trying to test with a large amount of numbers. #define ELEMENTS 25 void insertion_sort(int x[],int length); void insertion_sort_reverse(int x[],int length); int main() { clock_t tStart = clock(); int B[ELEMENTS]={4,2,5,6,1,3,17,14,67,45,32,66,88, 78,69,92,93,21,25,23,71,61,59,60,30}; int x; cout<<"Not Sorted: "<<endl; for(x=0;x<ELEMENTS;x++) cout<<B[x]<<endl; insertion_sort(B,ELEMENTS); cout <<"Sorted Normal: "<<endl; for(x=0;x<ELEMENTS;x++) cout<< B[x] <<endl; insertion_sort_reverse(B,ELEMENTS); cout <<"Sorted Reverse: "<<endl; for(x=0;x<ELEMENTS;x++) cout<< B[x] <<endl; double seconds = clock() / double(CLK_TCK); cout << "This program has been running for " << seconds << " seconds." << endl; system("pause"); return 0; }

    Read the article

  • question about permut-by-sorting

    - by davit-datuashvili
    hi i have following question from book introduction in algorithms second edition there is such problem suppose we have some array A int a[]={1,2,3,4} and we have some random priorities array P={36,3,97,19} we shoud permut array a randomly using this priorities array here is pseudo code P ERMUTE -B Y-S ORTING ( A) 1 n ? length[A] 2 for i ? 1 to n do P[i] = R ANDOM(1, n 3 ) 3 4 sort A, using P as sort keys 5 return A and result will be permuted array B={2, 4, 1, 3}; please help any ideas i have done this code and need aideas how continue import java.util.*; public class Permut { public static void main(String[]args){ Random r=new Random(); int a[]=new int[]{1,2,3,4}; int n=a.length; int b[]=new int[a.length]; int p[]=new int[a.length]; for (int i=0;i<p.length;i++){ p[i]=r.nextInt(n*n*n)+1; } // for (int i=0;i<p.length;i++){ // System.out.println(p[i]); //} } } please help

    Read the article

  • JAVA - How to code Node neighbours in a Grid ?

    - by ke3pup
    Hi guys I'm new to programming and as a School task i need to implement BFS,DFS and A* search algorithms in java to search for a given Goal from a given start position in a Grid of given size, 4x4,8x8..etc to begin with i don't know how to code the neighbors of all the nodes. For example tile 1 in grid as 2 and 9 as neighbors and Tile 12 has ,141,13,20 as its neighbours but i'm struggling to code that. I need the neighbours part so that i can move from start position to other parts of gird legally by moving horizontally or vertically through the neighbours. my node class is: class node { int value; LinkedList neighbors; bool expanded; } let's say i'm given a 8x8 grid right, So if i start the program with a grid of size 8x8 right : 1 - my main will func will create an arrayList of nodes for example node ArrayList test = new ArrayList(); and then using a for loop assign value to all the nodes in arrayList from 1 to 64 (if the grid size was 8x8). BUT somehow i need t on coding that, if anyone can give me some details i would really appreciate it.

    Read the article

  • [Wireless LAN]hostapd is giving error whwn running in target board

    - by Renjith G
    hi, I got the following error when i tried to run the hostapd command in my target board. Any idea about this? /etc # hostapd -dd hostapd.conf Configuration file: hostapd.conf madwifi_set_iface_flags: dev_up=0 madwifi_set_privacy: enabled=0 BSS count 1, BSSID mask ff:ff:ff:ff:ff:ff (0 bits) Flushing old station entries madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=3 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 3) Could not connect to kernel driver. Deauthenticate all stations madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=2 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 2) madwifi_set_privacy: enabled=0 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=0 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=1 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=2 madwifi_del_key: addr=00:00:00:00:00:00 key_idx=3 Using interface ath0 with hwaddr 00:0b:6b:33:8c:30 and ssid '"RG_WLAN Testing Renjith G"' SSID - hexdump_ascii(len=27): 22 52 47 5f 57 4c 41 4e 20 54 65 73 74 69 6e 67 "RG_WLAN Testing 20 52 65 6e 6a 69 74 68 20 47 22 Renjith G" PSK (ASCII passphrase) - hexdump_ascii(len=12): 6d 79 70 61 73 73 70 68 72 61 73 65 mypassphrase PSK (from passphrase) - hexdump(len=32): 70 6f a6 92 da 9c a8 3b ff 36 85 76 f3 11 9c 5e 5d 4a 4b 79 f4 4e 18 f6 b1 b8 09 af 6c 9c 6c 21 madwifi_set_ieee8021x: enabled=1 madwifi_configure_wpa: group key cipher=1 madwifi_configure_wpa: pairwise key ciphers=0xa madwifi_configure_wpa: key management algorithms=0x2 madwifi_configure_wpa: rsn capabilities=0x0 madwifi_configure_wpa: enable WPA=0x1 WPA: group state machine entering state GTK_INIT (VLAN-ID 0) GMK - hexdump(len=32): [REMOVED] GTK - hexdump(len=32): [REMOVED] WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0) madwifi_set_key: alg=TKIP addr=00:00:00:00:00:00 key_idx=1 madwifi_set_privacy: enabled=1 madwifi_set_iface_flags: dev_up=1 ath0: Setup of interface done. l2_packet_receive - recvfrom: Network is down Wireless event: cmd=0x8b1a len=40 Register Fail Register Fail WPA: group state machine entering state SETKEYS (VLAN-ID 0) GMK - hexdump(len=32): [REMOVED] GTK - hexdump(len=32): [REMOVED] wpa_group_setkeys: GKeyDoneStations=0 WPA: group state machine entering state SETKEYSDONE (VLAN-ID 0) madwifi_set_key: alg=TKIP addr=00:00:00:00:00:00 key_idx=2 Signal 2 received - terminating Flushing old station entries madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=3 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 3) Could not connect to kernel driver. Deauthenticate all stations madwifi_sta_deauth: addr=ff:ff:ff:ff:ff:ff reason_code=2 ioctl[IEEE80211_IOCTL_SETMLME]: Invalid argument madwifi_sta_deauth: Failed to deauth STA (addr ff:ff:ff:ff:ff:ff reason 2) madwifi_set_privacy: enabled=0 madwifi_set_ieee8021x: enabled=0 madwifi_set_iface_flags: dev_up=0

    Read the article

  • TPROXY Not working with HAProxy, Ubuntu 14.04

    - by Nyxynyx
    I'm trying to use HAProxy as a fully transparent proxy using TPROXY in Ubuntu 14.04. HAProxy will be setup on the first server with eth1 111.111.250.250 and eth0 10.111.128.134. The single balanced server has eth1 and eth0 as well. eth1 is the public facing network interface while eth0 is for the private network which both servers are in. Problem: I'm able to connect to the balanced server's port 1234 directly (via eth1) but am not able to reach the balanced server via Haproxy port 1234 (which redirects to 1234 via eth0). Am I missing out something in this configuration? On the HAProxy server The current kernel is: Linux extremehash-lb2 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The kernel appears to have TPROXY support: # grep TPROXY /boot/config-3.13.0-24-generic CONFIG_NETFILTER_XT_TARGET_TPROXY=m HAProxy was compiled with TPROXY support: haproxy -vv HA-Proxy version 1.5.3 2014/07/25 Copyright 2000-2014 Willy Tarreau <[email protected]> Build options : TARGET = linux26 CPU = x86_64 CC = gcc CFLAGS = -g -fno-strict-aliasing OPTIONS = USE_LINUX_TPROXY=1 USE_LIBCRYPT=1 USE_STATIC_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built without zlib support (USE_ZLIB not set) Compression algorithms supported : identity Built without OpenSSL support (USE_OPENSSL not set) Built with PCRE version : 8.31 2012-07-06 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. In /etc/haproxy/haproxy.cfg, I've configured a port to have the following options: listen test1235 :1234 mode tcp option tcplog balance leastconn source 0.0.0.0 usesrc clientip server balanced1 10.111.163.76:1234 check inter 5s rise 2 fall 4 weight 4 On the balanced server In /etc/networking/interfaces I've set the gateway for eth0 to be the HAProxy box 10.111.128.134 and restarted networking. auto eth0 eth1 iface eth0 inet static address 111.111.250.250 netmask 255.255.224.0 gateway 111.131.224.1 dns-nameservers 8.8.4.4 8.8.8.8 209.244.0.3 iface eth1 inet static address 10.111.163.76 netmask 255.255.0.0 gateway 10.111.128.134 ip route gives: default via 111.111.224.1 dev eth0 10.111.0.0/16 dev eth1 proto kernel scope link src 10.111.163.76 111.111.224.0/19 dev eth0 proto kernel scope link src 111.111.250.250

    Read the article

  • Is there a way to replicate a very large file shares in real-time?

    - by fsckin
    I have an hourly cron job that copies about 40GB of data from a source folder into a new folder with the hour appended on the end. When it's done, the job prunes anything older than 24 hours. This data changes very often during work hours and is on a samba file share. Here's how the folder structure looks: \server\Version.1 \server\Version.2 \server\Version.3 ... \server\Version.24 The contents of each new folder compared to the last one usually doesn't change very much, since this is a hourly job. Now you might be thinking that I'm an idiot for setting dreaming this up. Truth is, I just found out. It's actually been used for years and is so incredibly simple, anyone could delete the ENTIRE 40GB share (imagine that dialog spooling up... deleting thousands and thousands of files) and it would actually be faster to restore by moving the latest copy back to the source than it took to delete. Brilliant! Now to top this off, I need to efficiently replicate this 960GB of "mostly similar" data to a remote server over WAN link, with the replication happening as close to real-time as possible -- think hot spare, disaster recovery, etc. My first thought was rsync. Total failure. Rsync sees it sees a deletion of the folder that is 24 hours old and the addition of a new folder with 30GB of data to sync! I also looked at rdiff-backup and unison, they both appear to use similar algorithms and do not keep enough meta-data to do this intelligently. Best thing that I can find "out of the box" to do this is Windows Server "Distributed Filesystem Replication" which uses "Remote Differential Compression" -- After reading the background information on how this works, it actually looks like exactly what I need. Problem: Both servers are running Linux. D'oh! One approach to this I'm looking at is this, say it's 5AM and the cron job finishes: New Version.5 folder arrives at on local server SSH to remote server and copy Version.4 to Version.5 Run rsync on the local server pushing changes to the remote server. Rsync finally knows to do a differential copy between Version.4 and Version.5 Is there a smarter way to replicate Samba shares as close to real-time as possible? Anything out there that does "Remote Differential Compression" on Linux?

    Read the article

  • NDepend tool – Why every developer working with Visual Studio.NET must try it!

    - by hajan
    In the past two months, I have had a chance to test the capabilities and features of the amazing NDepend tool designed to help you make your .NET code better, more beautiful and achieve high code quality. In other words, this tool will definitely help you harmonize your code. I mean, you’ve probably heard about Chaos Theory. Experienced developers and architects are already advocates of the programming chaos that happens when working with complex project architecture, the matrix of relationships between objects which simply even if you are the one who have written all that code, you know how hard is to visualize everything what does the code do. When the application get more and more complex, you will start missing a lot of details in your code… NDepend will help you visualize all the details on a clever way that will help you make smart moves to make your code better. The NDepend tool supports many features, such as: Code Query Language – which will help you write custom rules and query your own code! Imagine, you want to find all your methods which have more than 100 lines of code :)! That’s something simple! However, I will dig much deeper in one of my next blogs which I’m going to dedicate to the NDepend’s CQL (Code Query Language) Architecture Visualization – You are an architect and want to visualize your application’s architecture? I’m thinking how many architects will be really surprised from their architectures since NDepend shows your whole architecture showing each piece of it. NDepend will show you how your code is structured. It shows the architecture in graphs, but if you have very complex architecture, you can see it in Dependency Matrix which is more suited to display large architecture Code Metrics – Using NDepend’s panel, you can see the code base according to Code Metrics. You can do some additional filtering, like selecting the top code elements ordered by their current code metric value. You can use the CQL language for this purpose too. Smart Search – NDepend has great searching ability, which is again based on the CQL (Code Query Language). However, you have some options to search using dropdown lists and text boxes and it will generate the appropriate CQL code on fly. Moreover, you can modify the CQL code if you want it to fit some more advanced searching tasks. Compare Builds and Code Difference – NDepend will also help you compare previous versions of your code with the current one at one of the most clever ways I’ve seen till now. Create Custom Rules – using CQL you can create custom rules and let NDepend warn you on each build if you break a rule Reporting – NDepend can automatically generate reports with detailed stats, graph representation, dependency matrixes and some additional advanced reporting features that will simply explain you everything related to your application’s code, architecture and what you’ve done. And that’s not all. As I’ve seen, there are many other features that NDepend supports. I will dig more in the upcoming days and will blog more about it. The team who built the NDepend have also created good documentation, which you can find on the NDepend website. On their website, you can also find some good videos that will help you get started quite fast. It’s easy to install and what is very important it is fully integrated with Visual Studio. To get you started, you can watch the following Getting Started Online Demo and Tutorial with explanations and screenshots. If you are interested to know more about how to use the features of this tool, either visit their website or wait for my next blogs where I will show some real examples of using the tool and how it helps make your code better. And the last thing for this blog, I would like to copy one sentence from the NDepend’s home page which says: ‘Hence the software design becomes concrete, code reviews are effective, large refactoring are easy and evolution is mastered.’ Website: www.ndepend.com Getting Started: http://www.ndepend.com/GettingStarted.aspx Features: http://www.ndepend.com/Features.aspx Download: http://www.ndepend.com/NDependDownload.aspx Hope you like it! Please do let me know your feedback by providing comments to my blog post. Kind Regards, Hajan

    Read the article

  • Form, function and complexity in rule processing

    - by Charles Young
    Tim Bass posted on ‘Orwellian Event Processing’. I was involved in a heated exchange in the comments, and he has more recently published a post entitled ‘Disadvantages of Rule-Based Systems (Part 1)’. Whatever the rights and wrongs of our exchange, it clearly failed to generate any agreement or understanding of our different positions. I don't particularly want to promote further argument of that kind, but I do want to take the opportunity of offering a different perspective on rule-processing and an explanation of my comments. For me, the ‘red rag’ lay in Tim’s claim that “...rules alone are highly inefficient for most classes of (not simple) problems” and a later paragraph that appears to equate the simplicity of form (‘IF-THEN-ELSE’) with simplicity of function.   It is not the first time Tim has expressed these views and not the first time I have responded to his assertions.   Indeed, Tim has a long history of commenting on the subject of complex event processing (CEP) and, less often, rule processing in ‘robust’ terms, often asserting that very many other people’s opinions on this subject are mistaken.   In turn, I am of the opinion that, certainly in terms of rule processing, which is an area in which I have a specific interest and knowledge, he is often mistaken. There is no simple answer to the fundamental question ‘what is a rule?’ We use the word in a very fluid fashion in English. Likewise, the term ‘rule processing’, as used widely in IT, is equally difficult to define simplistically. The best way to envisage the term is as a ‘centre of gravity’ within a wider domain. That domain contains many other ‘centres of gravity’, including CEP, statistical analytics, neural networks, natural language processing and so much more. Whole communities tend to gravitate towards and build themselves around some of these centres. The term 'rule processing' is associated with many different technology types, various software products, different architectural patterns, the functional capability of many applications and services, etc. There is considerable variation amongst these different technologies, techniques and products. Very broadly, a common theme is their ability to manage certain types of processing and problem solving through declarative, or semi-declarative, statements of propositional logic bound to action-based consequences. It is generally important to be able to decouple these statements from other parts of an overall system or architecture so that they can be managed and deployed independently.  As a centre of gravity, ‘rule processing’ is no island. It exists in the context of a domain of discourse that is, itself, highly interconnected and continuous.   Rule processing does not, for example, exist in splendid isolation to natural language processing.   On the contrary, an on-going theme of rule processing is to find better ways to express rules in natural language and map these to executable forms.   Rule processing does not exist in splendid isolation to CEP.   On the contrary, an event processing agent can reasonably be considered as a rule engine (a theme in ‘Power of Events’ by David Luckham).   Rule processing does not live in splendid isolation to statistical approaches such as Bayesian analytics. On the contrary, rule processing and statistical analytics are highly synergistic.   Rule processing does not even live in splendid isolation to neural networks. For example, significant research has centred on finding ways to translate trained nets into explicit rule sets in order to support forms of validation and facilitate insight into the knowledge stored in those nets. What about simplicity of form?   Many rule processing technologies do indeed use a very simple form (‘If...Then’, ‘When...Do’, etc.)   However, it is a fundamental mistake to equate simplicity of form with simplicity of function.   It is absolutely mistaken to suggest that simplicity of form is a barrier to the efficient handling of complexity.   There are countless real-world examples which serve to disprove that notion.   Indeed, simplicity of form is often the key to handling complexity. Does rule processing offer a ‘one size fits all’. No, of course not.   No serious commentator suggests it does.   Does the design and management of large knowledge bases, expressed as rules, become difficult?   Yes, it can do, but that is true of any large knowledge base, regardless of the form in which knowledge is expressed.   The measure of complexity is not a function of rule set size or rule form.  It tends to be correlated more strongly with the size of the ‘problem space’ (‘search space’) which is something quite different.   Analysis of the problem space and the algorithms we use to search through that space are, of course, the very things we use to derive objective measures of the complexity of a given problem. This is basic computer science and common practice. Sailing a Dreadnaught through the sea of information technology and lobbing shells at some of the islands we encounter along the way does no one any good.   Building bridges and causeways between islands so that the inhabitants can collaborate in open discourse offers hope of real progress.

    Read the article

  • WebLogic Server Performance and Tuning: Part I - Tuning JVM

    - by Gokhan Gungor
    Each WebLogic Server instance runs in its own dedicated Java Virtual Machine (JVM) which is their runtime environment. Every Admin Server in any domain executes within a JVM. The same also applies for Managed Servers. WebLogic Server can be used for a wide variety of applications and services which uses the same runtime environment and resources. Oracle WebLogic ships with 2 different JVM, HotSpot and JRocket but you can choose which JVM you want to use. JVM is designed to optimize itself however it also provides some startup options to make small changes. There are default values for its memory and garbage collection. In real world, you will not want to stick with the default values provided by the JVM rather want to customize these values based on your applications which can produce large gains in performance by making small changes with the JVM parameters. We can tell the garbage collector how to delete garbage and we can also tell JVM how much space to allocate for each generation (of java Objects) or for heap. Remember during the garbage collection no other process is executed within the JVM or runtime, which is called STOP THE WORLD which can affect the overall throughput. Each JVM has its own memory segment called Heap Memory which is the storage for java Objects. These objects can be grouped based on their age like young generation (recently created objects) or old generation (surviving objects that have lived to some extent), etc. A java object is considered garbage when it can no longer be reached from anywhere in the running program. Each generation has its own memory segment within the heap. When this segment gets full, garbage collector deletes all the objects that are marked as garbage to create space. When the old generation space gets full, the JVM performs a major collection to remove the unused objects and reclaim their space. A major garbage collect takes a significant amount of time and can affect system performance. When we create a managed server either on the same machine or on remote machine it gets its initial startup parameters from $DOMAIN_HOME/bin/setDomainEnv.sh/cmd file. By default two parameters are set:     Xms: The initial heapsize     Xmx: The max heapsize Try to set equal initial and max heapsize. The startup time can be a little longer but for long running applications it will provide a better performance. When we set -Xms512m -Xmx1024m, the physical heap size will be 512m. This means that there are pages of memory (in the state of the 512m) that the JVM does not explicitly control. It will be controlled by OS which could be reserve for the other tasks. In this case, it is an advantage if the JVM claims the entire memory at once and try not to spend time to extend when more memory is needed. Also you can use -XX:MaxPermSize (Maximum size of the permanent generation) option for Sun JVM. You should adjust the size accordingly if your application dynamically load and unload a lot of classes in order to optimize the performance. You can set the JVM options/heap size from the following places:     Through the Admin console, in the Server start tab     In the startManagedWeblogic script for the managed servers     $DOMAIN_HOME/bin/startManagedWebLogic.sh/cmd     JAVA_OPTIONS="-Xms1024m -Xmx1024m" ${JAVA_OPTIONS}     In the setDomainEnv script for the managed servers and admin server (domain wide)     USER_MEM_ARGS="-Xms1024m -Xmx1024m" When there is free memory available in the heap but it is too fragmented and not contiguously located to store the object or when there is actually insufficient memory we can get java.lang.OutOfMemoryError. We should create Thread Dump and analyze if that is possible in case of such error. The second option we can use to produce higher throughput is to garbage collection. We can roughly divide GC algorithms into 2 categories: parallel and concurrent. Parallel GC stops the execution of all the application and performs the full GC, this generally provides better throughput but also high latency using all the CPU resources during GC. Concurrent GC on the other hand, produces low latency but also low throughput since it performs GC while application executes. The JRockit JVM provides some useful command-line parameters that to control of its GC scheme like -XgcPrio command-line parameter which takes the following options; XgcPrio:pausetime (To minimize latency, parallel GC) XgcPrio:throughput (To minimize throughput, concurrent GC ) XgcPrio:deterministic (To guarantee maximum pause time, for real time systems) Sun JVM has similar parameters (like  -XX:UseParallelGC or -XX:+UseConcMarkSweepGC) to control its GC scheme. We can add -verbosegc -XX:+PrintGCDetails to monitor indications of a problem with garbage collection. Try configuring JVM’s of all managed servers to execute in -server mode to ensure that it is optimized for a server-side production environment.

    Read the article

  • Big Data: Size isn’t everything

    - by Simon Elliston Ball
    Big Data has a big problem; it’s the word “Big”. These days, a quick Google search will uncover terabytes of negative opinion about the futility of relying on huge volumes of data to produce magical, meaningful insight. There are also many clichéd but correct assertions about the difficulties of correlation versus causation, in massive data sets. In reading some of these pieces, I begin to understand how climatologists must feel when people complain ironically about “global warming” during snowfall. Big Data has a name problem. There is a lot more to it than size. Shape, Speed, and…err…Veracity are also key elements (now I understand why Gartner and the gang went with V’s instead of S’s). The need to handle data of different shapes (Variety) is not new. Data developers have always had to mold strange-shaped data into our reporting systems, integrating with semi-structured sources, and even straying into full-text searching. However, what we lacked was an easy way to add semi-structured and unstructured data to our arsenal. New “Big Data” tools such as MongoDB, and other NoSQL (Not Only SQL) databases, or a graph database like Neo4J, fill this gap. Still, to many, they simply introduce noise to the clean signal that is their sensibly normalized data structures. What about speed (Velocity)? It’s not just high frequency trading that generates data faster than a single system can handle. Many other applications need to make trade-offs that traditional databases won’t, in order to cope with high data insert speeds, or to extract quickly the required information from data streams. Unfortunately, many people equate Big Data with the Hadoop platform, whose batch driven queries and job processing queues have little to do with “velocity”. StreamInsight, Esper and Tibco BusinessEvents are examples of Big Data tools designed to handle high-velocity data streams. Again, the name doesn’t do the discipline of Big Data any favors. Ultimately, though, does analyzing fast moving data produce insights as useful as the ones we get through a more considered approach, enabled by traditional BI? Finally, we have Veracity and Value. In many ways, these additions to the classic Volume, Velocity and Variety trio acknowledge the criticism that without high-quality data and genuinely valuable outputs then data, big or otherwise, is worthless. As a discipline, Big Data has recognized this, and data quality and cleaning tools are starting to appear to support it. Rather than simply decrying the irrelevance of Volume, we need as a profession to focus how to improve Veracity and Value. Perhaps we should just declare the ‘Big’ silent, embrace these new data tools and help develop better practices for their use, just as we did the good old RDBMS? What does Big Data mean to you? Which V gives your business the most pain, or the most value? Do you see these new tools as a useful addition to the BI toolbox, or are they just enabling a dangerous trend to find ghosts in the noise?

    Read the article

  • The New Social Developer Community: a Q&A

    - by Mike Stiles
    In our last blog, we introduced the opportunities that lie ahead for social developers as social applications reach across every aspect and function of the enterprise. Leading the upcoming JavaOne Social Developer Program October 2 at the San Francisco Hilton is Roland Smart, VP of Social Marketing at Oracle. I got to ask Roland a few of the questions an existing or budding social developer might want to know as social extends beyond interacting with friends and marketing and into the enterprise. Why is it smart for developers to specialize as social developers? What opportunities lie in the immediate future that’s making this a critical, in-demand position? Social has changed the way we interact with brands and with each other across the web. As we acclimate to a new social paradigm we also look to extend its benefits into new areas of our lives. The workplace is a logical next step, and we're starting to see social interactions more and more in this context. But unlocking the value of social interactions requires technical expertise and knowledge of developing social apps that tap into the social graph. Developers focused on integrating social experiences into enterprise applications must be familiar with popular social APIs and must understand how to build enterprise social graphs of their own. These developers are part of an emerging community of social developers and are key to socially enabling the enterprise. Facebook rebranded their Preferred Developer Consultant Group (PDC) and the Preferred Marketing Developers (PMD) to underscore the fact developers are required inside marketing organizations to unlock the full potential of their platform. While this trend is starting on the marketing side with marketing developers, this is just an extension of the social developer concept that will ultimately drive social across the enterprise. What are some of the various ways social will be making its way into every area of enterprise organizations? How will it be utilized and what kinds of applications are going to be needed to facilitate and maximize these changes? Check out Oracle’s vision for the social-enabled enterprise. It’s a high-level overview of how social will impact across the enterprise. For example: HR can leverage social in recruiting and retentionSales can leverage social as a prospecting toolMarketing can use social to gain market insightCustomer support can use social to leverage community support to improve customer satisfaction while reducing service costOperations can leverage social improve systems That’s only the beginning. Once sleeves get rolled up and social developers and innovators get to work, still more social functions will no doubt emerge. What makes Java one of, if not the most viable platform on which to build these new enterprise social applications? Java is certainly one of the best platforms on which to build social experiences because there’s such a large existing community of Java developers. This means you can affordably recruit talent, and it's possible to effectively solicit advice from the community through various means, including our new Social Developer Community. Beyond that, there are already some great proof points Java is the best platform for creating social experiences at scale. Consider LinkedIn and Twitter. Tell us more about the benefits of collaboration and more about what the Oracle Social Developer Community is. What opportunities does that offer up and what are some of the ways developers can actively participate in and benefit from that community? Much has been written about the overall benefits of collaborating with other developers. Those include an opportunity to introduce yourself to the community of social developers, foster a reputation, establish an expertise, contribute to the advancement of the space, get feedback, experiment with the latest concepts, and gain inspiration. In short, collaboration is a tool that must be applied properly within a framework to get the most value out of it. The OSDC is a place where social developers can congregate to discuss the opportunities/challenges of building social integrations into their applications. What “needs” will this community have? We don't know yet. But we wanted to create a forum where we can engage and understand what social developers are thinking about, excited about, struggling with, etc. The OSDL can then step in if we can help remove barriers and add value in a serious and committed way so Oracle can help drive practice development.

    Read the article

  • Why lock-free data structures just aren't lock-free enough

    - by Alex.Davies
    Today's post will explore why the current ways to communicate between threads don't scale, and show you a possible way to build scalable parallel programming on top of shared memory. The problem with shared memory Soon, we will have dozens, hundreds and then millions of cores in our computers. It's inevitable, because individual cores just can't get much faster. At some point, that's going to mean that we have to rethink our architecture entirely, as millions of cores can't all access a shared memory space efficiently. But millions of cores are still a long way off, and in the meantime we'll see machines with dozens of cores, struggling with shared memory. Alex's tip: The best way for an application to make use of that increasing parallel power is to use a concurrency model like actors, that deals with synchronisation issues for you. Then, the maintainer of the actors framework can find the most efficient way to coordinate access to shared memory to allow your actors to pass messages to each other efficiently. At the moment, NAct uses the .NET thread pool and a few locks to marshal messages. It works well on dual and quad core machines, but it won't scale to more cores. Every time we use a lock, our core performs an atomic memory operation (eg. CAS) on a cell of memory representing the lock, so it's sure that no other core can possibly have that lock. This is very fast when the lock isn't contended, but we need to notify all the other cores, in case they held the cell of memory in a cache. As the number of cores increases, the total cost of a lock increases linearly. A lot of work has been done on "lock-free" data structures, which avoid locks by using atomic memory operations directly. These give fairly dramatic performance improvements, particularly on systems with a few (2 to 4) cores. The .NET 4 concurrent collections in System.Collections.Concurrent are mostly lock-free. However, lock-free data structures still don't scale indefinitely, because any use of an atomic memory operation still involves every core in the system. A sync-free data structure Some concurrent data structures are possible to write in a completely synchronization-free way, without using any atomic memory operations. One useful example is a single producer, single consumer (SPSC) queue. It's easy to write a sync-free fixed size SPSC queue using a circular buffer*. Slightly trickier is a queue that grows as needed. You can use a linked list to represent the queue, but if you leave the nodes to be garbage collected once you're done with them, the GC will need to involve all the cores in collecting the finished nodes. Instead, I've implemented a proof of concept inspired by this intel article which reuses the nodes by putting them in a second queue to send back to the producer. * In all these cases, you need to use memory barriers correctly, but these are local to a core, so don't have the same scalability problems as atomic memory operations. Performance tests I tried benchmarking my SPSC queue against the .NET ConcurrentQueue, and against a standard Queue protected by locks. In some ways, this isn't a fair comparison, because both of these support multiple producers and multiple consumers, but I'll come to that later. I started on my dual-core laptop, running a simple test that had one thread producing 64 bit integers, and another consuming them, to measure the pure overhead of the queue. So, nothing very interesting here. Both concurrent collections perform better than the lock-based one as expected, but there's not a lot to choose between the ConcurrentQueue and my SPSC queue. I was a little disappointed, but then, the .NET Framework team spent a lot longer optimising it than I did. So I dug out a more powerful machine that Red Gate's DBA tools team had been using for testing. It is a 6 core Intel i7 machine with hyperthreading, adding up to 12 logical cores. Now the results get more interesting. As I increased the number of producer-consumer pairs to 6 (to saturate all 12 logical cores), the locking approach was slow, and got even slower, as you'd expect. What I didn't expect to be so clear was the drop-off in performance of the lock-free ConcurrentQueue. I could see the machine only using about 20% of available CPU cycles when it should have been saturated. My interpretation is that as all the cores used atomic memory operations to safely access the queue, they ended up spending most of the time notifying each other about cache lines that need invalidating. The sync-free approach scaled perfectly, despite still working via shared memory, which after all, should still be a bottleneck. I can't quite believe that the results are so clear, so if you can think of any other effects that might cause them, please comment! Obviously, this benchmark isn't realistic because we're only measuring the overhead of the queue. Any real workload, even on a machine with 12 cores, would dwarf the overhead, and there'd be no point worrying about this effect. But would that be true on a machine with 100 cores? Still to be solved. The trouble is, you can't build many concurrent algorithms using only an SPSC queue to communicate. In particular, I can't see a way to build something as general purpose as actors on top of just SPSC queues. Fundamentally, an actor needs to be able to receive messages from multiple other actors, which seems to need an MPSC queue. I've been thinking about ways to build a sync-free MPSC queue out of multiple SPSC queues and some kind of sign-up mechanism. Hopefully I'll have something to tell you about soon, but leave a comment if you have any ideas.

    Read the article

  • Towards an F# .NET Reflector add-in

    - by CliveT
    When I had the opportunity to spent some time during Red Gate's recent "down tools" week on a project of my choice, the obvious project was an F# add-in for Reflector . To be honest, this was a bit of a misnomer as the amount of time in the designated week for coding was really less than three days, so it was always unlikely that very much progress would be made in such a small amount of time (and that certainly proved to be the case), but I did learn some things from the experiment. Like lots of problems, one useful technique is to take examples, get them to work, and then generalise to get something that works across the board. Unfortunately, I didn't have enough time to do the last stage. The obvious first step is to take a few function definitions, starting with the obvious hello world, moving on to a non-recursive function and finishing with the ubiquitous recursive Fibonacci function. let rec printMessage message  =     printfn  message let foo x  =    (x + 1) let rec fib x  =     if (x >= 2) then (fib (x - 1) + fib (x - 2)) else 1 The major problem in decompiling these simple functions is that Reflector has an in-memory object model that is designed to support object-oriented languages. In particular it has a return statement that allows function bodies to finish early. I used some of the in-built functionality to take the IL and produce an in-memory object model for the language, but then needed to write a transformer to push the return statements to the top of the tree to make it easy to render the code into a functional language. This tree transform works in some scenarios, but not in others where we simply regenerate code that looks more like CPS style. The next thing to get working was library level bindings of values where these values are calculated at runtime. let x = [1 ; 2 ; 3 ; 4] let y = List.map  (fun x -> foo x) x The way that this is translated into a set of classes for the underlying platform means that the code needs to follow references around, from the property exposing the calculated value to the class in which the code for generating the value is embedded. One of the strongest selling points of functional languages is the algebraic datatypes, which allow definitions via standard mathematical-style inductive definitions across the union cases. type Foo =     | Something of int     | Nothing type 'a Foo2 =     | Something2 of 'a     | Nothing2 Such a definition is compiled into a number of classes for the cases of the union, which all inherit from a class representing the type itself. It wasn't too hard to get such a de-compilation happening in the cases I tried. What did I learn from this? Firstly, that there are various bits of functionality inside Reflector that it would be useful for us to allow add-in writers to access. In particular, there are various implementations of the Visitor pattern which implement algorithms such as calculating the number of references for particular variables, and which perform various substitutions which could be more generally useful to add-in writers. I hope to do something about this at some point in the future. Secondly, when you transform a functional language into something that runs on top of an object-based platform, you lose some fidelity in the representation. The F# compiler leaves attributes in place so that tools can tell which classes represent classes from the source program and which are there for purposes of the implementation, allowing the decompiler to regenerate these constructs again. However, decompilation technology is a long way from being able to take unannotated IL and transform it into a program in a different language. For a simple function definition, like Fibonacci, I could write a simple static function and have it come out in F# as the same function, but it would be practically impossible to take a mass of class definitions and have a decompiler translate it automatically into an F# algebraic data type. What have we got out of this? Some data on the feasibility of implementing an F# decompiler inside Reflector, though it's hard at the moment to say how long this would take to do. The work we did is included the 6.5 EAP for Reflector that you can get from the EAP forum. All things considered though, it was a useful way to gain more familiarity with the process of writing an add-in and understand difficulties other add-in authors might experience. If you'd like to check out a video of Down Tools Week, click here.

    Read the article

  • Oracle Products Reflect Key Trends Shaping Enterprise 2.0

    - by kellsey.ruppel(at)oracle.com
    Following up on his predictions for 2011, we asked Enterprise 2.0 veteran Andy MacMillan to map out the ways Oracle solutions are at the forefront of industry trends--and how Oracle customers can benefit in the coming year. 1. Increase organizational awareness | Oracle WebCenter Suite Oracle WebCenter Suite provides a unique set of capabilities to drive organizational awareness. In particular, the expansive activity graph connects users directly to key enterprise applications, activities, and interests. In this way, applicable and critical business information is automatically and immediately visible--in the context of key tasks--via real-time dashboards and comprehensive reporting. Oracle WebCenter Suite also integrates key E2.0 services, such as blogs, wikis, and RSS feeds, into critical business processes, including back-office systems of records such as ERP and CRM systems. 2. Drive online customer engagement | Oracle Real-Time Decisions With more and more business being conducted on the Web, driving increased online customer engagement becomes a critical key to success. This effort is usually spearheaded by an increasingly important executive role, the Head of Online, who usually reports directly to the CMO. To help manage the Web experience online, Oracle solutions are driving a new kind of intelligent social commerce by combining Oracle Universal Content Management, Oracle WebCenter Services, and Oracle Real-Time Decisions with leading e-commerce and product recommendations. Oracle Real-Time Decisions provides multichannel recommendations for content, products, and services--including seamless integration across Web, mobile, and social channels. The result: happier customers, increased customer acquisition and retention, and improved critical success metrics such as shopping cart abandonment. 3. Easily build composite applications | Oracle Application Development Framework Thanks to the shared user experience strategy across Oracle Fusion Middleware, Oracle Fusion Applications and many other Oracle Applications, customers can easily create real, customer-specific composite applications using Oracle WebCenter Suite and Oracle Application Development Framework. Oracle Application Development Framework components provide modular user interface components that can build rich, social composite applications. In addition, a broad set of components spanning BPM, SOA, ECM, and beyond can be quickly and easily incorporated into composite applications. 4. Integrate records management into a global content platform | Oracle Enterprise Content Management 11g Oracle Enterprise Content Management 11g provides leading records management capabilities as part of a unified ECM platform for managing records, documents, Web content, digital assets, enterprise imaging, and application imaging. This unique strategy provides comprehensive records management in a consistent, cost-effective way, and enables organizations to consolidate ECM repositories and connect ECM to critical business applications. 5. Achieve ECM at extreme scale | Oracle WebLogic Server and Oracle Exadata To support the high-performance demands of a unified and rationalized content platform, Oracle has pioneered highly scalable and high-performing ECM infrastructures. Two innovations in particular helped make this happen. The core ECM platform itself moved to an Enterprise Java architecture, so organizations can now use Oracle WebLogic Server for enhanced scalability and manageability. Oracle Enterprise Content Management 11g can leverage Oracle Exadata for extreme performance and scale. Likewise, Oracle Exalogic--Oracle's foundation for cloud computing--enables extreme performance for processor-intensive capabilities such as content conversion or dynamic Web page delivery. Learn more about Oracle's Enterprise 2.0 solutions.

    Read the article

  • My Take on Hadoop World 2011

    - by Jean-Pierre Dijcks
    I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

    Read the article

  • Getting started with Oracle Database In-Memory Part III - Querying The IM Column Store

    - by Maria Colgan
    In my previous blog posts, I described how to install, enable, and populate the In-Memory column store (IM column store). This weeks post focuses on how data is accessed within the IM column store. Let’s take a simple query “What is the most expensive air-mail order we have received to date?” SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorderWHERE  lo_shipmode = 5; The LINEORDER table has been populated into the IM column store and since we have no alternative access paths (indexes or views) the execution plan for this query is a full table scan of the LINEORDER table. You will notice that the execution plan has a new set of keywords “IN MEMORY" in the access method description in the Operation column. These keywords indicate that the LINEORDER table has been marked for INMEMORY and we may use the IM column store in this query. What do I mean by “may use”? There are a small number of cases were we won’t use the IM column store even though the object has been marked INMEMORY. This is similar to how the keyword STORAGE is used on Exadata environments. You can confirm that the IM column store was actually used by examining the session level statistics, but more on that later. For now let's focus on how the data is accessed in the IM column store and why it’s faster to access the data in the new column format, for analytical queries, rather than the buffer cache. There are four main reasons why accessing the data in the IM column store is more efficient. 1. Access only the column data needed The IM column store only has to scan two columns – lo_shipmode and lo_ordtotalprice – to execute this query while the traditional row store or buffer cache has to scan all of the columns in each row of the LINEORDER table until it reaches both the lo_shipmode and the lo_ordtotalprice column. 2. Scan and filter data in it's compressed format When data is populated into the IM column it is automatically compressed using a new set of compression algorithms that allow WHERE clause predicates to be applied against the compressed formats. This means the volume of data scanned in the IM column store for our query will be far less than the same query in the buffer cache where it will scan the data in its uncompressed form, which could be 20X larger. 3. Prune out any unnecessary data within each column The fastest read you can execute is the read you don’t do. In the IM column store a further reduction in the amount of data accessed is possible due to the In-Memory Storage Indexes(IM storage indexes) that are automatically created and maintained on each of the columns in the IM column store. IM storage indexes allow data pruning to occur based on the filter predicates supplied in a SQL statement. An IM storage index keeps track of minimum and maximum values for each column in each of the In-Memory Compression Unit (IMCU). In our query the WHERE clause predicate is on the lo_shipmode column. The IM storage index on the lo_shipdate column is examined to determine if our specified column value 5 exist in any IMCU by comparing the value 5 to the minimum and maximum values maintained in the Storage Index. If the value 5 is outside the minimum and maximum range for an IMCU, the scan of that IMCU is avoided. For the IMCUs where the value 5 does fall within the min, max range, an additional level of data pruning is possible via the metadata dictionary created when dictionary-based compression is used on IMCU. The dictionary contains a list of the unique column values within the IMCU. Since we have an equality predicate we can easily determine if 5 is one of the distinct column values or not. The combination of the IM storage index and dictionary based pruning, enables us to only scan the necessary IMCUs. 4. Use SIMD to apply filter predicates For the IMCU that need to be scanned Oracle takes advantage of SIMD vector processing (Single Instruction processing Multiple Data values). Instead of evaluating each entry in the column one at a time, SIMD vector processing allows a set of column values to be evaluated together in a single CPU instruction. The column format used in the IM column store has been specifically designed to maximize the number of column entries that can be loaded into the vector registers on the CPU and evaluated in a single CPU instruction. SIMD vector processing enables the Oracle Database In-Memory to scan billion of rows per second per core versus the millions of rows per second per core scan rate that can be achieved in the buffer cache. I mentioned earlier in this post that in order to confirm the IM column store was used; we need to examine the session level statistics. You can monitor the session level statistics by querying the performance views v$mystat and v$statname. All of the statistics related to the In-Memory Column Store begin with IM. You can see the full list of these statistics by typing: display_name format a30 SELECT display_name FROM v$statname WHERE  display_name LIKE 'IM%'; If we check the session statistics after we execute our query the results would be as follow; SELECT Max(lo_ordtotalprice) most_expensive_order FROM lineorderWHERE lo_shipmode = 5; SELECT display_name FROM v$statname WHERE  display_name IN ('IM scan CUs columns accessed',                        'IM scan segments minmax eligible',                        'IM scan CUs pruned'); As you can see, only 2 IMCUs were accessed during the scan as the majority of the IMCUs (44) in the LINEORDER table were pruned out thanks to the storage index on the lo_shipmode column. In next weeks post I will describe how you can control which queries use the IM column store and which don't. +Maria Colgan

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Key-Value Pair Databases and Document Databases – Day 13 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Relational Database and NoSQL database in the Big Data Story. In this article we will understand the role of Key-Value Pair Databases and Document Databases Supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (Yesterday’s post) NoSQL Databases (Yesterday’s post) Key-Value Pair Databases (This post) Document Databases (This post) Columnar Databases (Tomorrow’s post) Graph Databases (Tomorrow’s post) Spatial Databases (Tomorrow’s post) Key Value Pair Databases Key Value Pair Databases are also known as KVP databases. A key is a field name and attribute, an identifier. The content of that field is its value, the data that is being identified and stored. They have a very simple implementation of NoSQL database concepts. They do not have schema hence they are very flexible as well as scalable. The disadvantages of Key Value Pair (KVP) database are that they do not follow ACID (Atomicity, Consistency, Isolation, Durability) properties. Additionally, it will require data architects to plan for data placement, replication as well as high availability. In KVP databases the data is stored as strings. Here is a simple example of how Key Value Database will look like: Key Value Name Pinal Dave Color Blue Twitter @pinaldave Name Nupur Dave Movie The Hero As the number of users grow in Key Value Pair databases it starts getting difficult to manage the entire database. As there is no specific schema or rules associated with the database, there are chances that database grows exponentially as well. It is very crucial to select the right Key Value Pair Database which offers an additional set of tools to manage the data and provides finer control over various business aspects of the same. Riak Rick is one of the most popular Key Value Database. It is known for its scalability and performance in high volume and velocity database. Additionally, it implements a mechanism for collection key and values which further helps to build manageable system. We will further discuss Riak in future blog posts. Key Value Databases are a good choice for social media, communities, caching layers for connecting other databases. In simpler words, whenever we required flexibility of the data storage keeping scalability in mind – KVP databases are good options to consider. Document Database There are two different kinds of document databases. 1) Full document Content (web pages, word docs etc) and 2) Storing Document Components for storage. The second types of the document database we are talking about over here. They use Javascript Object Notation (JSON) and Binary JSON for the structure of the documents. JSON is very easy to understand language and it is very easy to write for applications. There are two major structures of JSON used for Document Database – 1) Name Value Pairs and 2) Ordered List. MongoDB and CouchDB are two of the most popular Open Source NonRelational Document Database. MongoDB MongoDB databases are called collections. Each collection is build of documents and each document is composed of fields. MongoDB collections can be indexed for optimal performance. MongoDB ecosystem is highly available, supports query services as well as MapReduce. It is often used in high volume content management system. CouchDB CouchDB databases are composed of documents which consists fields and attachments (known as description). It supports ACID properties. The main attraction points of CouchDB are that it will continue to operate even though network connectivity is sketchy. Due to this nature CouchDB prefers local data storage. Document Database is a good choice of the database when users have to generate dynamic reports from elements which are changing very frequently. A good example of document usages is in real time analytics in social networking or content management system. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Countdown of Top 10 Reasons to Never Ever Use a Pie Chart

    - by Tony Wolfram
      Pie charts are evil. They represent much of what is wrong with the poor design of many websites and software applications. They're also innefective, misleading, and innacurate. Using a pie chart as your graph of choice to visually display important statistics and information demonstrates either a lack of knowledge, laziness, or poor design skills. Figure 1: A floating, tilted, 3D pie chart with shadow trying (poorly)to show usage statistics within a graphics application.   Of course, pie charts in and of themselves are not evil. This blog is really about designers making poor decisions for all the wrong reasons. In order for a pie chart to appear on a web page, somebody chose it over the other alternatives, and probably thought they were doing the right thing. They weren't. Using a pie chart is almost always a bad design decision. Figure 2: Pie Chart from an Oracle Reports User Guide   A pie chart does not do the job of effectively displaying information in an elegant visual form.  Being circular, they use up too much space while not allowing their labels to line up. Bar charts, line charts, and tables do a much better job. Expert designers, statisticians, and business analysts have documented their many failings, and strongly urge software and report designers not to use them. It's obvious to them that the pie chart has too many inherent defects to ever be used effectively. Figure 3: Demonstration of how comparing data between multiple pie charts is difficult.   Yet pie charts are still used frequently in today's software applications, financial reports, and websites, often on the opening page as a symbol of how the data inside is represented. In an attempt to get a flashy colorful graphic to break up boring text, designers will often settle for a pie chart that looks like pac man, a colored spinning wheel, or a 3D floating alien space ship.     Figure 4: Best use of a pie chart I've found yet.   Why is the pie chart so popular? Through its constant use and iconic representation as the classic chart, the idea persists that it must be a good choice, since everyone else is still using it. Like a virus or an urban legend, no amount of vaccine or debunking will slow down the use of pie charts, which seem to be resistant to logic and common sense. Even the new iPad from Apple showcases the pie chart as one of its options.     Figure 5: Screen shot of new iPad showcasing pie charts. Regardless of the futility in trying to rid the planet of this often used poor design choice, I now present to you my top 10 reasons why you should never, ever user a pie chart again.    Number 10 - Pie Charts Just Don't Work When Comparing Data Number 9 - You Have A Better Option: The Sorted Horizontal Bar Chart Number 8 - The Pie Chart is Always Round Number 7 - Some Genius Will Make It 3D Number 6 - Legends and Labels are Hard to Align and Read Number 5 - Nobody Has Ever Made a Critical Decision Using a Pie Chart Number 4 - It Doesn't Scale Well to More Than 2 Items Number 3 - A Pie Chart Causes Distortions and Errors Number 2 - Everyone Else Uses Them: Debunking the "Urban Legend" of Pie Charts Number 1 - Pie Charts Make You Look Stupid and Lazy  

    Read the article

  • Content Challenge: You Can Only Get it Here

    - by Mike Stiles
    Part of the content conundrum for brands is figuring out what kind of content customers would find cool, desirable, and relevant. The mere fact many brands have no idea what this content might be is, in itself, pretty alarming. You’d have to have a pretty thorough lack of involvement with and understanding of your customers to not know what they might like. But despite what should be a great awakening in which consumers are using every technology and trick in the book to shield themselves from ads and commercials, brand self-obsession continues as marketers concentrate on their message, their campaign, what they want to say, and what they want social users to do. When individuals conduct themselves in that same fashion on Facebook and Twitter, it gets tiresome and starts losing value pretty quickly. Their posts eventually get hidden. Conversely, friends who post things that consistently entertain or inform, with little self-marketing desperation involved, win the coveted “show all updates” setting. Of course brands are going to use social to market. It’s pretty much the point of having social in the marketing mix. And yes, people who follow a brand’s Twitter account or “Like” a brand’s Facebook Page implicitly state they want to know what’s going on with that brand’s products and services. But if you have a Facebook friend that assumes you want every one of her posts to be about what wine she likes (Mitsubishi’s current campaign is even based around weeding out pretentious Facebook friends, then running them over), then you know how it must feel for your fans and followers to get a sales pitch for your crackers or whatever you’re selling every single time. Is there such a thing as content that doesn’t sell but that still advances the brand and makes the consumer more involved and valuable? Of course. And perhaps there are no better companies than enterprise brands to do it. Enterprise organizations are large enough to go beyond a product and engage readers/viewers at higher, broader levels…communicating expertise across entire sectors, subjects and industries. You’re going from pitchman to news source, and getting full credit for it as the presenter. A recent GigaOM article pointed out the success a San Francisco-based startup called Crunchyroll is having. Their niche (and they proudly admit it’s a niche) is providing Japanese anime, Korean drama and Asian live action content to countries that can’t get it any other way via licensing deals. Shows are available in HD and on the same day they air in the host country. Crunchyroll not only gets 8 million viewers a month, they have 100,000 paying subscribers at $7-12/month. Got a point, Mike? I do happen to have one. Crunchyroll illustrates the content opportunity enterprise companies have…which is to determine your “area,” the interest graph of your customers, then provide content that speaks to and satisfies those interests that can’t be found anywhere else. At least not in the same style, or of the same quality, or with the same authority. Do what no one else is doing. Provide what no one else is providing in your sector. If underserved users are willing to pay monthly for access to awkwardly moving cartoon dragons, imagine the audience you could attract with free, useful, non-sales content in your customers’ area of interest. It’s an audience you’ll want in place when the time does come to put out that marketing message. A content challenge is better than a content conundrum any day.

    Read the article

  • 45 minutes to talk about C# [closed]

    - by Philip
    I have the opportunity to give a 45 minute talk on C# in the theory of programming languages class I'm taking. The college teaches Java almost exclusively, so that's what all the students are most familiar with. (There's a little C, assembly, Prolog and LISP as well.) I decide what to talk about. It seems to me the best approach is to focus on a few of the big, obvious differences between C# and Java. I don't intend it to be a recommendation to use C# -- there are reasons to use each, mostly because of their ecosystems. So I want to focus on C# as a language. I don't want to go too fast and end up listing a whole bunch of features without showing their usefulness. My current plan is this: Functions as first class objects. This is, in my opinion, one of the biggest differences between C# and Java. The professor briefly mentioned this notion and showed a LISP example, but many of the students have probably never used it. I can show real world examples where it's made my code more readable. Lambda expressions as concise syntax for anonymous functions. Obviously with examples to show how this is useful. The real hit-home examples will be at the end when it's combined with the rest. I don't see an advantage to first showing the old delegate syntax and then replacing it with lambdas -- most of us won't have ever seen delegates anyway so it would just be confusing. The yield keyword and how it's different from returning an array. I have the impression that a lot of C# developers aren't familiar with how to use this. It will likely be very foreign to Java developers. I have some examples from my own work where it was really useful, such as iterating over a tree traversal, or iterating over neighbors in a graph where the neighbors aren't stored in memory. In both cases, doing it in Java would likely mean returning a complete list -- with yield I can stop iterating if I find what I want early on, without using memory for superfluous lists or arrays. Extension methods as a way to write implementation on interfaces. We'll all be familiar with how interfaces don't allow method implementation, and how this leads to code duplication. I'll show a specific example of this and how the extension method can solve the problem. Demonstrate how the above can be combined by implementing some simple Linq methods and using them. Where, Select, First, maybe more depending on how much time is left. Ideas on which ones might 'hit home' the best? There are other things I could talk about such as generics, value types, properties and more. I haven't yet though of good ways to incorporate these. In the case of generics and value types, the advantages might not be obvious or as relevant. Properties are obviously useful, particularly since we're taught strict JavaBeans here, but I don't know if I could integrate it with the "path to Linq" discussion above without it feeling tacked on. So I'm looking for thoughts on how to talk about C#, and what to talk about. Even minor details. I'm sure there are more experienced C# developers than me here who have good insight about what's really important in the language, and what would miss the point.

    Read the article

< Previous Page | 143 144 145 146 147 148 149 150 151 152 153 154  | Next Page >