Search Results

Search found 1914 results on 77 pages for 'mongrel cluster'.

Page 73/77 | < Previous Page | 69 70 71 72 73 74 75 76 77  | Next Page >

  • SQL SERVER – Solution – Puzzle – SELECT * vs SELECT COUNT(*)

    - by pinaldave
    Earlier I have published Puzzle Why SELECT * throws an error but SELECT COUNT(*) does not. This question have received many interesting comments. Let us go over few of the answers, which are valid. Before I start the same, let me acknowledge Rob Farley who has not only answered correctly very first but also started interesting conversation in the same thread. The usual question will be what is the right answer. I would like to point to official Microsoft Connect Items which discusses the same. RGarvao https://connect.microsoft.com/SQLServer/feedback/details/671475/select-test-where-exists-select tiberiu utan http://connect.microsoft.com/SQLServer/feedback/details/338532/count-returns-a-value-1 Rob Farley count(*) is about counting rows, not a particular column. It doesn’t even look to see what columns are available, it’ll just count the rows, which in the case of a missing FROM clause, is 1. “select *” is designed to return columns, and therefore barfs if there are none available. Even more odd is this one: select ‘blah’ where exists (select *) You might be surprised at the results… Koushik The engine performs a “Constant scan” for Count(*) where as in the case of “SELECT *” the engine is trying to perform either Index/Cluster/Table scans. amikolaj When you query ‘select * from sometable’, SQL replaces * with the current schema of that table. With out a source for the schema, SQL throws an error. so when you query ‘select count(*)’, you are counting the one row. * is just a constant to SQL here. Check out the execution plan. Like the description states – ‘Scan an internal table of constants.’ You could do ‘select COUNT(‘my name is adam and this is my answer’)’ and get the same answer. Netra Acharya SELECT * Here, * represents all columns from a table. So it always looks for a table (As we know, there should be FROM clause before specifying table name). So, it throws an error whenever this condition is not satisfied. SELECT COUNT(*) Here, COUNT is a Function. So it is not mandetory to provide a table. Check it out this: DECLARE @cnt INT SET @cnt = COUNT(*) SELECT @cnt SET @cnt = COUNT(‘x’) SELECT @cnt Naveen Select 1 / Select ‘*’ will return 1/* as expected. Select Count(1)/Count(*) will return the count of result set of select statement. Count(1)/Count(*) will have one 1/* for each row in the result set of select statement. Select 1 or Select ‘*’ result set will contain only 1 result. so count is 1. Where as “Select *” is a sysntax which expects the table or equauivalent to table (table functions, etc..). It is like compilation error for that query. Ramesh Hi Friends, Count is an aggregate function and it expects the rows (list of records) for a specified single column or whole rows for *. So, when we use ‘select *’ it definitely give and error because ‘*’ is meant to have all the fields but there is not any table and without table it can only raise an error. So, in the case of ‘Select Count(*)’, there will be an error as a record in the count function so you will get the result as ’1'. Try using : Select COUNT(‘RAMESH’) and think there is an error ‘Must specify table to select from.’ in place of ‘RAMESH’ Pinal : If i am wrong then please clarify this. Sachin Nandanwar Any aggregate function expects a constant or a column name as an expression. DO NOT be confused with * in an aggregate function.The aggregate function does not treat it as a column name or a set of column names but a constant value, as * is a key word in SQL. You can replace any value instead of * for the COUNT function.Ex Select COUNT(5) will result as 1. The error resulting from select * is obvious it expects an object where it can extract the result set. I sincerely thank you all for wonderful conversation, I personally enjoyed it and I am sure all of you have the same feeling. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, Pinal Dave, PostADay, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

    Read the article

  • Auto DOP and Concurrency

    - by jean-pierre.dijcks
    After spending some time in the cloud, I figured it is time to come down to earth and start discussing some of the new Auto DOP features some more. As Database Machines (the v2 machine runs Oracle Database 11.2) are effectively selling like hotcakes, it makes some sense to talk about the new parallel features in more detail. For basic understanding make sure you have read the initial post. The focus there is on Auto DOP and queuing, which is to some extend the focus here. But now I want to discuss the concurrency a little and explain some of the relevant parameters and their impact, specifically in a situation with concurrency on the system. The goal of Auto DOP The idea behind calculating the Automatic Degree of Parallelism is to find the highest possible DOP (ideal DOP) that still scales. In other words, if we were to increase the DOP even more  above a certain DOP we would see a tailing off of the performance curve and the resource cost / performance would become less optimal. Therefore the ideal DOP is the best resource/performance point for that statement. The goal of Queuing On a normal production system we should see statements running concurrently. On a Database Machine we typically see high concurrency rates, so we need to find a way to deal with both high DOP’s and high concurrency. Queuing is intended to make sure we Don’t throttle down a DOP because other statements are running on the system Stay within the physical limits of a system’s processing power Instead of making statements go at a lower DOP we queue them to make sure they will get all the resources they want to run efficiently without trashing the system. The theory – and hopefully – practice is that by giving a statement the optimal DOP the sum of all statements runs faster with queuing than without queuing. Increasing the Number of Potential Parallel Statements To determine how many statements we will consider running in parallel a single parameter should be looked at. That parameter is called PARALLEL_MIN_TIME_THRESHOLD. The default value is set to 10 seconds. So far there is nothing new here…, but do realize that anything serial (e.g. that stays under the threshold) goes straight into processing as is not considered in the rest of this post. Now, if you have a system where you have two groups of queries, serial short running and potentially parallel long running ones, you may want to worry only about the long running ones with this parallel statement threshold. As an example, lets assume the short running stuff runs on average between 1 and 15 seconds in serial (and the business is quite happy with that). The long running stuff is in the realm of 1 – 5 minutes. It might be a good choice to set the threshold to somewhere north of 30 seconds. That way the short running queries all run serial as they do today (if it ain’t broken, don’t fix it) and allows the long running ones to be evaluated for (higher degrees of) parallelism. This makes sense because the longer running ones are (at least in theory) more interesting to unleash a parallel processing model on and the benefits of running these in parallel are much more significant (again, that is mostly the case). Setting a Maximum DOP for a Statement Now that you know how to control how many of your statements are considered to run in parallel, lets talk about the specific degree of any given statement that will be evaluated. As the initial post describes this is controlled by PARALLEL_DEGREE_LIMIT. This parameter controls the degree on the entire cluster and by default it is CPU (meaning it equals Default DOP). For the sake of an example, let’s say our Default DOP is 32. Looking at our 5 minute queries from the previous paragraph, the limit to 32 means that none of the statements that are evaluated for Auto DOP ever runs at more than DOP of 32. Concurrently Running a High DOP A basic assumption about running high DOP statements at high concurrency is that you at some point in time (and this is true on any parallel processing platform!) will run into a resource limitation. And yes, you can then buy more hardware (e.g. expand the Database Machine in Oracle’s case), but that is not the point of this post… The goal is to find a balance between the highest possible DOP for each statement and the number of statements running concurrently, but with an emphasis on running each statement at that highest efficiency DOP. The PARALLEL_SERVER_TARGET parameter is the all important concurrency slider here. Setting this parameter to a higher number means more statements get to run at their maximum parallel degree before queuing kicks in.  PARALLEL_SERVER_TARGET is set per instance (so needs to be set to the same value on all 8 nodes in a full rack Database Machine). Just as a side note, this parameter is set in processes, not in DOP, which equates to 4* Default DOP (2 processes for a DOP, default value is 2 * Default DOP, hence a default of 4 * Default DOP). Let’s say we have PARALLEL_SERVER_TARGET set to 128. With our limit set to 32 (the default) we are able to run 4 statements concurrently at the highest DOP possible on this system before we start queuing. If these 4 statements are running, any next statement will be queued. To run a system at high concurrency the PARALLEL_SERVER_TARGET should be raised from its default to be much closer (start with 60% or so) to PARALLEL_MAX_SERVERS. By using both PARALLEL_SERVER_TARGET and PARALLEL_DEGREE_LIMIT you can control easily how many statements run concurrently at good DOPs without excessive queuing. Because each workload is a little different, it makes sense to plan ahead and look at these parameters and set these based on your requirements.

    Read the article

  • Replication - between pools in the same system

    - by Steve Tunstall
    OK, I fully understand that's it's been a LONG time since I've blogged with any tips or tricks on the ZFSSA, and I'm way behind. Hey, I just wrote TWO BLOGS ON THE SAME DAY!!! Make sure you keep scrolling down to see the next one too, or you may have missed it. To celebrate, for the one or two of you out there who are still reading this, I got something for you. The first TWO people who make any comment below, with your real name and email so I can contact you, will get some cool Oracle SWAG that I have to give away. Don't get excited, it's not an iPad, but it pretty good stuff. Only the first two, so if you already see two below, then settle down. Now, let's talk about Replication and Migration.  I have talked before about Shadow Migration here: https://blogs.oracle.com/7000tips/entry/shadow_migrationShadow Migration lets one take a NFS or CIFS share in one pool on a system and migrate that data over to another pool in the same system. That's handy, but right now it's only for file systems like NFS and CIFS. It will not work for LUNs. LUN shadow migration is a roadmap item, however. So.... What if you have a ZFSSA cluster with multiple pools, and you have a LUN in one pool but later you decide it's best if it was in the other pool? No problem. Replication to the rescue. What's that? Replication is only for replicating data between two different systems? Who told you that? We've been able to replicate to the same system now for a few code updates back. These instructions below will also work just fine if you're setting up replication between two different systems. After replication is complete, you can easily break replication, change the new LUN into a primary LUN and then delete the source LUN. Bam. Step 1- setup a target system. In our case, the target system is ourself, but you still have to set it up like it's far away. Go to Configuration-->Services-->Remote Replication. Click the plus sign and setup the target, which is the ZFSSA you're on now. Step 2. Now you can go to the LUN you want to replicate. Take note which Pool and Project you're in. In my case, I have a LUN in Pool2 called LUNp2 that I wish to replicate to Pool1.  Step 3. In my case, I made a Project called "Luns" and it has LUNp2 inside of it. I am going to replicate the Project, which will automatically replicate all of the LUNs and/or Filesystems inside of it.  Now, you can also replicate from the Share level instead of the Project. That will only replicate the share, and not all the other shares of a project. If someone tells you that if you replicate a share, it always replicates all the other shares also in that Project, don't listen to them.Note below how I can choose not only the Target (which is myself), but I can also choose which Pool to replicate it to. So I choose Pool1.  Step 4. I did not choose a schedule or pick the "Continuous" button, which means my replication will be manual only. I can now push the Manual Replicate button on my Actions list and you will see it start. You will see both a barber pole animation and also an update in the status bar on the top of the screen that a replication event has begun. This also goes into the event log.  Step 5. The status bar will also log an event when it's done. Step 6. If you go back to Configuration-->Services-->Remote Replication, you will see your event. Step 7. Done. To see your new replica, go to the other Pool (Pool1 for me), and click the "Replica" area below the words "Filesystems | LUNs" Here, you will see any replicas that have come in from any of your sources. It's a simple matter from here to break the replication, which will change this to a "Local" LUN, and then delete the original LUN back in Pool2. Ok, that's all for now, but I promise to give out more tricks sometime in November !!! There's very exciting stuff coming down the pipe for the ZFSSA. Both new hardware and new software features that I'm just drooling over. That's all I can say, but contact your local sales SC to get a NDA roadmap talk if you want to hear more.   Happy Halloween,Steve 

    Read the article

  • Oracle Linux and Oracle VM pricing guide

    - by wcoekaer
    A few days ago someone showed me a pricing guide from a Linux vendor and I was a bit surprised at the complexity of it. Especially when you look at larger servers (4 or 8 sockets) and when adding virtual machine use into the mix. I think we have a very compelling and simple pricing model for both Oracle Linux and Oracle VM. Let me see if I can explain it in 1 page, not 10 pages. This pricing information is publicly available on the Oracle store, I am using the current public list prices. Also keep in mind that this is for customers using non-oracle x86 servers. When a customer purchases an Oracle x86 server, the annual systems support includes full use (all you can eat) of Oracle Linux, Oracle VM and Oracle Solaris (no matter how many VMs you run on that server, in case you deploy guests on a hypervisor). This support level is the equivalent of premier support in the list below. Let's start with Oracle VM (x86) : Oracle VM support subscriptions are per physical server on which you deploy the Oracle VM Server product. (1) Oracle VM Premier Limited - 1- or 2 socket server : $599 per server per year (2) Oracle VM Premier - more than 2 socket server (4, or 8 or whatever more) : $1199 per server per year The above includes the use of Oracle VM Manager and Oracle Enterprise Manager Cloud Control's Virtualization management pack (including self service cloud portal, etc..) 24x7 support, access to bugfixes, updates and new releases. It also includes all options, live migrate, dynamic resource scheduling, high availability, dynamic power management, etc If you want to play with the product, or even use the product without access to support services, the product is freely downloadable from edelivery. Next, Oracle Linux : Oracle Linux support subscriptions are per physical server. If you plan to run Oracle Linux as a guest on Oracle VM, VMWare or Hyper-v, you only have to pay for a single subscription per system, we do not charge per guest or per number of guests. In other words, you can run any number of Oracle Linux guests per physical server and count it as just a single subscription. (1) Oracle Linux Network Support - any number of sockets per server : $119 per server per year Network support does not offer support services. It provides access to the Unbreakable Linux Network and also offers full indemnification for Oracle Linux. (2) Oracle Linux Basic Limited Support - 1- or 2 socket servers : $499 per server per year This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management. It includes ocfs2 as a clustered filesystem. (3) Oracle Linux Basic Support - more than 2 socket server (4, or 8 or more) : $1199 per server per year This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management. It includes ocfs2 as a clustered filesystem (4) Oracle Linux Premier Limited Support - 1- or 2 socket servers : $1399 per server per year This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management, XFS filesystem support. It also offers Oracle Lifetime support, backporting of patches for critical customers in previous versions of package and ksplice zero-downtime updates. (5) Oracle Linux Premier Support - more than 2 socket servers : $2299 per server per year This subscription provides 24x7 support services, access to the Unbreakable Linux Network and the Oracle Support portal, indemnification, use of Oracle Clusterware for Linux HA and use of Oracle Enterprise Manager Cloud control for Linux OS management, XFS filesystem support. It also offers Oracle Lifetime support, backporting of patches for critical customers in previous versions of package and ksplice zero-downtime updates. (6) Freely available Oracle Linux - any number of sockets You can freely download Oracle Linux, install it on any number of servers and use it for any reason, without support, without right to use of these extra features like Oracle Clusterware or ksplice, without indemnification. However, you do have full access to all errata as well. Need support? then use options (1)..(5) So that's it. Count number of 2 socket boxes, more than 2 socket boxes, decide on basic or premier support level and you are done. You don't have to worry about different levels based on how many virtual instance you deploy or want to deploy. A very simple menu of choices. We offer, inclusive, Linux OS clusterware, Linux OS Management, provisioning and monitoring, cluster filesystem (ocfs), high performance filesystem (xfs), dtrace, ksplice, ofed (infiniband stack for high performance networking). No separate add-on menus. NOTE : socket/cpu can have any number of cores. So whether you have a 4,6,8,10 or 12 core CPU doesn't matter, we count the number of physical CPUs.

    Read the article

  • FairScheduling Conventions in Hadoop

    - by dan.mcclary
    While scheduling and resource allocation control has been present in Hadoop since 0.20, a lot of people haven't discovered or utilized it in their initial investigations of the Hadoop ecosystem. We could chalk this up to many things: Organizations are still determining what their dataflow and analysis workloads will comprise Small deployments under tests aren't likely to show the signs of strains that would send someone looking for resource allocation options The default scheduling options -- the FairScheduler and the CapacityScheduler -- are not placed in the most prominent position within the Hadoop documentation. However, for production deployments, it's wise to start with at least the foundations of scheduling in place so that you can tune the cluster as workloads emerge. To do that, we have to ask ourselves something about what the off-the-rack scheduling options are. We have some choices: The FairScheduler, which will work to ensure resource allocations are enforced on a per-job basis. The CapacityScheduler, which will ensure resource allocations are enforced on a per-queue basis. Writing your own implementation of the abstract class org.apache.hadoop.mapred.job.TaskScheduler is an option, but usually overkill. If you're going to have several concurrent users and leverage the more interactive aspects of the Hadoop environment (e.g. Pig and Hive scripting), the FairScheduler is definitely the way to go. In particular, we can do user-specific pools so that default users get their fair share, and specific users are given the resources their workloads require. To enable fair scheduling, we're going to need to do a couple of things. First, we need to tell the JobTracker that we want to use scheduling and where we're going to be defining our allocations. We do this by adding the following to the mapred-site.xml file in HADOOP_HOME/conf: <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/path/to/allocations.xml</value> </property> <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>pool.name</value> </property> <property> <name>pool.name</name> <value>${user.name}</name> </property> What we've done here is simply tell the JobTracker that we'd like to task scheduling to use the FairScheduler class rather than a single FIFO queue. Moreover, we're going to be defining our resource pools and allocations in a file called allocations.xml For reference, the allocation file is read every 15s or so, which allows for tuning allocations without having to take down the JobTracker. Our allocation file is now going to look a little like this <?xml version="1.0"?> <allocations> <pool name="dan"> <minMaps>5</minMaps> <minReduces>5</minReduces> <maxMaps>25</maxMaps> <maxReduces>25</maxReduces> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> </pool> <mapreduce.job.user.name="dan"> <maxRunningJobs>6</maxRunningJobs> </user> <userMaxJobsDefault>3</userMaxJobsDefault> <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout> </allocations> In this case, I've explicitly set my username to have upper and lower bounds on the maps and reduces, and allotted myself double the number of running jobs. Now, if I run hive or pig jobs from either the console or via the Hue web interface, I'll be treated "fairly" by the JobTracker. There's a lot more tweaking that can be done to the allocations file, so it's best to dig down into the description and start trying out allocations that might fit your workload.

    Read the article

  • SPARC M7 Chip - 32 cores - Mind Blowing performance

    - by Angelo-Oracle
    The M7 Chip Oracle just announced its Next Generation Processor at the HotChips HC26 conference. As the Tech Lead in our Systems Division's Partner group, I had a front row seat to the extraordinary price performance advantage of Oracle current T5 and M6 based systems. Partner after partner tested  these systems and were impressed with it performance. Just read some of the quotes to see what our partner has been saying about our hardware. We just announced our next generation processor, the M7. This has 32 cores (up from 16-cores in T5 and 12-cores in M6). With 20 nm technology  this is our most advanced processor. The processor has more cores than anything else in the industry today. After the Sun acquisition Oracle has released 5 processors in 4 years and this is the 6th.  The S4 core  The M7 is built using the foundation of the S4 core. This is the next generation core technology. Like its predecessor, the S4 has 8 dynamic threads. It increases the frequency while maintaining the Pipeline depth. Each core has its own fine grain power estimator that keeps the core within its power envelop in 250 nano-sec granularity. Each core also includes Software in Silicon features for Application Acceleration Support. Each core includes features to improve Application Data Integrity, with almost no performance loss. The core also allows using part of the Virtual Address to store meta-data.  User-Level Synchronization Instructions are also part of the S4 core. Each core has 16 KB Instruction and 16 KB Data L1 cache. The Core Clusters  The cores on the M7 chip are organized in sets of 4-core clusters. The core clusters share  L2 cache.  All four cores in the complex share 256 KB of 4 way set associative L2 Instruction Cache, with over 1/2 TB/s of throughput. Two cores share 256 KB of 8 way set associative L2 Data Cache, with over 1/2 TB/s of throughput. With this innovative Core Cluster architecture, the M7 doubles core execution bandwidth. to maximize per-thread performance.  The Chip  Each  M7 chip has 8 sets of these core-clusters. The chip has 64 MB on-chip L3 cache. This L3 caches is shared among all the cores and is partitioned into 8 x 8 MB chunks. Each chunk is  8-way set associative cache. The aggregate bandwidth for the L3 cache on the chip is over 1.6TB/s. Each chip has 4 DDR4 memory controllers and can support upto 16 DDR4 DIMMs, allowing for 2 TB of RAM/chip. The chip also includes 4 internal links of PCIe Gen3 I/O controllers.  Each chip has 7 coherence links, allowing for 8 of these chips to be connected together gluelessly. Also 32 of these chips can be connected in an SMP configuration. A potential system with 32 chips will have 1024 cores and 8192 threads and 64 TB of RAM.  Software in Silicon The M7 chip has many built in Application Accelerators in Silicon. These features will be exposed to our Software partners using the SPARC Accelerator Program.  The M7  has built-in logic to decompress data at the speed of memory access. This means that applications can directly work on compressed data in memory increasing the data access rates. The VA Masking feature allows the use of part of the virtual address to store meta-data.  Realtime Application Data Integrity The Realtime Application Data Integrity feature helps applications safeguard against invalid, stale memory reference and buffer overflows. The first 4-bits if the Pointer can be used to store a version number and this version number is also maintained in the memory & cache lines. When a pointer accesses memory the hardware checks to make sure the two versions match. A SEGV signal is raised when there is a mismatch. This feature can be used by the Database, applications and the OS.  M7 Database In-Memory Query Accelerator The M7 chip also includes a In-Silicon Query Engines.  These accelerate tasks that work on In-Memory Columnar Vectors. Oracle In-Memory options stores data in Column Format. The M7 Query Engine can speed up In-Memory Format Conversion, Value and Range Comparisons and Set Membership lookups. This engine can work on Compressed data - this means not only are we accelerating the query performance but also increasing the memory bandwidth for queries.  SPARC Accelerated Program  At the Hotchips conference we also introduced the SPARC Accelerated Program to provide our partners and third part developers access to all the goodness of the M7's SPARC Application Acceleration features. Please get in touch with us if you are interested in knowing more about this program. 

    Read the article

  • Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers.  It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • New Replication, Optimizer and High Availability features in MySQL 5.6.5!

    - by Rob Young
    As the Product Manager for the MySQL database it is always great to announce when the MySQL Engineering team delivers another great product release.  As a field DBA and developer it is even better when that release contains improvements and innovation that I know will help those currently using MySQL for apps that range from modest intranet sites to the most highly trafficked web sites on the web.  That said, it is my pleasure to take my hat off to MySQL Engineering for today's release of the MySQL 5.6.5 Development Milestone Release ("DMR"). The new highlighted features in MySQL 5.6.5 are discussed here: New Self-Healing Replication ClustersThe 5.6.5 DMR improves MySQL Replication by adding Global Transaction Ids and automated utilities for self-healing Replication clusters.  Prior to 5.6.5 this has been somewhat of a pain point for MySQL users with most developing custom solutions or looking to costly, complex third-party solutions for these capabilities.  With 5.6.5 these shackles are all but removed by a solution that is included with the GPL version of the database and supporting GPL tools.  You can learn all about the details of the great, problem solving Replication features in MySQL 5.6 in Mat Keep's Developer Zone article.  New Replication Administration and Failover UtilitiesAs mentioned above, the new Replication features, Global Transaction Ids specifically, are now supported by a set of automated GPL utilities that leverage the new GTIDs to provide administration and manual or auto failover to the most up to date slave (that is the default, but user configurable if needed) in the event of a master failure. The new utilities, along with links to Engineering related blogs, are discussed in detail in the DevZone Article noted above. Better Query Optimization and ThroughputThe MySQL Optimizer team continues to amaze with the latest round of improvements in 5.6.5. Along with much refactoring of the legacy code base, the Optimizer team has improved complex query optimization and throughput by adding these functional improvements: Subquery Optimizations - Subqueries are now included in the Optimizer path for runtime optimization.  Better throughput of nested queries enables application developers to simplify and consolidate multiple queries and result sets into a single unit or work. Optimizer now uses CURRENT_TIMESTAMP as default for DATETIME columns - For simplification, this eliminates the need for application developers to assign this value when a column of this type is blank by default. Optimizations for Range based queries - Optimizer now uses ready statistics vs Index based scans for queries with multiple range values. Optimizations for queries using filesort and ORDER BY.  Optimization criteria/decision on execution method is done now at optimization vs parsing stage. Print EXPLAIN in JSON format for hierarchical readability and Enterprise tool consumption. You can learn the details about these new features as well all of the Optimizer based improvements in MySQL 5.6 by following the Optimizer team blog. You can download and try the MySQL 5.6.5 DMR here. (look under "Development Releases")  Please let us know what you think!  The new HA utilities for Replication Administration and Failover are available as part of the MySQL Workbench Community Edition, which you can download here .Also New in MySQL LabsAs has become our tradition when announcing DMRs we also like to provide "Early Access" development features to the MySQL Community via the MySQL Labs.  Today is no exception as we are also releasing the following to Labs for you to download, try and let us know your thoughts on where we need to improve:InnoDB Online OperationsMySQL 5.6 now provides Online ADD Index, FK Drop and Online Column RENAME.  These operations are non-blocking and will continue to evolve in future DMRs.  You can learn the grainy details by following John Russell's blog.InnoDB data access via Memcached API ("NotOnlySQL") - Improved refresh of an earlier feature releaseSimilar to Cluster 7.2, MySQL 5.6 provides direct NotOnlySQL access to InnoDB data via the familiar Memcached API. This provides the ultimate in flexibility for developers who need fast, simple key/value access and complex query support commingled within their applications.Improved Transactional Performance, ScaleThe InnoDB Engineering team has once again under promised and over delivered in the area of improved performance and scale.  These improvements are also included in the aggregated Spring 2012 labs release:InnoDB CPU cache performance improvements for modern, multi-core/CPU systems show great promise with internal tests showing:    2x throughput improvement for read only activity 6x throughput improvement for SELECT range Read/Write benchmarks are in progress More details on the above are available here. You can download all of the above in an aggregated "InnoDB 2012 Spring Labs Release" binary from the MySQL Labs. You can also learn more about these improvements and about related fixes to mysys mutex and hash sort by checking out the InnoDB team blog.MySQL 5.6.5 is another installment in what we believe will be the best release of the MySQL database ever.  It also serves as a shining example of how the MySQL Engineering team at Oracle leads in MySQL innovation.You can get the overall Oracle message on the MySQL 5.6.5 DMR and Early Access labs features here. As always, thanks for your continued support of MySQL, the #1 open source database on the planet!

    Read the article

  • PASS Summit 2011 &ndash; Part III

    - by Tara Kizer
    Well we’re about a month past PASS Summit 2011, and yet I haven’t finished blogging my notes! Between work and home life, I haven’t been able to come up for air in a bit.  Now on to my notes… On Thursday of the PASS Summit 2011, I attended Klaus Aschenbrenner’s (blog|twitter) “Advanced SQL Server 2008 Troubleshooting”, Joe Webb’s (blog|twitter) “SQL Server Locking & Blocking Made Simple”, Kalen Delaney’s (blog|twitter) “What Happened? Exploring the Plan Cache”, and Paul Randal’s (blog|twitter) “More DBA Mythbusters”.  I think my head grew two times in size from the Thursday sessions.  Just WOW! I took a ton of notes in Klaus' session.  He took a deep dive into how to troubleshoot performance problems.  Here is how he goes about solving a performance problem: Start by checking the wait stats DMV System health Memory issues I/O issues I normally start with blocking and then hit the wait stats.  Here’s the wait stat query (Paul Randal’s) that I use when working on a performance problem.  He highlighted a few waits to be aware of such as WRITELOG (indicates IO subsystem problem), SOS_SCHEDULER_YIELD (indicates CPU problem), and PAGEIOLATCH_XX (indicates an IO subsystem problem or a buffer pool problem).  Regarding memory issues, Klaus recommended that as a bare minimum, one should set the “max server memory (MB)” in sp_configure to 2GB or 10% reserved for the OS (whichever comes first).  This is just a starting point though! Regarding I/O issues, Klaus talked about disk partition alignment, which can improve SQL I/O performance by up to 100%.  You should use 64kb for NTFS cluster, and it’s automatic in Windows 2008 R2. Joe’s locking and blocking presentation was a good session to really clear up the fog in my mind about locking.  One takeaway that I had no idea could be done was that you can set a timeout in T-SQL code view LOCK_TIMEOUT.  If you do this via the application, you should trap error 1222. Kalen’s session went into execution plans.  The minimum size of a plan is 24k.  This adds up fast especially if you have a lot of plans that don’t get reused much.  You can use sys.dm_exec_cached_plans to check how often a plan is being reused by checking the usecounts column.  She said that we can use DBCC FLUSHPROCINDB to clear out the stored procedure cache for a specific database.  I didn’t know we had this available, so this was great to hear.  This will be less intrusive when an emergency comes up where I’ve needed to run DBCC FREEPROCCACHE. Kalen said one should enable “optimize for ad hoc workloads” if you have an adhoc loc.  This stores only a 300-byte stub of the first plan, and if it gets run again, it’ll store the whole thing.  This helps with plan cache bloat.  I have a lot of systems that use prepared statements, and Kalen says we simulate those calls by using sp_executesql.  Cool! Paul did a series of posts last year to debunk various myths and misconceptions around SQL Server.  He continues to debunk things via “DBA Mythbusters”.  You can get a PDF of a bunch of these here.  One of the myths he went over is the number of tempdb data files that you should have.  Back in 2000, the recommendation was to have as many tempdb data files as there are CPU cores on your server.  This no longer holds true due to the numerous cores we have on our servers.  Paul says you should start out with 1/4 to 1/2 the number of cores and work your way up from there.  BUT!  Paul likes what Bob Ward (twitter) says on this topic: 8 or less cores –> set number of files equal to the number of cores Greater than 8 cores –> start with 8 files and increase in blocks of 4 One common myth out there is to set your MAXDOP to 1 for an OLTP workload with high CXPACKET waits.  Instead of that, dig deeper first.  Look for missing indexes, out-of-date statistics, increase the “cost threshold for parallelism” setting, and perhaps set MAXDOP at the query level.  Paul stressed that you should not plan a backup strategy but instead plan a restore strategy.  What are your recoverability requirements?  Once you know that, now plan out your backups. As Paul always does, he talked about DBCC CHECKDB.  He said how fabulous it is.  I didn’t want to interrupt the presentation, so after his session had ended, I asked Paul about the need to run DBCC CHECKDB on your mirror systems.  You could have data corruption occur at the mirror and not at the principal server.  If you aren’t checking for data corruption on your mirror systems, you could be failing over to a corrupt database in the case of a disaster or even a planned failover.  You can’t run DBCC CHECKDB against the mirrored database, but you can run it against a snapshot off the mirrored database.

    Read the article

  • Interview with Ronald Bradford about MySQL Connect

    - by Keith Larson
    Ronald Bradford,  an Oracle ACE Director has been busy working with  database consulting, book writing (EffectiveMySQL) while traveling and speaking around the world in support of MySQL. I was able to take some of his time to get an interview on this thoughts about theMySQL Connect conference. Keith Larson: What where your thoughts when you heard that Oracle was going to provide the community the MySQL Conference ?Ronald Bradford: Oracle has already been providing various different local community events including OTN Tech Days and  MySQL community days. These are great for local regions both in the US and abroad.  In previous years there has been an increase of content at Oracle Open World, however that benefits the Oracle community far more then the MySQL community.  It is good to see that Oracle is realizing the benefit in providing a large scale dedicated event for the MySQL community that includes speakers from the MySQL development teams, invested companies in the ecosystem and other community evangelists.I fully expect a successful event and look forward to hopefully seeing MySQL Connect at the upcoming Brazil and Japan OOW conferences and perhaps an event on the East Coast.Keith Larson: Since you are part of the content committee, what did you think of the submissions that were received during call for papers?Ronald Bradford: There was a large number of quality submissions to the number of available presentation sessions. As with the previous years as a committee member for the annual MySQL conference, there is always a large variety of common cornerstone MySQL features as well as new products and upcoming companies sharing their MySQL experiences. All of the usual major players in the ecosystem will in presenting at MySQL Connect including Facebook, Twitter, Yahoo, Continuent, Percona, Tokutek, Sphinx and Amazon to name a few.  This is ensuring the event will have a large number of quality speakers and a difficult time in choosing what to attend. Keith Larson: What sessions do you look forwarding to attending? Ronald Bradford: As with most quality conferences you can only be in one place at one time, so with multiple tracks per session it is always difficult to decide. The continued work and success with MySQL Cluster, and with a number of sessions I am sure will be popular. The features that interest me the most are around the optimizer, where there are several sessions on new features, and on the importance of backups. There are three presentations in this area to choose from.Keith Larson: Are you going to cover any of the content in your books at your MySQL Connect sessions?Ronald Bradford: I will be giving two presentations at MySQL Connect. The first will include the techniques available for creating better indexes where I will be touching on some aspects of the first Effective MySQL book on Optimizing SQL Statements.  In my second presentation from experiences of managing 500+ AWS MySQL instances, I will be touching on areas including SQL tuning, backup and recovery and scale out with replication.   These are the key topics of the initial books in the Effective MySQL series that focus on performance, scalability and business continuity.  The books however cover a far greater amount of detail then can be presented in a 1 hour session. Keith Larson: What features of MySQL 5.6 do you look forward to the most ?Ronald Bradford: I am very impressed with the optimizer trace feature. The ability to see exposed information is invaluable not just for MySQL 5.6, but to also apply information discerned for optimizing SQL statements in earlier versions of MySQL.  Not everybody understands that it is easy to deploy a MySQL 5.6 slave into an existing topology running an older version if MySQL for evaluation of many new features.  You can use the new mysqlbinlog streaming feature for duplicating master binary logs on an older version with a MySQL 5.6 slave.  The improvements in instrumentation in the Performance Schema are exciting.   However, as with my upcoming Replication Techniques in Depth title, that will be available for sale at MySQL Connect, there are numerous replication features, some long overdue with provide significant management benefits. Crash Save Slaves, Global transaction Identifiers (GTID)  and checksums just to mention a few.Keith Larson: You have been to numerous conferences, what would you recommend for people at the conference? Ronald Bradford: Make the time to meet and introduce yourself to the speakers that cover the topics that most interest you. The MySQL ecosystem has a very strong community.  The relationships you build with presenters, developers and architects in MySQL can be invaluable, however they are created over time. Get to know these people, interact with them over time.  This is the opportunity to learn more then just the content from a 1 hour session. Keith Larson: Any additional tips to handling the long hours ? Ronald Bradford: Conferences can be hard, especially with all the post event drinking.  This is a two day event and I am sure will include additional events on Friday and Saturday night so come well prepared, and leave work behind. Take the time to learn something new.   You can always catchup on sleep later. Keith Larson: Thank you so much for taking some time to do this I look forward to seeing you at the MySQL Connect conference.  Please stay tuned here for more updates on MySQL. 

    Read the article

  • Availability Best Practices on Oracle VM Server for SPARC

    - by jsavit
    This is the first of a series of blog posts on configuring Oracle VM Server for SPARC (also called Logical Domains) for availability. This series will show how to how to plan for availability, improve serviceability, avoid single points of failure, and provide resiliency against hardware and software failures. Availability is a broad topic that has filled entire books, so these posts will focus on aspects specifically related to Oracle VM Server for SPARC. The goal is to improve Reliability, Availability and Serviceability (RAS): An article defining RAS can be found here. Oracle VM Server for SPARC Principles for Availability Let's state some guiding principles for availability that apply to Oracle VM Server for SPARC: Avoid Single Points Of Failure (SPOFs). Systems should be configured so a component failure does not result in a loss of application service. The general method to avoid SPOFs is to provide redundancy so service can continue without interruption if a component fails. For a critical application there may be multiple levels of redundancy so multiple failures can be tolerated. Oracle VM Server for SPARC makes it possible to configure systems that avoid SPOFs. Configure for availability at a level of resource and effort consistent with business needs. Effort and resource should be consistent with business requirements. Production has different availability requirements than test/development, so it's worth expending resources to provide higher availability. Even within the category of production there may be different levels of criticality, outage tolerances, recovery and repair time requirements. Keep in mind that a simple design may be more understandable and effective than a complex design that attempts to "do everything". Design for availability at the appropriate tier or level of the platform stack. Availability can be provided in the application, in the database, or in the virtualization, hardware and network layers they depend on - or using a combination of all of them. It may not be necessary to engineer resilient virtualization for stateless web applications applications where availability is provided by a network load balancer, or for enterprise applications like Oracle Real Application Clusters (RAC) and WebLogic that provide their own resiliency. It's (often) the same architecture whether virtual or not: For example, providing resiliency against a lost device path or failing disk media is done for the same reasons and may use the same design whether in a domain or not. It's (often) the same technique whether using domains or not: Many configuration steps are the same. For example, configuring IPMP or creating a redundant ZFS pool is pretty much the same within the guest whether you're in a guest domain or not. There are configuration steps and choices for provisioning the guest with the virtual network and disk devices, which we will discuss. Sometimes it is different using domains: There are new resources to configure. Most notable is the use of alternate service domains, which provides resiliency in case of a domain failure, and also permits improved serviceability via "rolling upgrades". This is an important differentiator between Oracle VM Server for SPARC and traditional virtual machine environments where all virtual I/O is provided by a monolithic infrastructure that itself is a SPOF. Alternate service domains are widely used to provide resiliency in production logical domains environments. Some things are done via logical domains commands, and some are done in the guest: For example, with Oracle VM Server for SPARC we provide multiple network connections to the guest, and then configure network resiliency in the guest via IP Multi Pathing (IPMP) - essentially the same as for non-virtual systems. On the other hand, we configure virtual disk availability in the virtualization layer, and the guest sees an already-resilient disk without being aware of the details. These blogs will discuss configuration details like this. Live migration is not "high availability" in the sense of "continuous availability": If the server is down, then you don't live migrate from it! (A cluster or VM restart elsewhere would be used). However, live migration can be part of the RAS (Reliability, Availability, Serviceability) picture by improving Serviceability - you can move running domains off of a box before planned service or maintenance. The blog Best Practices - Live Migration on Oracle VM Server for SPARC discusses this. Topics Here are some of the topics that will be covered: Network availability using IP Multipathing and aggregates Disk path availability using virtual disks defined with multipath groups ("mpgroup") Disk media resiliency configuring for redundant disks that can tolerate media loss Multiple service domains - this is probably the most significant item and the one most specific to Oracle VM Server for SPARC. It is very widely deployed in production environments as the means to provide network and disk availability, but it can be confusing. Subsequent articles will describe why and how to configure multiple service domains. Note, for the sake of precision: an I/O domain is any domain that has a physical I/O resource (such as a PCIe bus root complex). A service domain is a domain providing virtual device services to other domains; it is almost always an I/O domain too (so it can have something to serve). Resources Here are some important links; we'll be drawing on their content in the next several articles: Oracle VM Server for SPARC Documentation Maximizing Application Reliability and Availability with SPARC T5 Servers whitepaper by Gary Combs Maximizing Application Reliability and Availability with the SPARC M5-32 Server whitepaper by Gary Combs Summary Oracle VM Server for SPARC offers features that can be used to provide highly-available environments. This and the following blog entries will describe how to plan and deploy them.

    Read the article

  • Simple Project Templates

    - by Geertjan
    The NetBeans sources include a module named "simple.project.templates": In the module sources, Tim Boudreau turns out to be the author of the code, so I asked him what it was all about, and if he could provide some usage code. His response, from approximately this time last year because it's been sitting in my inbox for a while, is below. Sure - though I think the javadoc in it is fairly complete.  I wrote it because I needed to create a bunch of project templates for Javacard, and all of the ways that is usually done were grotesque and complicated.  I figured we already have the ability to create files from templates, and we already have the ability to do substitutions in templates, so why not have a single file that defines the project as a list of file templates to create (with substitutions in the names) and some definitions of what should be in project properties. You can also add files to the project programmatically if you want.Basically, a template for an entire project is a .properties file.  Any line which doesn't have the prefix 'pp.' or 'pvp.' is treated as the definition of one file which should be created in the new project.  Any such line where the key ends in * means that file should be opened once the new project is created.  So, for example, in the nodejs module, the definition looks like: {{projectName}}.js*=Templates/javascript/HelloWorld.js .npmignore=node_hidden_templates/npmignore So, the first line means:  - Create a file with the same name as the project, using the HelloWorld template    - I.e. the left side of the line is the relative path of the file to create, and the right side is the path in the system filesystem for the template to use       - If the template is not one you normally want users to see, just register it in the system filesystem somewhere other than Templates/ (but remember to set the attribute that marks it as a template)  - Include that file in the set of files which should be opened in the editor once the new project is created. To actually create a project, first you just create a new ProjectCreator: ProjectCreator gen = new ProjectCreator( parentFolderOfNewProject ); Now, if you want to programmatically generate any files, in addition to those defined in the template, you can: gen.add (new FileCreator("nbproject", "project.xml", false) {     public DataObject create (FileObject project, Map<String,String> substitutions) throws IOException {          ...     } }); Then pass the FileObject for the project template (the properties file) to the ProjectCreator's createProject method (hmm, maybe it should be the string path to the project template instead, to save the caller trouble looking up the FileObject for the template).  That method looks like this: public final GeneratedProject createProject(final ProgressHandle handle, final String name, final FileObject template, final Map<String, String> substitutions) throws IOException { The name parameter should be the directory name for the new project;  the map is the strings you gathered in the wizard which should be used for substitutions.  createProject should be called on a background thread (i.e. use a ProgressInstantiatingIterator for the wizard iterator and just pass in the ProgressHandle you are given). The return value is a GeneratedProject object, which is just a holder for the created project directory and the set of DataObjects which should be opened when the wizard finishes. I'd love to see simple.project.templates moved out of the javacard cluster, as it is really useful and much simpler than any of the stuff currently done for generating projects.  It would also be possible to do much richer tools for creating projects in apisupport - i.e. choose (or create in the wizard) the templates you want to use, generate a skeleton wizard with a UI for all the properties you'd like to substitute, etc. Here is a partial project template from Javacard - for example usage, see org.netbeans.modules.javacard.wizard.ProjectWizardIterator in javacard.project (or the much simpler one in contrib/nodejs). #This properties file describes what to create when a project template is#instantiated.  The keys are paths on disk relative to the project root. #The values are paths to the templates to use for those files in the system#filesystem.  Any string inside {{ and }}'s will be substituted using properties#gathered in the template wizard.#Special key prefixes are #  pp. - indicates an entry for nbproject/project.properties#  pvp. - indicates an entry for nbproject/private/private.properties #File templates, in format [path-in-project=path-to-template]META-INF/javacard.xml=org-netbeans-modules-javacard/templates/javacard.xmlMETA-INF/MANIFEST.MF=org-netbeans-modules-javacard/templates/EAP_MANIFEST.MF APPLET-INF/applet.xml=org-netbeans-modules-javacard/templates/applet.xmlscripts/{{classnamelowercase}}.scr=org-netbeans-modules-javacard/templates/test.scrsrc/{{packagepath}}/{{classname}}.java*=Templates/javacard/ExtendedApplet.java nbproject/deployment.xml=org-netbeans-modules-javacard/templates/deployment.xml#project.properties contentspp.display.name={{projectname}}pp.platform.active={{activeplatform}} pp.active.device={{activedevice}}pp.includes=**pp.excludes= I will be using the above info in an upcoming blog entry and provide step by step instructions showing how to use them. However, anyone else out there should have enough info from the above to get started yourself!

    Read the article

  • SQL Table stored as a Heap - the dangers within

    - by MikeD
    Nearly all of the time I create a table, I include a primary key, and often that PK is implemented as a clustered index. Those two don't always have to go together, but in my world they almost always do. On a recent project, I was working on a data warehouse and a set of SSIS packages to import data from an OLTP database into my data warehouse. The data I was importing from the business database into the warehouse was mostly new rows, sometimes updates to existing rows, and sometimes deletes. I decided to use the MERGE statement to implement the insert, update or delete in the data warehouse, I found it quite performant to have a stored procedure that extracted all the new, updated, and deleted rows from the source database and dump it into a working table in my data warehouse, then run a stored proc in the warehouse that was the MERGE statement that took the rows from the working table and updated the real fact table. Use Warehouse CREATE TABLE Integration.MergePolicy (PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date, Operation varchar(5)) CREATE TABLE fact.Policy (PolicyKey int identity primary key, PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date) CREATE PROC Integration.MergePolicy as begin begin tran Merge fact.Policy as tgtUsing Integration.MergePolicy as SrcOn (tgt.PolicyId = Src.PolicyId) When not matched by Target then Insert (PolicyId, PolicyTypeKey, Premium, Deductible, EffectiveDate)values (src.PolicyId, src.PolicyTypeKey, src.Premium, src.Deductible, src.EffectiveDate) When matched and src.Operation = 'U' then Update set PolicyTypeKey = src.PolicyTypeKey,Premium = src.Premium,Deductible = src.Deductible,EffectiveDate = src.EffectiveDate When matched and src.Operation = 'D' then Delete ;delete from Integration.WorkPolicy commit end Notice that my worktable (Integration.MergePolicy) doesn't have any primary key or clustered index. I didn't think this would be a problem, since it was relatively small table and was empty after each time I ran the stored proc. For one of the work tables, during the initial loads of the warehouse, it was getting about 1.5 million rows inserted, processed, then deleted. Also, because of a bug in the extraction process, the same 1.5 million rows (plus a few hundred more each time) was getting inserted, processed, and deleted. This was being sone on a fairly hefty server that was otherwise unused, and no one was paying any attention to the time it was taking. This week I received a backup of this database and loaded it on my laptop to troubleshoot the problem, and of course it took a good ten minutes or more to run the process. However, what seemed strange to me was that after I fixed the problem and happened to run the merge sproc when the work table was completely empty, it still took almost ten minutes to complete. I immediately looked back at the MERGE statement to see if I had some sort of outer join that meant it would be scanning the target table (which had about 2 million rows in it), then turned on the execution plan output to see what was happening under the hood. Running the stored procedure again took a long time, and the plan output didn't show me much - 55% on the MERGE statement, and 45% on the DELETE statement, and table scans on the work table in both places. I was surprised at the relative cost of the DELETE statement, because there were really 0 rows to delete, but I was expecting to see the table scans. (I was beginning now to suspect that my problem was because the work table was being stored as a heap.) Then I turned on STATS_IO and ran the sproc again. The output was quite interesting.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Policy'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'MergePolicy'. Scan count 1, logical reads 433276, physical reads 60, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. I've reproduced the above from memory, the details aren't exact, but the essential bit was the very high number of logical reads on the table stored as a heap. Even just doing a SELECT Count(*) from Integration.MergePolicy incurred that sort of output, even though the result was always 0. I suppose I should research more on the allocation and deallocation of pages to tables stored as a heap, but I haven't, and my original assumption that a table stored as a heap with no rows would only need to read one page to answer any query was definitely proven wrong. It's likely that some sort of physical defragmentation of the table may have cleaned that up, but it seemed that the easiest answer was to put a clustered index on the table. After doing so, the execution plan showed a cluster index scan, and the IO stats showed only a single page read. (I aborted my first attempt at adding a clustered index on the table because it was taking too long - instead I ran TRUNCATE TABLE Integration.MergePolicy first and added the clustered index, both of which took very little time). I suspect I may not have noticed this if I had used TRUNCATE TABLE Integration.MergePolicy instead of DELETE FROM Integration.MergePolicy, since I'm guessing that the truncate operation does some rather quick releasing of pages allocated to the heap table. In the future, I will likely be much more careful to have a clustered index on every table I use, even the working tables. Mike  

    Read the article

  • Big Data – Buzz Words: What is HDFS – Day 8 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned what is MapReduce. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – HDFS. What is HDFS ? HDFS stands for Hadoop Distributed File System and it is a primary storage system used by Hadoop. It provides high performance access to data across Hadoop clusters. It is usually deployed on low-cost commodity hardware. In commodity hardware deployment server failures are very common. Due to the same reason HDFS is built to have high fault tolerance. The data transfer rate between compute nodes in HDFS is very high, which leads to reduced risk of failure. HDFS creates smaller pieces of the big data and distributes it on different nodes. It also copies each smaller piece to multiple times on different nodes. Hence when any node with the data crashes the system is automatically able to use the data from a different node and continue the process. This is the key feature of the HDFS system. Architecture of HDFS The architecture of the HDFS is master/slave architecture. An HDFS cluster always consists of single NameNode. This single NameNode is a master server and it manages the file system as well regulates access to various files. In additional to NameNode there are multiple DataNodes. There is always one DataNode for each data server. In HDFS a big file is split into one or more blocks and those blocks are stored in a set of DataNodes. The primary task of the NameNode is to open, close or rename files and directory and regulate access to the file system, whereas the primary task of the DataNode is read and write to the file systems. DataNode is also responsible for the creation, deletion or replication of the data based on the instruction from NameNode. In reality, NameNode and DataNode are software designed to run on commodity machine build in Java language. Visual Representation of HDFS Architecture Let us understand how HDFS works with the help of the diagram. Client APP or HDFS Client connects to NameSpace as well as DataNode. Client App access to the DataNode is regulated by NameSpace Node. NameSpace Node allows Client App to connect to the DataNode based by allowing the connection to the DataNode directly. A big data file is divided into multiple data blocks (let us assume that those data chunks are A,B,C and D. Client App will later on write data blocks directly to the DataNode. Client App does not have to directly write to all the node. It just has to write to any one of the node and NameNode will decide on which other DataNode it will have to replicate the data. In our example Client App directly writes to DataNode 1 and detained 3. However, data chunks are automatically replicated to other nodes. All the information like in which DataNode which data block is placed is written back to NameNode. High Availability During Disaster Now as multiple DataNode have same data blocks in the case of any DataNode which faces the disaster, the entire process will continue as other DataNode will assume the role to serve the specific data block which was on the failed node. This system provides very high tolerance to disaster and provides high availability. If you notice there is only single NameNode in our architecture. If that node fails our entire Hadoop Application will stop performing as it is a single node where we store all the metadata. As this node is very critical, it is usually replicated on another clustered as well as on another data rack. Though, that replicated node is not operational in architecture, it has all the necessary data to perform the task of the NameNode in the case of the NameNode fails. The entire Hadoop architecture is built to function smoothly even there are node failures or hardware malfunction. It is built on the simple concept that data is so big it is impossible to have come up with a single piece of the hardware which can manage it properly. We need lots of commodity (cheap) hardware to manage our big data and hardware failure is part of the commodity servers. To reduce the impact of hardware failure Hadoop architecture is built to overcome the limitation of the non-functioning hardware. Tomorrow In tomorrow’s blog post we will discuss the importance of the relational database in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Computer Networks UNISA - Chap 14 &ndash; Insuring Integrity &amp; Availability

    - by MarkPearl
    After reading this section you should be able to Identify the characteristics of a network that keep data safe from loss or damage Protect an enterprise-wide network from viruses Explain network and system level fault tolerance techniques Discuss issues related to network backup and recovery strategies Describe the components of a useful disaster recovery plan and the options for disaster contingencies What are integrity and availability? Integrity – the soundness of a networks programs, data, services, devices, and connections Availability – How consistently and reliably a file or system can be accessed by authorized personnel A number of phenomena can compromise both integrity and availability including… security breaches natural disasters malicious intruders power flaws human error users etc Although you cannot predict every type of vulnerability, you can take measures to guard against the most damaging events. The following are some guidelines… Allow only network administrators to create or modify NOS and application system users. Monitor the network for unauthorized access or changes Record authorized system changes in a change management system’ Install redundant components Perform regular health checks on the network Check system performance, error logs, and the system log book regularly Keep backups Implement and enforce security and disaster recovery policies These are just some of the basics… Malware Malware refers to any program or piece of code designed to intrude upon or harm a system or its resources. Types of Malware… Boot sector viruses Macro viruses File infector viruses Worms Trojan Horse Network Viruses Bots Malware characteristics Some common characteristics of Malware include… Encryption Stealth Polymorphism Time dependence Malware Protection There are various tools available to protect you from malware called anti-malware software. These monitor your system for indications that a program is performing potential malware operations. A number of techniques are used to detect malware including… Signature Scanning Integrity Checking Monitoring unexpected file changes or virus like behaviours It is important to decide where anti-malware tools will be installed and find a balance between performance and protection. There are several general purpose malware policies that can be implemented to protect your network including… Every compute in an organization should be equipped with malware detection and cleaning software that regularly runs Users should not be allowed to alter or disable the anti-malware software Users should know what to do in case the anti-malware program detects a malware virus Users should be prohibited from installing any unauthorized software on their systems System wide alerts should be issued to network users notifying them if a serious malware virus has been detected. Fault Tolerance Besides guarding against malware, another key factor in maintaining the availability and integrity of data is fault tolerance. Fault tolerance is the ability for a system to continue performing despite an unexpected hardware or software malfunction. Fault tolerance can be realized in varying degrees, the optimal level of fault tolerance for a system depends on how critical its services and files are to productivity. Generally the more fault tolerant the system, the more expensive it is. The following describe some of the areas that need to be considered for fault tolerance. Environment (Temperature and humidity) Power Topology and Connectivity Servers Storage Power Typical power flaws include Surges – a brief increase in voltage due to lightening strikes, solar flares or some idiot at City Power Noise – Fluctuation in voltage levels caused by other devices on the network or electromagnetic interference Brownout – A sag in voltage for just a moment Blackout – A complete power loss The are various alternate power sources to consider including UPS’s and Generators. UPS’s are found in two categories… Standby UPS – provides continuous power when mains goes down (brief period of switching over) Online UPS – is online all the time and the device receives power from the UPS all the time (the UPS is charged continuously) Servers There are various techniques for fault tolerance with servers. Server mirroring is an option where one device or component duplicates the activities of another. It is generally an expensive process. Clustering is a fault tolerance technique that links multiple servers together to appear as a single server. They share processing and storage responsibilities and if one unit in the cluster goes down, another unit can be brought in to replace it. Storage There are various techniques available including the following… RAID Arrays NAS (Storage (Network Attached Storage) SANs (Storage Area Networks) Data Backup A backup is a copy of data or program files created for archiving or safekeeping. Many different options for backups exist with various media including… These vary in cost and speed. Optical Media Tape Backup External Disk Drives Network Backups Backup Strategy After selecting the appropriate tool for performing your servers backup, devise a backup strategy to guide you through performing reliable backups that provide maximum data protection. Questions that should be answered include… What data must be backed up At what time of day or night will the backups occur How will you verify the accuracy of the backups Where and for how long will backup media be stored Who will take responsibility for ensuring that backups occurred How long will you save backups Where will backup and recovery documentation be stored Different backup methods provide varying levels of certainty and corresponding labour cost. There are also different ways to determine which files should be backed up including… Full backup – all data on all servers is copied to storage media Incremental backup – Only data that has changed since the last full or incremental backup is copied to a storage medium Differential backup – Only data that has changed since the last backup is coped to a storage medium Disaster Recovery Disaster recovery is the process of restoring your critical functionality and data after an enterprise wide outage has occurred. A disaster recovery plan is for extreme scenarios (i.e. fire, line fault, etc). A cold site is a place were the computers, devices, and connectivity necessary to rebuild a network exist but they are not appropriately configured. A warm site is a place where the computers, devices, and connectivity necessary to rebuild a network exists with some appropriately configured devices. A hot site is a place where the computers, devices, and connectivity necessary to rebuild a network exists and all are appropriately configured.

    Read the article

  • Oracle NoSQL Database Exceeds 1 Million Mixed YCSB Ops/Sec

    - by Charles Lamb
    We ran a set of YCSB performance tests on Oracle NoSQL Database using SSD cards and Intel Xeon E5-2690 CPUs with the goal of achieving 1M mixed ops/sec on a 95% read / 5% update workload. We used the standard YCSB parameters: 13 byte keys and 1KB data size (1,102 bytes after serialization). The maximum database size was 2 billion records, or approximately 2 TB of data. We sized the shards to ensure that this was not an "in-memory" test (i.e. the data portion of the B-Trees did not fit into memory). All updates were durable and used the "simple majority" replica ack policy, effectively 'committing to the network'. All read operations used the Consistency.NONE_REQUIRED parameter allowing reads to be performed on any replica. In the past we have achieved 100K ops/sec using SSD cards on a single shard cluster (replication factor 3) so for this test we used 10 shards on 15 Storage Nodes with each SN carrying 2 Rep Nodes and each RN assigned to its own SSD card. After correcting a scaling problem in YCSB, we blew past the 1M ops/sec mark with 8 shards and proceeded to hit 1.2M ops/sec with 10 shards.  Hardware Configuration We used 15 servers, each configured with two 335 GB SSD cards. We did not have homogeneous CPUs across all 15 servers available to us so 12 of the 15 were Xeon E5-2690, 2.9 GHz, 2 sockets, 32 threads, 193 GB RAM, and the other 3 were Xeon E5-2680, 2.7 GHz, 2 sockets, 32 threads, 193 GB RAM.  There might have been some upside in having all 15 machines configured with the faster CPU, but since CPU was not the limiting factor we don't believe the improvement would be significant. The client machines were Xeon X5670, 2.93 GHz, 2 sockets, 24 threads, 96 GB RAM. Although the clients had 96 GB of RAM, neither the NoSQL Database or YCSB clients require anywhere near that amount of memory and the test could have just easily been run with much less. Networking was all 10GigE. YCSB Scaling Problem We made three modifications to the YCSB benchmark. The first was to allow the test to accommodate more than 2 billion records (effectively int's vs long's). To keep the key size constant, we changed the code to use base 32 for the user ids. The second change involved to the way we run the YCSB client in order to make the test itself horizontally scalable.The basic problem has to do with the way the YCSB test creates its Zipfian distribution of keys which is intended to model "real" loads by generating clusters of key collisions. Unfortunately, the percentage of collisions on the most contentious keys remains the same even as the number of keys in the database increases. As we scale up the load, the number of collisions on those keys increases as well, eventually exceeding the capacity of the single server used for a given key.This is not a workload that is realistic or amenable to horizontal scaling. YCSB does provide alternate key distribution algorithms so this is not a shortcoming of YCSB in general. We decided that a better model would be for the key collisions to be limited to a given YCSB client process. That way, as additional YCSB client processes (i.e. additional load) are added, they each maintain the same number of collisions they encounter themselves, but do not increase the number of collisions on a single key in the entire store. We added client processes proportionally to the number of records in the database (and therefore the number of shards). This change to the use of YCSB better models a use case where new groups of users are likely to access either just their own entries, or entries within their own subgroups, rather than all users showing the same interest in a single global collection of keys. If an application finds every user having the same likelihood of wanting to modify a single global key, that application has no real hope of getting horizontal scaling. Finally, we used read/modify/write (also known as "Compare And Set") style updates during the mixed phase. This uses versioned operations to make sure that no updates are lost. This mode of operation provides better application behavior than the way we have typically run YCSB in the past, and is only practical at scale because we eliminated the shared key collision hotspots.It is also a more realistic testing scenario. To reiterate, all updates used a simple majority replica ack policy making them durable. Scalability Results In the table below, the "KVS Size" column is the number of records with the number of shards and the replication factor. Hence, the first row indicates 400m total records in the NoSQL Database (KV Store), 2 shards, and a replication factor of 3. The "Clients" column indicates the number of YCSB client processes. "Threads" is the number of threads per process with the total number of threads. Hence, 90 threads per YCSB process for a total of 360 threads. The client processes were distributed across 10 client machines. Shards KVS Size Clients Mixed (records) Threads OverallThroughput(ops/sec) Read Latencyav/95%/99%(ms) Write Latencyav/95%/99%(ms) 2 400m(2x3) 4 90(360) 302,152 0.76/1/3 3.08/8/35 4 800m(4x3) 8 90(720) 558,569 0.79/1/4 3.82/16/45 8 1600m(8x3) 16 90(1440) 1,028,868 0.85/2/5 4.29/21/51 10 2000m(10x3) 20 90(1800) 1,244,550 0.88/2/6 4.47/23/53

    Read the article

  • Understanding the 'High Performance' meaning in Extreme Transaction Processing

    - by kyap
    Despite my previous blogs entries on SOA/BPM and Identity Management, the domain where I'm the most passionated is definitely the Extreme Transaction Processing, commonly called XTP.I came across XTP back to 2007 while I was still FMW Product Manager in EMEA. At that time Oracle acquired a company called Tangosol, which owned an unique product called Coherence that we renamed to Oracle Coherence. Beside this innovative renaming of the product, to be honest, I didn't know much about it, except being a "distributed in-memory cache for Extreme Transaction Processing"... not very helpful still.In general when people doesn't fully understand a technology or a concept, they tend to find some shortcuts, either correct or not, to justify their lack-of understanding... and of course I was part of this category of individuals. And the shortcut was "Oracle Coherence Cache helps to improve Performance". Excellent marketing slogan... but not very meaningful still. By chance I was able to get away quickly from that group in July 2007* at Thames Valley Park (UK), after I attended one of the most interesting workshops, in my 10 years career in Oracle, delivered by Brian Oliver. The biggest mistake I made was to assume that performance improvement with Coherence was related to the response time. Which can be considered as legitimus at that time, because after-all caches help to reduce latency on cached data access, hence reduce the response-time. But like all caches, you need to define caching and expiration policies, thinking about the cache-missed strategy, and most of the time you have to re-write partially your application in order to work with the cache. At a result, the expected benefit vanishes... so, not very useful then?The key mistake I made was my perception or obsession on how performance improvement should be driven, but I strongly believe this is still a common problem to most of the developers. In fact we all know the that the performance of a system is generally presented by the Capacity (or Throughput), with the 2 important dimensions Speed (response-time) and Volume (load) :Capacity (TPS) = Volume (T) / Speed (S)To increase the Capacity, we can either reduce the Speed(in terms of response-time), or to increase the Volume. However we tend to only focus on reducing the Speed dimension, perhaps it is more concrete and tangible to measure, and nicer to present to our management because there's a direct impact onto the end-users experience. On the other hand, we assume the Volume can be addressed by the underlying hardware or software stack, so if we need more capacity (scale out), we just add more hardware or software. Unfortunately, the reality proves that IT is never as ideal as we assume...The challenge with Speed improvement approach is that it is generally difficult and costly to make things already fast... faster. And by adding Coherence will not necessarily help either. Even though we manage to do so, the Capacity can not increase forever because... the Speed can be influenced by the Volume. For all system, we always have a performance illustration as follow: In all traditional system, the increase of Volume (Transaction) will also increase the Speed (Response-Time) as some point. The reason is simple: most of the time the Application logics were not designed to scale. As an example, if you have a while-loop in your application, it is natural to conceive that parsing 200 entries will require double execution-time compared to 100 entries. If you need to "Speed-up" the execution, you can only upgrade your hardware (scale-up) with faster CPU and/or network to reduce network latency. It is technically limited and economically inefficient. And this is exactly where XTP and Coherence kick in. The primary objective of XTP is about designing applications which can scale-out for increasing the Volume, by applying coding techniques to keep the execution-time as constant as possible, independently of the number of runtime data being manipulated. It is actually not just about having an application running as fast as possible, but about having a much more predictable system, with constant response-time and linearly scale, so we can easily increase throughput by adding more hardwares in parallel. It is in general combined with the Low Latency Programming model, where we tried to optimize the network usage as much as possible, either from the programmatic angle (less network-hoops to complete a task), and/or from a hardware angle (faster network equipments). In this picture, Oracle Coherence can be considered as software-level XTP enabler, via the Distributed-Cache because it can guarantee: - Constant Data Objects access time, independently from the number of Objects and the Coherence Cluster size - Data Objects Distribution by Affinity for in-memory data grouping - In-place Data Processing for parallel executionTo summarize, Oracle Coherence is indeed useful to improve your application performance, just not in the way we commonly think. It's not about the Speed itself, but about the overall Capacity with Extreme Load while keeping consistant Speed. In the future I will keep adding new blog entries around this topic, with some sample codes experiences sharing that I capture in the last few years. In the meanwhile if you want to know more how Oracle Coherence, I strongly suggest you to start with checking how our worldwide customers are using Oracle Coherence first, then you can start playing with the product through our tutorial.Have Fun !

    Read the article

  • What do the participants say about the Open Day in South Africa?

    - by Maria Sandu
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 On the 26th of September, a group of students who were specifically selected to attend an Open day at Oracle South Africa, joined us at our offices in Woodmead, Johannesburg. The Conference room was filled with inquisitive minds. What we had in store for them was a detailed presentation about Oracle which was delivered by Zuko - Cluster Leader: Tech GB South Africa. The student’s many questions were all answered especially when we started addressing the opportunities we have and detailed information on our Graduate Programme. Our employees then came to talk about their experience. This allowed all the students to have an integrated learning experience. By inviting the students to walk around our Oracle Offices allowed them to see, talk, experience a bit of the culture and ask more questions. Here is some of the feedback from the attendees: Maxwell Moloi: “The open day truly served its purpose and exceeded expectations in the sense that I got to find out more about Oracle and all the different opportunities it has to offer. The fact that Oracle supplies a full solution to a customer and not just part of it and how the company manages to setup professional development for their employees is what entices me to want to join the rapidly growing team of Oracle.” Nqobile Mabaso: “I found the open day to be quite informative and enlightening because coming from a marketing background I could apply the knowledge I got from varsity to the Company I was able to point out what they do as part of their corporate social responsibility (Oracle recently partnered with the department of education to build a school), how Oracle emphasizes on relationship building because they know they sell to people and not companies and how they offer the full stack of solutions which gives them a competitive advantage over their competitors.” Nondumiso Mvelase: “The Open Day was a wonderful experience for me especially because I have never been part of an Open Day before, so it was absolutely amazing for me. It gave me a good idea of how it is to be part of Oracle. We were served with lovely breakfast and lunch which I enjoyed. I wish the Open Day went on for a whole week. Seeing and hearing from 2013 Graduates, telling us about their experience within Oracle was very inspiring to me. They were encouraging us to work hard if we ever got the opportunity they had. After hearing this from them I will definitely not take it for granted.” Itumeleng Moraka: “Before I walked into the Oracle offices all that was in my mind was databases and cloud storage. I was then surrounded by passionate, enthusiastic and welcoming employees. I came across a positive energy within the multinational company. I realized that Oracle is not a company that operates in survival mode. This may sound idealistic, but they operate in a non-traditional way investing more into innovation, they stay focused on what matters most about where technology is going and at the same time they are not losing sight of how their products make a difference in the world.” For more information on how to be part of the Oracle Graduate Programme please follow us on Facebook! https://www.facebook.com/CampusAtOracle /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-family:"Calibri","sans-serif"; mso-ascii- mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi- mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

    Read the article

  • Lesi, from Graduate Trainee to Territory Manager

    - by Maria Sandu
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 It’s the final year, University is now coming to an end. A new chapter now awaits my arrival. This part of my life is called “Looking for a Job”. With no form of experience whatsoever, getting a job at a well renowned IT company is something that every IT student dreams about. CV: v, Application form: v, interviews: v. Acceptance Call, “Lesi I’m pleased to inform you that you have been accepted to be part of the Oracle Graduate Program for 2012”. Life would never again be the same. Being Part of the Graduate Program Going into the Graduate program, I felt like a baby seeing candy for the first time. The Program gave me the platform to not only break in to the workplace but also to help launch my career. Over the next 3 months, I went through various trainings / workshops / events / coaching / mentorship sessions. Like a construction worker building a solid foundation for a beautifully designed architecture, a clear path to build my career was set. With training out the way, it was now time to start working closely with my team. For the rest of the year, it was all about selling. Sales, Pipeline, Forecasting and numbers soon became the common words in my career. As the saying goes, “once a sales man, always a sales man”. There was no turning back now, a career in sales was the new hustle in my life. I worked closely with my mentor & coach (Ibrahim) who was heading up Zambia and Malawi. This was to be one of my best moments in the program as I started engaging with customers and getting some hands on experience in the field. By the end of the program all the experience, hard work, training and resources came in handy as I was now ready and fully groomed to be a sales rep. Life after the Graduate Program I’m proud to say that now I’m a Territory Manager, heading up Malawi, selling Technology, Middleware & Applications across all industries. I’m part of the Transition Cluster Team, a powerful team headed by the seasoned Senior Director. As a Territory Manager my role is to push for coverage, to penetrate the market by selling Oracle from end- to- end to all accounts in Malawi. I now spend my days living out of a suitcase, moving from hotel to hotel, chasing after business in all areas of Malawi. It’s the life of a Sales Man and I’m enjoying every minute of it. I’m truly fortunate and grateful to have been part of such a wonderful graduate program. I owe my Sales career to the graduate program, and I truly hope that the program will continue to develop and to groom new talent amongst the youth of this world. If you're interested in joining the Graduate Program in South Africa keep an eye on our CampusatOracle Facebook Page page to get the latest updates! /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

    Read the article

  • GI ????

    - by Allen Gao
    Normal 0 7.8 ? 0 2 false false false MicrosoftInternetExplorer4 classid="clsid:38481807-CA0E-42D2-BF39-B33AF135CC4D" id=ieooui st1\:*{behavior:url(#ieooui) } /* Style Definitions */ table.MsoNormalTable {mso-style-name:????; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} ??????????11gR2 GI ?????????,??????GI????????????????????? ????????GI???????3???,ohasd??,??????,??????? ??,ohasd??? 1. /etc/inittab?????? h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null ???,??????? root 4865 1 0 Dec02 ? 00:01:01 /bin/sh /etc/init.d/init.ohasd run ??????????????,???? +init.ohasd ????????? + os????????? + ??S* ohasd????, ??S96ohasd + GI????????(crsctl enable crs) ??,ohasd.bin ??????,????OLR????,??,??ohasd.bin??????,?????OLR??????????????OLR???$GRID_HOME/cdata/${HOSTNAME}.olr 2. ohasd.bin????????agents(orarootagent, oraagent, cssdagnet ? cssdmonitor) ???????????????,?????agent??????,??????????$GRID_HOME/bin ???????????,??,?????????,??corruption. ???,??????? 1. Mdnsd ??????(Multicast)???????????????????,??????????????????????????? 2. Gpnpd ????,??????????bootstrap ??,??????????????gpnp profile???,?????mdnsd??????,???????????,?????????????,??gpnp profile (<gi_home>/gpnp/profiles/peer/profile.xml)?????????? 3. Gipcd ????,????????????????(cluster interconnect)?????,???????gpnpd???,??,??????????,?????gpnpd ??????? 4. Ocssd.bin ?????????????gpnp profile?????????(Voting Disk),????gpnpd ??????????,?????????????,??ocssd.bin ??????,?????????? + gpnp profile ?????????? + gpnpd ??????? + ??????asm disk ??????????? + ??????????? 5. ??????????:ora.ctssd, ora.asm, ora.cluster_interconnect.haip, ora.crf, ora.crsd ?? ??:????????????????ocssd.bin, gpnpd.bin ? gipcd.bin ????,??gpnpd.bin????,ocssd.bin ? gipcd.bin ?????????,?gpnpd.bin????????,ocssd.bin ? gipcd.bin ????????gpnp profile?????????? ??,????????????,?????crsd????????? 1. Crsd?????????????OCR,????OCR????ASM?,???? ASM??????,??OCR???ASM??????????OCR???????,???????????????? 2. Crsd ?????agents(orarootagent, oraagent_<rdbms_owner>, oraagent_<gi_owner> )???agent????,??????????$GRID_HOME/bin ???????????,??,?????????,??corruption. 3. ????????  ora.net1.network : ????,?????????????,scanvip, vip, listener?????????????,??????????,vip, scanvip ?listener ??offline,?????????????? ora.<scan_name>.vip:scan???vip??,?????3?? ora.<node_name>.vip : ?????vip ?? ora.<listener_name>.lsnr: ???????????????,?11gR2??,listener.ora???????,????????? ora.LISTENER_SCAN<n>.lsnr: scan ????? ora.<????>.dg: ASM ????????????????mount???,dismount???? ora.<????>.db: ???????11gR2????????????,??????????rac ????????,??????????,???????“USR_ORA_INST_NAME@SERVERNAME(<node name> )”???????,??????????ASM???,???????????????????,??dependency?????????,??????????????????,???dependancy???????,??????(crsctl modify res ……)? ora.<???>.svc:?????????11gR2 ??,?????????,???10gR2??,???????????,srv ?cs ????? ora.cvu :?????11.2.0.2???,???????cluvfy??,???????????????? ora.ons : ONS??,????????,????? ??,?????GI??????????????????? $GRID_HOME/log/<node_name>/ocssd <== ocssd.bin ?? $GRID_HOME/log/<node_name>/gpnpd <== gpnpd.bin ?? $GRID_HOME/log/<node_name>/gipcd <== gipcd.bin ?? $GRID_HOME/log/<node_name>/agent/crsd <== crsd.bin ?? $GRID_HOME/log/<node_name>/agent/ohasd <== ohasd.bin ?? $GRID_HOME/log/<node_name>/mdnsd <== mdnsd.bin ?? $GRID_HOME/log/<node_name>/client <== ????GI ??(ocrdump, crsctl, ocrcheck, gpnptool??)??????????? $GRID_HOME/log/<node_name>/ctssd <== ctssd.bin ?? $GRID_HOME/log/<node_name>/crsd <== crsd.bin ?? $GRID_HOME/log/<node_name>/cvu <== cluvfy ????????? $GRID_HOME/bin/diagcollection.sh <== ????????????????? ??,????????(/var/tmp/.oracle ? /tmp/.oracle),??????????????????ipc???,??,?????????????????????,???GI?????????????????????,??????????GI??????????????

    Read the article

  • Memory Leak Issue in Weblogic, SUN, Apache and Oracle classes Options

    - by Amit
    Hi All, Please find below the description of memory leaks issues. Statistics show major growth in the perm area (static classes). Flows were ran for 8 hours , Heap dump was taken after 2 hours and at the end. A growth in Perm area was identified Statistics show from our last run 240MB growth in 6 hour,40mb growth every hour 2GB heap –can hold ¾ days ,heap will be full in ¾ days Heap dump show –growth in area as mentioned below JMS connection/session Area Apache org.apache.xml.dtm.DTM[] org.apache.xml.dtm.ref.ExpandedNameTable$ExtendedType org.jdom.AttributeList org.jdom.Content[] org.jdom.ContentList org.jdom.Element SUN * ConstantPoolCacheKlass * ConstantPoolKlass * ConstMethodKlass * MethodDataKlass * MethodKlass * SymbolKlass byte[] char[] com.sun.org.apache.xml.internal.dtm.DTM[] com.sun.org.apache.xml.internal.dtm.ref.ExtendedType java.beans.PropertyDescriptor java.lang.Class java.lang.Long java.lang.ref.WeakReference java.lang.ref.SoftReference java.lang.String java.text.Format[] java.util.concurrent.ConcurrentHashMap$Segment java.util.LinkedList$Entry Weblogic com.bea.console.cvo.ConsoleValueObject$PropertyInfo com.bea.jsptools.tree.TreeNode com.bea.netuix.servlets.controls.content.StrutsContent com.bea.netuix.servlets.controls.layout.FlowLayout com.bea.netuix.servlets.controls.layout.GridLayout com.bea.netuix.servlets.controls.layout.Placeholder com.bea.netuix.servlets.controls.page.Book com.bea.netuix.servlets.controls.window.Window[] com.bea.netuix.servlets.controls.window.WindowMode javax.management.modelmbean.ModelMBeanAttributeInfo weblogic.apache.xerces.parsers.SecurityConfiguration weblogic.apache.xerces.util.AugmentationsImpl weblogic.apache.xerces.util.AugmentationsImpl$SmallContainer weblogic.apache.xerces.util.SymbolTable$Entry weblogic.apache.xerces.util.XMLAttributesImpl$Attribute weblogic.apache.xerces.xni.QName weblogic.apache.xerces.xni.QName[] weblogic.ejb.container.cache.CacheKey weblogic.ejb20.manager.SimpleKey weblogic.jdbc.common.internal.ConnectionEnv weblogic.jdbc.common.internal.StatementCacheKey weblogic.jms.common.Item weblogic.jms.common.JMSID weblogic.jms.frontend.FEConnection weblogic.logging.MessageLogger$1 weblogic.logging.WLLogRecord weblogic.rjvm.BubblingAbbrever$BubblingAbbreverEntry weblogic.rjvm.ClassTableEntry weblogic.rjvm.JVMID weblogic.rmi.cluster.ClusterableRemoteRef weblogic.rmi.internal.CollocatedRemoteRef weblogic.rmi.internal.PhantomRef weblogic.rmi.spi.ServiceContext[] weblogic.security.acl.internal.AuthenticatedSubject weblogic.security.acl.internal.AuthenticatedSubject$SealableSet weblogic.servlet.internal.ServletRuntimeMBeanImpl weblogic.transaction.internal.XidImpl weblogic.utils.collections.ConcurrentHashMap$Entry Oracle XA Transaction oracle.jdbc.driver.Binder[] oracle.jdbc.driver.OracleDatabaseMetaData oracle.jdbc.driver.T4C7Ocommoncall oracle.jdbc.driver.T4C7Oversion oracle.jdbc.driver.T4C8Oall oracle.jdbc.driver.T4C8Oclose oracle.jdbc.driver.T4C8TTIBfile oracle.jdbc.driver.T4C8TTIBlob oracle.jdbc.driver.T4C8TTIClob oracle.jdbc.driver.T4C8TTIdty oracle.jdbc.driver.T4C8TTILobd oracle.jdbc.driver.T4C8TTIpro oracle.jdbc.driver.T4C8TTIrxh oracle.jdbc.driver.T4C8TTIuds oracle.jdbc.driver.T4CCallableStatement oracle.jdbc.driver.T4CClobAccessor oracle.jdbc.driver.T4CConnection oracle.jdbc.driver.T4CMAREngine oracle.jdbc.driver.T4CNumberAccessor oracle.jdbc.driver.T4CPreparedStatement oracle.jdbc.driver.T4CTTIdcb oracle.jdbc.driver.T4CTTIk2rpc oracle.jdbc.driver.T4CTTIoac oracle.jdbc.driver.T4CTTIoac[] oracle.jdbc.driver.T4CTTIoauthenticate oracle.jdbc.driver.T4CTTIokeyval oracle.jdbc.driver.T4CTTIoscid oracle.jdbc.driver.T4CTTIoses oracle.jdbc.driver.T4CTTIOtxen oracle.jdbc.driver.T4CTTIOtxse oracle.jdbc.driver.T4CTTIsto oracle.jdbc.driver.T4CXAConnection oracle.jdbc.driver.T4CXAResource oracle.jdbc.oracore.OracleTypeADT[] oracle.jdbc.xa.OracleXAResource$XidListEntry oracle.net.ano.Ano oracle.net.ns.ClientProfile oracle.net.ns.ClientProfile oracle.net.ns.NetInputStream oracle.net.ns.NetOutputStream oracle.net.ns.SessionAtts oracle.net.nt.ConnOption oracle.net.nt.ConnStrategy oracle.net.resolver.AddrResolution oracle.sql.CharacterSet1Byte we are using Oracle BEA Weblogic 9.2 MP3 JDK 1.5.12 Oracle versoin 10.2.0.4 (for oracle we found one path which is needed to applied to avoid XA transaction memory leaks). But we are stuck to resolve SUN, BEA Weblgogic and Apache leaks. please suggest... regards, Amit J.

    Read the article

  • Loading, listing, and using R Modules and Functions in PL/R

    - by Dave Jarvis
    I am having difficulty with: Listing the R packages and functions available to PostgreSQL. Installing a package (such as Kendall) for use with PL/R Calling an R function within PostgreSQL Listing Available R Packages Q.1. How do you find out what R modules have been loaded? SELECT * FROM r_typenames(); That shows the types that are available, but what about checking if Kendall( X, Y ) is loaded? For example, the documentation shows: CREATE TABLE plr_modules ( modseq int4, modsrc text ); That seems to allow inserting records to dictate that Kendall is to be loaded, but the following code doesn't explain, syntactically, how to ensure that it gets loaded: INSERT INTO plr_modules VALUES (0, 'pg.test.module.load <-function(msg) {print(msg)}'); Q.2. What would the above line look like if you were trying to load Kendall? Q.3. Is it applicable? Installing R Packages Using the "synaptic" package manager the following packages have been installed: r-base r-base-core r-base-dev r-base-html r-base-latex r-cran-acepack r-cran-boot r-cran-car r-cran-chron r-cran-cluster r-cran-codetools r-cran-design r-cran-foreign r-cran-hmisc r-cran-kernsmooth r-cran-lattice r-cran-matrix r-cran-mgcv r-cran-nlme r-cran-quadprog r-cran-robustbase r-cran-rpart r-cran-survival r-cran-vr r-recommended Q.4. How do I know if Kendall is in there? Q.5. If it isn't, how do I find out what package it is in? Q.6. If it isn't in a package suitable for installing with apt-get (aptitude, synaptic, dpkg, what have you), how do I go about installing it on Ubuntu? Q.7. Where are the installation steps documented? Calling R Functions I have the following code: EXECUTE 'SELECT ' 'regr_slope( amount, year_taken ),' 'regr_intercept( amount, year_taken ),' 'corr( amount, year_taken ),' 'sum( measurements ) AS total_measurements ' 'FROM temp_regression' INTO STRICT slope, intercept, correlation, total_measurements; This code calls the PostgreSQL function corr to calculate Pearson's correlation over the data. Ideally, I'd like to do the following (by switching corr for plr_kendall): EXECUTE 'SELECT ' 'regr_slope( amount, year_taken ),' 'regr_intercept( amount, year_taken ),' 'plr_kendall( amount, year_taken ),' 'sum( measurements ) AS total_measurements ' 'FROM temp_regression' INTO STRICT slope, intercept, correlation, total_measurements; Q.8. Do I have to write plr_kendall myself? Q.9. Where can I find a simple example that walks through: Loading an R module into PG. Writing a PG wrapper for the desired R function. Calling the PG wrapper from a SELECT. For example, would the last two steps look like: create or replace function plr_kendall( _float8, _float8 ) returns float as ' agg_kendall(arg1, arg2) ' language 'plr'; CREATE AGGREGATE agg_kendall ( sfunc = plr_array_accum, basetype = float8, -- ??? stype = _float8, -- ??? finalfunc = plr_kendall ); And then the SELECT as above? Thank you!

    Read the article

  • Tridion Installation

    - by Kevin Brydon
    I am currently upgrading an installation of Tridion from 5.3 to 2011 starting almost from scratch (aside from migrating the database), brand new virtual servers. I just want to ask for some advice on my current server setup... a sanity check. All servers are running Windows Server 2008. The pages on our website are all classic ASP. Database SQL Server cluster. The 5.3 database has been migrated using the DatabaseManager. This is pretty standard and works well (in test anyway). Content Manager A single server to run the Content Manager and the Publisher. There are around 10 people using it at any one time so not under a particularly heavy load. Content Data Store Filesystem located somewhere on the network. One directory for live and one for staging. Content Delivery Two servers (cd1 and cd2) each with the the following server roles installed. cd1 writes to a filesystem content data store for the live website, cd2 writes to the content data store for the staging website. Presentation Two public facing web servers (web1 and web2) serving both the live and staging websites. The web servers read directly from the content data store as its a filesystem. Each of the web servers have the Content Delivery Server installed so that I can use dynamic linking (and other features?). I've so far set up everything but the web servers. Any thoughts? edit Thanks to Ram S who linked me to a decent walkthrough, upvoted. I suppose I should have posed some questions as I didn't really ask a question. I guess I'm a little confused over the content deliver aspect. I have the Content Delivery split in two separate parts. cd1 and cd2 do the work of shifting information from the Content Manager to the Staging/Live web directories. web1 and web2 should do the work of serving the web pages to the outside world and will interact with the content data store (file system). Is this a correct setup? I need some parts of the Content Delivery on my web servers right? Theoretically I could get rid of the cd1 and cd2 servers and use web1 and web2 to do the deployment right? But I suspect this will put the web servers under unnecessary strain should there ever be a big publish. I've been reading the 2011 Installation Manual, Content Delivery section, and I'm finding it quite hard to get my head around!

    Read the article

  • Problem with bootstrap loader and kernel

    - by dboarman-FissureStudios
    We are working on a project to learn how to write a kernel and learn the ins and outs. We have a bootstrap loader written and it appears to work. However we are having a problem with the kernel loading. I'll start with the first part: bootloader.asm: [BITS 16] [ORG 0x0000] ; ; all the stuff in between ; ; the bottom of the bootstrap loader datasector dw 0x0000 cluster dw 0x0000 ImageName db "KERNEL SYS" msgLoading db 0x0D, 0x0A, "Loading Kernel Shell", 0x0D, 0x0A, 0x00 msgCRLF db 0x0D, 0x0A, 0x00 msgProgress db ".", 0x00 msgFailure db 0x0D, 0x0A, "ERROR : Press key to reboot", 0x00 TIMES 510-($-$$) DB 0 DW 0xAA55 ;************************************************************************* The bootloader.asm is too long for the editor without causing it to chug and choke. In addition, the bootloader and kernel do work within bochs as we do get the message "Welcome to our OS". Anyway, the following is what we have for a kernel at this point. kernel.asm: [BITS 16] [ORG 0x0000] [SEGMENT .text] ; code segment mov ax, 0x0100 ; location where kernel is loaded mov ds, ax mov es, ax cli mov ss, ax ; stack segment mov sp, 0xFFFF ; stack pointer at 64k limit sti mov si, strWelcomeMsg ; load message call _disp_str mov ah, 0x00 int 0x16 ; interrupt: await keypress int 0x19 ; interrupt: reboot _disp_str: lodsb ; load next character or al, al ; test for NUL character jz .DONE mov ah, 0x0E ; BIOS teletype mov bh, 0x00 ; display page 0 mov bl, 0x07 ; text attribute int 0x10 ; interrupt: invoke BIOS jmp _disp_str .DONE: ret [SEGMENT .data] ; initialized data segment strWelcomeMsg db "Welcome to our OS", 0x00 [SEGMENT .bss] ; uninitialized data segment Using nasm 2.06rc2 I compile as such: nasm bootloader.asm -o bootloader.bin -f bin nasm kernel.asm -o kernel.sys -f bin We write bootloader.bin to the floppy as such: dd if=bootloader.bin bs=512 count=1 of/dev/fd0 We write kernel.sys to the floppy as such: cp kernel.sys /dev/fd0 As I stated, this works in bochs. But booting from the floppy we get output like so: Loading Kernel Shell ........... ERROR : Press key to reboot Other specifics: OpenSUSE 11.2, GNOME desktop, AMD x64 Any other information I may have missed, feel free to ask. I tried to get everything in here that would be needed. If I need to, I can find a way to get the entire bootloader.asm posted somewhere. We are not really interested in using GRUB either for several reasons. This could change, but we want to see this boot successful before we really consider GRUB.

    Read the article

  • With sqlalchemy how to dynamically bind to database engine on a per-request basis

    - by Peter Hansen
    I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security. Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date. We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter. Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files. My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request. We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation. Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this: # in lib/base.py on the BaseController class def __call__(self, environ, start_response): # note: web session contains {'username': XXX, 'password': YYY} url1 = 'postgres://%(username)s:%(password)s@server1/finance' % session url2 = 'postgres://%(username)s:%(password)s@server2/staff' % session finance = create_engine(url1) staff = create_engine(url2) db_configure(staff, finance) # see below ... etc # in another file Session = scoped_session(sessionmaker()) def db_configure(staff, finance): s = Session() from db.finance import Employee, Customer, Invoice for c in [ Employee, Customer, Invoice, ]: s.bind_mapper(c, finance) from db.staff import Project, Hour for c in [ Project, Hour, ]: s.bind_mapper(c, staff) s.close() # prevents leaking connections between sessions? So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly. But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way. Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.

    Read the article

< Previous Page | 69 70 71 72 73 74 75 76 77  | Next Page >