Search Results

Search found 7116 results on 285 pages for 'nested queries'.

Page 88/285 | < Previous Page | 84 85 86 87 88 89 90 91 92 93 94 95  | Next Page >

  • Big Data – Data Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It  supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Looking under the hood of SSRS

    - by Jim Giercyk
    SSRS is a powerful tool, but there is very little available to measure it’s performance or view the SSRS execution log or catalog in detail.  Here are a few simple queries that will give you insight to the system that you never had before.   ACTIVE REPORTS:  Have you ever seen your SQL Server performance take a nose dive due to a long-running report?  If the SPID is executing under a generic Report ID, or it is a scheduled job, you may have no way to tell which report is killing your server.  Running this query will show you which reports are executing at a given time, and WHO is executing them.   USE ReportServerNative SELECT runningjobs.computername,             runningjobs.requestname,              runningjobs.startdate,             users.username,             Datediff(s,runningjobs.startdate, Getdate()) / 60 AS    'Active Minutes' FROM runningjobs INNER JOIN users ON runningjobs.userid = users.userid ORDER BY runningjobs.startdate               SSRS CATALOG:  We have all asked “What was the last thing that changed”, or better yet, “Who in the world did that!”.  Here is a query that will show all of the reports in your SSRS catalog, when they were created and changed, and by who.           USE ReportServerNative SELECT DISTINCT catalog.PATH,                            catalog.name,                            users.username AS [Created By],                             catalog.creationdate,                            users_1.username AS [Modified By],                            catalog.modifieddate FROM catalog         INNER JOIN users ON catalog.createdbyid = users.userid  INNER JOIN users AS users_1 ON catalog.modifiedbyid = users_1.userid INNER JOIN executionlogstorage ON catalog.itemid = executionlogstorage.reportid WHERE ( catalog.name <> '' )               SSRS EXECUTION LOG:  Sometimes we need to know what was happening on the SSRS report server at a given time in the past.  This query will help you do just that.  You will need to set the timestart and timeend in the WHERE clause to suit your needs.         USE ReportServerNative SELECT catalog.name AS report,        executionlogstorage.username AS [User],        executionlogstorage.timestart,        executionlogstorage.timeend,         Datediff(mi,e.timestart,e.timeend) AS ‘Time In Minutes',        catalog.modifieddate AS [Report Last Modified],        users.username FROM   catalog  (nolock)        INNER JOIN executionlogstorage e (nolock)          ON catalog.itemid = executionlogstorage.reportid        INNER JOIN users (nolock)          ON catalog.modifiedbyid = users.userid WHERE  executionlogstorage.timestart >= Dateadd(s, -1, '03/31/2012')        AND executionlogstorage.timeend <= Dateadd(DAY, 1, '04/02/2012')      LONG RUNNING REPORTS:  This query will show the longest running reports over a given time period.  Note that the “>5” in the WHERE clause sets the report threshold at 5 minutes, so anything that ran less than 5 minutes will not appear in the result set.  Adjust the threshold and start/end times to your liking.  With this information in hand, you can better optimize your system by tweaking the longest running reports first.         USE ReportServerNative SELECT executionlogstorage.instancename,        catalog.PATH,        catalog.name,        executionlogstorage.username,        executionlogstorage.timestart,        executionlogstorage.timeend,        Datediff(mi, e.timestart, e.timeend) AS 'Minutes',        executionlogstorage.timedataretrieval,        executionlogstorage.timeprocessing,        executionlogstorage.timerendering,        executionlogstorage.[RowCount],        users_1.username        AS createdby,        CONVERT(VARCHAR(10), catalog.creationdate, 101)        AS 'Creation Date',        users.username        AS modifiedby,        CONVERT(VARCHAR(10), catalog.modifieddate, 101)        AS 'Modified Date' FROM   executionlogstorage e         INNER JOIN catalog          ON executionlogstorage.reportid = catalog.itemid        INNER JOIN users          ON catalog.modifiedbyid = users.userid        INNER JOIN users AS users_1          ON catalog.createdbyid = users_1.userid WHERE  ( e.timestart > '03/31/2012' )        AND ( e.timestart <= '04/02/2012' )        AND  Datediff(mi, e.timestart, e.timeend) > 5        AND catalog.name <> '' ORDER  BY 'Minutes' DESC        I have used these queries to build SSRS reports that I can refer to quickly, and export to Excel if I need to report or quantify my findings.  I encourage you to look at the data in the ReportServerNative database on your report server to understand the queries and create some of your own.  For instance, you may want a query to determine which reports are using which shared data sources.  Work smarter, not harder!

    Read the article

  • Getting Started with StreamInsight 2.1

    - by Roman Schindlauer
    If you're just beginning to get familiar with StreamInsight, you may be looking for a way to get started. What are the basics? How can I get my first StreamInsight application running so I can see how it works? Where is the 'front door' that will get me going? If that describes you, then this blog entry might be just what you need. If you're already a StreamInsight wiz, keep reading anyway - you may find some helpful links here that you weren't aware of. But here's what we'd like from you experienced readers in particular: if you know of other good resources that we missed, please feel free to add them in the comments below. We appreciate you sharing your expertise. The Book The basic documentation for StreamInsight is located in the MSDN Library (Microsoft StreamInsight 2.1). You'll notice that previous versions of StreamInsight are still there (1.2 and 2.0), but if you're just getting started you can stick to the 2.1 section. The documentation has been organized to function as reference material, which is fine after you're familiar with the technology. But if you're trying to learn the basics, you might want to take a different path instead of just starting at the top. The following is one map you can use. What Is StreamInsight? Here is a sequence of topics that should give you a good overview of what StreamInsight is and how it works: Overview answers the question, "what is it?" StreamInsight Server Architecture gives you a quick look at a high-level architectural drawing StreamInsight Concepts lays out an overview of the basic components Deploying StreamInsight Entities to a StreamInsight Server describes the mechanics of how these components work together Getting an Example Running Once you have this background, go ahead and install StreamInsight and get a basic example up and running: Installation download and install the software StreamInsight Examples walk through a set of 3 simple StreamInsight applications that work together to demonstrate what you learned in the topics above; you can copy and paste the code into Visual Studio, compile, and run That's it - you now have a real, functioning StreamInsight system! Now that you have a handle on the basics, you might want to start digging deeper. Digging Deeper Here's a suggested path through the documentation to help you understand the next layer of StreamInsight technologies: Using Event Sources and Event Sinks sources supply data and sinks consume it; this topic gives you an overview of how they work Publishing and Connecting to the StreamInsight Server practical details on how to set up a StreamInsight server A Hitchhiker’s Guide to StreamInsight 2.1 Queries queries are the heart of how StreamInsight performs data analytics, and this whitepaper will help you really understand how they work Using StreamInsight LINQ root through this section for technical details on specific query components Using the StreamInsight Event Flow Debugger in addition to troubleshooting, the debugger is a great way to learn more about what goes on inside a StreamInsight application And Even Deeper Finally, to get a handle on some of the more complex things you can do with StreamInsight, dig into these: Input and Output Adapters adapters can be useful for handling more complex sources and sinks Building Resilient StreamInsight Applications a resilient application is able to recover from system failures Operations this section will help you monitor and troubleshoot a running StreamInsight system The StreamInsight Community As you're designing and developing your StreamInsight solutions, you probably will find it helpful to see working examples or to learn tips and tricks from others. Or maybe you need a place to post a vexing question. Here are some community resources that we have found useful. If you know of others, please add them in the comments below. Code samples and tools Official StreamInsight code samples Introduction to LinqPad Driver for StreamInsight 2.1 - LinqPad is a very useful tool for developing queries The following case studies are based on earlier versions of StreamInsight, but they still are useful examples: Microsoft Media Analytics - real-time monitoring and analytic Edgenet - responding to information from multiple source ICONICS - managing energy usage Blogs Microsoft StreamInsight Ruminations of J.net Richard Seroter's Architecture Musings pluralsight Forums MSDN StreamInsight Forum stackoverflow Training Microsoft StreamInsight Fundamentals (“Introducing StreamInsight” is free) from pluralsight Twitter @streaminsight   You’re a StreamInsight Expert That should get you going. Please add any other resources you have found useful in the comments below.   Regards, The StreamInsight Team

    Read the article

  • Tuning Red Gate: #1 of Many

    - by Grant Fritchey
    Everyone runs into performance issues at some point. Same thing goes for Red Gate software. Some of our internal systems were running into some serious bottlenecks. It just so happens that we have this nice little SQL Server monitoring tool. What if I were to, oh, I don't know, use the monitoring tool to identify the bottlenecks, figure out the causes and then apply a fix (where possible) and then start the whole thing all over again? Just a crazy thought. OK, I was asked to. This is my first time looking through these servers, so here's how I'd go about using SQL Monitor to get a quick health check, sort of like checking the vitals on a patient. First time opening up our internal SQL Monitor instance and I was greeted with this: Oh my. Maybe I need to get our internal guys to read my blog. Anyway, I know that there are two servers where most of the load is. I'll drill down on the first. I'm selecting the server, not the instance, by clicking on the server name. That opens up the Global Overview page for the server. The information here much more applicable to the "oh my gosh, I have a problem now" type of monitoring. But, looking at this, I am seeing something immediately. There are four(4) drives on the system. The C:\ has an average read time of 16.9ms, more than double the others. Is that a problem? Not sure, but it's something I'll look at. It's write time is higher too. I'll keep drilling down, first, to the unclosed alerts on the server. Now things get interesting. SQL Monitor has a number of different types of alerts, some related to error states, others to service status, and then some related to performance. Guess what I'm seeing a bunch of right here: Long running queries and long job durations. If you check the dates, they're all recent, within the last 24 hours. If they had just been old, uncleared alerts, I wouldn't be that concerned. But with all these, all performance related, and all in the last 24 hours, yeah, I'm concerned. At this point, I could just start responding to the Alerts. If I click on one of the the Long-running query alerts, I'll get all kinds of cool data that can help me determine why the query ran long. But, I'm not in a reactive mode here yet. I'm still gathering data, trying to understand how the server works. I have the information that we're generating a lot of performance alerts, let's sock that away for the moment. Instead, I'm going to back up and look at the Global Overview for the SQL Instance. It shows all the databases on the server and their status. Then it shows a number of basic metrics about the SQL Server instance, again for that "what's happening now" view or things. Then, down at the bottom, there is the Top 10 expensive queries list: This is great stuff. And no, not because I can see the top queries for the last 5 minutes, but because I can adjust that out 3 days. Now I can see where some serious pain is occurring over the last few days. Databases have been blocked out to protect the guilty. That's it for the moment. I have enough knowledge of what's going on in the system that I can start to try to figure out why the system is running slowly. But, I want to look a little more at some historical data, to understand better how this server is behaving. More next time.

    Read the article

  • SQL SERVER – Puzzle #1 – Querying Pattern Ranges and Wild Cards

    - by Pinal Dave
    Note: Read at the end of the blog post how you can get five Joes 2 Pros Book #1 and a surprise gift. I have been blogging for almost 7 years and every other day I receive questions about Querying Pattern Ranges. The most common way to solve the problem is to use Wild Cards. However, not everyone knows how to use wild card properly. SQL Queries 2012 Joes 2 Pros Volume 1 – The SQL Queries 2012 Hands-On Tutorial for Beginners Book On Amazon | Book On Flipkart Learn SQL Server get all the five parts combo kit Kit on Amazon | Kit on Flipkart Many people know wildcards are great for finding patterns in character data. There are also some special sequences with wildcards that can give you even more power. This series from SQL Queries 2012 Joes 2 Pros® Volume 1 will show you some of these cool tricks. All supporting files are available with a free download from the www.Joes2Pros.com web site. This example is from the SQL 2012 series Volume 1 in the file SQLQueries2012Vol1Chapter2.2Setup.sql. If you need help setting up then look in the “Free Videos” section on Joes2Pros under “Getting Started” called “How to install your labs” Querying Pattern Ranges The % wildcard character represents any number of characters of any length. Let’s find all first names that end in the letter ‘A’. By using the percentage ‘%’ sign with the letter ‘A’, we achieve this goal using the code sample below: SELECT * FROM Employee WHERE FirstName LIKE '%A' To find all FirstName values beginning with the letters ‘A’ or ‘B’ we can use two predicates in our WHERE clause, by separating them with the OR statement. Finding names beginning with an ‘A’ or ‘B’ is easy and this works fine until we want a larger range of letters as in the example below for ‘A’ thru ‘K’: SELECT * FROM Employee WHERE FirstName LIKE 'A%' OR FirstName LIKE 'B%' OR FirstName LIKE 'C%' OR FirstName LIKE 'D%' OR FirstName LIKE 'E%' OR FirstName LIKE 'F%' OR FirstName LIKE 'G%' OR FirstName LIKE 'H%' OR FirstName LIKE 'I%' OR FirstName LIKE 'J%' OR FirstName LIKE 'K%' The previous query does find FirstName values beginning with the letters ‘A’ thru ‘K’. However, when a query requires a large range of letters, the LIKE operator has an even better option. Since the first letter of the FirstName field can be ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’ or ‘K’, simply list all these choices inside a set of square brackets followed by the ‘%’ wildcard, as in the example below: SELECT * FROM Employee WHERE FirstName LIKE '[ABCDEFGHIJK]%' A more elegant example of this technique recognizes that all these letters are in a continuous range, so we really only need to list the first and last letter of the range inside the square brackets, followed by the ‘%’ wildcard allowing for any number of characters after the first letter in the range. Note: A predicate that uses a range will not work with the ‘=’ operator (equals sign). It will neither raise an error, nor produce a result set. --Bad query (will not error or return any records) SELECT * FROM Employee WHERE FirstName = '[A-K]%' Question: You want to find all first names that start with the letters A-M in your Customer table and end with the letter Z. Which SQL code would you use? a. SELECT * FROM Customer WHERE FirstName LIKE 'm%z' b. SELECT * FROM Customer WHERE FirstName LIKE 'a-m%z' c. SELECT * FROM Customer WHERE FirstName LIKE 'a-m%z' d. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]%z' e. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]z%' f. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]%z' g. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]z%' Contest Leave a valid answer before June 18, 2013 in the comment section. 5 winners will be selected from all the valid answers and will receive Joes 2 Pros Book #1. 1 Lucky person will get a surprise gift from Joes 2 Pros. The contest is open for all the countries where Amazon ships the book (USA, UK, Canada, India and many others). Special Note: Read all the options before you provide valid answer as there is a small trick hidden in answers. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • PASS Summit 2012: keynote and Mobile BI announcements #sqlpass

    - by Marco Russo (SQLBI)
    Today at PASS Summit 2012 there have been several announcements during the keynote. Moreover, other news have not been highlighted in the keynote but are equally if not more important for the BI community. Let’s start from the big news in the keynote (other details on SQL Server Blog): Hekaton: this is the codename for in-memory OLTP technology that will appear (I suppose) in the next release of the SQL Server relational engine. The improvement in performance and scalability is impressive and it enables new scenarios. I’m curious to see whether it can be used also to improve ETL performance and how it differs from using SSD technology. Updates on Columnstore: In the next major release of SQL Server the columnstore indexes will be updatable and it will be possible to create a clustered index with Columnstore index. This is really a great news for near real-time reporting needs! Polybase: in 2013 it will debut SQL Server 2012 Parallel Data Warehouse (PDW), which will include the Polybase technology. By using Polybase a single T-SQL query will run queries across relational data and Hadoop data. A single query language for both. Sounds really interesting for using BigData in a more integrated way with existing relational databases. And, of course, to load a data warehouse using BigData, which is the ultimate goal that we all BI Pro have, right? SQL Server 2012 SP1: the Service Pack 1 for SQL Server 2012 is available now and it enable the use of PowerPivot for SharePoint and Power View on a SharePoint 2013 installation with Excel 2013. Power View works with Multidimensional cube: the long-awaited feature of being able to use PowerPivot with Multidimensional cubes has been shown by Amir Netz in an amazing demonstration during the keynote. The interesting thing is that the data model behind was based on a many-to-many relationship (something that is not fully supported by Power View with Tabular models). Another interesting aspect is that it is Analysis Services 2012 that supports DAX queries run on a Multidimensional model, enabling the use of any future tool generating DAX queries on top of a Multidimensional model. There are still no info about availability by now, but this is *not* included in SQL Server 2012 SP1. So what about Mobile BI? Well, even if not announced during the keynote, there is a dedicated session on this topic and there are very important news in this area: iOS, Android and Microsoft mobile platforms: the commitment is to get data exploration and visualization capabilities working within June 2013. This should impact at least Power View and SharePoint/Excel Services. This is the type of UI experience we are all waiting for, in order to satisfy the requests coming from users and customers. The important news here is that native applications will be available for both iOS and Windows 8 so it seems that Android will be supported initially only through the web. Unfortunately we haven’t seen any demo, so it’s not clear what will be the offline navigation experience (and whether there will be one). But at least we know that Microsoft is working on native applications in this area. I’m not too surprised that HTML5 is not the magic bullet for all the platforms. The next PASS Business Analytics conference in 2013 seems a good place to see this in action, even if I hope we don’t have to wait other six months before seeing some demo of native BI applications on mobile platforms! Viewing Reporting Services reports on iPad is supported starting with SQL Server 2012 SP1, which has been released today. This is another good reason to install SP1 on SQL Server 2012. If you are at PASS Summit 2012, come and join me, Alberto Ferrari and Chris Webb at our book signing event tomorrow, Thursday 8 2012, at the bookstore between 12:00pm and 12:30pm, or follow one of our sessions!

    Read the article

  • SQL Server Optimizer Malfunction?

    - by Tony Davis
    There was a sharp intake of breath from the audience when Adam Machanic declared the SQL Server optimizer to be essentially "stuck in 1997". It was during his fascinating "Query Tuning Mastery: Manhandling Parallelism" session at the recent PASS SQL Summit. Paraphrasing somewhat, Adam (blog | @AdamMachanic) offered a convincing argument that the optimizer often delivers flawed plans based on assumptions that are no longer valid with today’s hardware. In 1997, when Microsoft engineers re-designed the database engine for SQL Server 7.0, SQL Server got its initial implementation of a cost-based optimizer. Up to SQL Server 2000, the developer often had to deploy a steady stream of hints in SQL statements to combat the occasionally wilful plan choices made by the optimizer. However, with each successive release, the optimizer has evolved and improved in its decision-making. It is still prone to the occasional stumble when we tackle difficult problems, join large numbers of tables, perform complex aggregations, and so on, but for most of us, most of the time, the optimizer purrs along efficiently in the background. Adam, however, challenged further any assumption that the current optimizer is competent at providing the most efficient plans for our more complex analytical queries, and in particular of offering up correctly parallelized plans. He painted a picture of a present where complex analytical queries have become ever more prevalent; where disk IO is ever faster so that reads from disk come into buffer cache faster than ever; where the improving RAM-to-data ratio means that we have a better chance of finding our data in cache. Most importantly, we have more CPUs at our disposal than ever before. To get these queries to perform, we not only need to have the right indexes, but also to be able to split the data up into subsets and spread its processing evenly across all these available CPUs. Improvements such as support for ColumnStore indexes are taking things in the right direction, but, unfortunately, deficiencies in the current Optimizer mean that SQL Server is yet to be able to exploit properly all those extra CPUs. Adam’s contention was that the current optimizer uses essentially the same costing model for many of its core operations as it did back in the days of SQL Server 7, based on assumptions that are no longer valid. One example he gave was a "slow disk" bias that may have been valid back in 1997 but certainly is not on modern disk systems. Essentially, the optimizer assesses the relative cost of serial versus parallel plans based on the assumption that there is no IO cost benefit from parallelization, only CPU. It assumes that a single request will saturate the IO channel, and so a query would not run any faster if we parallelized IO because the disk system simply wouldn’t be able to handle the extra pressure. As such, the optimizer often decides that a serial plan is lower cost, often in cases where a parallel plan would improve performance dramatically. It was challenging and thought provoking stuff, as were his techniques for driving parallelism through query logic based on subsets of rows that define the "grain" of the query. I highly recommend you catch the session if you missed it. I’m interested to hear though, when and how often people feel the force of the optimizer’s shortcomings. Barring mistakes, such as stale statistics, how often do you feel the Optimizer fails to find the plan you think it should, and what are the most common causes? Is it fighting to induce it toward parallelism? Combating unexpected plans, arising from table partitioning? Something altogether more prosaic? Cheers, Tony.

    Read the article

  • The five steps of business intelligence adoption: where are you?

    - by Red Gate Software BI Tools Team
    When I was in Orlando and New York last month, I spoke to a lot of business intelligence users. What they told me suggested a path of BI adoption. The user’s place on the path depends on the size and sophistication of their organisation. Step 1: A company with a database of customer transactions will often want to examine particular data, like revenue and unit sales over the last period for each product and territory. To do this, they probably use simple SQL queries or stored procedures to produce data on demand. Step 2: The results from step one are saved in an Excel document, so business users can analyse them with filters or pivot tables. Alternatively, SQL Server Reporting Services (SSRS) might be used to generate a report of the SQL query for display on an intranet page. Step 3: If these queries are run frequently, or business users want to explore data from multiple sources more freely, it may become necessary to create a new database structured for analysis rather than CRUD (create, retrieve, update, and delete). For example, data from more than one system — plus external information — may be incorporated into a data warehouse. This can become ‘one source of truth’ for the business’s operational activities. The warehouse will probably have a simple ‘star’ schema, with fact tables representing the measures to be analysed (e.g. unit sales, revenue) and dimension tables defining how this data is aggregated (e.g. by time, region or product). Reports can be generated from the warehouse with Excel, SSRS or other tools. Step 4: Not too long ago, Microsoft introduced an Excel plug-in, PowerPivot, which allows users to bring larger volumes of data into Excel documents and create links between multiple tables.  These BISM Tabular documents can be created by the database owners or other expert Excel users and viewed by anyone with Excel PowerPivot. Sometimes, business users may use PowerPivot to create reports directly from the primary database, bypassing the need for a data warehouse. This can introduce problems when there are misunderstandings of the database structure or no single ‘source of truth’ for key data. Step 5: Steps three or four are often enough to satisfy business intelligence needs, especially if users are sophisticated enough to work with the warehouse in Excel or SSRS. However, sometimes the relationships between data are too complex or the queries which aggregate across periods, regions etc are too slow. In these cases, it can be necessary to formalise how the data is analysed and pre-build some of the aggregations. To do this, a business intelligence professional will typically use SQL Server Analysis Services (SSAS) to create a multidimensional model — or “cube” — that more simply represents key measures and aggregates them across specified dimensions. Step five is where our tool, SSAS Compare, becomes useful, as it helps review and deploy changes from development to production. For us at Red Gate, the primary value of SSAS Compare is to establish a dialog with BI users, so we can develop a portfolio of products that support creation and deployment across a range of report and model types. For example, PowerPivot and the new BISM Tabular model create a potential customer base for tools that extend beyond BI professionals. We’re interested in learning where people are in this story, so we’ve created a six-question survey to find out. Whether you’re at step one or step five, we’d love to know how you use BI so we can decide how to build tools that solve your problems. So if you have a sixty seconds to spare, tell us on the survey!

    Read the article

  • How to get decent MySQL driver perfomance in Ruby

    - by Zombies
    I notice that I am getting very poor performance for either or both inserts and queries. The queries themselves are basic and can execute with no delay directly from mysql. The ruby script that I wrote is only 1 thread, so only 1 connection is being used, and never closed unless the script is terminated. Pretty basic, I am just trying to insert a lot of rows. There is a look-up or two to get a surrogate key, or to check for duplicates, but the complexity is just O(n). Also, it isn't like there are millions of records, so again the queries themselves take no time to run. I am using: Ruby 1.9.1 Gem/driver:ruby-mysql 2.9.2 MySQL 5.1.37-1ubuntu5.1 ^ all 32 bit versions on a 32bit ubuntu distro I am getting about 1-2 inserts per second, pretty slow. I know a lot of people will suggest to change drivers, but that means I have some refactoring and resting to do. So I would really appreciate any help, but please if you do recomend that at least say why you do (eg: if you have used ruby-mysql x.x.x before and found another mysql driver to be better).ruby-mysql 2.9.2 What I would like to know: How can I improve performance with ruby-mysql 2.9.2 If and only if I cannot do this with ruby-mysql 2.9.2, what should I do?

    Read the article

  • Lazy loading of Blob properties of one class

    - by Khosro
    Hi, My class contains "summary" and "title" properties those are Blob and other properties. Code:(I write some part of class) public class News extends BaseEntity{ @Lob @Basic(fetch = FetchType.LAZY) public String getSummary() { return summary; } @Lob @Basic(fetch = FetchType.LAZY) public String getTitle() { return title; } @Temporal(TemporalType.TIMESTAMP) public Date getPublishDate() { return publishDate; } } I instrument this class to lazy load of Blob properties using this class "org.hibernate.tool.instrument.javassist.InstrumentTask". When i write this code to retrieve only summary of new , newsDAO.findByid(1L).getSummary(); Hibernate generates these queries: Hibernate: select news0_.id as id1_, news0_.entityVersion as entityVe2_1_, news0_.publishDate as publish15_1_, news0_.url as url1_ from News news0_ Hibernate: select news_.summary as summary1_, news_.title as title1_ from News news_ where news_.id=? I have two qurestions: 1.I only want to retrieve "summary" property not "title" property,but Hibernate queries show that it also retrieve "title" property,Why this happens(i want to lazy load of "property")? It seems when i load one of Blob property ,Hibernate loads all of them.Why?(This is my main question) 2.Why Hibernate generates two queries for retrieving only "summary" property of news? Khosro.

    Read the article

  • Alternatives to decompiling MS Access MDE files

    - by booyaa
    I've been tasked with finding a suitable tool to decompile MDE files. The MDEs were created by staff who have since left (familar story eh?) and we do not have access to the originally MDB files. The reason we need access to the original code is that the data source is changing (the backend as well as some of the table and queries) and we need a way to update queries. An example of a change, in a SELECT statement where is the WHERE clause looks for zero as a string ("0") rather than an integer. I'm aware that unless you use the services of people like EverythingAccess.com its unlikely you will ever get the source code back. My main query is to ask for alternative methods to decompiling code. An example of the kinds of methods I'm thinking about is to spy on the traffic between the app the the ODBC DSN using tcpdump. I might then be able to write code to translate the data source queries between the old and new systems. Ideally I'd prefer a solution that is application centric rather than one that analyses all network traffic. I should add one caveat, no doubt most of you are thinking the best solution is to rewrite the code, based on its perceived functionality. This is the option we're not considering (at the moment).

    Read the article

  • Help with fql.multiQuery

    - by Daniel Schaffer
    I'm playing around with the Facebook API's fql.multiQuery method. I'm just using the API Test Console, and trying to get a successful response but can't seem to figure out exactly what it wants. Here's the text I'm entering into the "queries" field: {"tags" : "select subject from photo_tag where subject != 601599551 and pid in ( select pid from photo_tag where subject = 601599551 ) and subject in ( select uid2 from friend where uid1 = 601599551 )", "foo" : "select uid from user where uid = 601599551"} All it'll give me is a queries parameter: array expected. error. I've also tried just about every permutation I could think of involving wrapping the name/query pairs in their own curly braces, adding brackets, adding whitespace, removing whitespace in case it didn't want an associative array (for those watching the edits, I just found out about these wonderful things now... oy), all to no avail. Is there something painfully obvious I'm missing here, or do I need to make like Chuck Norris Jon Skeet and simply will it to do my bidding? Update: A note to anyone finding this question now: The fql.multiquery test console appears to be broken. You can test your query by clicking on the generated url in the test console and manually adding the "queries" parameter into the querystring.

    Read the article

  • Several Small, Specific, MySQL Query Cache Questions

    - by Robbie
    I've look all over the web and in the questions asked here about MySQL caching and most of them seem very non-specific about a couple of questions that I have about performance and MySQL query caching. Specifically I want answers to these questions, assume for all questions that I have the query cache enabled and it is of type 2, or "DEMAND": Is the query cache per table, per database, or per server? Meaning if I have the cache size set to X and have T tables and D databases will I be caching TX, DX, or X amount of data? If I have table T1 which I regularly use the SQL_CACHE hint on for SELECT queries and table T2 which I never do, when I query T2 with a SELECT query will it check through the cache first before performing the query? *Note: I don't want to use the SQL_NO_CACHE for all T2 queries.* Assume the same situation as in question 2. If I alter (INSERT, DELETE) table T2 will any processing be done on the cache? For answers to 2 and 3, is this processing time negligible if T2 is constantly being altered and is the target of a majority of my SELECT queries?

    Read the article

  • dynamically horizontal scalable key value store

    - by Zubair
    Hi, Is there a key value store that will give me the following: Allow me to simply add and remove nodes and will redstribute the data automatically Allow me to remove nodes and still have 2 extra data nodes to provide redundancy Allow me to store text or images up to 1GB in size Can store small size data up to 100TB of data Fast (so will allow queries to be performed on top of it) Make all this transparent to the client Works on Ubuntu/FreeBSD or Mac Free or open source I basically want something I can use a "single", and not have to worry about having memcached, a db, and several storage components so yes, I do want a database "silver bullet" you could say. Thanks Zubair Answers so far: MogileFS on top of BackBlaze - As far as I can see this is just a filesystem, and after some research it only seems to be appropriate for large image files Tokyo Tyrant - Needs lightcloud. This doesn't auto scale as you add new nodes. I did look into this and it seems it is very fast for queries which fit onto a single node though Riak - This is one I am looking into myself, but I don't have any results yet Amazon S3 - Is anyone using this as their sole persistance layer in production? From what I have seen it seems to be used for storage of images as complex queries are too expensive @shaman suggested Cassandra - definitely one I am looking into So far it seems that there is no database or key value store that fulfills the criteria I mentioned, not even after offering a bounty of 100 points did the question get answered!

    Read the article

  • Should we have a database independent SQL like query language in Django? [closed]

    - by Yugal Jindle
    Note : I know we have Django ORM already that keeps things database independent and converts to the database specific SQL queries. Once things starts getting complicated it is preferred to write raw SQL queries for better efficiency. When you write raw sql queries your code gets trapped with the database you are using. I also understand its important to use the full power of your database that can-not be achieved with the django orm alone. My Question : Until I use any database specific feature, why should one be trapped with the database. For instance : We have a query with multiple joins and we decided to write a raw sql query. Now, that makes my website postgres specific. Even when I have not used any postgres specific feature. I feel there should be some fake sql language which can translate to any database's sql query. Even Django's ORM can be built over it. So, that if you go out of ORM but not database specific - you can still remain database independent. I asked the same question to Jacob Kaplan Moss (In person) : He advised me to stay with the database that I like and endure its whole power, to which I agree. But my point was not that we should be database independent. My point is we should be database independent until we use a database specific feature. Please explain, why should be there a fake sql layer over the actual sql ?

    Read the article

  • How best to use XPath with very large XML files in .NET?

    - by glenatron
    I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size. I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block. One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted. Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc. I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...

    Read the article

  • SQL Server insert performance

    - by Jose
    I have an insert query that gets generated like this INSERT INTO InvoiceDetail (LegacyId,InvoiceId,DetailTypeId,Fee,FeeTax,Investigatorid,SalespersonId,CreateDate,CreatedById,IsChargeBack,Expense,RepoAgentId,PayeeName,ExpensePaymentId,AdjustDetailId) VALUES(1,1,2,1500.0000,0.0000,163,1002,'11/30/2001 12:00:00 AM',1116,0,550.0000,850,NULL,@ExpensePay1,NULL); DECLARE @InvDetail1 INT; SET @InvDetail1 = (SELECT @@IDENTITY); This query is generated for only 110K rows. It takes 30 minutes for all of these query's to execute I checked the query plan and the largest % nodes are A Clustered Index Insert at 57% query cost which has a long xml that I don't want to post. A Table Spool which is 38% query cost <RelOp AvgRowSize="35" EstimateCPU="5.01038E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1" LogicalOp="Eager Spool" NodeId="80" Parallel="false" PhysicalOp="Table Spool" EstimatedTotalSubtreeCost="0.0466109"> <OutputList> <ColumnReference Database="[SkipPro]" Schema="[dbo]" Table="[InvoiceDetail]" Column="InvoiceId" /> <ColumnReference Database="[SkipPro]" Schema="[dbo]" Table="[InvoiceDetail]" Column="InvestigatorId" /> <ColumnReference Column="Expr1054" /> <ColumnReference Column="Expr1055" /> </OutputList> <Spool PrimaryNodeId="3" /> </RelOp> So my question is what is there that I can do to improve the speed of this thing? I already run ALTER TABLE TABLENAME NOCHECK CONSTRAINTS ALL Before the queries and then ALTER TABLE TABLENAME NOCHECK CONSTRAINTS ALL after the queries. And that didn't shave off hardly anything off of the time. Know I am running these queries in a .NET application that uses a SqlCommand object to send the query. I then tried to output the sql commands to a file and then execute it using sqlcmd, but I wasn't getting any updates on how it was doing, so I gave up on that. Any ideas or hints or help?

    Read the article

  • Can't return a List from a Compiled Query.

    - by Andrew
    I was speeding up my app by using compiled queries for queries which were getting hit over and over. I tried to implement it like this: Function Select(ByVal fk_id As Integer) As List(SomeEntity) Using db As New DataContext() db.ObjectTrackingEnabled = False Return CompiledSelect(db, fk_id) End Using End Function Shared CompiledSelect As Func(Of DataContext, Integer, List(Of SomeEntity)) = _ CompiledQuery.Compile(Function(db As DataContext, fk_id As Integer) _ (From u In db.SomeEntities _ Where u.SomeLinkedEntity.ID = fk_id _ Select u).ToList()) This did not work and I got this error message: Type : System.ArgumentNullException, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089 Message : Value cannot be null. Parameter name: value However, when I changed my compiled query to return IQueryable instead of List like so: Function Select(ByVal fk_id As Integer) As List(SomeEntity) Using db As New DataContext() db.ObjectTrackingEnabled = False Return CompiledSelect(db, fk_id).ToList() End Using End Function Shared CompiledSelect As Func(Of DataContext, Integer, IQueryable(Of SomeEntity)) = _ CompiledQuery.Compile(Function(db As DataContext, fk_id As Integer) _ From u In db.SomeEntities _ Where u.SomeLinkedEntity.ID = fk_id _ Select u) It worked fine. Can anyone shed any light as to why this is? BTW, compiled queries rock! They sped up my app by a factor of 2.

    Read the article

  • Setting Connection Parameters via ADO for MSSQL

    - by taspeotis
    Is it possible to set a connection parameter on a connection to SQL Server and have that variable persist throughout the life of the connection? The parameter must be usable by subsequent queries. We have some old Access reports that use a handful of VBScript functions in the SQL queries (let's call them GetStartDate and GetEndDate) that return global variables. Our application would set these before invoking the query and then the queries can return information between date ranges specified in our application. We are looking at changing to a ReportViewer control running in local mode, but I don't see any convenient way to use these custom functions in straight T-SQL. I have two concept solutions (not tested yet), but I would like to know if there is a better way. Below is some psuedo code. Set all variables before running Recordset.OpenForward Connection->Execute("SET @GetStartDate = ..."); Connection->Execute("SET @GetEndDate = ..."); // Repeat for all parameters Will these variables persist to later calls of Recordset->OpenForward? Can anything reset the variables aside from another SET/SELECT @variable statement? Create an ADOCommand "factory" that automatically adds parameters to each ADOCommand object I will use to execute SQL // Command has been previously been created ADOParameter *Parameter1 = Command->CreateParameter("GetStartDate"); ADOParameter *Parameter2 = Command->CreateParameter("GetEndDate"); // Set values and attach etc... What I would like to know if there is something like: Connection->SetParameter("GetStartDate", "20090101"); Connection->SetParameter("GetEndDate", 20100101"); And these will persist for the lifetime of the connection, and the SQL can do something like @GetStartDate to access them. This may be exactly solution #1, if the variables persist throughout the lifetime of the connection.

    Read the article

  • transactions and delete using fluent nhibernate

    - by Will I Am
    I am starting to play with (Fluent) nHibernate and I am wondering if someone can help with the following. I'm sure it's a total noob question. I want to do: delete from TABX where name = 'abc' where table TABX is defined as: ID int name varchar(32) ... I build the code based on internet samples: using (ITransaction transaction = session.BeginTransaction()) { IQuery query = session.CreateQuery("FROM TABX WHERE name = :uid") .SetString("uid", "abc"); session.Delete(query.List<Person>()[0]); transaction.Commit(); } but alas, it's generating two queries (one select and one delete). I want to do this in a single statement, as in my original SQL. What is the correct way of doing this? Also, I noticed that in most samples on the internet, people tend to always wrap all queries in transactions. Why is that? If I'm only running a single statement, that seems an overkill. Do people tend to just mindlessly cut and paste, or is there a reason beyond that? For example, in my query above, if I do manage it to get it from two queries down to one, i should be able to remove the begin/commit transaction, no? if it matters, I'm using PostgreSQL for experimenting.

    Read the article

  • Lucene (.NET) Document stucture and performance suggestions.

    - by Josh Handel
    Hello, I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here. My problem is that the query performance degrades quickly when I start adding OR criteria to my query.. All my queries are on specific numaric terms.. So a document looks like StringField:[someString] and N DataField:[someNumber].. I then query it with something like DataField:((+1 +(2 3)) (+75 +(3 5 52)) (+99 +88 +(102 155 199))). Currently these queries take about 7 to 16 seconds to run on my laptop.. I would like to make sure thats really the best they can do.. I am open to suggestions on field structure and query structure :-). Thanks Josh PS: I have already read over all the other lucene performance discussions on here, and on the Lucene wiki and at lucid imiagination... I'm a bit further down the rabbit hole then that...

    Read the article

  • How to query JDO persistent objects in unowned relationship model?

    - by Paul B
    Hello, I'm trying to migrate my app from PHP and RDBMS (MySQL) to Google App Engine and have a hard time figuring out data model and relationships in JDO. In my current app I use a lot of JOIN queries like: SELECT users.name, comments.comment FROM users, comments WHERE users.user_id = comments.user_id AND users.email = '[email protected]' As I understand, JOIN queries are not supported in this way so the only(?) way to store data is using unowned relationships and "foreign" keys. There is a documentation regarding that, but no useful examples. So far I have something like this: @PersistenceCapable public class Users {     @PrimaryKey     @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)     private Key key;     @Persistent     private String name;         @Persistent     private String email;         @Persistent     private Set<Key> commentKeys;     // Accessors... } @PersistenceCapable public class Comments {     @PrimaryKey     @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)     private Key key;     @Persistent     private String comment;         @Persistent     private Date commentDate;     @Persistent     private Key userKey;     // Accessors... } So, how do I get a list with commenter's name, comment and date in one query? I see how I probably could get away with 3 queries but that seems wrong and would create unnecessary overhead. Please, help me out with some code examples. -- Paul.

    Read the article

  • MS Access 2003 - Is there a way to programmatically define the data for a chart?

    - by Justin
    So I have some VBA for taking charts built with the Form's Chart Wizard, and automatically inserting it into PowerPoint Presentation slides. I use those chart-forms as sub forms within a larger forms that has parameters the user can select to determine what is on the chart. The idea is that the user can determine the parameter, build the chart to his/her liking, and click a button and have it in a ppt slide with the company's background template, blah blah blah..... So it works, though it is very bulky in terms of the amount of objects I have to use to accomplish this. I use expressions such as the following: like forms!frmMain.Month&* to get the input values into the saved queries, which was fine when i first started, but it went over so well and they want so many options, that it is driving the number of saved queries/objects up. I need several saved forms with charts because of the number of different types of charts I need to have this be able to handle. SO FINALLY TO MY QUESTION: I would much rather do all this on the fly with some VBA. I know how to insert list boxes, and text boxes on a form, and I know how to use SQL in VBA to get the values I want from tables/queries using VBA, I just don't know if there is some vba I can use to set the data values of the charts from a resulting recordset: DIM rs AS DAO.Rescordset DIM db AS DAO.Database DIM sql AS String sql = "SELECT TOP 5 Count(tblMain.TransactionID) AS Total, tblMain.Location FROM tblMain WHERE (((tblMain.Month) = """ & me.txtMonth & """ )) ORDER BY Count (tblMain.TransactionID) DESC;" set db = currentDB set rs = db.OpenRecordSet(sql) rs.movefirst some kind of cool code in here to make this recordset the data of chart in frmChart ("Chart01") thanks for your help. apologies for the length of the explanation.

    Read the article

  • Mysql InnoDB performance optimization and indexing

    - by Davide C
    Hello everybody, I have 2 databases and I need to link information between two big tables (more than 3M entries each, continuously growing). The 1st database has a table 'pages' that stores various information about web pages, and includes the URL of each one. The column 'URL' is a varchar(512) and has no index. The 2nd database has a table 'urlHops' defined as: CREATE TABLE urlHops ( dest varchar(512) NOT NULL, src varchar(512) DEFAULT NULL, timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, KEY dest_key (dest), KEY src_key (src) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Now, I need basically to issue (efficiently) queries like this: select p.id,p.URL from db1.pages p, db2.urlHops u where u.src=p.URL and u.dest=? At first, I thought to add an index on pages(URL). But it's a very long column, and I already issue a lot of INSERTs and UPDATEs on the same table (way more than the number of SELECTs I would do using this index). Other possible solutions I thought are: -adding a column to pages, storing the md5 hash of the URL and indexing it; this way I could do queries using the md5 of the URL, with the advantage of an index on a smaller column. -adding another table that contains only page id and page URL, indexing both columns. But this is maybe a waste of space, having only the advantage of not slowing down the inserts and updates I execute on 'pages'. I don't want to slow down the inserts and updates, but at the same time I would be able to do the queries on the URL efficiently. Any advice? My primary concern is performance; if needed, wasting some disk space is not a problem. Thank you, regards Davide

    Read the article

  • Setting Connection Parameters via ADO for SQL Server

    - by taspeotis
    Is it possible to set a connection parameter on a connection to SQL Server and have that variable persist throughout the life of the connection? The parameter must be usable by subsequent queries. We have some old Access reports that use a handful of VBScript functions in the SQL queries (let's call them GetStartDate and GetEndDate) that return global variables. Our application would set these before invoking the query and then the queries can return information between date ranges specified in our application. We are looking at changing to a ReportViewer control running in local mode, but I don't see any convenient way to use these custom functions in straight T-SQL. I have two concept solutions (not tested yet), but I would like to know if there is a better way. Below is some pseudo code. Set all variables before running Recordset.OpenForward Connection->Execute("SET @GetStartDate = ..."); Connection->Execute("SET @GetEndDate = ..."); // Repeat for all parameters Will these variables persist to later calls of Recordset->OpenForward? Can anything reset the variables aside from another SET/SELECT @variable statement? Create an ADOCommand "factory" that automatically adds parameters to each ADOCommand object I will use to execute SQL // Command has been previously been created ADOParameter *Parameter1 = Command->CreateParameter("GetStartDate"); ADOParameter *Parameter2 = Command->CreateParameter("GetEndDate"); // Set values and attach etc... What I would like to know if there is something like: Connection->SetParameter("GetStartDate", "20090101"); Connection->SetParameter("GetEndDate", 20100101"); And these will persist for the lifetime of the connection, and the SQL can do something like @GetStartDate to access them. This may be exactly solution #1, if the variables persist throughout the lifetime of the connection.

    Read the article

< Previous Page | 84 85 86 87 88 89 90 91 92 93 94 95  | Next Page >