Search Results

Search found 356 results on 15 pages for 'datasets'.

Page 13/15 | < Previous Page | 9 10 11 12 13 14 15  | Next Page >

  • Best way to create a SPARQL endpoint for a RDBMS (MySQL database)

    - by Ankur
    I am doing (want to do) some experiments with Linked Open Datasets particularly those put out by governments. I have a RDBMS (more specifically MySQL). I designed it with semantic web ideas in mind i.e. I have a information stored as objects, predicates and classes which define objects. In turn all objects are related to each other though statements of the form subject -- predicate -- object (where the subjects are from the objects table). I want to be able to query other RDF triple stores from my application and let other triple stores query my data. Is it possible to "set something up" so that this is possible? I have looked at Jena. Using Jena seems to mean I have to it as a storage application rather than MySQL - the only problem with this is that I include a new concept called a category (which I don't think is part of the semantic web languages). I will use categories to help with displaying information (they don't have any other meaning) but using Jena seems to mean that I can't organise predicates under categories for more convenient viewing. I am using Java so a JAVA API is preferred. It's also possible I misunderstood the purpose of Jena, and maybe that can be of use, but I am not sure how. I am sure four days from now this question will seem rather silly, but at the moment I am somewhat confused about how to proceed.

    Read the article

  • Architecture for database analytics

    - by David Cournapeau
    Hi, We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more. The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons: as typical with those kinds of scripts, they fall into the write-once and never-touched-again category tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM). this is slow and non-"agile" Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

    Read the article

  • Testing When Correctness is Poorly Defined?

    - by dsimcha
    I generally try to use unit tests for any code that has easily defined correct behavior given some reasonably small, well-defined set of inputs. This works quite well for catching bugs, and I do it all the time in my personal library of generic functions. However, a lot of the code I write is data mining code that basically looks for significant patterns in large datasets. Correct behavior in this case is often not well defined and depends on a lot of different inputs in ways that are not easy for a human to predict (i.e. the math can't reasonably be done by hand, which is why I'm using a computer to solve the problem in the first place). These inputs can be very complex, to the point where coming up with a reasonable test case is near impossible. Identifying the edge cases that are worth testing is extremely difficult. Sometimes the algorithm isn't even deterministic. Usually, I do the best I can by using asserts for sanity checks and creating a small toy test case with a known pattern and informally seeing if the answer at least "looks reasonable", without it necessarily being objectively correct. Is there any better way to test these kinds of cases?

    Read the article

  • In SQL can I return a tables with a varying number of columns

    - by Matt
    I have a somewhat more complicated scenario, but I think it should be possible. I have a large SPROC whose result is a set of characteristics for a set of persons. So the Table would look something like this: Property | Client1 Client 2 Client3 ----------------------------------------------------------- Sex | M F M Age | 67 56 67 Income | Low Mid Low It's built using cursors, iterating over different datasets. The problem I am facing is that there is a varying number of Clients and Properties, so an equally valid result over different input sets might be: Property | Client1 Client 2 ------------------------------------------- Sex | M F Age | 67 56 Weight | 122 122 The different number of properties is easy, those are just extra rows. My problem is that I need to declare a temporary table with a varying number of columns. There could be 2 clients or 100. Every client in guaranteed to have every property ultimately listed. What SQL structure would statisfy this and how can I declare it and insert things into it? I can't just flip the columns and rows either because there is a variable number of each.

    Read the article

  • When should I implement globalization and localization in C#?

    - by Geo Ego
    I am cleaning up some code in a C# app that I wrote and really trying to focus on best practices and coding style. As such, I am running my assembly through FXCop and trying to research each message it gives me to decide what should and shouldn't be changed. What I am currently focusing on are locale settings. For instance, the two errors that I have currently are that I should be specifying the IFormatProvider parameter for Convert.ToString(int), and setting the Dataset and Datatable locale. This is something that I've never done, and never put much thought into. I've always just left that overload out. The current app that I am working on is an internal app for a small company that will very likely never need to run in another country. As such, it is my opinion that I do not need to set these at all. On the other hand, doing so would not be such a big deal, but it seems like it is unneccessary and could hinder readability to a degree. I understand that Microsoft's contention is to use it if it's there, period. Well, I'm technically supposed to call Dispose() on every object that implements IDisposable, but I don't bother doing that with Datasets and Datatables, so I wonder what the practice is "in the wild."

    Read the article

  • C# Type conversion between two similar Datatable objects

    - by Ali
    I have .NET project with sync framework and two separate Datasets for MS SQL and Compact SQL. in my base class I have a generic DataTable object. in my derived classed I assign Typed DataTable to the generic object based on whether the application is operating online or offline: example: if (online) _dataTable = new MSSQLDataSet.Customer; else _dataTable = new CompactSQLDataSet.Customer; Now every where in my code i have to check and do a cast based on the current network mode like this: public void changeCustomerID(int ID) { if (online) (MSSQLDataSet.CustomerDataTable)_dataTable)[i].CustomerID = value; else (CompactMSSQLDataSet.CustomerDataTable)_dataTable)[i].CustomerID = value; } but I don't think this is very efficient and I believe it can be done in a smarter way to only use one line of code by dynamically getting the Type of _dataTable on the run time. my problem is at the design time, in order to acess datatable porperties such as "CustomerID" it has to be casted to either MSSQLDataSet.CustomerDataTable or CompactMSSQLDataSet.CustomerDataTable. Is there a way to have a function or a operator to convert the _datatable to its runtime type but still be able to use it's design time properties which are the same between the two types? something like: ((aType)_dataTable)[i].CustomerID = value; //or GetRuntimeType(_dataTable)[i].CustomerID = value;

    Read the article

  • Test the sequentiality of a column with a single SQL query

    - by LauriE
    Hey, I have a table that contains sets of sequential datasets, like that: ID set_ID some_column n 1 'set-1' 'aaaaaaaaaa' 1 2 'set-1' 'bbbbbbbbbb' 2 3 'set-1' 'cccccccccc' 3 4 'set-2' 'dddddddddd' 1 5 'set-2' 'eeeeeeeeee' 2 6 'set-3' 'ffffffffff' 2 7 'set-3' 'gggggggggg' 1 At the end of a transaction that makes several types of modifications to those rows, I would like to ensure that within a single set, all the values of "n" are still sequential (rollback otherwise). They do not need to be in the same order according to the PK, just sequential, like 1-2-3 or 3-1-2, but not like 1-3-4. Due to the fact that there might be thousands of rows within a single set I would prefer to do it in the db to avoid the overhead of fetching the data just for verification after making some small changes. Also there is the issue of concurrency. The way locking in InnoDB (repeatable read) works (as I understand) is that if I have an index on "n" then InnoDB also locks the "gaps" between values. If I combine set_ID and n to a single index, would that eliminate the problem of phantom rows appearing? Looks to me like a common problem. Any brilliant ideas? Thanks! Note: using MySQL + InnoDB

    Read the article

  • How do you make use of the provider independency in the Entity Framework?

    - by Anders Svensson
    I'm trying to learn more about data access and the Entity Framework. My goal is to have a "provider independent" data access layer (to be able to switch easily e.g. from SQL Server to MySQL or vice versa), and since the EF is supposed to be provider independent it seems like a good way to go. But how do you use this provider independence? I mean, I was expecting to be able to sort of "program to an interface", and then just be able to switch database provider. But as far as I can tell I'm only getting a concrete type to program against, an "Entities" class, e.g. in my case it is: UserDBEntities _context = new UserDBEntities(); What I would have expected to be able to switch provider easily was to have an interface e.g. like IEntities _context = new UserDBEntities(); Sort of like I can do with datasets... But maybe that isn't how it works at all with EF? Or do you just switch provider in the connectionstring, and the model stays the same?? Please remember that I'm a complete newbie at this EF, and rather inexperienced with databases in general, so I would really appreciate if you could be as clear as possible :-)

    Read the article

  • query sql database for specific value in vb.net

    - by user2952298
    I am trying to convert VBA code to vb.net, im having trouble trying to search the database for a specific value around an if statement. any suggestions would be greatly appriciated. thedatabase is called confirmation, type is the column and email is the value im looking for. could datasets work? Function SendEmails() As Boolean Dim objOutlook As Outlook.Application Dim objOutlookMsg As Outlook.MailItem Dim objOutlookRecip As Outlook.Recipient Dim objOutlookAttach As Outlook.Attachment Dim intResponse As Integer Dim confirmation As New ADODB.Recordset Dim details As New ADODB.Recordset On Error GoTo Err_Execute Dim MyConnObj As New ADODB.Connection Dim cn As New ADODB.Connection() MyConnObj.Open( _ "Provider = sqloledb;" & _ "Server=myserver;" & _ "Database=Email_Text;" & _ "User Id=bla;" & _ "Password=bla;") confirmation.Open("Confirmation_list", MyConnObj) details.Open("MessagesToSend", MyConnObj) If details.EOF = False Then confirmation.MoveFirst() Do While Not confirmation.EOF If confirmation![Type] = "Email" Then ' Create the Outlook session. objOutlook = CreateObject("Outlook.Application") ' Create the message. End IF

    Read the article

  • In R, how to get powers of ten in bold font in a plot label?

    - by wfoolhill
    I want to have "10^4 points" in bold as my x-axis label. I know how to make a simple label in bold: plot(1:10, xlab="") mtext(text="10 points", side=1, font=2, line=3) Thanks to this answer, I know how to make a label with a power of ten but nothing is in bold: mtext(text=expression(paste(10^4, " points")), side=1, font=2, line=3) Thanks to this answer, I also know how to make a label with a Greek letter in bold: mtext(text=expression(bold(paste(beta, "=", 10^1, " points"))), side=1, line=3) But still the power of ten is not in bold! It doesn't work either with bquote: mtext(text=bquote(bold(10^1~points)), side=1, line=3) Any idea? Here are some details about my system. Let me know if you need anything else. > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base

    Read the article

  • What techniques do you use for emitting data from the server that will solely be used in client side scripts?

    - by chuck
    Hi all, I never found an optimal solution for this problem so I am hoping that some of you out there have a few solutions. Let's say I need to render out a list of checkboxes and each checkbox has a set of additional data that goes with it. This data will be used purely in the context of javascript and jquery. My usual strategy is to render this data in hidden fields that are grouped in the same container as the checkbox. My rendered HTML will look something like this: <div> <input type="checkbox" /> <input type="hidden" class="genreId" /> <input type="hidden" class="titleId" /> </div> My only problem with this is that the data in the hidden fields get posted to the server when the form is submitted. For small amounts of data, this is fine. However, I frequently work with large datasets and a large amount of data is needlessly transferred. UPDATE: Before submitting this post, I just saw that I can add a "DISABLED" attribute to my input element to suppress the submission of data. Is this pretty much the best approach that I can take? Thanks

    Read the article

  • Deserialization of a DataSet... deal with column name changes? how to migrate data from one column to another?

    - by Brian Kennedy
    So, we wanted to slightly generalize a couple columns in our typed dataset... basically dropped a foreign key constraint and then wanted to change a couple column names to better reflect their new state. All that is easy. The problem is that our users may have serialized out the old version of the DataSet as XML. We want to be able to read those old XML files and deserialize them into the revised DataSet. It seems that would be a fairly common desire... but I haven't yet figured out the right thing to search the internet for. One possible solution would seem to be some way to give a DataColumn an alias or alternate name such that when it reads the old column name, it knows that data can be read into the column with the new column name. I can find no support for any such thing. Another approach would seem to be an "after deserialization" method of some sort... so, I would let it read in the old column values into a normal DataColumn with that name, and then in the "after deserialization" method I would just move the data from the obsolete column into the new column, and then delete the old columns. That would seem to generalize to many other situations... and having such events or hooks is pretty common in ADO.NET. But I have looked for such a hook and haven't yet found it. If no "after deserialization" hook, it would seem I ought to be able to override ReadXml or ReadXmlSerializable methods to call the base and then do my "after" stuff to fix up old data into new. But it does not appear that is possible. Soooo, I have to think backward compatibility with old serialized DataSets and simple data migration would be a well-solved problem... so, trying to reinvent that wheel seems silly. But so far, I haven't seemed to find any documentation on doing those things. Suggestions? What is best practice for this?

    Read the article

  • Speed up csv export when using php from mysql database query

    - by John
    Ok, so i've got a web system (built on codeigniter & running on mysql) that allows people to query a database of postal address data by making selections in a series of forms until they arrive at the selection that want, pretty standard stuff. They can then buy that information and download it via that system. The queries run very fast, but when it comes to applying that query to the database,and exporting it to csv, once the datasets get to around the 30,000 record mark (each row has around 40 columns of which about 20 are all populated with on average 20 chars of data per cell) it can take 5 or so minutes to export to csv. So, my question is, what is the main cause for the slowness? Is it that the resultset of data from the query is so large, that it is running into memory issues? Therefore should i allow much more memory to the process? Or, is there a much more efficient way of exporting to csv from a mysql query that i'm not doing? Should i save the contents of the query to a temp table and simply export the temp table to csv? Or am i going about this all wrong? Also, is the fact that i'm using Codeigniters Active Record for this prohibitive due to the way that it stores the resultset? Any advice is welcome! Thank you for reading!

    Read the article

  • Tigther code - javascript object array

    - by Scott Silvi
    Inside the callback of a $.getJSON call, I have the code outlined below. The first for block aggregates 'total' & assigns values to sov[i]. The map function calculates the percentage of total. I then instantiate a variable called sovData. With the jQuery Flot graph, any objects that are empty aren't added to the pie chart, so this works for up to 7 different slices/datasets. What I'd like to do is only initialize the ones I need (e.g. sovData would have up to 'howMany - 1' (kws.length -1 ) objects inside of it, likely via something similar to dashboards[i] & sov[i]. How would I do this? Code: var sov = [], howMany = kws.length, total = 0, i = 0; for ( i; i < howMany; i++) { total += sov[ i ] = +parseInt(data.sov['sov' + ( i+1 ) ],10) || 0; } var dashboards = data.dashboards; sov = $.map( sov, function(v) { var s = Math.round( ( (v / total) * 10e3 ) / 100); return s < 1 ? 1 : s; }); var sovData = [{ label : dashboards[0], data : sov[0] }, { label : dashboards[1], data : sov[1] }, { label : dashboards[2], data : sov[2] }, { label : dashboards[3], data : sov[3] }, { label : dashboards[4], data : sov[4] }, { label : dashboards[5], data : sov[5] }, { label : dashboards[6], data : sov[6] } ]

    Read the article

  • [PHP] Does unsetting array values during iterating save on memory?

    - by saturn_rising
    Hello fellow code warriors, This is a simple programming question, coming from my lack of knowledge of how PHP handles array copying and unsetting during a foreach loop. It's like this, I have an array that comes to me from an outside source formatted in a way I want to change. A simple example would be: $myData = array('Key1' => array('value1', 'value2')); But what I want would be something like: $myData = array([0] => array('MyKey' => array('Key1' => array('value1', 'value2')))); So I take the first $myData and format it like the second $myData. I'm totally fine with my formatting algorithm. My question lies in finding a way to conserve memory since these arrays might get a little unwieldy. So, during my foreach loop I copy the current array value(s) into the new format, then I unset the value I'm working with from the original array. E.g.: $formattedData = array(); foreach ($myData as $key => $val) { // do some formatting here, copy to $reformattedVal $formattedData[] = $reformattedVal; unset($myData[$key]); } Is the call to unset() a good idea here? I.e., does it conserve memory since I have copied the data and no longer need the original value? Or, does PHP automatically garbage collect the data since I don't reference it in any subsequent code? The code runs fine, and so far my datasets have been too negligible in size to test for performance differences. I just don't know if I'm setting myself up for some weird bugs or CPU hits later on. Thanks for any insights. -sR

    Read the article

  • Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story What is Cloud? Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service.  Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post. There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud Public Cloud Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services. Private Cloud Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally. Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia:   Public Cloud Private Cloud Initial cost Typically zero Typically high Running cost Unpredictable Unpredictable Customization Impossible Possible Privacy No (Host has access to the data Yes Single sign-on Impossible Possible Scaling up Easy while within defined limits Laborious but no limits Hybrid Cloud Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together. Cloud and Big Data – Common Characteristics There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework. Here is the list of all the characteristics of cloud computing important in Big Data Scalability Elasticity Ad-hoc Resource Pooling Low Cost to Setup Infastructure Pay on Use or Pay as you Go Highly Available Leading Big Data Cloud Providers There are many players in Big Data Cloud but we will list a few of the known players in this list. Amazon Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence  they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling. Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services: Amazon Elastic MapReduce – It processes very high volumes of data Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster Amazon RedShift – It is petabyte scale data warehousing service Google Though Google is known for Search Engine, we all know that it is much more than that. Google Compute Engine – It offers secure, flexible computing from energy efficient data centers Google Big Query – It allows SQL-like queries to run against large datasets Google Prediction API – It is a cloud based machine learning tool Other Players Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware. Thing to Watch The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch: Data Integrity Initial Cost Recurring Cost Performance Data Access Security Location Compliance Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud. Tomorrow In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Integrate Microsoft Translator into your ASP.Net application

    - by sreejukg
    In this article I am going to explain how easily you can integrate the Microsoft translator API to your ASP.Net application. Why we need a translation API? Once you published a website, you are opening a channel to the global audience. So making the web content available only in one language doesn’t cover all your audience. Especially when you are offering products/services it is important to provide contents in multiple languages. Users will be more comfortable when they see the content in their native language. How to achieve this, hiring translators and translate the content to all your user’s languages will cost you lot of money, and it is not a one time job, you need to translate the contents on the go. What is the alternative, we need to look for machine translation. Thankfully there are some translator engines available that gives you API level access, so that automatically you can translate the content and display to the user. Microsoft Translator API is an excellent set of web service APIs that allows developers to use the machine translation technology in their own applications. The Microsoft Translator API is offered through Windows Azure market place. In order to access the data services published in Windows Azure market place, you need to have an account. The registration process is simple, and it is common for all the services offered through the market place. Last year I had written an article about Bing Search API, where I covered the registration process. You can refer the article here. http://weblogs.asp.net/sreejukg/archive/2012/07/04/integrate-bing-search-api-to-asp-net-application.aspx Once you registered with Windows market place, you will get your APP ID. Now you can visit the Microsoft Translator page and click on the sign up button. http://datamarket.azure.com/dataset/bing/microsofttranslator As you can see, there are several options available for you to subscribe. There is a free version available, great. Click on the sign up button under the package that suits you. Clicking on the sign up button will bring the sign up form, where you need to agree on the terms and conditions and go ahead. You need to have a windows live account in order to sign up for any service available in Windows Azure market place. Once you signed up successfully, you will receive the thank you page. You can download the C# class library from here so that the integration can be made without writing much code. The C# file name is TranslatorContainer.cs. At any point of time, you can visit https://datamarket.azure.com/account/datasets to see the applications you are subscribed to. Click on the Use link next to each service will give you the details of the application. You need to not the primary account key and URL of the service to use in your application. Now let us start our ASP.Net project. I have created an empty ASP.Net web application using Visual Studio 2010 and named it Translator Sample, any name could work. By default, the web application in solution explorer looks as follows. Now right click the project and select Add -> Existing Item and then browse to the TranslatorContainer.cs. Now let us create a page where user enter some data and perform the translation. I have added a new web form to the project with name Translate.aspx. I have placed one textbox control for user to type the text to translate, the dropdown list to select the target language, a label to display the translated text and a button to perform the translation. For the dropdown list I have selected some languages supported by Microsoft translator. You can get all the supported languages with their codes from the below link. http://msdn.microsoft.com/en-us/library/hh456380.aspx The form looks as below in the design surface of Visual Studio. All the class libraries in the windows market place requires reference to System.Data.Services.Client, let us add the reference. You can find the documentation of how to use the downloaded class library from the below link. http://msdn.microsoft.com/en-us/library/gg312154.aspx Let us evaluate the translatorContainer.cs file. You can refer the code and it is self-explanatory. Note the namespace name used (Microsoft), you need to add the namespace reference to your page. I have added the following event for the translate button. The code is self-explanatory. You are creating an object of TranslatorContainer class by passing the translation service URL. Now you need to set credentials for your Translator container object, which will be your account key. The TranslatorContainer support a method that accept a text input, source language and destination language and returns DataServiceQuery<Translation>. Let us see this working, I just ran the application and entered Good Morning in the Textbox. Selected target language and see the output as follows. It is easy to build great translator applications using Microsoft translator API, and there is a reasonable amount of translation you can perform in your application for free. For enterprises, you can subscribe to the appropriate package and make your application multi-lingual.

    Read the article

  • Joins in single-table queries

    - by Rob Farley
    Tables are only metadata. They don’t store data. I’ve written something about this before, but I want to take a viewpoint of this idea around the topic of joins, especially since it’s the topic for T-SQL Tuesday this month. Hosted this time by Sebastian Meine (@sqlity), who has a whole series on joins this month. Good for him – it’s a great topic. In that last post I discussed the fact that we write queries against tables, but that the engine turns it into a plan against indexes. My point wasn’t simply that a table is actually just a Clustered Index (or heap, which I consider just a special type of index), but that data access always happens against indexes – never tables – and we should be thinking about the indexes (specifically the non-clustered ones) when we write our queries. I described the scenario of looking up phone numbers, and how it never really occurs to us that there is a master list of phone numbers, because we think in terms of the useful non-clustered indexes that the phone companies provide us, but anyway – that’s not the point of this post. So a table is metadata. It stores information about the names of columns and their data types. Nullability, default values, constraints, triggers – these are all things that define the table, but the data isn’t stored in the table. The data that a table describes is stored in a heap or clustered index, but it goes further than this. All the useful data is going to live in non-clustered indexes. Remember this. It’s important. Stop thinking about tables, and start thinking about indexes. So let’s think about tables as indexes. This applies even in a world created by someone else, who doesn’t have the best indexes in mind for you. I’m sure you don’t need me to explain Covering Index bit – the fact that if you don’t have sufficient columns “included” in your index, your query plan will either have to do a Lookup, or else it’ll give up using your index and use one that does have everything it needs (even if that means scanning it). If you haven’t seen that before, drop me a line and I’ll run through it with you. Or go and read a post I did a long while ago about the maths involved in that decision. So – what I’m going to tell you is that a Lookup is a join. When I run SELECT CustomerID FROM Sales.SalesOrderHeader WHERE SalesPersonID = 285; against the AdventureWorks2012 get the following plan: I’m sure you can see the join. Don’t look in the query, it’s not there. But you should be able to see the join in the plan. It’s an Inner Join, implemented by a Nested Loop. It’s pulling data in from the Index Seek, and joining that to the results of a Key Lookup. It clearly is – the QO wouldn’t call it that if it wasn’t really one. It behaves exactly like any other Nested Loop (Inner Join) operator, pulling rows from one side and putting a request in from the other. You wouldn’t have a problem accepting it as a join if the query were slightly different, such as SELECT sod.OrderQty FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail as sod on sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesPersonID = 285; Amazingly similar, of course. This one is an explicit join, the first example was just as much a join, even thought you didn’t actually ask for one. You need to consider this when you’re thinking about your queries. But it gets more interesting. Consider this query: SELECT SalesOrderID FROM Sales.SalesOrderHeader WHERE SalesPersonID = 276 AND CustomerID = 29522; It doesn’t look like there’s a join here either, but look at the plan. That’s not some Lookup in action – that’s a proper Merge Join. The Query Optimizer has worked out that it can get the data it needs by looking in two separate indexes and then doing a Merge Join on the data that it gets. Both indexes used are ordered by the column that’s indexed (one on SalesPersonID, one on CustomerID), and then by the CIX key SalesOrderID. Just like when you seek in the phone book to Farley, the Farleys you have are ordered by FirstName, these seek operations return the data ordered by the next field. This order is SalesOrderID, even though you didn’t explicitly put that column in the index definition. The result is two datasets that are ordered by SalesOrderID, making them very mergeable. Another example is the simple query SELECT CustomerID FROM Sales.SalesOrderHeader WHERE SalesPersonID = 276; This one prefers a Hash Match to a standard lookup even! This isn’t just ordinary index intersection, this is something else again! Just like before, we could imagine it better with two whole tables, but we shouldn’t try to distinguish between joining two tables and joining two indexes. The Query Optimizer can see (using basic maths) that it’s worth doing these particular operations using these two less-than-ideal indexes (because of course, the best indexese would be on both columns – a composite such as (SalesPersonID, CustomerID – and it would have the SalesOrderID column as part of it as the CIX key still). You need to think like this too. Not in terms of excusing single-column indexes like the ones in AdventureWorks2012, but in terms of having a picture about how you’d like your queries to run. If you start to think about what data you need, where it’s coming from, and how it’s going to be used, then you will almost certainly write better queries. …and yes, this would include when you’re dealing with regular joins across multiples, not just against joins within single table queries.

    Read the article

  • Quadratic Programming with Oracle R Enterprise

    - by Jeff Taylor-Oracle
         I wanted to use quadprog with ORE on a server running Oracle Solaris 11.2 on a Oracle SPARC T-4 server For background, see: Oracle SPARC T4-2 http://docs.oracle.com/cd/E23075_01/ Oracle Solaris 11.2 http://www.oracle.com/technetwork/server-storage/solaris11/overview/index.html quadprog: Functions to solve Quadratic Programming Problems http://cran.r-project.org/web/packages/quadprog/index.html Oracle R Enterprise 1.4 ("ORE") 1.4 http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/ore-downloads-1502823.html Problem: path to Solaris Studio doesn't match my installation: $ ORE CMD INSTALL quadprog_1.5-5.tar.gz * installing to library \u2018/u01/app/oracle/product/12.1.0/dbhome_1/R/library\u2019 * installing *source* package \u2018quadprog\u2019 ... ** package \u2018quadprog\u2019 successfully unpacked and MD5 sums checked ** libs /opt/SunProd/studio12u3/solarisstudio12.3/bin/f95 -m64   -PIC  -g  -c aind.f -o aind.o bash: /opt/SunProd/studio12u3/solarisstudio12.3/bin/f95: No such file or directory *** Error code 1 make: Fatal error: Command failed for target `aind.o' ERROR: compilation failed for package \u2018quadprog\u2019 * removing \u2018/u01/app/oracle/product/12.1.0/dbhome_1/R/library/quadprog\u2019 $ ls -l /opt/solarisstudio12.3/bin/f95 lrwxrwxrwx   1 root     root          15 Aug 19 17:36 /opt/solarisstudio12.3/bin/f95 -> ../prod/bin/f90 Solution: a symbolic link: $ sudo mkdir -p /opt/SunProd/studio12u3 $ sudo ln -s /opt/solarisstudio12.3 /opt/SunProd/studio12u3/ Now, it is all good: $ ORE CMD INSTALL quadprog_1.5-5.tar.gz * installing to library \u2018/u01/app/oracle/product/12.1.0/dbhome_1/R/library\u2019 * installing *source* package \u2018quadprog\u2019 ... ** package \u2018quadprog\u2019 successfully unpacked and MD5 sums checked ** libs /opt/SunProd/studio12u3/solarisstudio12.3/bin/f95 -m64   -PIC  -g  -c aind.f -o aind.o /opt/SunProd/studio12u3/solarisstudio12.3/bin/ cc -xc99 -m64 -I/usr/lib/64/R/include -DNDEBUG -KPIC  -xlibmieee  -c init.c -o init.o /opt/SunProd/studio12u3/solarisstudio12.3/bin/f95 -m64  -PIC -g  -c -o solve.QP.compact.o solve.QP.compact.f /opt/SunProd/studio12u3/solarisstudio12.3/bin/f95 -m64  -PIC -g  -c -o solve.QP.o solve.QP.f /opt/SunProd/studio12u3/solarisstudio12.3/bin/f95 -m64   -PIC  -g  -c util.f -o util.o /opt/SunProd/studio12u3/solarisstudio12.3/bin/ cc -xc99 -m64 -G -o quadprog.so aind.o init.o solve.QP.compact.o solve.QP.o util.o -xlic_lib=sunperf -lsunmath -lifai -lsunimath -lfai -lfai2 -lfsumai -lfprodai -lfminlai -lfmaxlai -lfminvai -lfmaxvai -lfui -lfsu -lsunmath -lmtsk -lm -lifai -lsunimath -lfai -lfai2 -lfsumai -lfprodai -lfminlai -lfmaxlai -lfminvai -lfmaxvai -lfui -lfsu -lsunmath -lmtsk -lm -L/usr/lib/64/R/lib -lR installing to /u01/app/oracle/product/12.1.0/dbhome_1/R/library/quadprog/libs ** R ** preparing package for lazy loading ** help *** installing help indices   converting help for package \u2018quadprog\u2019     finding HTML links ... done     solve.QP                                html      solve.QP.compact                        html  ** building package indices ** testing if installed package can be loaded * DONE (quadprog) ====== Here is an example from http://cran.r-project.org/web/packages/quadprog/quadprog.pdf > require(quadprog) > Dmat <- matrix(0,3,3) > diag(Dmat) <- 1 > dvec <- c(0,5,0) > Amat <- matrix(c(-4,-3,0,2,1,0,0,-2,1),3,3) > bvec <- c(-8,2,0) > solve.QP(Dmat,dvec,Amat,bvec=bvec) $solution [1] 0.4761905 1.0476190 2.0952381 $value [1] -2.380952 $unconstrained.solution [1] 0 5 0 $iterations [1] 3 0 $Lagrangian [1] 0.0000000 0.2380952 2.0952381 $iact [1] 3 2 Here, the standard example is modified to work with Oracle R Enterprise require(ORE) ore.connect("my-name", "my-sid", "my-host", "my-pass", 1521) ore.doEval(   function () {     require(quadprog)   } ) ore.doEval(   function () {     Dmat <- matrix(0,3,3)     diag(Dmat) <- 1     dvec <- c(0,5,0)     Amat <- matrix(c(-4,-3,0,2,1,0,0,-2,1),3,3)     bvec <- c(-8,2,0)    solve.QP(Dmat,dvec,Amat,bvec=bvec)   } ) $solution [1] 0.4761905 1.0476190 2.0952381 $value [1] -2.380952 $unconstrained.solution [1] 0 5 0 $iterations [1] 3 0 $Lagrangian [1] 0.0000000 0.2380952 2.0952381 $iact [1] 3 2 Now I can combine the quadprog compute algorithms with the Oracle R Enterprise Database engine functionality: Scale to large datasets Access to tables, views, and external tables in the database, as well as those accessible through database links Use SQL query parallel execution Use in-database statistical and data mining functionality

    Read the article

  • Qt vs .NET - plz no n00bs who don't know wtf they're talking about [closed]

    - by Pirate for Profit
    Man in all these Qt vs. .NET discussions 90% these people don't know WTF they're talking about. Trying to get a real comparison chart going before we embark on a major fucking project. And yes I'm drunk, and yes I use cocaine. Event Handling In Qt the event handling system you just emit signals when something cool happens and then catch them in slots, for instance emit valueChanged(int percent, bool something); and void MyCatcherObj::valueChanged(int p, bool ok){} blocking them and disconnecting them when needed, doing it across threads... once you get the hang of it, it just seems a lot more natural and intuitive than the way the .NET event handling is set up (you know, object sender, CustomEventArgs e). And I'm not just talking about syntax, because in the end the .NET delegate crap is the bomb. I'm also talking about in more than just reflection (because, yes, .NET obviously has much stronger reflection capabilities). I'm talking about in the way the system feels to a human being. Qt wins hands down i m o. Basically, the footprints make more sense and you can visualize the project easier without the clunky event handling system. I wish I could it explain it better. The only thing is, I do love some of the ease of C# compared to C++ and .NET's assembly architecture. That is a big bonus for modular projects, which are a PITA to do in C++. Database Ease of Doing Crap Also what about datasets and database manipulations. I think .net wins here but I'm not sure. Threading/Conccurency How do you guys think of the threading? In .NET, all I've ever done is make like a list of master worker threads with locks. I like QConcurrentFramework, you don't worry about locks or anything, and with the ease of the signal slot system across threads it's nice to get notified about the progress of things. Memory Usage Also what do you think of the overall memory usage comparison. Is the .NET garbage collector pretty on the ball and quick compared to the instantaneous nature of native memory management? Or does it just let programs leak up a storm and lag the computer then clean it up when it's about to really lag? However, I am a n00b who doesn't know what I'm talking about, please school me on the subject.

    Read the article

  • Importing a large dataset into a database

    - by peaceful
    I'm a beginning programmer in the relevant areas to this question, so if possible, it'd be helpful to avoid assuming I know a lot already. I'm trying to import the OpenLibrary dataset into a local Postgres database. After it's imported, I plan to use it as a starting seed for a Ruby on Rails application that will include information on books. The OpenLibrary datasets are available here, in a modified JSON format: http://openlibrary.org/dev/docs/jsondump I only need very basic information for my application, much less than what is provided in the dumps. I'm only trying to get out book titles, author names, and relationships between books and authors. Below are two typical entries from their dataset, the first for an author, and the second for a book (they seem to have an entry for each edition of a book). The entries seem to lead off with a primary key, and then with a type, before including the actual JSON database dump. /a/OL2A /type/author {"name": "U. Venkatakrishna Rao", "personal_name": "U. Venkatakrishna Rao", "last_modified": {"type": "/type/datetime", "value": "2008-09-10 08:44:01.978456"}, "key": "/a/OL2A", "birth_date": "1904", "type": {"key": "/type/author"}, "id": 99, "revision": 3} /b/OL345M /type/edition {"publishers": ["Social Science Research Project, Dept. of Geography, University of Dacca"], "pagination": "ii, 54 p.", "title": "Land use in Fayadabad area", "lccn": ["sa 65000491"], "subject_place": ["East Pakistan", "Dacca region."], "number_of_pages": 54, "languages": [{"comment": "initial import", "code": "eng", "name": "English", "key": "/l/eng"}], "lc_classifications": ["S471.P162 E23"], "publish_date": "1963", "publish_country": "pk ", "key": "/b/OL345M", "authors": [{"birth_date": "1911", "name": "Nafis Ahmad", "key": "/a/OL302A", "personal_name": "Nafis Ahmad"}], "publish_places": ["Dacca, East Pakistan"], "by_statement": "[by] Nafis Ahmad and F. Karim Khan.", "oclc_numbers": ["4671066"], "contributions": ["Khan, Fazle Karim, joint author."], "subjects": ["Land use -- East Pakistan -- Dacca region."]} The size of the uncompressed dumps are enormous, about 2GB for the authors list, and 18GB for the book editions list. OpenLibrary does not provide any tools for this themselves, they provide a simple unoptimized Python script for reading in sample data (which unlike the actual dumps comes in pure JSON format), but they estimate if that was modified for use on their actual data it would take 2 months (!) to finish loading the data. How can I read this into the database? I assume I'll need to write a program to do this. What language and any guidance on how I should do it to finish in a reasonable amount of time? The only scripting language I have any experience with is Ruby.

    Read the article

  • QT vs. Net - REAL comparisons for R.A.D. projects

    - by Pirate for Profit
    Man in all these Qt vs. .NET discussions 90% these people argue about the dumbest crap. Trying to get a real comparison chart here, because I know a little about both frameworks but I don't know everything. I believe Qt and .NET both have strengths and weaknesses. This is to make a comparison that highlights these so people can make more informed decisions before embarking on a project, in the spirit of R.A.D. Event Handling In Qt the event handling system is very simple. You just emit signals when something cool happens and then catch them in slots. ie. // run some calculations, then emit valueChanged(30, false, 20.2); and then catching it, any object can make a slot to recieve that message easily void MyObj::valueChanged(int percent, bool ok, float timeRemaining). It's easy to "block" an event or "disconnect" when needed, and works seamlessly across threads... once you get the hang of it, it just seems a lot more natural and intuitive than the way the .NET event handling is set up (you know, void valueChanged(object sender, CustomEventArgs e). And I'm not just talking about syntax, because in the end the .NET anonymous delegates are the bomb. I'm also talking about in more than just reflection (because, yes, .NET obviously has much stronger reflection capabilities). I'm talking about in the way the system feels to a human being. Qt wins hands down for the simplest yet still flexible event handling system ever i m o. Plugins and such I do love some of the ease of C# compared to C++, as well as .NET's assembly architecture, even though it leads to a bunch of .dll's (there's ways to combine everything into a single exe though). That is a big bonus for modular projects, which are a PITA to import stuff in C++ as far as RAD is concerned. Database Ease of Doing Crap Also what about datasets and database manipulations. I think .net wins here but I'm not sure. Threading/Conccurency How do you guys think of the threading? In .NET, all I've ever done is make like a list of master worker threads with locks. I like QConcurrentFramework, you don't worry about locks or anything, and with the ease of the signal slot system across threads it's nice to get notified about the progress of things. QConcurrent is the simplest threading mechanism I've ever played with. Memory Usage Also what do you think of the overall memory usage comparison. Is the .NET garbage collector pretty on the ball and quick compared to the instantaneous nature of native memory management? Or does it just let programs leak up a storm and lag the computer then clean it up when it's about to really lag? Doesn't the just-in-time compiler make native code that is pretty good, like and that only happens the first time the program is run? However, I am a n00b who doesn't know what I'm talking about, please school me on the subject.

    Read the article

  • Qt vs .NET - a few comparisons [closed]

    - by Pirate for Profit
    Event Handling In Qt the event handling system you just emit signals when something cool happens and then catch them in slots, for instance emit valueChanged(int percent, bool something); and void MyCatcherObj::valueChanged(int p, bool ok){} blocking them and disconnecting them when needed, doing it across threads... once you get the hang of it, it just seems a lot more natural and intuitive than the way the .NET event handling is set up (you know, object sender, CustomEventArgs e). And I'm not just talking about syntax, because in the end the .NET delegate crap is the bomb. I'm also talking about in more than just reflection (because, yes, .NET obviously has much stronger reflection capabilities). I'm talking about in the way the system feels to a human being. Qt wins hands down i m o. Basically, the footprints make more sense and you can visualize the project easier without the clunky event handling system. I wish I could it explain it better. The only thing is, I do love some of the ease of C# compared to C++ and .NET's assembly architecture. That is a big bonus for modular projects, which are a PITA to do in C++. Database Ease of Doing Crap Also what about datasets and database manipulations. I think .net wins here but I'm not sure. Threading/Conccurency How do you guys think of the threading? In .NET, all I've ever done is make like a list of master worker threads with locks. I like QConcurrentFramework, you don't worry about locks or anything, and with the ease of the signal slot system across threads it's nice to get notified about the progress of things. Memory Usage Also what do you think of the overall memory usage comparison. Is the .NET garbage collector pretty on the ball and quick compared to the instantaneous nature of native memory management? Or does it just let programs leak up a storm and lag the computer then clean it up when it's about to really lag? However, I am a n00b who doesn't know what I'm talking about, please school me on the subject.

    Read the article

  • Workflow for statistical analysis and report writing

    - by ws
    Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries). The analyst analyzes the data created in (2), gets close to her goal, but sees that needs more data and so goes back to (1). Rinse repeat until the tables and graphics meet QA/QC and satisfy the client. Write report incorporating tables and graphics. Next year, the happy client comes back and wants an update. This should be as simple as updating the upstream data by a new download (e.g. get the building permits from the last year), and pressing a "RECALCULATE" button, unless specifications change. At the moment, I just start a directory and ad-hoc it the best I can. I would like a more systematic approach, so I am hoping someone has figured this out... I use a mix of spreadsheets, SQL, ARCGIS, R, and Unix tools. Thanks! PS: Below is a basic Makefile that checks for dependencies on various intermediate datasets (w/ ".RData" suffix) and scripts (".R" suffix). Make uses timestamps to check dependencies, so if you 'touch ss07por.csv', it will see that this file is newer than all the files / targets that depend on it, and execute the given scripts in order to update them accordingly. This is still a work in progress, including a step for putting into SQL database, and a step for a templating language like sweave. Note that Make relies on tabs in its syntax, so read the manual before cutting and pasting. Enjoy and give feedback! http://www.gnu.org/software/make/manual/html%5Fnode/index.html#Top R=/home/wsprague/R-2.9.2/bin/R persondata.RData : ImportData.R ../../DATA/ss07por.csv Functions.R $R --slave -f ImportData.R persondata.Munged.RData : MungeData.R persondata.RData Functions.R $R --slave -f MungeData.R report.txt: TabulateAndGraph.R persondata.Munged.RData Functions.R $R --slave -f TabulateAndGraph.R report.txt

    Read the article

  • What could cause a Labwindows/CVI C program to hate the number 2573?

    - by Adam Bard
    Using Windows So I'm reading from a binary file a list of unsigned int data values. The file contains a number of datasets listed sequentially. Here's the function to read a single dataset from a char* pointing to the start of it: function read_dataset(char* stream, t_dataset *dataset){ //...some init, including setting dataset->size; for(i=0;i<dataset->size;i++){ dataset->samples[i] = *((unsigned int *) stream); stream += sizeof(unsigned int); } //... } Where read_dataset in such a context as this: //... char buff[10000]; t_dataset* dataset = malloc( sizeof( *dataset) ); unsigned long offset = 0; for(i=0;i<number_of_datasets; i++){ fseek(fd_in, offset, SEEK_SET); if( (n = fread(buff, sizeof(char), sizeof(*dataset), fd_in)) != sizeof(*dataset) ){ break; } read_dataset(buff, *dataset); // Do something with dataset here. It's screwed up before this, I checked. offset += profileSize; } //... Everything goes swimmingly until my loop reads the number 2573. All of a sudden it starts spitting out random and huge numbers. For example, what should be ... 1831 2229 2406 2637 2609 2573 2523 2247 ... becomes ... 1831 2229 2406 2637 2609 0xDB00000A 0xC7000009 0xB2000008 ... If you think those hex numbers look suspicious, you're right. Turns out the hex values for the values that were changed are really familiar: 2573 -> 0xA0D 2523 -> 0x9DB 2247 -> 0x8C7 So apparently this number 2573 causes my stream pointer to gain a byte. This remains until the next dataset is loaded and parsed, and god forbid it contain a number 2573. I have checked a number of spots where this happens, and each one I've checked began on 2573. I admit I'm not so talented in the world of C. What could cause this is completely and entirely opaque to me.

    Read the article

< Previous Page | 9 10 11 12 13 14 15  | Next Page >