Search Results

Search found 4788 results on 192 pages for 'adhoc queries'.

Page 66/192 | < Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73  | Next Page >

  • Rethinking Oracle Optimizer Statistics for P6 Part 2

    - by Brian Diehl
    In the previous post (Part 1), I tried to draw some key insights about the relationship between P6 and Oracle Optimizer Statistics.  The first is that average cardinality has the greatest impact on query optimization and that the particular queries generated by P6 are more likely to use this average during calculations. The second is that these are statistics that are unlikely to change greatly over the life of the application. Ultimately, our goal is to get the best query optimization possible.  Or is it? Stability No application administrator wants to get the call at 9am that their application users cannot get there work done because everything is running slow. This is a possibility with a regularly scheduled nightly collection of statistics. It may not just be slow performance, but a complete loss of service because one or more queries are optimized poorly. Ideally, this should not be the case. The database optimizer should make better decisions with more up-to-date data. Better statistics may give incremental performance benefit. However, this benefit must be balanced against the potential cost of system down time.  It is stability that we ultimately desire and not absolute optimal performance. We do want the benefit from more accurate statistics and better query plans, but not at the risk of an unusable system. As a result, I've developed the following methodology around managing database statistics for the P6 database.  1. No Automatic Re-Gathering - The daily, weekly, or other interval of statistic gathering is unlikely to be beneficial. Quite the opposite. It is more likely to cause problems. 2. Smart Re-Gathering - The time to collect statistics is when things have changed significantly. For a new installation of P6, this is happening more often because the data is growing from a few rows to thousands and more. But for a mature system, the data is not changing significantly from week-to-week. There are times to collect statistics: New releases of the application Changes in the underlying hardware or software versions (ex. new Oracle RDBMS version) When additional user groups are added. The new groups may use the software in significantly different ways. After significant changes in the data. This may be monthly, quarterly or yearly.  3. Always Test - If you take away one thing from this post, it would be to always have a plan to test after changing statistics. In reality, statistics can be collected as often as you desire provided there are tests in place to verify that performance is the same or better. These might be automated tests or simply a manual script of application functions. 4. Have a Way Out - Never change the statistics without a way to return to the previous set. Think of the statistics as one part of the overall application code that also includes the source code--both application and RDBMS. It would be foolish to change to the new code without a way to get back to the previous version. In the final post, I will talk about the actual script I created for P6 PMDB and possible future direction for managing query performance. 

    Read the article

  • WPF application with MS Access database as a data source

    - by Kay Zed
    I have a Microsoft Access 2010 database. Now, using Visual Studio 2010, I want to create a WPF application and add the database as a data source. The app will have a window with a frame that provides navigation through pages. No problem so far. But: -What is the right way to set up the database in this scenario? Tables only? Or must everything go via queries? (VS2010 talks about views which I assume (?) are queries) -Database data must be updatable and records can be added. Some relationships go through link tables (many-to-many) and there are nullable foreign key relationships. Must I take manual steps to make it work? -While adding the data source VS2010 created an xsd from my Access database. I think the xsd might need further tweaking for the application to work the right way. What if I change my Access database design, I'd have to regenerate the xsd again as well. Is this right, and is it the way it is usually done? OR, should I let the original Access database go and give the application the capability to create new empty databases? -How do you provide controls in a page to step through the records in a table? Is there a special database control? -What is the way (WPF class?) to load records into the data context that displays in a page? (At this level it probably does not matter what type of data source it is.)

    Read the article

  • pylucene: install error

    - by Pradeep
    I am trying to install Pylucene (pylucene-3.3-3-src.tar.gz) on my ubuntu linux 11.10. I have python 2.7.2. I was able to compile JCC (I think) because I didnt see any error when I installed it. When I tried to install Pylucene I get the following error. Can someone help? Thanks. ICU not installed /usr/bin/python -m jcc --shared --jar lucene-java-3.3/lucene/build/lucene-core-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/analyzers/common/lucene-analyzers-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/memory/lucene-memory-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/highlighter/lucene-highlighter-3.3.jar --jar build/jar/extensions.jar --jar lucene-java-3.3/lucene/build/contrib/queries/lucene-queries-3.3.jar --jar lucene-java-3.3/lucene/build/contrib/grouping/lucene-grouping-3.3.jar --package java.lang java.lang.System java.lang.Runtime --package java.util java.util.Arrays java.util.HashMap java.util.HashSet java.text.SimpleDateFormat java.text.DecimalFormat java.text.Collator --package java.util.regex --package java.io java.io.StringReader java.io.InputStreamReader java.io.FileInputStream --exclude org.apache.lucene.queryParser.Token --exclude org.apache.lucene.queryParser.TokenMgrError --exclude org.apache.lucene.queryParser.QueryParserTokenManager --exclude org.apache.lucene.queryParser.ParseException --exclude org.apache.lucene.search.regex.JakartaRegexpCapabilities --exclude org.apache.regexp.RegexpTunnel --exclude org.apache.lucene.analysis.cn.smart.AnalyzerProfile --python lucene --mapping org.apache.lucene.document.Document 'get:(Ljava/lang/String;)Ljava/lang/String;' --mapping java.util.Properties 'getProperty:(Ljava/lang/String;)Ljava/lang/String;' --sequence java.util.AbstractList 'size:()I' 'get:(I)Ljava/lang/Object;' --rename org.apache.lucene.search.highlight.SpanScorer=HighlighterSpanScorer --version 3.3 --module python/collections.py --module python/ICUNormalizer2Filter.py --module python/ICUFoldingFilter.py --module python/ICUTransformFilter.py --files 3 --build /usr/bin/python: No module named jcc make: *** [compile] Error 1 Here is my Makefile configuration which I uncommented PREFIX_PYTHON=/usr ANT=ant PYTHON=$(PREFIX_PYTHON)/bin/python JCC=$(PYTHON) -m jcc --shared NUM_FILES=3

    Read the article

  • Unit Test For NpgsqlCommand With Rhino Mocks

    - by J Pollack
    My unit test keeps getting the following error: "System.InvalidOperationException: The Connection is not open." The Test [TestFixture] public class Test { [Test] public void Test1() { NpgsqlConnection connection = MockRepository.GenerateStub<NpgsqlConnection>(); // Tried to fake the open connection connection.Stub(x => x.State).Return(ConnectionState.Open); connection.Stub(x => x.FullState).Return(ConnectionState.Open); DbQueries queries = new DbQueries(connection); bool procedure = queries.ExecutePreProcedure("201003"); Assert.IsTrue(procedure); } } Code Under Test using System.Data; using Npgsql; public class DbQueries { private readonly NpgsqlConnection _connection; public DbQueries(NpgsqlConnection connection) { _connection = connection; } public bool ExecutePreProcedure(string date) { var command = new NpgsqlCommand("name_of_procedure", _connection); command.CommandType = CommandType.StoredProcedure; NpgsqlParameter parameter = new NpgsqlParameter {DbType = DbType.String, Value = date}; command.Parameters.Add(parameter); command.ExecuteScalar(); return true; } } How would you test the code using Rhino Mocks 3.6? PS. NpgsqlConnection is a connection to a PostgreSQL server.

    Read the article

  • NHibernate L2 Cache - fluent nHibernate configuration

    - by AWC
    I've managed to configure the L2 cache for Get\Load in FHN, but it's not working for queries configured using the ICriteria interface - it doesn't cache the results from these queries. Does anyone know why? The configurations are as follows: ICriteria: return unitOfWork .CurrentSession .CreateCriteria(typeof(Country)) .SetCacheable(true); Entity Mapping: public sealed class CountryMap : ClassMap<Country>, IMap { public CountryMap() { Table("Countries"); Not.LazyLoad(); Cache.ReadWrite().IncludeAll(); Id(x => x.Id); Map(x => x.TwoLetter); Map(x => x.ThreeLetter); Map(x => x.Name); } } And the session factory configuration for the database property: return () => MsSqlConfiguration.MsSql2005 .ConnectionString(BuildConnectionString()) .ShowSql() .Cache(c => c.UseQueryCache() .QueryCacheFactory<StandardQueryCacheFactory>() .ProviderClass(configuration.RepositoryCacheType) .UseMinimalPuts()) .FormatSql() .UseReflectionOptimizer(); Cheers AWC

    Read the article

  • should I use Entity Framework instead of raw ADO.NET

    - by user110182
    I am new to CSLA and Entity Framework. I am creating a new CSLA / Silverlight application that will replace a 12 year old Win32 C++ system. The old system uses a custom DCOM business object library and uses ODBC to get to SQL Server. The new system will not immediately replace the old system -- they must coexist against the same database for years to come. At first I thought EF was the way to go since it is the latest and greatest. After making a small EF model and only 2 CSLA editable root objects (I will eventually have hundreds of objects as my DB has 800+ tables) I am seriously questioning the use of EF. In the current system I have the need many times to do fine detail performance tuning of the queries which I can do because of 100% control of generated SQL. But it seems in EF that so much happens behind the scenes that I lose that control. Article like http://toomanylayers.blogspot.com/2009/01/entity-framework-and-linq-to-sql.html don't help my impression of EF. People seem to like EF because of LINQ to EF but since my criteria is passed between client and server as criteria object it seems like I could build queries just as easily without LINQ. I understand in WCF RIA that there is query projection (or something like that) where I can do client side LINQ which does move to the server before translation into actual SQL so in that case I can see the benefit of EF, but not in CSLA. If I use raw ADO.NET, will I regret my decision 5 years from now? Has anyone else made this choice recently and which way did you go?

    Read the article

  • should I use Entity Framework instead of raw ADO.NET

    - by user110182
    I am new to CSLA and Entity Framework. I am creating a new CSLA / Silverlight application that will replace a 12 year old Win32 C++ system. The old system uses a custom DCOM business object library and uses ODBC to get to SQL Server. The new system will not immediately replace the old system -- they must coexist against the same database for years to come. At first I thought EF was the way to go since it is the latest and greatest. After making a small EF model and only 2 CSLA editable root objects (I will eventually have hundreds of objects as my DB has 800+ tables) I am seriously questioning the use of EF. In the current system I have the need many times to do fine detail performance tuning of the queries which I can do because of 100% control of generated SQL. But it seems in EF that so much happens behind the scenes that I lose that control. Article like http://toomanylayers.blogspot.com/2009/01/entity-framework-and-linq-to-sql.html don't help my impression of EF. People seem to like EF because of LINQ to EF but since my criteria is passed between client and server as criteria object it seems like I could build queries just as easily without LINQ. I understand in WCF RIA that there is query projection (or something like that) where I can do client side LINQ which does move to the server before translation into actual SQL so in that case I can see the benefit of EF, but not in CSLA. If I use raw ADO.NET, will I regret my decision 5 years from now? Has anyone else made this choice recently and which way did you go?

    Read the article

  • MySQL performance - 100Mb ethernet vs 1Gb ethernet

    - by Rob Penridge
    Hi All I've just started a new job and noticed that the analysts computers are connected to the network at 100Mbps. The ODBC queries we run against the MySQL server can easily return 500MB+ and it seems at times when the servers are under high load the DBAs kill low priority jobs as they are taking too long to run. My question is this... How much of this server time is spent executing the request, and how much time is spent returning the data to the client? Could the query speeds be improved by upgrading the network connections to 1Gbps? (Updated for the why): The database in question was built to accomodate reporting needs and contains massive amounts of data. We usually work with subsets of this data at a granular level in external applications such as SAS or Excel, hence the reason for the large amounts of data being transmitted. The queries are not poorly structured - they are very simple and the appropriate joins/indexes etc are being used. I've removed 'query' from the Title of the post as I realised this question is more to do with general MySQL performance rather than query related performance. I was kind of hoping that someone with a Gigabit connection may be able to actually quantify some results for me here by running a query that returns a decent amount of data, then they could limit their connection speed to 100Mb and rerun the same query. Hopefully this could be done in an environment where loads are reasonably stable so as not to skew the results. If ethernet speed can improve the situation I wanted some quantifiable results to help argue my case for upgrading the network connections. Thanks Rob

    Read the article

  • MySql query optimization help

    - by rohitgu
    I have few queries and am not able to figure out how to optimize them, QUERY 1 select * from t_twitter_tracking where classified is null and tweetType='ENGLISH' order by id limit 500; QUERY 2 Select count(*) as cnt, DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') as dat from t_twitter_tracking wrdTrk where wrdTrk.word like ('dell') and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30') between '2010-12-12 00:00:00' and '2010-12-26 00:00:00' group by dat; Both these queries run on the same table, CREATE TABLE `t_twitter_tracking` ( `id` BIGINT(20) NOT NULL AUTO_INCREMENT, `word` VARCHAR(200) NOT NULL, `tweetId` BIGINT(100) NOT NULL, `twtText` VARCHAR(800) NULL DEFAULT NULL, `language` TEXT NULL, `links` TEXT NULL, `tweetType` VARCHAR(20) NULL DEFAULT NULL, `source` TEXT NULL, `sourceStripped` TEXT NULL, `isTruncated` VARCHAR(40) NULL DEFAULT NULL, `inReplyToStatusId` BIGINT(30) NULL DEFAULT NULL, `inReplyToUserId` INT(11) NULL DEFAULT NULL, `rtUsrProfilePicUrl` TEXT NULL, `isFavorited` VARCHAR(40) NULL DEFAULT NULL, `inReplyToScreenName` VARCHAR(40) NULL DEFAULT NULL, `latitude` BIGINT(100) NOT NULL, `longitude` BIGINT(100) NOT NULL, `retweetedStatus` VARCHAR(40) NULL DEFAULT NULL, `statusInReplyToStatusId` BIGINT(100) NOT NULL, `statusInReplyToUserId` BIGINT(100) NOT NULL, `statusFavorited` VARCHAR(40) NULL DEFAULT NULL, `statusInReplyToScreenName` TEXT NULL, `screenName` TEXT NULL, `profilePicUrl` TEXT NULL, `twitterId` BIGINT(100) NOT NULL, `name` TEXT NULL, `location` VARCHAR(100) NULL DEFAULT NULL, `bio` TEXT NULL, `url` TEXT NULL COLLATE 'latin1_swedish_ci', `utcOffset` INT(11) NULL DEFAULT NULL, `timeZone` VARCHAR(100) NULL DEFAULT NULL, `frenCnt` BIGINT(20) NULL DEFAULT '0', `createdAt` DATETIME NULL DEFAULT NULL, `createdOnGMT` VARCHAR(40) NULL DEFAULT NULL, `createdOnServerTime` DATETIME NULL DEFAULT NULL, `follCnt` BIGINT(20) NULL DEFAULT '0', `favCnt` BIGINT(20) NULL DEFAULT '0', `totStatusCnt` BIGINT(20) NULL DEFAULT NULL, `usrCrtDate` VARCHAR(200) NULL DEFAULT NULL, `humanSentiment` VARCHAR(30) NULL DEFAULT NULL, `replied` BIT(1) NULL DEFAULT NULL, `replyMsg` TEXT NULL, `classified` INT(32) NULL DEFAULT NULL, `createdOnGMTDate` DATETIME NULL DEFAULT NULL, `locationDetail` TEXT NULL, `geonameid` INT(11) NULL DEFAULT NULL, `country` VARCHAR(255) NULL DEFAULT NULL, `continent` CHAR(2) NULL DEFAULT NULL, `placeLongitude` FLOAT NULL DEFAULT NULL, `placeLatitude` FLOAT NULL DEFAULT NULL, PRIMARY KEY (`id`), INDEX `id` (`id`, `word`), INDEX `createdOnGMT_index` (`createdOnGMT`) USING BTREE, INDEX `word_index` (`word`) USING BTREE, INDEX `location_index` (`location`) USING BTREE, INDEX `classified_index` (`classified`) USING BTREE, INDEX `tweetType_index` (`tweetType`) USING BTREE, INDEX `getunclassified_index` (`classified`, `tweetType`) USING BTREE, INDEX `timeline_index` (`word`, `createdOnGMTDate`, `classified`) USING BTREE, INDEX `createdOnGMTDate_index` (`createdOnGMTDate`) USING BTREE, INDEX `locdetail_index` (`country`, `id`) USING BTREE, FULLTEXT INDEX `twtText_index` (`twtText`) ) COLLATE='utf8_general_ci' ENGINE=MyISAM ROW_FORMAT=DEFAULT AUTO_INCREMENT=12608048; The table has more than 10 million records. How can I optimize it?

    Read the article

  • SQL Server 2005 and 'General network error'

    - by Mariusz
    I know there is a lot of information in the Internet about solving this problem, but it didn't help me. My Delphi application uses dbExpress controls to access the database and execute SQL queries. Once every couple of days, however, it stops working because the database connection fails. This happens on several different computers with different versions of Windows. MSSQL Server 2005 (version 9.0.4035) is installed on each of them. The above mentioned application executes queries every couple of seconds, and they are mainly insert commands. Every couple of days I get a series of exceptions like the following one: [DBNETLIB][ConnectionOpen (PreLoginHandshake()).]General network error. Check your network documentation. And then the SQL server becomes inaccessible until I restart it manually. The information I found in the Internet say that I should install some service packs, change some registry entries etc., but believe me, none of these helps and I don't know what else to do now. Could you please help me solve this problem? Any clues or ideas? I can give you some more information about the server or the application if necessary. Thank you very much in advance.

    Read the article

  • MySQL Unique hash insertion

    - by Jesse
    So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE). Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries? Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db. Here's some pseudo-php, obviously untested, but gets the general idea across: while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){ //found conflict, try again. } In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable. So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert. I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake. Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too. Thanks!

    Read the article

  • "Executing SQL directly; no cursor" error when using SCOPE_IDENTITY

    - by Chris
    There wasn't much on google about this error, so I'm askin here. I'm switching a PHP web application from using MySQL to SQL Server 2008 (using ODBC, not php_mssql). Running queries or anything else isn't a problem, but when I try to do scope_identity (or any similar functions), I get the error "Executing SQL directly; no cursor". I'm doing this immediately after an insert, so it should still be in scope. Running the same insert statement then query for the insert ID works fine in SQL Server Management Studio. Here's my code right now (everything else in the database wrapper class works fine for other queries, so I'll assume it isn't relevant right now): function insert_id(){ $x = $this->query_first("SELECT SCOPE_IDENTITY('session_log') as insert_id"); echo "($x)"; return $x; } query_first being a function that returns the first result from the first field of a query (basically the equivalent of execute_scalar() on .net). The full error message: Warning: odbc_exec() [function.odbc-exec]: SQL error: [Microsoft][SQL Server Native Client 10.0][SQL Server]Executing SQL directly; no cursor., SQL state 01000 in SQLExecDirect in C:[...]\Database_MSSQL.php on line 110

    Read the article

  • SQL SERVER 2008 JOIN hints

    - by Nai
    Hi all, Recently, I was trying to optimise this query UPDATE Analytics SET UserID = x.UserID FROM Analytics z INNER JOIN UserDetail x ON x.UserGUID = z.UserGUID Estimated execution plan show 57% on the Table Update and 40% on a Hash Match (Aggregate). I did some snooping around and came across the topic of JOIN hints. So I added a LOOP hint to my inner join and WA-ZHAM! The new execution plan shows 38% on the Table Update and 58% on an Index Seek. So I was about to start applying LOOP hints to all my queries until prudence got the better of me. After some googling, I realised that JOIN hints are not very well covered in BOL. Therefore... Can someone please tell me why applying LOOP hints to all my queries is a bad idea. I read somewhere that a LOOP JOIN is default JOIN method for query optimiser but couldn't verify the validity of the statement? When are JOIN hints used? When the sh*t hits the fan and ghost busters ain't in town? What's the difference between LOOP, HASH and MERGE hints? BOL states that MERGE seems to be the slowest but what is the application of each hint? Thanks for your time and help people! I'm running SQL Server 2008 BTW. The statistics mentioned above are ESTIMATED execution plans.

    Read the article

  • Call Multiple Stored Procedures with the Zend Framework

    - by Brian Fisher
    I'm using Zend Framework 1.7.2, MySQL and the MySQLi PDO adapter. I would like to call multiple stored procedures during a given action. I've found that on Windows there is a problem calling multiple stored procedures. If you try it you get the following error message: SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active. Consider using PDOStatement::fetchAll(). Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute. I found that to work around this issue I could just close the connection to the database after each call to a stored procedure: if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') { //If on windows close the connection $db->closeConnection(); } This has worked well for me, however, now I want to call multiple stored procedures wrapped in a transaction. Of course, closing the connection isn't an option in this situation, since it causes a rollback of the open transaction. Any ideas, how to fix this problem and/or work around the issue. More info about the work around Bug report about the problem

    Read the article

  • Retrieve 2 last posts for each category.

    - by Savageman
    Hello, Lets say I have 2 tables: blog_posts and categories. Each blog post belongs to only ONE category, so there is basically a foreign key between the 2 tables here. I would like to retrieve the 2 lasts posts from each category, is it possible to achieve this in a single request? GROUP BY would group everything and leave me with only one row in each category. But I want 2 of them. It would be easy to perform 1 + N query (N = number of category). First retrieve the categories. And then retrieve 2 posts from each category. I believe it would also be quite easy to perform M queries (M = number of posts I want from each category). First query selects the first post for each category (with a group by). Second query retrieves the second post for each category. etc. I'm just wondering if someone has a better solution for this. I don't really mind doing 1+N queries for that, but for curiosity and general SQL knowledge, it would be appreciated! Thanks in advance to whom can help me with this.

    Read the article

  • Can't get MySQL source query to work using Python mysqldb module

    - by Chris
    I have the following lines of code: sql = "source C:\\My Dropbox\\workspace\\projects\\hosted_inv\\create_site_db.sql" cursor.execute (sql) When I execute my program, I get the following error: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'source C:\My Dropbox\workspace\projects\hosted_inv\create_site_db.sql' at line 1 Now I can copy and past the following into mysql as a query: source C:\\My Dropbox\\workspace\\projects\\hosted_inv\\create_site_db.sql And it works perfect. When I check the query log for the query executed by my script, it shows that my query was the following: source C:\\My Dropbox\\workspace\\projects\\hosted_inv\\create_site_db.sql However, when I manually paste it in and execute, the entire create_site_db.sql gets expanded in the query log and it shows all the sql queries in that file. Am I missing something here on how mysqldb does queries? Am I running into a limitation. My goal is to run a sql script to create the schema structure, but I don't want to have to call mysql in a shell process to source the sql file. Any thoughts? Thanks!

    Read the article

  • RESTful API design question - how should one allow users to create new resource instances?

    - by Tamás
    I'm working in a research group where we intend to publish implementations of some of the algorithms we develop on the web via a RESTful API. Most of these algorithms work on small to medium size datasets, and in many cases, a user of our services might want to run multiple queries (with different parameters) on the same dataset, so for me it seems reasonable to allow users to upload their datasets in advance and refer to them in their queries later. In this sense, a dataset could be a resource in my API, and an algorithm could be another. My question is: how should I let the users upload their own datasets? I cannot simply let users upload their data to /dataset/dataset_id as letting the users invent their own dataset_ids might result in ID collision and users overwriting each other's datasets by accident. (I believe one of the most frequently used dataset ID would be test). I think an ideal way would be to have a dedicated URL (like /dataset/upload) where users can POST their datasets and the response would contain a unique ID under which the dataset was stored, but I'm not sure that it does not violate the basic principles of REST. What is the preferred way of dealing with such scenarios?

    Read the article

  • ai: Determining what tests to run to get most useful data

    - by Sai Emrys
    This is for http://cssfingerprint.com I have a system (see about page on site for details) where: I need to output a ranked list, with confidences, of categories that match a particular feature vector the binary feature vectors are a list of site IDs & whether this session detected a hit feature vectors are, for a given categorization, somewhat noisy (sites will decay out of history, and people will visit sites they don't normally visit) categories are a large, non-closed set (user IDs) my total feature space is approximately 50 million items (URLs) for any given test, I can only query approx. 0.2% of that space I can only make the decision of what to query, based on results so far, ~10-30 times, and must do so in <~100ms (though I can take much longer to do post-processing, relevant aggregation, etc) getting the AI's probability ranking of categories based on results so far is mildly expensive; ideally the decision will depend mostly on a few cheap sql queries I have training data that can say authoritatively that any two feature vectors are the same category but not that they are different (people sometimes forget their codes and use new ones, thereby making a new user id) I need an algorithm to determine what features (sites) are most likely to have a high ROI to query (i.e. to better discriminate between plausible-so-far categories [users], and to increase certainty that it's any given one). This needs to take into balance exploitation (test based on prior test data) and exploration (test stuff that's not been tested enough to find out how it performs). There's another question that deals with a priori ranking; this one is specifically about a posteriori ranking based on results gathered so far. Right now, I have little enough data that I can just always test everything that anyone else has ever gotten a hit for, but eventually that won't be the case, at which point this problem will need to be solved. I imagine that this is a fairly standard problem in AI - having a cheap heuristic for what expensive queries to make - but it wasn't covered in my AI class, so I don't actually know whether there's a standard answer. So, relevant reading that's not too math-heavy would be helpful, as well as suggestions for particular algorithms. What's a good way to approach this problem?

    Read the article

  • Practical size limitations for RDBMS

    - by grenade
    I am working on a project that must store very large datasets and associated reference data. I have never come across a project that required tables quite this large. I have proved that at least one development environment cannot cope at the database tier with the processing required by the complex queries against views that the application layer generates (views with multiple inner and outer joins, grouping, summing and averaging against tables with 90 million rows). The RDBMS that I have tested against is DB2 on AIX. The dev environment that failed was loaded with 1/20th of the volume that will be processed in production. I am assured that the production hardware is superior to the dev and staging hardware but I just don't believe that it will cope with the sheer volume of data and complexity of queries. Before the dev environment failed, it was taking in excess of 5 minutes to return a small dataset (several hundred rows) that was produced by a complex query (many joins, lots of grouping, summing and averaging) against the large tables. My gut feeling is that the db architecture must change so that the aggregations currently provided by the views are performed as part of an off-peak batch process. Now for my question. I am assured by people who claim to have experience of this sort of thing (which I do not) that my fears are unfounded. Are they? Can a modern RDBMS (SQL Server 2008, Oracle, DB2) cope with the volume and complexity I have described (given an appropriate amount of hardware) or are we in the realm of technologies like Google's BigTable? I'm hoping for answers from folks who have actually had to work with this sort of volume at a non-theoretical level.

    Read the article

  • Reporting System architecture for better performance

    - by pauloya
    Hi, We have a product that runs Sql Server Express 2005 and uses mainly ASP.NET. The database has around 200 tables, with a few (4 or 5) that can grow from 300 to 5000 rows per day and keep a history of 5 years, so they can grow to have 10 million rows. We have built a reporting platform, that allows customers to build reports based on templates, fields and filters. We face performance problems almost since the beginning, we try to keep reports display under 10 seconds but some of them go up to 25 seconds (specially on those customers with long history). We keep checking indexes and trying to improve the queries but we get the feeling that there's only so much we can do. Off course the fact that the queries are generated dynamically doesn't help with the optimization. We also added a few tables that keep redundant data, but then we have the added problem of maintaining this data up to date, and also Sql Express has a limit on the size of databases. We are now facing a point where we have to decide if we want to give up real time reports, or maybe cut the history to be able to have better performance. I would like to ask what is the recommended approach for this kind of systems. Also, should we start looking for third party tools/platforms? I know OLAP can be an option but can we make it work on Sql Server Express, or at least with a license that is cheap enough to distribute to thousands of deployments? Thanks

    Read the article

  • Python: How to execute a SQL file or command

    - by Mestika
    Hi, I have this Python script: import sys import getopt import timeit import random import os import re import ibm_db import time from string import maketrans runs=5 queries=50 file = open("results.txt", "a") for r in range(5): print "Run %s\n" % r os.system("python reads.py -r1 -pquery1.sql -q50 -sespec") file.write('END QUERY READ 01') file.close() os.system("python query_read_02.py") Everything here is working, it is creating the results.txt file, it run the os.system("python reads.py...") file and that file is doing everything it's suppose to, but the problem comes when go and run the query_read_02.py file. In this file, it should execute a SQL command or a SQL file on my database, so I can create an index and see what the performance of that input is, but how do i do it? I create the connection to the database in the reads.py file, but it's hard to create the queries in there because I doesn't keep track of which file it has reached, it just execute commands from what the parameters are. I hope I've explained myself clear enough, otherwise please let me know. I just want to execute a SQL command or file which each query_read_0x.py file. Sincerely Mestika

    Read the article

  • NHibernate with nothing but stored procedures

    - by ChrisB2010
    I'd like to have NHibernate call a stored procedure when ISession.Get is called to fetch an entity by its key instead of using dynamic SQL. We have been using NHibernate and allowing it to generate our SQL for queries and inserts/updates/deletes, but now may have to deploy our application to an environment that requires us to use stored procedures for all database access. We can use sql-insert, sql-update, and sql-delete in our .hbm.xml mapping files for inserts/updates/deletes. Our hql and criteria queries will have to be replaced with stored procedure calls. However, I have not figured out how to force NHibernate to use a custom stored procedure to fetch an entity by its key. I still want to be able to call ISession.Get, as in: using (ISession session = MySessionFactory.OpenSession()) { return session.Get<Customer>(customerId); } and also lazy load objects, but I want NHibernate to call my "GetCustomerById" stored procedure instead of generating the dynamic SQL. Can this be done? Perhaps NHibernate is no longer a fit given this new environment we must support.

    Read the article

  • php: showing my country based on my IP, mysql optimized

    - by andufo
    I'm downloaded WIPmania's worldip table from http://www.wipmania.com/en/base/ -- the table has 3 fields and around 79k rows: startip // example: 3363110912 endip // example: 3363112063 country // example: AR (Argentina) So, lets suppose i'm in Argentina and my IP address is: 200.117.248.17 1) I use this function to convert my ip to long function ip_address_to_number($ip) { if(!$ip) { return false; } else { $ip = split('\.',$ip); return($ip[0]*16777216 + $ip[1]*65536 + $ip[2]*256 + $ip[3]); } } 2) I search for the proper country code by matching the long converted ip: $sql = 'SELECT * FROM worldip WHERE '.ip_address_to_number($_SERVER['REMOTE_ADDR']).' BETWEEN startip AND endip'; which is equivalent to: SELECT country FROM worldip WHERE 3363174417 BETWEEN startip AND endip (benchmark: Showing rows 0 - 0 (1 total, Query took 0.2109 sec)) Now comes the real question. What if another bunch of argentinian guys also open the website and they all have these ip addresses: 200.117.248.17 200.117.233.10 200.117.241.88 200.117.159.24 Since i'm caching all the sql queries; instead of matching EACH of the ip queries in the database, would it be better (and right) just to match the 2 first sections of the ip by modifying the function like this? function ip_address_to_number($ip) { if(!$ip) { return false; } else { $ip = split('\.',$ip); return($ip[0]*16777216 + $ip[1]*65536); } } (notice that the 3rd and 4th splitted values of the IP have been removed). That way instead of querying these 4 values: 3363174417 3363170570 3363172696 3363151640 ...all i have to query is: 3363110912 (which is 200.117.0.0 converted to long). Is this right? any other ideas to optimize this process? Thanks!

    Read the article

  • Mysqli query only works on localhost, not webserver

    - by whamo
    Hello. I have changed some of my old queries to the Mysqli framework to improve performance. Everything works fine on localhost but when i upload it to the webserver it outputs nothing. After connecting I check for errors and there are none. I also checked the php modules installed and mysqli is enabled. I am certain that it creates a connection to the database as no errors are displayed. (when i changed the database name string it gave the error) There is no output from the query on the webserver, which looks like this: $mysqli = new mysqli("server", "user", "password"); if (mysqli_connect_errno()) { printf("Can't connect Errorcode: %s\n", mysqli_connect_error()); exit; } // Query used $query = "SELECT name FROM users WHERE id = ?"; if ($stmt = $mysqli->prepare("$query")) { // Specify parameters to replace '?' $stmt->bind_param("d", $id); $stmt->execute(); // bind variables to prepared statement $stmt->bind_result($_userName); while ($stmt->fetch()) { echo $_userName; } $stmt-close(); } } //close connection $mysqli-close(); As I said this code works perfectly on my localserver just not online. Checked the error logs and there is nothing so everything points to a good connection. All the tables exists as well etc. Anyone any ideas because this one has me stuck! Also, if i get this working, will all my other queries still work? Or will i need to make them use the mysqli framework as well? Thanks in advance.

    Read the article

  • Converting a bash script to python (small script)

    - by Mestika
    Hi, I’ve a bash script I’ve been using for a Linux environment but now I have to use it on a Windows platform and want to convert the bash script to a python script which I can run. The bash script is rather simple (I think) and I’ve tried to convert it by google by way around but can’t convert it successfully. The bash script looks like this: runs=5 queries=50 outfile=outputfile.txt date >> $outfile echo -e "\n---------------------------------" echo -e "\n----------- Normal --------------" echo -e "\n---------------------------------" echo -e "\n----------- Normal --------------" >> $outfile for ((r = 1; r < ($runs + 1); r++)) do echo -e "Run $r of $runs\n" db2 FLUSH PACKAGE CACHE DYNAMIC python reads.py -r1 -pquery1.sql -q$queries -shotelspec -k6 -a5 >> $outfile done The main command, the python read.py … etc. is another python file I’ve been given and have the arguments as you see. I know it is a lot to ask for, but it would really help me out if someone could convert this to a python script I can use or at least give me some hints and directions. Sincerely Mestika

    Read the article

< Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73  | Next Page >