Search Results

Search found 177 results on 8 pages for 'normalization'.

Page 6/8 | < Previous Page | 2 3 4 5 6 7 8 | Next Page >

round off and displaying the values

- by S.PRATHIBA

Hi all, I have the following code: import java.io.; import java.sql.; import java.math.; import java.lang.; public class Testd1{ public static void main(String[] args) { System.out.println("Sum of the specific column!"); Connection con = null; int m=1; double sum,sum1,sum2; int e[]; e=new int[100]; int p; int decimalPlaces = 5; for( int i=0;i< e.length;i++) { e[i]=0; } double b2,c2,d2,u2,v2; int i,j,k,x,y; double mat[][]=new double[10][10]; try { Class.forName("com.mysql.jdbc.Driver"); con = DriverManager.getConnection ("jdbc:mysql://localhost:3306/prathi","root","mysql"); try{ Statement st = con.createStatement(); ResultSet res = st.executeQuery("SELECT Service_ID,SUM(consumer_feedback) FROM consumer1 group by Service_ID"); while (res.next()){ int data=res.getInt(1); System.out.println(data); System.out.println("\n\n"); int c1 = res.getInt(2); e[m]=res.getInt(2); if(e[m]<0) e[m]=0; m++; System.out.print(c1); System.out.println("\t\t"); } sum=e[1]+e[2]+e[3]+e[4]+e[5]; System.out.println("\n \n The sum is" +sum); for( p=21; p<=25; p++) { if(e[p] != 0) e[p]=e[p]/(int)sum; //I have type casted sum to get output BigDecimal bd1 = new BigDecimal(e[p]); bd1 = bd1.setScale(decimalPlaces, BigDecimal.ROUND_HALF_UP); // setScale is immutable e[p] = bd1.intValue(); System.out.println("\n\n The normalized value is" +e[p]); mat[4][p-21]=e[p]; } } catch (SQLException s){ System.out.println("SQL statement is not executed!"); } } catch (Exception e1){ e1.printStackTrace(); } } } I have a table named consumer1.After calculating the sum i am getting the values as follows mysql select Service_ID,sum(consumer_feedback) from consumer1 group by Service_ ID; Service_ID sum(consumer_feedback) 31 17 32 0 33 60 34 38 35 | 38 In my program I am getting the sum for each Service_ID correctly.But,after normalization ie while I am calculating 17/153=0.111 I am getting the normalized value is 0.I want the normalized values to be displayed correctly after rounding off.My output is as follows C:javac Testd1.java C:java Testd1 Sum of the specific column! 31 17 32 0 33 60 34 38 35 38 The sum is153.0 The normalized value is0 The normalized value is0 The normalized value is0 The normalized value is0 The normalized value is0 But,after normalization i want to get 17/153=0.111 I am getting the normalized value is 0.I want these values to be rounded off.

Read the article
Database design for very large amount of data

- by Hossein

Hi, I am working on a project, involving large amount of data from the delicious website.The data available is at files are "Date,UserId,Url,Tags" (for each bookmark). I normalized my database to a 3NF, and because of the nature of the queries that we wanted to use In combination I came down to 6 tables....The design looks fine, however, now a large amount of data is in the database, most of the queries needs to "join" at least 2 tables together to get the answer, sometimes 3 or 4. At first, we didn't have any performance issues, because for testing matters we haven't had added too much data in the database. No that we have a lot of data, simply joining extremely large tables does take a lot of time and for our project which has to be real-time is a disaster.I was wondering how big companies solve these issues.Looks like normalizing tables just adds complexity, but how does the big company handle large amounts of data in their databases, don't they do the normalization? thanks

Read the article
Store in DB or not to store ?

- by eugeneK

There are few string lists in my web application that i don't know where to store in DB or just class. ie. I have 7 major browsers with which users enter the site. I want to save these stats thus i need to create browser column in UserLogin database. I don't want to waste space and resources so i can save full browser name in each login row. So i either need to save browserID field and hook it up with Browsers table which will store names following db normalization rules or to have sort of Dataholder abstract class which has a list of browsers from which i can retrieve browser name by it's ID... The question what should i do ? These few data lists i have contain no more than 200 items each so i think it makes sense to have them as abstract class but again i don't know whether MS-SQL will handle multiple joins so well. Think of idea when i have user with country,ip,language,browser and few more stats .. thanks

Read the article
Views performance in MySQL for denormalization

- by Gianluca Bargelli

I am currently writing my truly first PHP Application and i would like to know how to project/design/implement MySQL Views properly; In my particular case User data is spread across several tables (as a consequence of Database Normalization) and i was thinking to use a View to group data into one large table: CREATE VIEW `Users_Merged` ( name, surname, email, phone, role ) AS ( SELECT name, surname, email, phone, 'Customer' FROM `Customer` ) UNION ( SELECT name, surname, email, tel, 'Admin' FROM `Administrator` ) UNION ( SELECT name, surname, email, tel, 'Manager' FROM `manager` ); This way i can use the View's data from the PHP app easily but i don't really know how much this can affect performance. For example: SELECT * from `Users_Merged` WHERE role = 'Admin'; Is the right way to filter view's data or should i filter BEFORE creating the view itself? (I need this to have a list of users and the functionality to filter them by role). EDIT Specifically what i'm trying to obtain is Denormalization of three tables into one. Is my solution correct? See Denormalization on wikipedia

Read the article
Views performance in MySQL

- by Gianluca Bargelli

I am currently writing my truly first PHP Application and i would like to know how to project/design/implement MySQL Views properly; In my particular case User data is spread across several tables (as a consequence of Database Normalization) and i was thinking to use a View to group data into one large table: CREATE VIEW `Users_Merged` ( name, surname, email, phone, role ) AS ( SELECT name, surname, email, phone, 'Customer' FROM `Customer` ) UNION ( SELECT name, surname, email, tel, 'Admin' FROM `Administrator` ) UNION ( SELECT name, surname, email, tel, 'Manager' FROM `manager` ); This way i can use the View's data from the PHP app easily but i don't really know how much this can affect performance. For example: SELECT * from `Users_Merged` WHERE role = 'Admin'; Is the right way to filter view's data or should i filter BEFORE creating the view itself? (I need this to have a list of users and the functionality to filter them by role).

Read the article
Numpy modify array in place?

- by User

I have the following code which is attempting to normalize the values of an m x n array (It will be used as input to a neural network, where m is the number of training examples and n is the number of features). However, when I inspect the array in the interpreter after the script runs, I see that the values are not normalized; that is, they still have the original values. I guess this is because the assignment to the array variable inside the function is only seen within the function. How can I do this normalization in place? Or do I have to return a new array from the normalize function? import numpy def normalize(array, imin = -1, imax = 1): """I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)""" dmin = array.min() dmax = array.max() array = imin + (imax - imin)*(array - dmin)/(dmax - dmin) print array[0] def main(): array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1) for column in array.T: normalize(column) return array if __name__ == "__main__": a = main()

Read the article
How can I draw a log-normalized imshow plot with a colorbar representing the raw data in matplotlib

- by Adam Fraser

I'm using matplotlib to plot log-normalized images but I would like the original raw image data to be represented in the colorbar rather than the [0-1] interval. I get the feeling there's a more matplotlib'y way of doing this by using some sort of normalization object and not transforming the data beforehand... in any case, there could be negative values in the raw image. import matplotlib.pyplot as plt import numpy as np def log_transform(im): '''returns log(image) scaled to the interval [0,1]''' try: (min, max) = (im[im > 0].min(), im.max()) if (max > min) and (max > 0): return (np.log(im.clip(min, max)) - np.log(min)) / (np.log(max) - np.log(min)) except: pass return im a = np.ones((100,100)) for i in range(100): a[i] = i f = plt.figure() ax = f.add_subplot(111) res = ax.imshow(log_transform(a)) # the colorbar drawn shows [0-1], but I want to see [0-99] cb = f.colorbar(res) I've tried using cb.set_array, but that didn't appear to do anything, and cb.set_clim, but that rescales the colors completely. Thanks in advance for any help :)

Read the article
Convert object to DateRange

- by user655832

I'm querying an underlying PostgreSQL database using Pandas 0.8. Pandas is returning the DataFrame properly but the underlying timestamp column in my database is being returned as a generic "object" type in Pandas. As I would eventually like to seasonal normalization of my data I am curious as to how to convert this generic "object" column to something that is appropriate for analysis. Here is my current code to retrieve the data: # get records from db example import pandas.io.sql as psql import psycopg2 # define query to get all subs created this year QRY = """ select i i, i * random() f, case when random() > 0.5 then true else false end t, (current_date - (i*random())::int)::timestamp with time zone tsz from generate_series(1,1000) as s(i) order by 4 ; """ CONN_STRING = "host='localhost' port=5432 dbname='postgres' user='postgres'" # connect to db conn = psycopg2.connect(CONN_STRING) # get some data set index on relid column df = psql.frame_query(QRY, con=conn) print "Row count retrieved: %i" % (len(df),) Thanks for any help you can render. M

Read the article
Java data structure to use with Hibernate to store unknown number of parameters?

- by Lunikon

Following problem: I want to render a news stream of short messages based on localized texts. In various places of these messages I have to insert parameters to "customize" them. I guess you know what I mean ;) My question probably falls into the "Which is the best style to do it?" category: How would you store these parameters (they may be Strings and Numbers that need to be formatted according to Locale) in the database? I'm using Hibernate to do the ORM and I can think of the following solutions: build a combined String and save it as such (ugly and hard to maintain I think) do some kind of fancy normalization and and make every parameter a single row on the database (clean I guess, but a performance nightmare) Put the params into an Array, Map or other Java data structure and save it in binary format (probably causes a lot of overhead size-wise) I tend towards option #3 but I'm afraid that it might be to costly in terms of size in the database. What do you think?

Read the article
Selecting item from set given distribution

- by JH

I have a set of X items such as {blower, mower, stove} and each item has a certain percentage of times it should be selected from the overall set {blower=25%,mower=25%,stove=75%} along with a certain distribution that these items should follow (blower should be selected more at the beginning of selection and stove more at the end). We are given a number of objects to be overall selected (ie 100) and a overall time to do this in (say 100 seconds). I was thinking of using a roulette wheel algorithm where the weights on the wheel are affected by the current distribution as a function of the elapsed time (and the allowed duration) so that simple functions could be used to determine the weight. Are there any common approaches to problems like this that anyone is aware of? Currently i have programmed something similar to this in java using functions such as x^2 (with correct normalization for the weights) to ensure that a good distribution occurs. Other suggestions or common practices would be welcome :-)

Read the article
Is there a combination of "LIKE" and "IN" in SQL?

- by Techpriester

Hi folks. In SQL I (sadly) often have to use "LIKE" conditions due to databases that violate nearly every rule of normalization. I can't change that right now. But that's irrelevant to the question. Further, I often use conditions like WHERE something in (1,1,2,3,5,8,13,21) for better readability and flexibility of my SQL statements. Is there any possible way to combine these two things without writing complicated sub-selects? I want something as easy as WHERE something LIKE ('bla%', '%foo%', 'batz%') instead of WHERE something LIKE 'bla%' OR something LIKE '%foo%' OR something LIKE 'batz%' I'm working with MS SQl Server and Oracle here but I'm interested if this is possible in any RDBMS at all.

Read the article
Implementing an ActiveRecord before_find

- by thaiyoshi

I am building a search with the keywords cached in a table. Before a user-inputted keyword is looked up in the table, it is normalized. For example, some punctuation like '-' is removed and the casing is standardized. The normalized keyword is then used to find fetch the search results. I am currently handling the normalization in the controller with a before_filter. I was wondering if there was a way to do this in the model instead. Something conceptually like a "before_find" callback would work although that wouldn't make sense on for an instance level.

Read the article
SQL SERVER – How to easily work with Database Diagrams

- by Pinal Dave

Databases are very widely used in the modern world. Regardless of the complexity of a database, each one requires in depth designing. To practice along please Download dbForge Studio now. The right methodology of designing a database is based on the foundations of data normalization, according to which we should first define database’s key elements – entities. Afterwards the attributes of entities and relations between them are determined. There is a strong opinion that the process of database designing should start with a pencil and a blank sheet of paper. This might look old-fashioned nowadays, because SQL Server provides a much wider functionality for designing databases – Database Diagrams. When using SSMS for working with Database Diagrams I realized two things – on the one hand, visualization of a scheme allows designing a database more efficiently; on the other – when it came to creating a big scheme, some difficulties occurred when designing with SSMS. The alternatives haven’t taken long to wait and dbForge Studio for SQL Server is one of them. Its functions offer more advantages for working with Database Diagrams. For example, unlike SSMS, dbForge Studio supports an opportunity to drag-and-drop several tables at once from the Database Explorer. This is my opinion but personally I find this option very useful. Another great thing is that a diagram can be saved as both a graphic file and a special XML file, which in case of identical environment can be easily opened on the other server for continuing the work. During working with dbForge Studio it turned out that it offers a wide set of elements to operate with on the diagram. Noteworthy among such elements are containers which allow aggregating diagram objects into thematic groups. Moreover, you can even place an image directly on the diagram if the scheme design is based on a standard template. Each of the development environments has a different approach to storing a diagram (for example, SSMS stores them on a server-side, whereas dbForge Studio – in a local file). I haven’t found yet an ability to convert existing diagrams from SSMS to dbForge Studio. However I hope Devart developers will implement this feature in one of the following releases. All in all, editing Database Diagrams through dbForge Studio was a nice experience and allowed speeding-up the common database designing tasks. Download dbForge Studio now. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL

Read the article
Meaning of Crawl errors

- by com

My question is about definition of Crawl errors in Google Webmaster Tools. Crawl errors is devided into few sections. Let's first consider HTTP section. I assume that all broken links in this section was somehow found by crawler, this is not the links from sitemap. If all this links was found by scanning pages from sitemap for links, why it doesn't mention what was the source page, like in sitemap section with column Linked From. Please correct me if I am wrong. Sitemap section. Looks like all those links came from my sitemap. But there is Linked From column, I already know, that all those broken links is from sitemap, so in order to fix the error, I should revise my sitemap. Am I wrong? Not followed section. I don't know what does it mean. Looks like it accumulates all links that caused redirect, but for some reason Google considers all those redirect as wrong redirect. Do you know if there are any set of rules how to determine wrong redirect. Actually I found were was my mistake, I tried to normalize URL and redirect it to the right URL, but I did normalization in a wrong way. Not found section. This section like HTTP section but with 404 errors. This section has Linked From column. But very often Linked From has unavailable. What does it mean, Google can not say me how it found this non existing page. How this section related to sitemap section. Does this section contains all 404 links from sitemap too. But there is too many 404 links, much more than in sitemap. I tried to take a look what we have in Linked From, and I saw that this link came from sitemap two month ago. But why Google keeps it indexed, the link is already dead, new sitemap doesn't have it. If there is any expire date for old links? Unreachable section. Looks like this section for 500 errors. This section doesn't contain Linked From column. There are too many completely meaningless links, I really don't know where this stuff came from, and without Linked From I am not able to figure out how to deal with it. Sorry for such a big topic, but I just want to make it clear, what every section stands for, because it's extremely crucial in order to deal with all those problems. Hopefully it will be useful not just for me. Thanks!

Read the article
How to remove the boundary effects arising due to zero padding in scipy/numpy fft?

- by Omkar

I have made a python code to smoothen a given signal using the Weierstrass transform, which is basically the convolution of a normalised gaussian with a signal. The code is as follows: #Importing relevant libraries from __future__ import division from scipy.signal import fftconvolve import numpy as np def smooth_func(sig, x, t= 0.002): N = len(x) x1 = x[-1] x0 = x[0] # defining a new array y which is symmetric around zero, to make the gaussian symmetric. y = np.linspace(-(x1-x0)/2, (x1-x0)/2, N) #gaussian centered around zero. gaus = np.exp(-y**(2)/t) #using fftconvolve to speed up the convolution; gaus.sum() is the normalization constant. return fftconvolve(sig, gaus/gaus.sum(), mode='same') If I run this code for say a step function, it smoothens the corner, but at the boundary it interprets another corner and smoothens that too, as a result giving unnecessary behaviour at the boundary. I explain this with a figure shown in the link below. Boundary effects This problem does not arise if we directly integrate to find convolution. Hence the problem is not in Weierstrass transform, and hence the problem is in the fftconvolve function of scipy. To understand why this problem arises we first need to understand the working of fftconvolve in scipy. The fftconvolve function basically uses the convolution theorem to speed up the computation. In short it says: convolution(int1,int2)=ifft(fft(int1)*fft(int2)) If we directly apply this theorem we dont get the desired result. To get the desired result we need to take the fft on a array double the size of max(int1,int2). But this leads to the undesired boundary effects. This is because in the fft code, if size(int) is greater than the size(over which to take fft) it zero pads the input and then takes the fft. This zero padding is exactly what is responsible for the undesired boundary effects. Can you suggest a way to remove this boundary effects? I have tried to remove it by a simple trick. After smoothening the function I am compairing the value of the smoothened signal with the original signal near the boundaries and if they dont match I replace the value of the smoothened func with the input signal at that point. It is as follows: i = 0 eps=1e-3 while abs(smooth[i]-sig[i])> eps: #compairing the signals on the left boundary smooth[i] = sig[i] i = i + 1 j = -1 while abs(smooth[j]-sig[j])> eps: # compairing on the right boundary. smooth[j] = sig[j] j = j - 1 There is a problem with this method, because of using an epsilon there are small jumps in the smoothened function, as shown below: jumps in the smooth func Can there be any changes made in the above method to solve this boundary problem?

Read the article
career in Mobile sw/Application Development [closed]

- by pramod

i m planning to do a course on Wireless & mobile computing.The syllabus are given below.Please check & let me know whether its worth to do.How is the job prospects after that.I m a fresher & from electronic Engg.The modules are- *Wireless and Mobile Computing (WiMC) – Modules* C, C++ Programming and Data Structures 100 Hours C Revision C, C++ programming tools on linux(Vi editor, gdb etc.) OOP concepts Programming constructs Functions Access Specifiers Classes and Objects Overloading Inheritance Polymorphism Templates Data Structures in C++ Arrays, stacks, Queues, Linked Lists( Singly, Doubly, Circular) Trees, Threaded trees, AVL Trees Graphs, Sorting (bubble, Quick, Heap , Merge) System Development Methodology 18 Hours Software life cycle and various life cycle models Project Management Software: A Process Various Phases in s/w Development Risk Analysis and Management Software Quality Assurance Introduction to Coding Standards Software Project Management Testing Strategies and Tactics Project Management and Introduction to Risk Management Java Programming 110 Hours Data Types, Operators and Language Constructs Classes and Objects, Inner Classes and Inheritance Inheritance Interface and Package Exceptions Threads Java.lang Java.util Java.awt Java.io Java.applet Java.swing XML, XSL, DTD Java n/w programming Introduction to servlet Mobile and Wireless Technologies 30 Hours Basics of Wireless Technologies Cellular Communication: Single cell systems, multi-cell systems, frequency reuse, analog cellular systems, digital cellular systems GSM standard: Mobile Station, BTS, BSC, MSC, SMS sever, call processing and protocols CDMA standard: spread spectrum technologies, 2.5G and 3G Systems: HSCSD, GPRS, W-CDMA/UMTS,3GPP and international roaming, Multimedia services CDMA based cellular mobile communication systems Wireless Personal Area Networks: Bluetooth, IEEE 802.11a/b/g standards Mobile Handset Device Interfacing: Data Cables, IrDA, Bluetooth, Touch- Screen Interfacing Wireless Security, Telemetry Java Wireless Programming and Applications Development(J2ME) 100 Hours J2ME Architecture The CLDC and the KVM Tools and Development Process Classification of CLDC Target Devices CLDC Collections API CLDC Streams Model MIDlets MIDlet Lifecycle MIDP Programming MIDP Event Architecture High-Level Event Handling Low-Level Event Handling The CLDC Streams Model The CLDC Networking Package The MIDP Implementation Introduction to WAP, WML Script and XHTML Introduction to Multimedia Messaging Services (MMS) Symbian Programming 60 Hours Symbian OS basics Symbian OS services Symbian OS organization GUI approaches ROM building Debugging Hardware abstraction Base porting Symbian OS reference design porting File systems Overview of Symbian OS Development – DevKits, CustKits and SDKs CodeWarrior Tool Application & UI Development Client Server Framework ECOM STDLIB in Symbian iPhone Programming 80 Hours Introducing iPhone core specifications Understanding iPhone input and output Designing web pages for the iPhone Capturing iPhone events Introducing the webkit CSS transforms transitions and animations Using iUI for web apps Using Canvas for web apps Building web apps with Dashcode Writing Dashcode programs Debugging iPhone web pages SDK programming for web developers An introduction to object-oriented programming Introducing the iPhone OS Using Xcode and Interface builder Programming with the SDK Toolkit OS Concepts & Linux Programming 60 Hours Operating System Concepts What is an OS? Processes Scheduling & Synchronization Memory management Virtual Memory and Paging Linux Architecture Programming in Linux Linux Shell Programming Writing Device Drivers Configuring and Building GNU Cross-tool chain Configuring and Compiling Linux Virtual File System Porting Linux on Target Hardware WinCE.NET and Database Technology 80 Hours Execution Process in .NET Environment Language Interoperability Assemblies Need of C# Operators Namespaces & Assemblies Arrays Preprocessors Delegates and Events Boxing and Unboxing Regular Expression Collections Multithreading Programming Memory Management Exceptions Handling Win Forms Working with database ASP .NET Server Controls and client-side scripts ASP .NET Web Server Controls Validation Controls Principles of database management Need of RDBMS etc Client/Server Computing RDBMS Technologies Codd’s Rules Data Models Normalization Techniques ER Diagrams Data Flow Diagrams Database recovery & backup SQL Android Application 80 Hours Introduction of android Why develop for android Android SDK features Creating android activities Fundamental android UI design Intents, adapters, dialogs Android Technique for saving data Data base in Androids Maps, Geocoding, Location based services Toast, using alarms, Instant messaging Using blue tooth Using Telephony Introducing sensor manager Managing network and wi-fi connection Advanced androids development Linux kernel security Implement AIDL Interface. Project 120 Hours

Read the article
How the number of indexes built on a table can impact performances?

- by Davide Mauri

We all know that putting too many indexes (I’m talking of non-clustered index only, of course) on table may produce performance problems due to the overhead that each index bring to all insert/update/delete operations on that table. But how much? I mean, we all agree – I think – that, generally speaking, having many indexes on a table is “bad”. But how bad it can be? How much the performance will degrade? And on a concurrent system how much this situation can also hurts SELECT performances? If SQL Server take more time to update a row on a table due to the amount of indexes it also has to update, this also means that locks will be held for more time, slowing down the perceived performance of all queries involved. I was quite curious to measure this, also because when teaching it’s by far more impressive and effective to show to attended a chart with the measured impact, so that they can really “feel” what it means! To do the tests, I’ve create a script that creates a table (that has a clustered index on the primary key which is an identity column) , loads 1000 rows into the table (inserting 1000 row using only one insert, instead of issuing 1000 insert of one row, in order to minimize the overhead needed to handle the transaction, that would have otherwise ), and measures the time taken to do it. The process is then repeated 16 times, each time adding a new index on the table, using columns from table in a round-robin fashion. Test are done against different row sizes, so that it’s possible to check if performance changes depending on row size. The result are interesting, although expected. This is the chart showing how much time it takes to insert 1000 on a table that has from 0 to 16 non-clustered indexes. Each test has been run 20 times in order to have an average value. The value has been cleaned from outliers value due to unpredictable performance fluctuations due to machine activity. The test shows that in a table with a row size of 80 bytes, 1000 rows can be inserted in 9,05 msec if no indexes are present on the table, and the value grows up to 88 (!!!) msec when you have 16 indexes on it This means a impact on performance of 975%. That’s *huge*! Now, what happens if we have a bigger row size? Say that we have a table with a row size of 1520 byte. Here’s the data, from 0 to 16 indexes on that table: In this case we need near 22 msec to insert 1000 in a table with no indexes, but we need more that 500msec if the table has 16 active indexes! Now we’re talking of a 2410% impact on performance! Now we can have a tangible idea of what’s the impact of having (too?) many indexes on a table and also how the size of a row also impact performances. That’s why the golden rule of OLTP databases “few indexes, but good” is so true! (And in fact last week I saw a database with tables with 1700bytes row size and 23 (!!!) indexes on them!) This also means that a too heavy denormalization is really not a good idea (we’re always talking about OLTP systems, keep it in mind), since the performance get worse with the increase of the row size. So, be careful out there, and keep in mind the “equilibrium” is the key world of a database professional: equilibrium between read and write performance, between normalization and denormalization, between to few and too may indexes. PS Tests are done on a VMWare Workstation 7 VM with 2 CPU and 4 GB of Memory. Host machine is a Dell Precsioni M6500 with i7 Extreme X920 Quad-Core HT 2.0Ghz and 16Gb of RAM. Database is stored on a SSD Intel X-25E Drive, Simple Recovery Model, running on SQL Server 2008 R2. If you also want to to tests on your own, you can download the test script here: Open TestIndexPerformance.sql

Read the article
What Counts for a DBA: Skill

- by drsql

“Practice makes perfect:” right? Well, not exactly. The reality of it all is that this saying is an untrustworthy aphorism. I discovered this in my “younger” days when I was a passionate tennis player, practicing and playing 20+ hours a week. No matter what my passion level was, without some serious coaching (and perhaps a change in dietary habits), my skill level was never going to rise to a level where I could make any money at the sport that involved something other than selling tennis balls at a sporting goods store. My game may have improved with all that practice but I had too many bad practices to overcome. Practice by itself merely reinforces what we know and what we can figure out naturally. The truth is actually closer to the expression used by Vince Lombardi: “Perfect practice makes perfect.” So how do you get to become skilled as a DBA if practice alone isn’t sufficient? Hit the Internet and start searching for SQL training and you can find 100 different sites. There are also hundreds of blogs, magazines, books, conferences both onsite and virtual. But then how do you know who is good? Unfortunately often the worst guide can be to find out the experience level of the writer. Some of the best DBAs are frighteningly young, and some got their start back when databases were stored on stacks of paper with little holes in it. As a programmer, is it really so hard to understand normalization? Set based theory? Query optimization? Indexing and performance tuning? The biggest barrier often is previous knowledge, particularly programming skills cultivated before you get started with SQL. In the world of technology, it is pretty rare that a fresh programmer will gravitate to database programming. Database programming is very unsexy work, because without a UI all you have are a bunch of text strings that you could never impress anyone with. Newbies spend most of their time building UIs or apps with procedural code in C# or VB scoring obvious interesting wins. Making matters worse is that SQL programming requires mastery of a much different toolset than most any mainstream programming skill. Instead of controlling everything yourself, most of the really difficult work is done by the internals of the engine (written by other non-relational programmers…we just can’t get away from them.) So is there a golden road to achieving a high skill level? Sadly, with tennis, I am pretty sure I’ll never discover it. However, with programming it seems to boil down to practice in applying the appropriate techniques for whatever type of programming you are doing. Can a C# programmer build a great database? As long as they don’t treat SQL like C#, absolutely. Same goes for a DBA writing C# code. None of this stuff is rocket science, as long as you learn to understand that different types of programming require different skill sets and you as a programmer must recognize the difference between one of the procedural languages and SQL and treat them differently. Skill comes from practicing doing things the right way and making “right” a habit.

Read the article
Database structure - is mySQL the right choice?

- by Industrial

Hi everyone, We are currently planning the database structure of a quite complex e-commerce web app that has flexibility as it's main cornerstone. Our app features a large amount of data (products) and we have run into a slight headache trying to keep performance high without compromizing normalization rules in the database, or leaving our highly beloved flexibility concept behind when integrating product options (also widely known as product attributes or parameters). Based on various references and sources available, we have made up lists on pros and cons of all major and well known database patterns to solve this. After comparing these, we have come up with two final alternatives: EAV (Entity-attribute-value model) : Pros: Database is used for all sorting. Cons: All related queries will include a number of joins between multiple tables in order to complete the collection of data. SLOB (Serialized LOB, also known as Facade?) : Pros: Very flexible. Keeping the number of necessary joins low compared to a EAV design pattern. Easy to update/add/remove data from each product. Cons: All sorting will be done by the application instead of the database. Will use lots of performance (memory?) when big datasets is processed by a large number of users. Our main questions: Which pattern/structure would you use, or maybe even a different solution? Is there better databases besides mySQL available nowadays to accomplish what we want? Thanks a lot! Reference: http://stackoverflow.com/questions/695752/product-table-many-kinds-of-product-each-product-has-many-parameters

Read the article
Non-Relational Database Design

- by Ian Varley

I'm interested in hearing about design strategies you have used with non-relational "nosql" databases - that is, the (mostly new) class of data stores that don't use traditional relational design or SQL (such as Hypertable, CouchDB, SimpleDB, Google App Engine datastore, Voldemort, Cassandra, SQL Data Services, etc.). They're also often referred to as "key/value stores", and at base they act like giant distributed persistent hash tables. Specifically, I want to learn about the differences in conceptual data design with these new databases. What's easier, what's harder, what can't be done at all? Have you come up with alternate designs that work much better in the non-relational world? Have you hit your head against anything that seems impossible? Have you bridged the gap with any design patterns, e.g. to translate from one to the other? Do you even do explicit data models at all now (e.g. in UML) or have you chucked them entirely in favor of semi-structured / document-oriented data blobs? Do you miss any of the major extra services that RDBMSes provide, like relational integrity, arbitrarily complex transaction support, triggers, etc? I come from a SQL relational DB background, so normalization is in my blood. That said, I get the advantages of non-relational databases for simplicity and scaling, and my gut tells me that there has to be a richer overlap of design capabilities. What have you done? FYI, there have been StackOverflow discussions on similar topics here: the next generation of databases changing schemas to work with Google App Engine choosing a document-oriented database

Read the article
Reordering arguments using recursion (pro, cons, alternatives)

- by polygenelubricants

I find that I often make a recursive call just to reorder arguments. For example, here's my solution for endOther from codingbat.com: Given two strings, return true if either of the strings appears at the very end of the other string, ignoring upper/lower case differences (in other words, the computation should not be "case sensitive"). Note: str.toLowerCase() returns the lowercase version of a string. public boolean endOther(String a, String b) { return a.length() < b.length() ? endOther(b, a) : a.toLowerCase().endsWith(b.toLowerCase()); } I'm very comfortable with recursions, but I can certainly understand why some perhaps would object to it. There are two obvious alternatives to this recursion technique: Swap a and b traditionally public boolean endOther(String a, String b) { if (a.length() < b.length()) { String t = a; a = b; b = t; } return a.toLowerCase().endsWith(b.toLowerCase()); } Not convenient in a language like Java that doesn't pass by reference Lots of code just to do a simple operation An extra if statement breaks the "flow" Repeat code public boolean endOther(String a, String b) { return (a.length() < b.length()) ? b.toLowerCase().endsWith(a.toLowerCase()) : a.toLowerCase().endsWith(b.toLowerCase()); } Explicit symmetry may be a nice thing (or not?) Bad idea unless the repeated code is very simple ...though in this case you can get rid of the ternary and just || the two expressions So my questions are: Is there a name for these 3 techniques? (Are there more?) Is there a name for what they achieve? (e.g. "parameter normalization", perhaps?) Are there official recommendations on which technique to use (when)? What are other pros/cons that I may have missed?

Read the article
Beginner SQL question: querying gold and silver tag badges in Stack Exchange Data Explorer

- by polygenelubricants

I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases. I'm trying to query the Badges table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema: Badges Id UserId Name Date This works well for badges like [Epic] and [Legendary] which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name. Here's an example query I wrote for [mysql] tag: SELECT UserId as [User Link], Date FROM Badges Where Name = 'mysql' Order By Date ASC The (slightly annotated) output is: as seen on stexdex: User Link Date --------------- ------------------- // all for silver except where noted Bill Karwin 2009-02-20 11:00:25 Quassnoi 2009-06-01 10:00:16 Greg 2009-10-22 10:00:25 Quassnoi 2009-10-31 10:00:24 // for gold Bill Karwin 2009-11-23 11:00:30 // for gold cletus 2010-01-01 11:00:23 OMG Ponies 2010-01-03 11:00:48 Pascal MARTIN 2010-02-17 11:00:29 Mark Byers 2010-04-07 10:00:35 Daniel Vassallo 2010-05-14 10:00:38 This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql] tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice. So this is the way I understand it: The first time an Id appears (in chronological order) is for the silver badge The second time is for the gold Now, the above result mixes the silver and gold entries together. My questions are: Is this a typical design, or are there much friendlier schema/normalization/whatever you call it? In the current design, how would you query the silver and gold badges separately? GROUP BY Id and picking the min/max or first/second by the Date somehow? How can you write a query that lists all the silver badges first then all the gold badges next? Imagine also that the "real" query may be more complicated, i.e. not just listing by date. How would you write it so that it doesn't have too many repetition between the silver and gold subqueries? Is it perhaps more typical to do two totally separate queries instead? What is this idiom called? A row "partitioning" query to put them into "buckets" or something?

Read the article
Java UTF-8 to ASCII conversion with supplements

- by bozo

Hi, we are accepting all sorts of national characters in UTF-8 string on the input, and we need to convert them to ASCII string on the output for some legacy use. (we don't accept Chinese and Japanese chars, only European languages) We have a small utility to get rid of all the diacritics: public static final String toBaseCharacters(final String sText) { if (sText == null || sText.length() == 0) return sText; final char[] chars = sText.toCharArray(); final int iSize = chars.length; final StringBuilder sb = new StringBuilder(iSize); for (int i = 0; i < iSize; i++) { String sLetter = new String(new char[] { chars[i] }); sLetter = Normalizer.normalize(sLetter, Normalizer.Form.NFC); try { byte[] bLetter = sLetter.getBytes("UTF-8"); sb.append((char) bLetter[0]); } catch (UnsupportedEncodingException e) { } } return sb.toString(); } The question is how to replace all the german sharp s (ß, Ð, d) and other characters that get through the above normalization method, with their supplements (in case of ß, supplement would probably be "ss" and in case od Ð supplement would be either "D" or "Dj"). Is there some simple way to do it, without million of .replaceAll() calls? So for example: Ðonardan = Djonardan, Blaß = Blass and so on. We can replace all "problematic" chars with empty space, but would like to avoid this to make the output as similar to the input as possible. Thank you for your answers, Bozo

Read the article
how can I speed up insertion of many rows to a table via ADO.NET?

- by jcollum

I have a table that has 5 columns: AcctId (int), Address1 (varchar), Address2 (varchar), Person1 (varchar), Person2 (varchar) . I'm generating random data to insert into this table via a C# console application. I've tried doing this random data insert via SQL-Server and decided it was not a good solution -- SQL is not good at random on an each-row basis. Generating the random data -- 975k rows of it -- takes a minimal amount of time. It's in a List of custom objects. I need to take this random data and update many rows in the database with the new random data. I tried updating the rows one at a time, very slow because of the repeated searching of the List object in code. So I think the best approach is to put all the randomized data into a table in the database, then update all the other tables that use this data. I.e. UPDATE t SET t.Address1=d.Address1 FROM Table1 t INNER JOIN RandomizedData d ON d.AcctId = t.Acct_ID. The database is very un-normalized so this Acct data is sprinkled all over the place. I've got no control of the normalization. So, having decided to insert all of the randomized data into a single table, I set out to create insert scripts: USE TheDatabase Insert tmp_RandomizedData SELECT 1,'4392 EIGHTH AVE','','JENNIFER CARTER','BARBARA CARTER' UNION ALL SELECT 2,'2168 MAIN ST','HNGR F','DANIEL HERNANDEZ','SUSAN MARTIN' // etc another 98 times... // FYI, this is not real data! I'm building this INSERT script in batches of 100. It's taking on average 175 ms to run each insert. Does this seem like a long time? It's going to take about 35 mins to run the whole insert. The table doesn't have a primary key or any indexes. I was planning on adding those after all the data in inserted (thinking that that would be faster). Is there a better way to do this?

Read the article
How should I design my MYSQL table/s?

- by yaya3

I built a really basic php/mysql site for an architect that uses one 'projects' table. The website showcases various projects that he has worked on. Each project contained one piece of text and one series of images. Original projects table (create syntax): CREATE TABLE `projects` ( `project_id` int(11) NOT NULL auto_increment, `project_name` text, `project_text` text, `image_filenames` text, `image_folder` text, `project_pdf` text, PRIMARY KEY (`project_id`) ) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1; The client now requires the following, and I'm not sure how to handle the expansions in my DB. My suspicion is that I will need an additional table. Each project now have 'pages'. Pages either contain... One image One "piece" of text One image and one piece of text. Each page could use one of three layouts. As each project does not currently have more than 4 pieces of text (a very risky assumption) I have expanded the original table to accommodate everything. New projects table attempt (create syntax): CREATE TABLE `projects` ( `project_id` int(11) NOT NULL AUTO_INCREMENT, `project_name` text, `project_pdf` text, `project_image_folder` text, `project_img_filenames` text, `pages_with_text` text, `pages_without_img` text, `pages_layout_type` text, `pages_title` text, `page_text_a` text, `page_text_b` text, `page_text_c` text, `page_text_d` text, PRIMARY KEY (`project_id`) ) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1; In trying to learn more about MYSQL table structuring I have just read an intro to normalization and A Simple Guide to Five Normal Forms in Relational Database Theory. I'm going to keep reading! Thanks in advance

Read the article

< Previous Page | 2 3 4 5 6 7 8 | Next Page >