normalizing - Page 2 - Developer IT

Converting Numpy Lstsq residual value to R^2

- by whatnick

I am performing a least squares regression as below (univariate). I would like to express the significance of the result in terms of R^2. Numpy returns a value of unscaled residual, what would be a sensible way of normalizing this. field_clean,back_clean = rid_zeros(backscatter,field_data) num_vals = len(field_clean) x = field_clean[:,row:row+1] y = 10*log10(back_clean) A = hstack([x, ones((num_vals,1))]) soln = lstsq(A, y ) m, c = soln [0] residues = soln [1] print residues

Read the article

Setting up RADIUS + LDAP for WPA2 on Ubuntu

- by Morten Siebuhr

I'm setting up a wireless network for ~150 users. In short, I'm looking for a guide to set RADIUS server to authenticate WPA2 against a LDAP. On Ubuntu. I got a working LDAP, but as it is not in production use, it can very easily be adapted to whatever changes this project may require. I've been looking at FreeRADIUS, but any RADIUS server will do. We got a separate physical network just for WiFi, so not too many worries about security on that front. Our AP's are HP's low end enterprise stuff - they seem to support whatever you can think of. All Ubuntu Server, baby! And the bad news: I now somebody less knowledgeable than me will eventually take over administration, so the setup has to be as "trivial" as possible. So far, our setup is based only on software from the Ubuntu repositories, with exception of our LDAP administration web application and a few small special scripts. So no "fetch package X, untar, ./configure"-things if avoidable. UPDATE 2009-08-18: While I found several useful resources, there is one serious obstacle: Ignoring EAP-Type/tls because we do not have OpenSSL support. Ignoring EAP-Type/ttls because we do not have OpenSSL support. Ignoring EAP-Type/peap because we do not have OpenSSL support. Basically the Ubuntu version of FreeRADIUS does not support SSL (bug 183840), which makes all the secure EAP-types useless. Bummer. But some useful documentation for anybody interested: http://vuksan.com/linux/dot1x/802-1x-LDAP.html http://tldp.org/HOWTO/html_single/8021X-HOWTO/#confradius UPDATE 2009-08-19: I ended up compiling my own FreeRADIUS package yesterday evening - there's a really good recipe at http://www.linuxinsight.com/building-debian-freeradius-package-with-eap-tls-ttls-peap-support.html (See the comments to the post for updated instructions). I got a certificate from http://CACert.org (you should probably get a "real" cert if possible) Then I followed the instructions at http://vuksan.com/linux/dot1x/802-1x-LDAP.html. This links to http://tldp.org/HOWTO/html_single/8021X-HOWTO/, which is a very worthwhile read if you want to know how WiFi security works. UPDATE 2009-08-27: After following the above guide, I've managed to get FreeRADIUS to talk to LDAP: I've created a test user in LDAP, with the password mr2Yx36M - this gives an LDAP entry roughly of: uid: testuser sambaLMPassword: CF3D6F8A92967E0FE72C57EF50F76A05 sambaNTPassword: DA44187ECA97B7C14A22F29F52BEBD90 userPassword: {SSHA}Z0SwaKO5tuGxgxtceRDjiDGFy6bRL6ja When using radtest, I can connect fine: > radtest testuser "mr2Yx36N" sbhr.dk 0 radius-private-password Sending Access-Request of id 215 to 130.225.235.6 port 1812 User-Name = "msiebuhr" User-Password = "mr2Yx36N" NAS-IP-Address = 127.0.1.1 NAS-Port = 0 rad_recv: Access-Accept packet from host 130.225.235.6 port 1812, id=215, length=20 > But when I try through the AP, it doesn't fly - while it does confirm that it figures out the NT and LM passwords: ... rlm_ldap: sambaNTPassword -> NT-Password == 0x4441343431383745434139374237433134413232463239463532424542443930 rlm_ldap: sambaLMPassword -> LM-Password == 0x4346334436463841393239363745304645373243353745463530463736413035 [ldap] looking for reply items in directory... WARNING: No "known good" password was found in LDAP. Are you sure that the user is configured correctly? [ldap] user testuser authorized to use remote access rlm_ldap: ldap_release_conn: Release Id: 0 ++[ldap] returns ok ++[expiration] returns noop ++[logintime] returns noop [pap] Normalizing NT-Password from hex encoding [pap] Normalizing LM-Password from hex encoding ... It is clear that the NT and LM passwords differ from the above, yet the message [ldap] user testuser authorized to use remote access - and the user is later rejected...

Read the article

Five Key Strategies in Master Data Management

- by david.butler(at)oracle.com

Here is a very interesting Profit Magazine article on MDM: A recent customer survey reveals the deleterious effects of data fragmentation. by Trevor Naidoo, December 2010 Across industries and geographies, IT organizations have grown in complexity, whether due to mergers and acquisitions, or decentralized systems supporting functional or departmental requirements. With systems architected over time to support unique, one-off process needs, they are becoming costly to maintain, and the Internet has only further added to the complexity. Data fragmentation has become a key inhibitor in delivering flexible, user-friendly systems. The Oracle Insight team conducted a survey assessing customers' master data management (MDM) capabilities over the past two years to get a sense of where they are in terms of their capabilities. The responses, by 27 respondents from six different industries, reveal five key areas in which customers need to improve their data management in order to get better financial results. 1. Less than 15 percent of organizations surveyed understand the sources and quality of their master data, and have a roadmap to address missing data domains. Examples of the types of master data domains referred to are customer, supplier, product, financial and site. Many organizations have multiple sources of master data with varying degrees of data quality in each source -- customer data stored in the customer relationship management system is inconsistent with customer data stored in the order management system. Imagine not knowing how many places you stored your customer information, and whether a customer's address was the most up to date in each source. In fact, more than 55 percent of the respondents in the survey manage their data quality on an ad-hoc basis. It is important for organizations to document their inventory of data sources and then profile these data sources to ensure that there is a consistent definition of key data entities throughout the organization. Some questions to ask are: How do we define a customer? What is a product? How do we define a site? The goal is to strive for one common repository for master data that acts as a cross reference for all other sources and ensures consistent, high-quality master data throughout the organization. 2. Only 18 percent of respondents have an enterprise data management strategy to ensure that data is treated as an asset to the organization. Most respondents handle data at the department or functional level and do not have an enterprise view of their master data. The sales department may track all their interactions with customers as they move through the sales cycle, the service department is tracking their interactions with the same customers independently, and the finance department also has a different perspective on the same customer. The salesperson may not be aware that the customer she is trying to sell to is experiencing issues with existing products purchased, or that the customer is behind on previous invoices. The lack of a data strategy makes it difficult for business users to turn data into information via reports. Without the key building blocks in place, it is difficult to create key linkages between customer, product, site, supplier and financial data. These linkages make it possible to understand patterns. A well-defined data management strategy is aligned to the business strategy and helps create the governance needed to ensure that data stewardship is in place and data integrity is intact. 3. Almost 60 percent of respondents have no strategy to integrate data across operational applications. Many respondents have several disparate sources of data with no strategy to keep them in sync with each other. Even though there is no clear strategy to integrate the data (see #2 above), the data needs to be synced and cross-referenced to keep the business processes running. About 55 percent of respondents said they perform this integration on an ad hoc basis, and in many cases, it is done manually with the help of Microsoft Excel spreadsheets. For example, a salesperson needs a report on global sales for a specific product, but the product has different product numbers in different countries. Typically, an analyst will pull all the data into Excel, manually create a cross reference for that product, and then aggregate the sales. The exact same procedure has to be followed if the same report is needed the following month. A well-defined consolidation strategy will ensure that a central cross-reference is maintained with updates in any one application being propagated to all the other systems, so that data is synchronized and up to date. This can be done in real time or in batch mode using integration technology. 4. Approximately 50 percent of respondents spend manual efforts cleansing and normalizing data. Information stored in various systems usually follows different standards and formats, making it difficult to match the data. A customer's address can be stored in different ways using a variety of abbreviations -- for example, "av" or "ave" for avenue. Similarly, a product's attributes can be stored in a number of different ways; for example, a size attribute can be stored in inches and can also be entered as "'' ". These types of variations make it difficult to match up data from different sources. Today, most customers rely on manual, heroic efforts to match, cleanse, and de-duplicate data -- clearly not a scalable, sustainable model. To solve this challenge, organizations need the ability to standardize data for customers, products, sites, suppliers and financial accounts; however, less than 10 percent of respondents have technology in place to automatically resolve duplicates. It is no wonder, therefore, that we get communications about products we don't own, at addresses we don't reside, and using channels (like direct mail) we don't like. An all-too-common example of a potential challenge follows: Customers end up receiving duplicate communications, which not only impacts customer satisfaction, but also incurs additional mailing costs. Cleansing, normalizing, and standardizing data will help address most of these issues. 5. Only 10 percent of respondents have the ability to share data that was mastered in a master data hub. Close to 60 percent of respondents have efforts in place that profile, standardize and cleanse data manually, and the output of these efforts are stored in spreadsheets in various parts of the organization. This valuable information is not easily shared with the rest of the organization and, more importantly, this enriched information cannot be sent back to the source systems so that the data is fixed at the source. A key benefit of a master data management strategy is not only to clean the data, but to also share the data back to the source systems as well as other systems that need the information. Aside from the source systems, another key beneficiary of this data is the business intelligence system. Having clean master data as input to business intelligence systems provides more accurate and enhanced reporting. Characteristics of Stellar MDM When deciding on the right master data management technology, organizations should look for solutions that have four main characteristics: enterprise-grade MDM performance complete technology that can be rapidly deployed and addresses multiple business issues end-to-end MDM process management with data quality monitoring and assurance pre-built MDM business relevant applications with data stores and workflows These master data management capabilities will aid in moving closer to a best-practice maturity level, delivering tremendous efficiencies and savings as well as revenue growth opportunities as a result of better understanding your customers. Trevor Naidoo is a senior director in Industry Strategy and Insight at Oracle.

Read the article

How to distill / rasterize a PDF in Linux

- by Sampo

We have a printer at our office that prints PDF files from a USB stick. It prints most files okay, but it has problems with some, especially ones generated with Latex. Some PDFs it simply refuses to print, some PDFs it prints with courier-type font, and some it prints fine except for equations. I'm looking for a way to "distill" PDFs into a dead-sure format to print. Either by simplifying / normalizing the PDF to the point that any renderer will render it correctly, or by simply making each page a 600dpi raster image in the PDF. (I could split the PDF into individual raster images and combine them manually, but I want something scriptable.) The output file size doesn't matter, as long as it's sure to print, has A4 paper size (or the original) and 300~600dpi resolution.

Read the article

Database design for very large amount of data

- by Hossein

Hi, I am working on a project, involving large amount of data from the delicious website.The data available is at files are "Date,UserId,Url,Tags" (for each bookmark). I normalized my database to a 3NF, and because of the nature of the queries that we wanted to use In combination I came down to 6 tables....The design looks fine, however, now a large amount of data is in the database, most of the queries needs to "join" at least 2 tables together to get the answer, sometimes 3 or 4. At first, we didn't have any performance issues, because for testing matters we haven't had added too much data in the database. No that we have a lot of data, simply joining extremely large tables does take a lot of time and for our project which has to be real-time is a disaster.I was wondering how big companies solve these issues.Looks like normalizing tables just adds complexity, but how does the big company handle large amounts of data in their databases, don't they do the normalization? thanks

Read the article

Hashes vs Numeric id's

- by Karan Bhangui

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc). Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id? In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database? Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future? The platform uses php/python with ISAM /w MySQL.

Read the article

What's the best way to normalize scores for ranking things?

- by beagleguy

hi all, I'm curious how to do normalizing of numbers for a ranking algorithm let's say I want to rank a link based on importance and I have two columns to work with so a table would look like url | comments | views now I want to rank comments higher than views so I would first think to do comments*3 or something to weight it, however if there is a large view number like 40,000 and only 4 comments then the comments weight gets dropped out. So I'm thinking I have to normalize those scores down to a more equal playing field before I can weight them. Any ideas or pointers to how that's usually done? thanks

Read the article

C/C++ - Convert 24-bit signed integer to float

- by e-t172

I'm programming in C++. I need to convert a 24-bit signed integer (stored in a 3-byte array) to float (normalizing to [-1.0,1.0]). The platform is MSVC++ on x86 (which means the input is little-endian). I tried this: float convert(const unsigned char* src) { int i = src[2]; i = (i << 8) | src[1]; i = (i << 8) | src[0]; const float Q = 2.0 / ((1 << 24) - 1.0); return (i + 0.5) * Q; } I'm not entirely sure, but it seems the results I'm getting from this code are incorrect. So, is my code wrong and if so, why?

Read the article

How to normalize SVG path data (cross browser)?

- by Timo

I have tried to find a way to implement cross browser path normalizer. There IS a native way which is described here and functional example is here, but it works only in newest Opera (but not in IE, FF, Safari, Chrome). The native way uses pathElm.normalizedPathSegList and it converts all relative coordinates to absolute ones and represents all path segment types as a following subset of types: M,L,C,z. I have found only one javascript code and jsfiddled functional example of it, but it works only in IE and FF. Chrome gives "Uncaught Error: INDEX_SIZE_ERR: DOM Exception 1". How this could be fixed to work also in Opera, Safari and Chrome or is there any other way for normalizing SVG paths?

Read the article

How to make a huge ram drive?

- by Brandon Moore

At my old job when a report was needed I could sit down with someone and pull up results and get immediate feedback, and then refine my queries and ultimately have the data we needed, in the format we needed within 30-90 minutes. I just started working for a new company with a database containing millions of records and I spent my whole 8 hours making a report that I feel I could have made in less than 2 hours if it were not for the massive amount of data the queries are working with, and the fact that I couldn't ask the person needing the data to sit down with me and give me feedback as I pulled up results as I am used to. So I am trying to think of how we can make the server faster... much faster, so that I can have the same level of productivity I'm used to. One thought that just came to mind is that memory is so cheap these days, and by my calculations I could buy 10 8gig ram sticks for 1000 bucks. What I have never heard of though is a device that would let me combine these into a huge ram drive. So I'd like to know if any such device exists, and if not what is the largest ram drive I could realistically make and how would I go about doing so? EDIT: To you guys who are saying the database shema needs to be analyzed... you can't make a query such as "Select f1, f2, f3, etc from SomeTable" run any faster by normalizing or indexing the table. What I'm talking about IS ABSOLUTELY a need for improved performance at the hardware level. I am used to having results come back to me in a few seconds, not a few minutes or much less a half an hour. Maybe that's what you guys are used to who have 100 billion record tables and you feel like that's fast, but I'm looking for results back from tables with about 10 million records to come back to me withing less than half a minute TOPS.

Read the article

Windows Phone 8, possible tablets and what the latest update might mean

- by Roger Hart

Microsoft have just announced an update to Windows Phone 8. As one of the five, maybe six people who actually bought a WP8 handset I found this interesting. Then I read the blog post about it, and rushed off to write somewhat less than a thousand words about a single picture. The blog post announces an extra column of tiles on the start screen, and support for higher resolutions. If we ignore all the usual flummery about how this will make your life better, that (and the rotation lock) sounds a little like stage setting for tablets. Looking at the preview screenshot, I started to wonder. What it’s called Phablet_5F00_StartScreenProductivity_5F00_01_5F00_072A1240.jpg Pretty conclusive. If you can brand something a “phablet” and sleep at night you’re made of sterner stuff than I am, but that’s beside the point. It’s explicit in the post that Microsoft are expecting a broader range of form factors for WP8, but they stop short of quite calling out tablet size. The extra columns and resolution definitely back that up, so why stop at a 6 inch “phablet”? Sadly, the string of numbers there don’t really look like a Lumia model number – that would be a bit tendentious even for a speculative blog post about a single screenshot. “Productivity” is interesting too. I get into this a bit more below, but this is a pretty clear pitch for a business device. What it looks like Something that would look quite decent on a 7 inch screen, but something a bit too vertical to go toe-to-toe with the Surface. Certainly, it would look a lot better on a large-factor phone than any of the current models. Those tiles are going to get cramped and a bit ugly if the handsets aren’t getting bigger. What’s on it You have a bunch of missed calls, you rarely text, use a stocks app, and your budget spreadsheet and meeting notes are a thumb-reach away. Outlook is your main form of email. You care enough about LinkedIn to not only install its app but give it a huge live tile. There’s no beating about the bush here, the implicit persona is a corporate exec. With Nokia in the bag and Blackberry pushing daisies, that may not be a stupid play. There’s almost certainly a niche there if they can parlay their corporate credentials into filling it before BYOD (which functionally means an iPhone) reaches the late adopters. The really quite slick WP8 Office implementation ought to help here. This is the face they’ve chosen to present, the cultural milieu they’re normalizing for Windows Phone. It’s an iPhone for Serious Business Grown-ups. Could work, I guess. Does it mean anything? Is the latest WP8 update a sign that we can expect to see tablets running Windows Phone rather than WinRT? Well, WinRT tablets haven’t exactly taken off but I’m not quite going to make a leap like that just from a file name and a column of icons. I feel pretty safe, however, conjecturing that Microsoft would like to squeeze a WP8 “phablet” into the palm of every exec who’s ever grumbled about their Blackberry, and this release might get them a bit closer. If it works well incrementing up to larger devices, then that could be a fair hedge against WinRt crashing and burning any harder in the marketplace.

Read the article

Sample uniformly at random from an n-dimensional unit simplex.

- by dreeves

Sampling uniformly at random from an n-dimensional unit simplex is the fancy way to say that you want n random numbers such that they are all non-negative, they sum to one, and every possible vector of n non-negative numbers that sum to one are equally likely. In the n=2 case you want to sample uniformly from the segment of the line x+y=1 (ie, y=1-x) that is in the positive quadrant. In the n=3 case you're sampling from the triangle-shaped part of the plane x+y+z=1 that is in the positive octant of R3: (Image from http://en.wikipedia.org/wiki/Simplex.) Note that picking n uniform random numbers and then normalizing them so they sum to one does not work. You end up with a bias towards less extreme numbers. Similarly, picking n-1 uniform random numbers and then taking the nth to be one minus the sum of them also introduces bias. Wikipedia gives two algorithms to do this correctly: http://en.wikipedia.org/wiki/Simplex#Random_sampling (Though the second one currently claims to only be correct in practice, not in theory. I'm hoping to clean that up or clarify it when I understand this better. I initially stuck in a "WARNING: such-and-such paper claims the following is wrong" on that Wikipedia page and someone else turned it into the "works only in practice" caveat.) Finally, the question: What do you consider the best implementation of simplex sampling in Mathematica (preferably with empirical confirmation that it's correct)? Related questions http://stackoverflow.com/questions/2171074/generating-a-probability-distribution http://stackoverflow.com/questions/3007975/java-random-percentages

Read the article

What's the recommended implemenation for hashing OLE Variants?

- by Barry Kelly

OLE Variants, as used by older versions of Visual Basic and pervasively in COM Automation, can store lots of different types: basic types like integers and floats, more complicated types like strings and arrays, and all the way up to IDispatch implementations and pointers in the form of ByRef variants. Variants are also weakly typed: they convert the value to another type without warning depending on which operator you apply and what the current types are of the values passed to the operator. For example, comparing two variants, one containing the integer 1 and another containing the string "1", for equality will return True. So assuming that I'm working with variants at the underlying data level (e.g. VARIANT in C++ or TVarData in Delphi - i.e. the big union of different possible values), how should I hash variants consistently so that they obey the right rules? Rules: Variants that hash unequally should compare as unequal, both in sorting and direct equality Variants that compare as equal for both sorting and direct equality should hash as equal It's OK if I have to use different sorting and direct comparison rules in order to make the hashing fit. The way I'm currently working is I'm normalizing the variants to strings (if they fit), and treating them as strings, otherwise I'm working with the variant data as if it was an opaque blob, and hashing and comparing its raw bytes. That has some limitations, of course: numbers 1..10 sort as [1, 10, 2, ... 9] etc. This is mildly annoying, but it is consistent and it is very little work. However, I do wonder if there is an accepted practice for this problem.

Read the article

How to keep text inside a circle using Cairo?

- by miguelrios

I a drawing a graph using Cairo (pycairo specifically) and I need to know how can I draw text inside a circle without overlapping it, by keeping it inside the bounds of the circle. I have this simple code snippet that draws a letter "a" inside the circle: ''' Created on May 8, 2010 @author: mrios ''' import cairo, math WIDTH, HEIGHT = 1000, 1000 #surface = cairo.PDFSurface ("/Users/mrios/Desktop/exampleplaces.pdf", WIDTH, HEIGHT) surface = cairo.ImageSurface (cairo.FORMAT_ARGB32, WIDTH, HEIGHT) ctx = cairo.Context (surface) ctx.scale (WIDTH/1.0, HEIGHT/1.0) # Normalizing the canvas ctx.rectangle(0, 0, 1, 1) # Rectangle(x0, y0, x1, y1) ctx.set_source_rgb(255,255,255) ctx.fill() ctx.arc(0.5, 0.5, .4, 0, 2*math.pi) ctx.set_source_rgb(0,0,0) ctx.set_line_width(0.03) ctx.stroke() ctx.arc(0.5, 0.5, .4, 0, 2*math.pi) ctx.set_source_rgb(0,0,0) ctx.set_line_width(0.01) ctx.set_source_rgb(255,0,255) ctx.fill() ctx.set_source_rgb(0,0,0) ctx.select_font_face("Georgia", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_BOLD) ctx.set_font_size(1.0) x_bearing, y_bearing, width, height = ctx.text_extents("a")[:4] print ctx.text_extents("a")[:4] ctx.move_to(0.5 - width / 2 - x_bearing, 0.5 - height / 2 - y_bearing) ctx.show_text("a") surface.write_to_png ("/Users/mrios/Desktop/node.png") # Output to PNG The problem is that my labels have variable amount of characters (with a limit of 20) and I need to set the size of the font dynamically. It must fit inside the circle, no matter the size of the circle nor the size of the label. Also, every label has one line of text, no spaces, no line breaks. Any suggestion?

Read the article

javascript normalize whitespace and other plain-text formatting routines

- by dreftymac

Background: The language is JavaScript. The goal is to find a library or pre-existing code to do low-level plain-text formatting. I can write it myself, but why re-invent the wheel. The issue is: it is tough to determine if a "wheel" is out there, since any search for JavaScript libraries pulls up an ocean of HTML-centric stuff. I am not interested in HTML necessarily, just text. Example: I need a JavaScript function that changes this: BEFORE: nisi ut aliquip | ex ea commodo consequat duis |aute irure dolor in esse cillum dolore | eu fugiat nulla pariatur |excepteur sint occa in culpa qui | officia deserunt mollit anim id |est laborum ... into this ... AFTER: nisi ut aliquip | ex ea commodo consequat duis | aute irure dolor in esse cillum dolore | eu fugiat nulla pariatur | excepteur sint occa in culpa qui | officia deserunt mollit anim id | est laborum Question: Does it exist, a JavaScript library that is non-html-web-development-centric that has functions for normalizing spaces in delimited plain text, justifying and spacing plain text? Rationale: Investigating JavaScript for use in a programmer's text editor.

Read the article

Could not determine the size of this expression.

- by Søren

Hi, I have resently started to use MATLAB Simulink, and my problem is that i can't implement an AMDF function, because simulink compiler cannot determine the lengths. Simulink errors: |--------------------------------------------------------------------------------------- Could not determine the size of this expression. Function 'Embedded MATLAB Function2' (#38.728.741), line 33, column 32: "1:flength-k+1" Errors occurred during parsing of Embedded MATLAB function 'Embedded MATLAB Function2'(#38) Embedded MATLAB Interface Error: Errors occurred during parsing of Embedded MATLAB function 'Embedded MATLAB Function2'(#38) . |--------------------------------------------------------------------------------------- MY CODE: |--------------------------------------------------------------------------------------- persistent sLength persistent fLength persistent amdf % Length of the frame flength = length(frame); % Pitch period is between 2.5 ms and 19.5 ms for LPC-10 algorithm % This because this algorithm assumes the frequencyspan is 50 and 400 Hz pH = ceil((1/min(fspan))*fs); if(pH flength) pH = flength; end; pL = ceil((1/max(fspan))*fs); if(pL <= 0 || pL = flength) pL = 0; end; sLength = pH - pL; % Normalize the frame frame = frame/max(max(abs(frame))); % Allocating memory for the calculation of the amdf %amdf = zeros(1,sLength); %%%%%%%% amdf = 0; % Calculating the AMDF with unbiased normalizing for k = (pL+1):pH amdf(k-pL) = sum(abs(frame(1:flength-k+1) - frame(k:flength)))/(flength-k+1); end; % Output of the AMDF if(min(amdf) < lvlThr) voiced = 1; else voiced = 0; end; % Output of the minimum of the amdf minAMDF = min(amdf); |---------------------------------------------------------------------------------------- HELP Kind regards Søren

Read the article

Normalize whitespace and other plain-text formatting routines

- by dreftymac

Background: The language is JavaScript. The goal is to find a library or pre-existing code to do low-level plain-text formatting. I can write it myself, but why re-invent the wheel. The issue is: it is tough to determine if a "wheel" is out there, since any search for JavaScript libraries pulls up an ocean of HTML-centric stuff. I am not interested in HTML necessarily, just text. Example: I need a JavaScript function that changes this: BEFORE: nisi ut aliquip | ex ea commodo consequat duis |aute irure dolor in esse cillum dolore | eu fugiat nulla pariatur |excepteur sint occa in culpa qui | officia deserunt mollit anim id |est laborum ... into this ... AFTER: nisi ut aliquip | ex ea commodo consequat duis | aute irure dolor in esse cillum dolore | eu fugiat nulla pariatur | excepteur sint occa in culpa qui | officia deserunt mollit anim id | est laborum Question: Does it exist, a JavaScript library that is non-html-web-development-centric that has functions for normalizing spaces in delimited plain text, justifying and spacing plain text? Rationale: Investigating JavaScript for use in a programmer's text editor.

Read the article

Is there a standard for storing normalized phone numbers in a database?

- by Eric Z Beard

What is a good data structure for storing phone numbers in database fields? I'm looking for something that is flexible enough to handle international numbers, and also something that allows the various parts of the number to be queried efficiently. [Edit] Just to clarify the use case here: I currently store numbers in a single varchar field, and I leave them just as the customer entered them. Then, when the number is needed by code, I normalize it. The problem is that if I want to query a few million rows to find matching phone numbers, it involves a function, like where dbo.f_normalizenum(num1) = dbo.f_normalizenum(num2) which is terribly inefficient. Also queries that are looking for things like the area code become extremely tricky when it's just a single varchar field. [Edit] People have made lots of good suggestions here, thanks! As an update, here is what I'm doing now: I still store numbers exactly as they were entered, in a varchar field, but instead of normalizing things at query time, I have a trigger that does all that work as records are inserted or updated. So I have ints or bigints for any parts that I need to query, and those fields are indexed to make queries run faster.

Read the article

Populating a foreign key table with variable user input

- by Vincent

I'm working on a website that will be based on user contributed data, submitted using a regular HTML form. To simplify my question, let's say that there will be two fields in the form: "User Name" and "Country" (this is just an example, not the actual site). There will be two tables in the database : "countries" and "users," with "users.country_id" being a foreign key to the "countries" table (one-to-many). The initial database will be empty. Users from all over the world will submit their names and the countries they live in and eventually the "countries" table will get filled out with all of the country names in the world. Since one country can have several alternative names, input like Chile, Chili, Chilli will generate 3 different records in the countries table, but in fact there is only one country. When I search for records from Chile, Chili and Chilli will not be included. So my question is - what would be the best way to deal with a situation like this, with conditions such that the initial database is empty, no other resources are available and everything is based on user input? How can I organize it in such way that Chile, Chili and Chilli would be treated as one country, with minimum manual interference. What are the best practices when it comes to normalizing user submitted data and is there a scientific term for this? I'm sure this is a common problem. Again, I used country names just to simplify my question, it can be anything that has possible different spellings.

Read the article

What's the recommended implementation for hashing OLE Variants?

- by Barry Kelly

OLE Variants, as used by older versions of Visual Basic and pervasively in COM Automation, can store lots of different types: basic types like integers and floats, more complicated types like strings and arrays, and all the way up to IDispatch implementations and pointers in the form of ByRef variants. Variants are also weakly typed: they convert the value to another type without warning depending on which operator you apply and what the current types are of the values passed to the operator. For example, comparing two variants, one containing the integer 1 and another containing the string "1", for equality will return True. So assuming that I'm working with variants at the underlying data level (e.g. VARIANT in C++ or TVarData in Delphi - i.e. the big union of different possible values), how should I hash variants consistently so that they obey the right rules? Rules: Variants that hash unequally should compare as unequal, both in sorting and direct equality Variants that compare as equal for both sorting and direct equality should hash as equal It's OK if I have to use different sorting and direct comparison rules in order to make the hashing fit. The way I'm currently working is I'm normalizing the variants to strings (if they fit), and treating them as strings, otherwise I'm working with the variant data as if it was an opaque blob, and hashing and comparing its raw bytes. That has some limitations, of course: numbers 1..10 sort as [1, 10, 2, ... 9] etc. This is mildly annoying, but it is consistent and it is very little work. However, I do wonder if there is an accepted practice for this problem.

Read the article

Database schema for a library

- by ABach

Hi all - I'm designing a library management system for a department at my university and I wanted to enlist your eyes on my proposed schema. This post is primarily concerned with how we store multiple copies of each book; something about what I've designed rubs me the wrong way, and I'm hoping you all can point out better ways to tackle things. For dealing with users checking out books, I've devised three tables: book, customer, and book_copy. The relationships between these tables are as follows: Every book has many book_copies (to avoid duplicating the book's information while storing the fact that we have multiple copies of that book). Every user has many book_copies (the other end of the relationship) The tables themselves are designed like this: ------------------------------------------------ book ------------------------------------------------ + id + title + author + isbn + etc. ------------------------------------------------ ------------------------------------------------ customer ------------------------------------------------ + id + first_name + first_name + email + address + city + state + zip + etc. ------------------------------------------------ ------------------------------------------------ book_copy ------------------------------------------------ + id + book_id (FK to book) + customer_id (FK to customer) + checked_out + due_date + etc. ------------------------------------------------ Something about this seems incorrect (or at least inefficient to me) - the perfectionist in me feels like I'm not normalizing this data correctly. What say ye? Is there a better, more effective way to design this schema? Thanks!

Read the article

Looking for: nosql (redis/mongodb) based event logging for Django

- by Parand

I'm looking for a flexible event logging platform to store both pre-defined (username, ip address) and non-pre-defined (can be generated as needed by any piece of code) events for Django. I'm currently doing some of this with log files, but it ends up requiring various analysis scripts and ends up in a DB anyway, so I'm considering throwing it immediately into a nosql store such as MongoDB or Redis. The idea is to be easily able to query, for example, which ip address the user most commonly comes from, whether the user has ever performed some action, lookup the outcome for a specific event, etc. Is there something that already does this? If not, I'm thinking of this: The "event" is a dictionary attached to the request object. Middleware fills in various pieces (username, ip, sql timing), code fills in the rest as needed. After the request is served a post-request hook drops the event into mongodb/redis, normalizing various fields (eg. incrementing the username:ip address counter) and dropping the rest in as is. Words of wisdom / pointers to code that does some/all of this would be appreciated.

Read the article

Increasing efficiency of N-Body gravity simulation

- by Postman

I'm making a space exploration type game, it will have many planets and other objects that will all have realistic gravity. I currently have a system in place that works, but if the number of planets goes above 70, the FPS decreases an practically exponential rates. I'm making it in C# and XNA. My guess is that I should be able to do gravity calculations between 100 objects without this kind of strain, so clearly my method is not as efficient as it should be. I have two files, Gravity.cs and EntityEngine.cs. Gravity manages JUST the gravity calculations, EntityEngine creates an instance of Gravity and runs it, along with other entity related methods. EntityEngine.cs public void Update() { foreach (KeyValuePair<string, Entity> e in Entities) { e.Value.Update(); } gravity.Update(); } (Only relevant piece of code from EntityEngine, self explanatory. When an instance of Gravity is made in entityEngine, it passes itself (this) into it, so that gravity can have access to entityEngine.Entities (a dictionary of all planet objects)) Gravity.cs namespace ExplorationEngine { public class Gravity { private EntityEngine entityEngine; private Vector2 Force; private Vector2 VecForce; private float distance; private float mult; public Gravity(EntityEngine e) { entityEngine = e; } public void Update() { //First loop foreach (KeyValuePair<string, Entity> e in entityEngine.Entities) { //Reset the force vector Force = new Vector2(); //Second loop foreach (KeyValuePair<string, Entity> e2 in entityEngine.Entities) { //Make sure the second value is not the current value from the first loop if (e2.Value != e.Value ) { //Find the distance between the two objects. Because Fg = G * ((M1 * M2) / r^2), using Vector2.Distance() and then squaring it //is pointless and inefficient because distance uses a sqrt, squaring the result simple cancels that sqrt. distance = Vector2.DistanceSquared(e2.Value.Position, e.Value.Position); //This makes sure that two planets do not attract eachother if they are touching, completely unnecessary when I add collision, //For now it just makes it so that the planets are not glitchy, performance is not significantly improved by removing this IF if (Math.Sqrt(distance) > (e.Value.Texture.Width / 2 + e2.Value.Texture.Width / 2)) { //Calculate the magnitude of Fg (I'm using my own gravitational constant (G) for the sake of time (I know it's 1 at the moment, but I've been changing it) mult = 1.0f * ((e.Value.Mass * e2.Value.Mass) / distance); //Calculate the direction of the force, simply subtracting the positions and normalizing works, this fixes diagonal vectors //from having a larger value, and basically makes VecForce a direction. VecForce = e2.Value.Position - e.Value.Position; VecForce.Normalize(); //Add the vector for each planet in the second loop to a force var. Force = Vector2.Add(Force, VecForce * mult); //I have tried Force += VecForce * mult, and have not noticed much of an increase in speed. } } } //Add that force to the first loop's planet's position (later on I'll instead add to acceleration, to account for inertia) e.Value.Position += Force; } } } } I have used various tips (about gravity optimizing, not threading) from THIS question (that I made yesterday). I've made this gravity method (Gravity.Update) as efficient as I know how to make it. This O(N^2) algorithm still seems to be eating up all of my CPU power though. Here is a LINK (google drive, go to File download, keep .Exe with the content folder, you will need XNA Framework 4.0 Redist. if you don't already have it) to the current version of my game. Left click makes a planet, right click removes the last planet. Mouse moves the camera, scroll wheel zooms in and out. Watch the FPS and Planet Count to see what I mean about performance issues past 70 planets. (ALL 70 planets must be moving, I've had 100 stationary planets and only 5 or so moving ones while still having 300 fps, the issue arises when 70+ are moving around) After 70 planets are made, performance tanks exponentially. With < 70 planets, I get 330 fps (I have it capped at 300). At 90 planets, the FPS is about 2, more than that and it sticks around at 0 FPS. Strangely enough, when all planets are stationary, the FPS climbs back up to around 300, but as soon as something moves, it goes right back down to what it was, I have no systems in place to make this happen, it just does. I considered multithreading, but that previous question I asked taught me a thing or two, and I see now that that's not a viable option. I've also thought maybe I could do the calculations on my GPU instead, though I don't think it should be necessary. I also do not know how to do this, it is not a simple concept and I want to avoid it unless someone knows a really noob friendly simple way to do it that will work for an n-body gravity calculation. (I have an NVidia gtx 660) Lastly I've considered using a quadtree type system. (Barnes Hut simulation) I've been told (in the previous question) that this is a good method that is commonly used, and it seems logical and straightforward, however the implementation is way over my head and I haven't found a good tutorial for C# yet that explains it in a way I can understand, or uses code I can eventually figure out. So my question is this: How can I make my gravity method more efficient, allowing me to use more than 100 objects (I can render 1000 planets with constant 300+ FPS without gravity calculations), and if I can't do much to improve performance (including some kind of quadtree system), could I use my GPU to do the calculations?

Read the article

Can MySQL reasonably perform queries on billions of rows?

- by haxney

I am planning on storing scans from a mass spectrometer in a MySQL database and would like to know whether storing and analyzing this amount of data is remotely feasible. I know performance varies wildly depending on the environment, but I'm looking for the rough order of magnitude: will queries take 5 days or 5 milliseconds? Input format Each input file contains a single run of the spectrometer; each run is comprised of a set of scans, and each scan has an ordered array of datapoints. There is a bit of metadata, but the majority of the file is comprised of arrays 32- or 64-bit ints or floats. Host system |----------------+-------------------------------| | OS | Windows 2008 64-bit | | MySQL version | 5.5.24 (x86_64) | | CPU | 2x Xeon E5420 (8 cores total) | | RAM | 8GB | | SSD filesystem | 500 GiB | | HDD RAID | 12 TiB | |----------------+-------------------------------| There are some other services running on the server using negligible processor time. File statistics |------------------+--------------| | number of files | ~16,000 | | total size | 1.3 TiB | | min size | 0 bytes | | max size | 12 GiB | | mean | 800 MiB | | median | 500 MiB | | total datapoints | ~200 billion | |------------------+--------------| The total number of datapoints is a very rough estimate. Proposed schema I'm planning on doing things "right" (i.e. normalizing the data like crazy) and so would have a runs table, a spectra table with a foreign key to runs, and a datapoints table with a foreign key to spectra. The 200 Billion datapoint question I am going to be analyzing across multiple spectra and possibly even multiple runs, resulting in queries which could touch millions of rows. Assuming I index everything properly (which is a topic for another question) and am not trying to shuffle hundreds of MiB across the network, is it remotely plausible for MySQL to handle this? UPDATE: additional info The scan data will be coming from files in the XML-based mzML format. The meat of this format is in the <binaryDataArrayList> elements where the data is stored. Each scan produces = 2 <binaryDataArray> elements which, taken together, form a 2-dimensional (or more) array of the form [[123.456, 234.567, ...], ...]. These data are write-once, so update performance and transaction safety are not concerns. My naïve plan for a database schema is: runs table | column name | type | |-------------+-------------| | id | PRIMARY KEY | | start_time | TIMESTAMP | | name | VARCHAR | |-------------+-------------| spectra table | column name | type | |----------------+-------------| | id | PRIMARY KEY | | name | VARCHAR | | index | INT | | spectrum_type | INT | | representation | INT | | run_id | FOREIGN KEY | |----------------+-------------| datapoints table | column name | type | |-------------+-------------| | id | PRIMARY KEY | | spectrum_id | FOREIGN KEY | | mz | DOUBLE | | num_counts | DOUBLE | | index | INT | |-------------+-------------| Is this reasonable?

Read the article

Mobile BI Comes of Age

- by rich.clayton(at)oracle.com

Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} One of the hot topics in the Business Intelligence industry is mobility. More specifically the question is how business can be transformed by the iPhone and the iPad. In June 2003, Gartner predicted that Mobile BI would be obsolete and that the technology was headed for the 'trough of disillusionment'. I agreed with them at that time. Many vendors like MicroStrategy and Business Objects jumped into the fray attempting to show how PDA's like Palm Pilots could be integrated with BI. Their investments resulted in interesting demos with no commercial traction. Why, because wireless networks and mobile operating systems were primitive, immature and slow. In my opinion, Apple's iOS has changed everything in Mobile BI. Yes Blackberry, Android and Symbian and all the rest have their place in the market but I believe that increasingly consumers (not IT departments) influence BI decision making processes. Consumers are choosing the iPhone and the iPad. The number of iPads I see in business meetings now is staggering. Some use it for email and note taking and others are starting to use corporate applications. The possibilities for Mobile BI are countless and I would expect to see iPads enterprise-wide over the next few years. These new devices will provide just-in-time access to critical business information. Front-line managers interacting with customers, suppliers, patients or citizens will have information literally at their fingertips. I've experimented with several mobile BI tools. They look cool but like their Executive Information System (EIS) predecessors of the 1990's these tools lack a backbone and a plausible integration strategy. EIS was a viral technology in the early 1990's. Executives from every industry and job function were showcasing their dashboards to fellow co-workers and colleagues at the country club. Just like the iPad, every senior manager wanted one. EIS wasn't a device however, it was a software application. EIS quickly faded into the software sunset as it lacked integration with corporate information systems. BI servers replaced EIS because the technology focused on the heavy data lifting of integrating, normalizing, aggregating and managing large, complex data volumes. The devices are here to stay. The cute stand-alone mobile BI tools, not so much. If all you're looking to do is put Excel files on your iPad, there are plenty of free tools on the market. You'll look cool at your next management meeting but after a few weeks, the cool factor will fade away and you'll be wondering how you will ever maintain it. If however you want secure, consistent, reliable information on your iPad, you need an integration strategy and a way to model the data. BI Server technologies like the Oracle BI Foundation is a market leading approach to tackle that issue. I liken the BI mobility frenzy to buying classic cars. Classic Cars have two buying groups - teenagers and middle-age folks looking to tinker. Teenagers look at the pin-stripes and the paint job while middle-agers (like me) kick the tires a bit and look under the hood to check out the quality and reliability of the engine. Mobile BI tools sure look sexy but don't go very far without an engine and a transmission or an integration strategy. The strategic question in Mobile BI is can these startups build a motor and transmission faster than Oracle can re-paint the car? Oracle has a great engine and a transmission that connects to all enterprise information assets. We're working on the new paint job and are excited about the possibilities. Just as vertical integration worked in the automotive business, it too works in the technology industry.

Search Results

Search found 53 results on 3 pages for 'normalizing'.

Page 2/3 | < Previous Page | 1 2 3 | Next Page >

- by whatnick

- by Morten Siebuhr

- by david.butler(at)oracle.com

- by Sampo

- by Hossein

- by Karan Bhangui

- by beagleguy

- by e-t172

- by Timo

- by Brandon Moore

- by Roger Hart

- by dreeves

- by Barry Kelly

- by miguelrios

- by dreftymac

- by Søren

- by dreftymac

- by Eric Z Beard

- by Vincent

- by Barry Kelly

- by ABach

- by Parand

- by Postman

- by haxney

- by rich.clayton(at)oracle.com

< Previous Page | 1 2 3 | Next Page >