Search Results

Search found 7490 results on 300 pages for 'algorithm analysis'.

Page 138/300 | < Previous Page | 134 135 136 137 138 139 140 141 142 143 144 145  | Next Page >

  • Is it illegal to rewrite every line of an open source project in a slightly different way, and use it in a closed source project?

    - by Chris Barry
    There is some code which is GPL or LGPL that I am considering using for an iPhone project. If I took that code (JavaScript) and rewrote it in a different language for use on the iPhone would that be a legal issue? In theory the process that has happened is that I have gone through each line of the project, learnt what it is doing, and then reimplemented the ideas in a new language. To me it seems this is like learning how to implement something, but then reimplementing it separately from the original licence. Therefore you have only copied the algorithm, which arguably you could have learnt from somewhere else other than the original project. Does the licence cover the specific implementation or the algorithm as well? EDIT------ Really glad to see this topic create a good conversation. To give a bit more backing to the project, the code involved does some kind of audio analysis. I believe it is non-trivial to learn or implement, although I was prepared to embark on this task (I'm at the level where I can implement an FFT algorithm, and this was going to go beyond that.) It is a fairly low LOC script, so I didn't think it would be too hard to do a straight port. I really like the idea of rereleasing my port as well as using it in the application. I don't see any problem with that, and it would be a great way to give something back to the community. I was going to add a line about not wanting to discuss the moral issues, but I'm quite glad I didn't as it seems to have fired the debate a bit. I still feel a bit odd about using open source code to learn from. Does this mean that anything one learns from an open source project is not allowed to be used in a closed source project? And how long after or different does an implementation have to be to not be considered violation of the licence? Murky! EDIT 2 -------- Follow up question

    Read the article

  • How to find the average color of an image.

    - by Edward Boyle
    Years ago I was the lead developer of a large Scrapbook Web Site. One of the things I implemented was to allow shoppers to find Scrapbook papers and embellishments of like colors (“more like this color”). Below is the base algorithm I wrote to extract the color from an image. It worked out pretty well. I took the returned values and stored them in an associated table for the products. Yet another algorithm was used to SELECT near matches. This algorithm has turned out to be very handy for me. I have used it for borders and subtle outlined text overlays. I am sure you will find more creative uses for it. Enjoy… private Color GetColor(Bitmap bmp) { int r = 0; int g = 0; int b = 0; Color mColor = System.Drawing.Color.White; for (int i = 1; i < bmp.Width; i++) { for (int x = 1; x < bmp.Height; x++) { mColor = bmp.GetPixel(i, x); r += mColor.R; g += mColor.G; b += mColor.B; } } r = (r / (bmp.Height * bmp.Width)); g = (g / (bmp.Height * bmp.Width)); b = (b / (bmp.Height * bmp.Width)); return System.Drawing.Color.FromArgb(r, g, b); } You could also get the RGB values by passing in the RGB by ref private Color GetColor(ref int r, ref int g, ref int b, Bitmap bmp) but that is a bit much as you can simply get it from the return value: mReturnedColor.R; mReturnedColor.G; mReturnedColor.B;

    Read the article

  • Continuous Physics Engine's Collision Detection Techniques

    - by Griffin
    I'm working on a purely continuous physics engine, and I need to choose algorithms for broad and narrow phase collision detection. "Purely continuous" means I never do intersection tests, but instead want to find ways to catch every collision before it happens, and put each into "planned collisions" stack that is ordered by TOI. Broad Phase The only continuous broad-phase method I can think of is encasing each body in a circle and testing if each circle will ever overlap another. This seems horribly inefficient however, and lacks any culling. I have no idea what continuous analogs might exist for today's discrete collision culling methods such as quad-trees either. How might I go about preventing inappropriate and pointless broad test's such as a discrete engine does? Narrow Phase I've managed to adapt the narrow SAT to a continuous check rather than discrete, but I'm sure there's other better algorithms out there in papers or sites you guys might have come across. What various fast or accurate algorithm's do you suggest I use and what are the advantages / disatvantages of each? Final Note: I say techniques and not algorithms because I have not yet decided on how I will store different polygons which might be concave, convex, round, or even have holes. I plan to make a decision on this based on what the algorithm requires (for instance if I choose an algorithm that breaks down a polygon into triangles or convex shapes I will simply store the polygon data in this form).

    Read the article

  • How to practice object oriented programming?

    - by user1620696
    I've always programmed in procedural languages and currently I'm moving towards object orientation. The main problem I've faced is that I can't see a way to practice object orientation in an effective way. I'll explain my point. When I've learned PHP and C it was pretty easy to practice: it was just matter of choosing something and thinking about an algorithm for that thing. In PHP for example, it was matter os sitting down and thinking: "well, just to practice, let me build one application with an administration area where people can add products". This was pretty easy, it was matter of thinking of an algorithm to register some user, to login the user, and to add the products. Combining these with PHP features, it was a good way to practice. Now, in object orientation we have lots of additional things. It's not just a matter of thinking about an algorithm, but analysing requirements deeper, writing use cases, figuring out class diagrams, properties and methods, setting up dependency injection and lots of things. The main point is that in the way I've been learning object orientation it seems that a good design is crucial, while in procedural languages one vague idea was enough. I'm not saying that in procedural languages we can write good software without design, just that for sake of practicing it is feasible, while in object orientation it seems not feasible to go without a good design, even for practicing. This seems to be a problem, because if each time I'm going to practice I need to figure out tons of requirements, use cases and so on, it seems to become not a good way to become better at object orientation, because this requires me to have one whole idea for an app everytime I'm going to practice. Because of that, what's a good way to practice object orientation?

    Read the article

  • In the days of modern computing, in 'typical business apps' - why does performance matter?

    - by Prog
    This may seem like an odd question to some of you. I'm a hobbyist Java programmer. I have developed several games, an AI program that creates music, another program for painting, and similar stuff. This is to tell you that I have an experience in programming, but not in professional development of business applications. I see a lot of talk on this site about performance. People often debate what would be the most efficient algorithm in C# to perform a task, or why Python is slow and Java is faster, etc. What I'm trying to understand is: why does this matter? There are specific areas of computing where I see why performance matters: games, where tens of thousands of computations are happening every second in a constant-update loop, or low level systems which other programs rely on, such as OSs and VMs, etc. But for the normal, typical high-level business app, why does performance matter? I can understand why it used to matter, decades ago. Computers were much slower and had much less memory, so you had to think carefully about these things. But today, we have so much memory to spare and computers are so fast: does it actually matter if a particular Java algorithm is O(n^2)? Will it actually make a difference for the end users of this typical business app? When you press a GUI button in a typical business app, and behind the scenes it invokes an O(n^2) algorithm, in these days of modern computing - do you actually feel the inefficiency? My question is split in two: In practice, today does performance matter in a typical normal business program? If it does, please give me real-world examples of places in such an application, where performance and optimizations are important.

    Read the article

  • String patterns that can be used to filter and group files

    - by Louis Rhys
    One of our application filters files in certain directory, extract some data from it and export a document from the extracted data. The algorithm for extracting the data depends on the file, and so far we use regex to select the algorithm to be used, for example .*\.txt will be processed by algorithm A, foo[0-5]\.xml will be processed by algo B, etc. However now we need some files to be processed together. For example, in one case we need two files, foo.*\.xml and bar.*\.xml. Part of the information to be extracted exist in the foo file, and the other part in the bar file. Moreover, we need to make sure the wild card is compatible. For example, if there are 6 files foo1.xml foo23.xml bar1.xml bar9.xml bar23.xml foo4.xml I would expect foo1 and bar1 to be identified as a group, and foo23 and bar23 as another group. bar9 and foo4 has no pair, so they will not be treated. Now, since the filter is configured by user, we need to have a pattern that can express the above requirement. I don't think you can express meaning like above in standard regex. (foo|bar).*\.xml will match all 6 file above and we can't identify which file is paired for a particular file. Is there any standard pattern that can express it? Or any idea how to modify regex to support this, that can be implemented easily?

    Read the article

  • Automatic Appointment Conflict Resolution

    - by Thomas
    I'm trying to figure out an algorithm for resolving appointment times. I currently have a naive algorithm that pushes down conflicting appointments repeatedly, until there are no more appointments. # The appointment list is always sorted on start time appointment_list = [ <Appointment: 10:00 -> 12:00>, <Appointment: 11:00 -> 12:30>, <Appointment: 13:00 -> 14:00>, <Appointment: 13:30 -> 14:30>, ] Constraints are that appointments: cannot be after 15:00 cannot be before 9:00 This is the naive algorithm for i, app in enumerate(appointment_list): for possible_conflict in appointment_list[i+1:]: if possible_conflict.start < app.end: difference = app.end - possible_conflict.start possible_conflict.end += difference possible_conflict.start += difference else: break This results in the following resolution, which obviously breaks those constraints, and the last appointment will have to be pushed to the following day. appointment_list = [ <Appointment: 10:00 -> 12:00>, <Appointment: 12:00 -> 13:30>, <Appointment: 13:30 -> 14:30>, <Appointment: 14:30 -> 15:30>, ] Obviously this is sub-optimal, It performs 3 appointment moves when the confict could have been resolved with one: if we were able to push the first appointment backwards, we could avoid moving all the subsequent appointments down. I'm thinking that there should be a sort of edit-distance approach that would calculate the least number of appointments that should be moved in order to resolve the scheduling conflict, but I can't get the a handle on the methodology. Should it be breadth-first or depth first solution search. When do I know if the solution is "good enough"?

    Read the article

  • Obstacle Avoidance steering behavior: how can an entity avoid an obstacle while other forces are acting on the entity?

    - by Prog
    I'm trying to implement the Obstacle Avoidance steering behavior in my 2D game. Currently my approach is to apply a force on the entity, in the direction of the normal of the heading, scaled by a number that gets bigger the closer we are to the obstacle. This is supposed to push the entity to the side and avoid the obstacle that blocks it's way. However, in the same time that my entity tries to avoid an obstacle, it Seeks to a point more or less behind the obstacle (which is the reason it needs to avoid the obstacle in the first place). The Seek algorithm constantly applies a force on the entity that pushes it (more or less) in the direction of the obstacle, while the Obstacle Avoidance algorithm constantly applies a force that pushes the entity away (more accurately, to the side) of the obstacle. The result is that sometimes the entity succesfully avoids the obstacle, and sometimes it collides with it, depending on the strength of the avoidance force I'm applying. How can I make sure that a force will succeed in steering the entity in some direction, while other forces are currently acting on the entity? (And while still looking natural). I can't allow entities to collide with obstacles when realistically they should be able to easily avoid them, doesn't matter what they're currently doing. Also, the Obstacle Avoidance algorithm is made exactly for the case where another force is acting on the entity. Otherwise it wouldn't be moving and there would be no need to avoid anything. So maybe I'm missing something. Thanks

    Read the article

  • Finding header files

    - by rwallace
    A C or C++ compiler looks for header files using a strict set of rules: relative to the directory of the including file (if "" was used), then along the specified and default include paths, fail if still not found. An ancillary tool such as a code analyzer (which I'm currently working on) has different requirements: it may for a number of reasons not have the benefit of the setup performed by a complex build process, and have to make the best of what it is given. In other words, it may find a header file not present in the include paths it knows, and have to take its best shot at finding the file itself. I'm currently thinking of using the following algorithm: Start in the directory of the including file. Is the header file found in the current directory or any subdirectory thereof? If so, done. If we are at the root directory, the file doesn't seem to be present on this machine, so skip it. Otherwise move to the parent of the current directory and go to step 2. Is this the best algorithm to use? In particular, does anyone know of any case where a different algorithm would work better?

    Read the article

  • Mechanics of reasoning during programming interviews

    - by user129506
    This is not the usual "I don't want to write code during an interview", in this question the assumption is that I need to write code during an interview (think about the level of rewriting the quicksort or mergesort from scratch) I know how the algorithm work or I have a basic idea of how I should start working from there, i.e. I don't remember the algorithm by heart I noticed that even on a whiteboard, I always end up writing bugged code or code that doesn't compile. If there's a typo, whatever I usually live with that.. but when there's a crash due to some uncaught particular case I end up losing confidence in my skills. I realize that perhaps interviewers might want to look at how I write code and/or how I solve problems rather than proof-compiling my whiteboard code, but I'd like to ask how should I approach the above problem in mental terms, i.e. what mental steps should I follow when writing code for an interview with the two bullet points above. There must be a unique and agreed series of steps I should follow to avoid getting stuck/caught into particular exception cases (limit cases) that might end up wasting my time and my energies rather than focusing on the overall algorithm for the general case. I hope I made my point clear

    Read the article

  • Approach to Authenticate Clients to TCP Server

    - by dab
    I'm writing a Server/Client application where clients will connect to the server. What I want to do, is make sure that the client connecting to the server is actually using my protocol and I can "trust" the data being sent from the client to the server. What I thought about doing is creating a sort of hash on the client's machine that follows a particular algorithm. What I did in a previous version was took their IP address, the client version, and a few other attributes of the client and sent it as a calculated hash to the server, who then took their IP, and the version of the protocol the client claimed to be using, and calculated that number to see if they matched. This works ok until you get clients that connect from within a router environment where their internal IP is different from their external IP. My fix for this was to pass the client's internal IP used to calculate this hash with the authentication protocol. My fear is this approach is not secure enough. Since I'm passing the data used to create the "auth hash". Here's an example of what I'm talking about: Client IP: 192.168.1.10, Version: 2.4.5.2 hash = 2*4*5*1 * (1+9+2) * (1+6+8) * (1) * (1+0) Client Connects to Server client sends: auth hash ip version Server calculates that info, and accepts or denies the hash. Before I go and come up with another algorithm to prove a client can provide data a server (or use this existing algorithm), I was wondering if there are any existing, proven, and secure systems out there for generating a hash that both sides can generate with general knowledge. The server won't know about the client until the very first connection is established. The protocol's intent is to manage a network of clients who will be contributing data to the server periodically. New clients will be added simply by connecting the client to the server and "registering" with the server. So a client connects to the server for the first time, and registers their info (mac address or some other kind of unique computer identifier), then when they connect again, the server will recognize that client as a previous person and associate them with their data in the database.

    Read the article

  • How do I do high quality scaling of a image?

    - by pbhogan
    I'm writing some code to scale a 32 bit RGBA image in C/C++. I have written a few attempts that have been somewhat successful, but they're slow and most importantly the quality of the sized image is not acceptable. I compared the same image scaled by OpenGL (i.e. my video card) and my routine and it's miles apart in quality. I've Google Code Searched, scoured source trees of anything I thought would shed some light (SDL, Allegro, wxWidgets, CxImage, GD, ImageMagick, etc.) but usually their code is either convoluted and scattered all over the place or riddled with assembler and little or no comments. I've also read multiple articles on Wikipedia and elsewhere, and I'm just not finding a clear explanation of what I need. I understand the basic concepts of interpolation and sampling, but I'm struggling to get the algorithm right. I do NOT want to rely on an external library for one routine and have to convert to their image format and back. Besides, I'd like to know how to do it myself anyway. :) I have seen a similar question asked on stack overflow before, but it wasn't really answered in this way, but I'm hoping there's someone out there who can help nudge me in the right direction. Maybe point me to some articles or pseudo code... anything to help me learn and do. Here's what I'm looking for: 1. No assembler (I'm writing very portable code for multiple processor types). 2. No dependencies on external libraries. 3. I am primarily concerned with scaling DOWN, but will also need to write a scale up routine later. 4. Quality of the result and clarity of the algorithm is most important (I can optimize it later). My routine essentially takes the following form: DrawScaled( uint32 *src, uint32 *dst, src_x, src_y, src_w, src_h, dst_x, dst_y, dst_w, dst_h ); Thanks! UPDATE: To clarify, I need something more advanced than a box resample for downscaling which blurs the image too much. I suspect what I want is some kind of bicubic (or other) filter that is somewhat the reverse to a bicubic upscaling algorithm (i.e. each destination pixel is computed from all contributing source pixels combined with a weighting algorithm that keeps things sharp. EXAMPLE: Here's an example of what I'm getting from the wxWidgets BoxResample algorithm vs. what I want on a 256x256 bitmap scaled to 55x55. And finally: the original 256x256 image

    Read the article

  • How to fix “The requested service, ‘net.pipe://localhost/SecurityTokenServiceApplication/appsts.svc’ could not be activated.”

    - by ybbest
    Problem: When I try to publish a SharePoint2013 workflow, I received the error: The requested service, ‘net.pipe://localhost/SecurityTokenServiceApplication/appsts.svc’ could not be activated. After that, my workflow stopped working and every time I start a work I receive the following error message: System.ApplicationException: PreconditionFailed ---> System.ApplicationException: Error in the application. --- End of inner exception stack trace --- at System.Activities.Statements.Throw.Execute(CodeActivityContext context) at System.Activities.CodeActivity.InternalExecute(ActivityInstance instance, ActivityExecutor executor, BookmarkManager bookmarkManager) at System.Activities.Runtime.ActivityExecutor.ExecuteActivityWorkItem.ExecuteBody(ActivityExecutor executor, BookmarkManager bookmarkManager, Location resultLocation) Analysis: After analysis, I found the error by visiting the http://localhost:32843/SecurityTokenServiceApplication/securitytoken.svc and the error I got on the message is                                                                                                                                              Solution: The solution is basically getting more memory to the server. For development environment, you can restart your noderunner.exe or some other services to release some memories. To verify you have enough memory    you can browse to http://localhost:32843/SecurityTokenServiceApplication/securitytoken.svc , it should return the information below. Then you can republish your workflow and it will work like a charm.

    Read the article

  • Migration from Exchange to BPOS - Microsoft Assessment and Planning (MAP) Toolkit Link

    - by Harish Pavithran
    The Microsoft Assessment and Planning (MAP) Toolkit is an agentless toolkit that finds computers on a network and performs a detailed inventory of the computers using Windows Management Instrumentation (WMI) and the Remote Registry Service. The data and analysis provided by this toolkit can significantly simplify the planning process for migrating to Windows® 7, Windows Vista®, Microsoft Office 2007, Windows Server® 2008 R2, Windows Server 2008, Hyper-V, Microsoft Application Virtualization, Microsoft SQL Server 2008, and Forefront® Client Security and Network Access Protection. Assessments for Windows Server 2008 R2, Windows Server 2008, Windows 7, and Windows Vista include device driver availability as well as recommendations for hardware upgrades. If you are interested in server virtualization planning, MAP provides the ability to gather performance metrics from computers you are considering for virtualization and a feature to model a library of potential host hardware and storage configurations. This information can be used to quickly perform "what-if" analysis using Hyper-V and Microsoft Virtual Server 2005 R2 as virtualization platforms. http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=67240b76-3148-4e49-943d-4d9ea7f77730

    Read the article

  • Trace File Source Adapter

    The Trace File Source adapter is a useful addition to your SSIS toolbox.  It allows you to read 2005 and 2008 profiler traces stored as .trc files and read them into the Data Flow.  From there you can perform filtering and analysis using the power of SSIS. There is no need for a SQL Server connection this just uses the trace file. Example Usages Cache warming for SQL Server Analysis Services Reading the flight recorder Find out the longest running queries on a server Analyze statements for CPU, memory by user or some other criteria you choose Properties The Trace File Source adapter has two properties, both of which combine to control the source trace file that is read at runtime. SQL Server 2005 and SQL Server 2008 trace files are supported for both the Database Engine (SQL Server) and Analysis Services. The properties are managed by the Editor form or can be set directly from the Properties Grid in Visual Studio. Property Type Description AccessMode Enumeration This property determines how the Filename property is interpreted. The values available are: DirectInput Variable Filename String This property holds the path for trace file to load (*.trc). The value is either a full path, or the name of a variable which contains the full path to the trace file, depending on the AccessMode property. Trace Column Definition Hopefully the majority of you can skip this section entirely, but if you encounter some problems processing a trace file this may explain it and allow you to fix the problem. The component is built upon the trace management API provided by Microsoft. Unfortunately API methods that expose the schema of a trace file have known issues and are unreliable, put simply the data often differs from what was specified. To overcome these limitations the component uses  some simple XML files. These files enable the trace column data types and sizing attributes to be overridden. For example SQL Server Profiler or TMO generated structures define EventClass as an integer, but the real value is a string. TraceDataColumnsSQL.xml  - SQL Server Database Engine Trace Columns TraceDataColumnsAS.xml    - SQL Server Analysis Services Trace Columns The files can be found in the %ProgramFiles%\Microsoft SQL Server\100\DTS\PipelineComponents folder, e.g. "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsSQL.xml" "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsAS.xml" If at runtime the component encounters a type conversion or sizing error it is most likely due to a discrepancy between the column definition as reported by the API and the actual value encountered. Whilst most common issues have already been fixed through these files we have implemented specific exception traps to direct you to the files to enable you to fix any further issues due to different usage or data scenarios that we have not tested. An example error that you can fix through these files is shown below. Buffer exception writing value to column 'Column Name'. The string value is 999 characters in length, the column is only 111. Columns can be overridden by the TraceDataColumns XML files in "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsAS.xml". Installation The component is provided as an MSI file which you can download and run to install it. This simply places the files on disk in the correct locations and also installs the assemblies in the Global Assembly Cache as per Microsoft’s recommendations. You may need to restart the SQL Server Integration Services service, as this caches information about what components are installed, as well as restarting any open instances of Business Intelligence Development Studio (BIDS) / Visual Studio that you may be using to build your SSIS packages. Finally you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the Trace File Source transformation in the Choose Toolbox Items window. This process has been described in detail in the related FAQ entry for How do I install a task or transform component? We recommend you follow best practice and apply the current Microsoft SQL Server Service pack to your SQL Server servers and workstations. Please note that the Microsoft Trace classes used in the component are not supported on 64-bit platforms. To use the Trace File Source on a 64-bit host you need to ensure you have the 32-bit (x86) tools available, and the way you execute your package is setup to use them, please see the help topic 64-bit Considerations for Integration Services for more details. Downloads Trace Sources for SQL Server 2005 -- Trace Sources for SQL Server 2008 Version History SQL Server 2008 Version 2.0.0.382 - SQL Sever 2008 public release. (9 Apr 2009) SQL Server 2005 Version 1.0.0.321 - SQL Server 2005 public release. (18 Nov 2008) -- Screenshots

    Read the article

  • How to fix “The requested service, ‘net.pipe://localhost/SecurityTokenServiceApplication/appsts.svc’ could not be activated.”

    - by ybbest
    Problem: When I try to publish a SharePoint2013 workflow, I received the error: The requested service, ‘net.pipe://localhost/SecurityTokenServiceApplication/appsts.svc’ could not be activated. After that, my workflow stopped working and every time I start a work I receive the following error message: System.ApplicationException: PreconditionFailed ---> System.ApplicationException: Error in the application. --- End of inner exception stack trace --- at System.Activities.Statements.Throw.Execute(CodeActivityContext context) at System.Activities.CodeActivity.InternalExecute(ActivityInstance instance, ActivityExecutor executor, BookmarkManager bookmarkManager) at System.Activities.Runtime.ActivityExecutor.ExecuteActivityWorkItem.ExecuteBody(ActivityExecutor executor, BookmarkManager bookmarkManager, Location resultLocation) Analysis: After analysis, I found the error by visiting the http://localhost:32843/SecurityTokenServiceApplication/securitytoken.svc and the error I got on the message is                                                                                                                                              Solution: The solution is basically getting more memory to the server. For development environment, you can restart your noderunner.exe or some other services to release some memories. To verify you have enough memory    you can browse to http://localhost:32843/SecurityTokenServiceApplication/securitytoken.svc , it should return the information below. Then you can republish your workflow and it will work like a charm.

    Read the article

  • To sample or not to sample...

    - by [email protected]
    Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and, in most cases, is impractical if not impossible. Consider asking every person in a region for their candidate preference, testing every person with flu symptoms for H1N1 (assuming every person reported when they had flu symptoms), or destructively testing widgets to determine if they are "good" (leaving no product to sell). Knowing exact answers, fortunately, isn't necessary or even useful in many situations. Understanding the direction of a trend or statistically significant results may be sufficient to answer the underlying question: who is likely to win the election, have we likely reached a critical threshold for flu, or is this batch of widgets good enough to ship? Statistics help us to answer these questions with a certain degree of confidence. This focuses on how we collect data. In data mining, we focus on the use of data, that is data that has already been collected. In some cases, we may have all the data (all purchases made by all customers), in others the data may have been collected using sampling (voters, their demographics and candidate choice). Building data mining models on all of your data can be expensive in terms of time and hardware resources. Consider a company with 40 million customers. Do we need to mine all 40 million customers to get useful data mining models? The quality of models built on all data may be no better than models built on a relatively small sample. Determining how much is a reasonable amount of data involves experimentation. When starting the model building process on large datasets, it is often more efficient to begin with a small sample, perhaps 1000 - 10,000 cases (records) depending on the algorithm, source data, and hardware. This allows you to see quickly what issues might arise with choice of algorithm, algorithm settings, data quality, and need for further data preparation. Instead of waiting for a model on a large dataset to build only to find that the results don't meet expectations, once you are satisfied with the results on the initial sample, you can  take a larger sample to see if model quality improves, and to get a sense of how the algorithm scales to the particular dataset. If model accuracy or quality continues to improve, consider increasing the sample size. Sampling in data mining is also used to produce a held-aside or test dataset for assessing classification and regression model accuracy. Here, we reserve some of the build data (data that includes known target values) to be used for an honest estimate of model error using data the model has not seen before. This sampling transformation is often called a split because the build data is split into two randomly selected sets, often with 60% of the records being used for model building and 40% for testing. Sampling must be performed with care, as it can adversely affect model quality and usability. Even a truly random sample doesn't guarantee that all values are represented in a given attribute. This is particularly troublesome when the attribute with omitted values is the target. A predictive model that has not seen any examples for a particular target value can never predict that target value! For other attributes, values may consist of a single value (a constant attribute) or all unique values (an identifier attribute), each of which may be excluded during mining. Values from categorical predictor attributes that didn't appear in the training data are not used when testing or scoring datasets. In subsequent posts, we'll talk about three sampling techniques using Oracle Database: simple random sampling without replacement, stratified sampling, and simple random sampling with replacement.

    Read the article

  • GDD-BR 2010 [2F] Storage, Bigquery and Prediction APIs

    GDD-BR 2010 [2F] Storage, Bigquery and Prediction APIs Speaker: Patrick Chanezon Track: Cloud Computing Time slot: F [15:30 - 16:15] Room: 2 Level: 101 Google is expanding our storage products by introducing Google Storage for Developers. It offers a RESTful API for storing and accessing data at Google. Developers can take advantage of the performance and reliability of Google's storage infrastructure, as well as the advanced security and sharing capabilities. We will demonstrate key functionality of the product as well as customer use cases. Google relies heavily on data analysis and has developed many tools to understand large datasets. Two of these tools are now available on a limited sign-up basis to developers: (1) BigQuery: interactive analysis of very large data sets and (2) Prediction API: make informed predictions from your data. We will demonstrate their use and give instructions on how to get access. From: GoogleDevelopers Views: 1 0 ratings Time: 39:27 More in Science & Technology

    Read the article

  • New e learning course on Business Intelligence

    - by simonsabin
    I just got this from fello SQL MVP Chris Testa O'Neil   "I am pleased to announce the release of the Author Model eCourseCollection 6233 AE: Implementing and Maintaining Business Intelligence in Microsoft® SQL Server® 2008: Integration Services, Reporting Services and Analysis Services This 24-hour collection provides you with the skills and knowledge required for implementing and maintaining business intelligence solutions on SQL Server 2008. You will learn about the SQL Server technologies, such as Integration Services, Analysis Services, and Reporting Services. This collection also helps students to prepare for Exam 70-448 and can be accessed from: http://www.microsoft.com/learning/elearning/course/6233.mspx   

    Read the article

  • Trace File Source Adapter

    The Trace File Source adapter is a useful addition to your SSIS toolbox.  It allows you to read 2005 and 2008 profiler traces stored as .trc files and read them into the Data Flow.  From there you can perform filtering and analysis using the power of SSIS. There is no need for a SQL Server connection this just uses the trace file. Example Usages Cache warming for SQL Server Analysis Services Reading the flight recorder Find out the longest running queries on a server Analyze statements for CPU, memory by user or some other criteria you choose Properties The Trace File Source adapter has two properties, both of which combine to control the source trace file that is read at runtime. SQL Server 2005 and SQL Server 2008 trace files are supported for both the Database Engine (SQL Server) and Analysis Services. The properties are managed by the Editor form or can be set directly from the Properties Grid in Visual Studio. Property Type Description AccessMode Enumeration This property determines how the Filename property is interpreted. The values available are: DirectInput Variable Filename String This property holds the path for trace file to load (*.trc). The value is either a full path, or the name of a variable which contains the full path to the trace file, depending on the AccessMode property. Trace Column Definition Hopefully the majority of you can skip this section entirely, but if you encounter some problems processing a trace file this may explain it and allow you to fix the problem. The component is built upon the trace management API provided by Microsoft. Unfortunately API methods that expose the schema of a trace file have known issues and are unreliable, put simply the data often differs from what was specified. To overcome these limitations the component uses  some simple XML files. These files enable the trace column data types and sizing attributes to be overridden. For example SQL Server Profiler or TMO generated structures define EventClass as an integer, but the real value is a string. TraceDataColumnsSQL.xml  - SQL Server Database Engine Trace Columns TraceDataColumnsAS.xml    - SQL Server Analysis Services Trace Columns The files can be found in the %ProgramFiles%\Microsoft SQL Server\100\DTS\PipelineComponents folder, e.g. "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsSQL.xml" "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsAS.xml" If at runtime the component encounters a type conversion or sizing error it is most likely due to a discrepancy between the column definition as reported by the API and the actual value encountered. Whilst most common issues have already been fixed through these files we have implemented specific exception traps to direct you to the files to enable you to fix any further issues due to different usage or data scenarios that we have not tested. An example error that you can fix through these files is shown below. Buffer exception writing value to column 'Column Name'. The string value is 999 characters in length, the column is only 111. Columns can be overridden by the TraceDataColumns XML files in "C:\Program Files\Microsoft SQL Server\100\DTS\PipelineComponents\TraceDataColumnsAS.xml". Installation The component is provided as an MSI file which you can download and run to install it. This simply places the files on disk in the correct locations and also installs the assemblies in the Global Assembly Cache as per Microsoft’s recommendations. You may need to restart the SQL Server Integration Services service, as this caches information about what components are installed, as well as restarting any open instances of Business Intelligence Development Studio (BIDS) / Visual Studio that you may be using to build your SSIS packages. Finally you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the Trace File Source transformation in the Choose Toolbox Items window. This process has been described in detail in the related FAQ entry for How do I install a task or transform component? We recommend you follow best practice and apply the current Microsoft SQL Server Service pack to your SQL Server servers and workstations. Please note that the Microsoft Trace classes used in the component are not supported on 64-bit platforms. To use the Trace File Source on a 64-bit host you need to ensure you have the 32-bit (x86) tools available, and the way you execute your package is setup to use them, please see the help topic 64-bit Considerations for Integration Services for more details. Downloads Trace Sources for SQL Server 2005 -- Trace Sources for SQL Server 2008 Version History SQL Server 2008 Version 2.0.0.382 - SQL Sever 2008 public release. (9 Apr 2009) SQL Server 2005 Version 1.0.0.321 - SQL Server 2005 public release. (18 Nov 2008) -- Screenshots

    Read the article

  • How to Use RDA to Generate WLS Thread Dumps At Specified Intervals?

    - by Daniel Mortimer
    Introduction There are many ways to generate a thread dump of a WebLogic Managed Server. For example, take a look at: Taking Thread Dumps - [an excellent blog post on the Middleware Magic site]or  Different ways to take thread dumps in WebLogic Server (Document 1098691.1) There is another method - use Remote Diagnostic Agent! The solution described below is not documented, but it is relatively straightforward to execute. One advantage of using RDA to collect the thread dumps is RDA will also collect configuration, log files, network, system, performance information at the same time. Instructions 1. Not familiar with Remote Diagnostic Agent? Take a look at my previous blog "Resolve SRs Faster Using RDA - Find the Right Profile" 2. Choose a profile, which includes the WebLogic Server data collection modules (for example the profile "WebLogicServer"). At RDA setup time you should see the prompt below: ------------------------------------------------------------------------------- S301WLS: Collects Oracle WebLogic Server Information ------------------------------------------------------------------------------- Enter the location of the directory where the domains to analyze are located (For example in UNIX, <BEA Home>/user_projects/domains or <Middleware Home>/user_projects/domains) Hit 'Return' to accept the default (/oracle/11AS/Middleware/user_projects/domains) > For a successful WLS connection, ensure that the domain Admin Server is up and running. Data Collection Type:   1  Collect for a single server (offline mode)   2  Collect for a single server (using WLS connection)   3  Collect for multiple servers (using WLS connection) Enter the item number Hit 'Return' to accept the default (1) > 2 Choose option 2 or 3. Note: Collect for a single server or multiple servers using WLS connection means that RDA will attempt to connect to execute online WLST commands against the targeted server(s). The thread dumps are collected using the WLST function - "threadDumps()". If WLST cannot connect to the managed server, RDA will proceed to collect other data and ignore the request to collect thread dumps. If in the final output you see no Thread Dump menu item, then it's likely that the managed server is in a state which prevents new connections to it. If faced with this scenario, you would have to employ alternative methods for collecting thread dumps. 3. The RDA setup will create a setup.cfg file in the RDA_HOME directory. Open this file in an editor. You will find the following parameters which govern the number of thread dumps and thread dump interval. #N.Number of thread dumps to capture WREQ_THREAD_DUMP=10 #N.Thread dump interval WREQ_THREAD_DUMP_INTERVAL=5000 The example lines above show the default settings. In other words, RDA will collect 10 thread dumps at 5000 millisecond (5 second) intervals. You may want to change this to something like: #N.Number of thread dumps to capture WREQ_THREAD_DUMP=10 #N.Thread dump interval WREQ_THREAD_DUMP_INTERVAL=30000 However, bear in mind, that such change will increase the total amount of time it takes for RDA to complete its run. 4. Once you are happy with the setup.cfg, run RDA. RDA will collect, render, generate and package all files in the output directory. 5. For ease of viewing, open up the RDA Start html file - "xxxx__start.htm". The thread dumps can be found under the WLST Collections for the target managed server(s). See screenshots belowScreenshot 1:RDA Start Page - Main Index Screenshot 2: Managed Server Sub Index Screenshot 3: WLST Collections Screenshot 4: Thread Dump Page - List of dump file links Screenshot 5: Thread Dump Dat File Link Additional Comment: A) You can view the thread dump files within the RDA Start Page framework, but most likely you will want to download the dat files for in-depth analysis via thread dump analysis tools such as: Thread Dump Analyzer -  Samurai - a GUI based tail , thread dump analysis tool If you are new to thread dump analysis - take a look at this recorded Support Advisor Webcast  Oracle WebLogic Server: Diagnosing Performance Issues through Java Thread Dumps[Slidedeck from webcast in PDF format]B) I have logged a couple of enhancement requests for the RDA Development Team to consider: Add timestamp to dump file links, dat filename and at the top of the body of the dat file Package the individual thread dumps in a zip so all dump files can be conveniently downloaded in one go.

    Read the article

< Previous Page | 134 135 136 137 138 139 140 141 142 143 144 145  | Next Page >