Search Results

Search found 2585 results on 104 pages for 'forensic analysis'.

Page 43/104 | < Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >

Investigating on xVelocity (VertiPaq) column size

- by Marco Russo (SQLBI)

In January I published an article about how to optimize high cardinality columns in VertiPaq. In the meantime, VertiPaq has been rebranded to xVelocity: the official name is now “xVelocity in-memory analytics engine (VertiPaq)” but using xVelocity and VertiPaq when we talk about Analysis Services has the same meaning. In this post I’ll show how to investigate on columns size of an existing Tabular database so that you can find the most important columns to be optimized. A first approach can be looking in the DataDir of Analysis Services and look for the folder containing the database. Then, look for the biggest files in all subfolders and you will find the name of a file that contains the name of the most expensive column. However, this heuristic process is not very optimized. A better approach is using a DMV that provides the exact information. For example, by using the following query (open SSMS, open an MDX query on the database you are interested to and execute it) you will see all database objects sorted by used size in a descending way. SELECT * FROM $SYSTEM.DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS ORDER BY used_size DESC You can look at the first rows in order to understand what are the most expensive columns in your tabular model. The interesting data provided are: TABLE_ID: it is the name of the object – it can be also a dictionary or an index COLUMN_ID: it is the column name the object belongs to – you can also see ID_TO_POS and POS_TO_ID in case they refer to internal indexes RECORDS_COUNT: it is the number of rows in the column USED_SIZE: it is the used memory for the object By looking at the ration between USED_SIZE and RECORDS_COUNT you can understand what you can do in order to optimize your tabular model. Your options are: Remove the column. Yes, if it contains data you will never use in a query, simply remove the column from the tabular model Change granularity. If you are tracking time and you included milliseconds but seconds would be enough, round the data source column to the nearest second. If you have a floating point number but two decimals are good enough (i.e. the temperature), round the number to the nearest decimal is relevant to you. Split the column. Create two or more columns that have to be combined together in order to produce the original value. This technique is described in VertiPaq optimization article. Sort the table by that column. When you read the data source, you might consider sorting data by this column, so that the compression will be more efficient. However, this technique works better on columns that don’t have too many distinct values and you will probably move the problem to another column. Sorting data starting from the lower density columns (those with a few number of distinct values) and going to higher density columns (those with high cardinality) is the technique that provides the best compression ratio. After the optimization you should be able to reduce the used size and improve the count/size ration you measured before. If you are interested in a longer discussion about internal storage in VertiPaq and you want understand why this approach can save you space (and time), you can attend my 24 Hours of PASS session “VertiPaq Under the Hood” on March 21 at 08:00 GMT.

Read the article
The five steps of business intelligence adoption: where are you?

- by Red Gate Software BI Tools Team

When I was in Orlando and New York last month, I spoke to a lot of business intelligence users. What they told me suggested a path of BI adoption. The user’s place on the path depends on the size and sophistication of their organisation. Step 1: A company with a database of customer transactions will often want to examine particular data, like revenue and unit sales over the last period for each product and territory. To do this, they probably use simple SQL queries or stored procedures to produce data on demand. Step 2: The results from step one are saved in an Excel document, so business users can analyse them with filters or pivot tables. Alternatively, SQL Server Reporting Services (SSRS) might be used to generate a report of the SQL query for display on an intranet page. Step 3: If these queries are run frequently, or business users want to explore data from multiple sources more freely, it may become necessary to create a new database structured for analysis rather than CRUD (create, retrieve, update, and delete). For example, data from more than one system — plus external information — may be incorporated into a data warehouse. This can become ‘one source of truth’ for the business’s operational activities. The warehouse will probably have a simple ‘star’ schema, with fact tables representing the measures to be analysed (e.g. unit sales, revenue) and dimension tables defining how this data is aggregated (e.g. by time, region or product). Reports can be generated from the warehouse with Excel, SSRS or other tools. Step 4: Not too long ago, Microsoft introduced an Excel plug-in, PowerPivot, which allows users to bring larger volumes of data into Excel documents and create links between multiple tables. These BISM Tabular documents can be created by the database owners or other expert Excel users and viewed by anyone with Excel PowerPivot. Sometimes, business users may use PowerPivot to create reports directly from the primary database, bypassing the need for a data warehouse. This can introduce problems when there are misunderstandings of the database structure or no single ‘source of truth’ for key data. Step 5: Steps three or four are often enough to satisfy business intelligence needs, especially if users are sophisticated enough to work with the warehouse in Excel or SSRS. However, sometimes the relationships between data are too complex or the queries which aggregate across periods, regions etc are too slow. In these cases, it can be necessary to formalise how the data is analysed and pre-build some of the aggregations. To do this, a business intelligence professional will typically use SQL Server Analysis Services (SSAS) to create a multidimensional model — or “cube” — that more simply represents key measures and aggregates them across specified dimensions. Step five is where our tool, SSAS Compare, becomes useful, as it helps review and deploy changes from development to production. For us at Red Gate, the primary value of SSAS Compare is to establish a dialog with BI users, so we can develop a portfolio of products that support creation and deployment across a range of report and model types. For example, PowerPivot and the new BISM Tabular model create a potential customer base for tools that extend beyond BI professionals. We’re interested in learning where people are in this story, so we’ve created a six-question survey to find out. Whether you’re at step one or step five, we’d love to know how you use BI so we can decide how to build tools that solve your problems. So if you have a sixty seconds to spare, tell us on the survey!

Read the article
Synchronizing Asynchronous request handlers in Silverlight environment

- by Eric Lifka

For our senior design project my group is making a Silverlight application that utilizes graph theory concepts and stores the data in a database on the back end. We have a situation where we add a link between two nodes in the graph and upon doing so we run analysis to re-categorize our clusters of nodes. The problem is that this re-categorization is quite complex and involves multiple queries and updates to the database so if multiple instances of it run at once it quickly garbles data and breaks (by trying to re-insert already used primary keys). Essentially it's not thread safe, and we're trying to make it safe, and that's where we're failing and need help :). The create link function looks like this: private Semaphore dblock = new Semaphore(1, 1); // This function is on our service reference and gets called // by the client code. public int addNeed(int nodeOne, int nodeTwo) { dblock.WaitOne(); submitNewNeed(createNewNeed(nodeOne, nodeTwo)); verifyClusters(nodeOne, nodeTwo); dblock.Release(); return 0; } private void verifyClusters(int nodeOne, int nodeTwo) { // Run analysis of nodeOne and nodeTwo in graph } All copies of addNeed should wait for the first one that comes in to finish before another can execute. But instead they all seem to be running and conflicting with each other in the verifyClusters method. One solution would be to force our front end calls to be made synchronously. And in fact, when we do that everything works fine, so the code logic isn't broken. But when it's launched our application will be deployed within a business setting and used by internal IT staff (or at least that's the plan) so we'll have the same problem. We can't force all clients to submit data at different times, so we really need to get it synchronized on the back end. Thanks for any help you can give, I'd be glad to supply any additional information that you could need!

Read the article
OCR: How to improve accuracy - existing libraries for removing non-text 'furniture', shapes, etc to

- by Rob

I want to remove rectangles etc that enclose text in a screenshot image, so that I can perform optical character recognition to get accurate text from the screenshot. Background: I doing this to extract data from a legacy application for use with other applications. This is the only way to get at this data as associated files are in a closed, proprietary, binary format. I will be using AutoItScript to drive the application to show data in its UI, then I will screenshot this and feed this to tesseract. I've already had some success in automating the UI, and have been able to use tesseract to get plain ascii text out of the bitmap. There are several AutoItScripr forum articles discussing its use with tesseract/OCR but not specifically for my question. http://www.autoitscript.com/forum/index.php?s=6c32c3ece12756e635a619cdf175eff9&showforum=2 What I need to do There are thin, 1-pixel wide rectangles that closely enclose some text, when fed to tesseract, it sees them as I for example for a verticle line of the rectangle. Any thoughts on how to remove the rectangles, or best practices? I'm asking if there is a generic command line based toolset to overwrite rectangles, for example, in .png files. I could then pass the .png through this, then pass it to tesseract. Details on the tesseract release/setup I've used are as follows: Go here: http://code.google.com/p/tesseract-ocr/downloads/list - For the basic english generic character set to get Tesseract up and running and recognising your bitmapped text into ascii text, use tesseract-2.00.eng.tar.gz (current version at time of writing is: "English language data for Tesseract (2.00 and up) Jul 2007 989 KB 84845") Related questions I have already looked at on Stack Overflow http://stackoverflow.com/questions/1335581/how-to-give-best-chance-of-success-to-an-ocr-software http://stackoverflow.com/questions/2296568/analysis-and-transformation-of-the-image-on-the-basis-of-this-analysis-for-better http://stackoverflow.com/questions/2268028/reading-characters-off-of-the-screen In these, my question is not completely answered or a commercial solution is being sold. I do not want to consider a commercial solution at this stage.

Read the article
Text mining on large database (data mining)

- by yox

Hello, I have a large database of resumes (CV), and a certain table skills grouping all users skills. inside that table there's a field skill_text that describes the skill in full text. I'm looking for an algorithm/software/method to extract significant terms/phrases from that table in order to build a new table with standarized skills.. Here are some examples skills extracted from the DB : Sectoral and competitive analysis Business Development (incl. in international settings) Specific structure and road design software - Microstation, Macao, AutoCAD (basic knowledge) Creative work (Photoshop, In-Design, Illustrator) checking and reporting back on campaign progress organising and attending events and exhibitions Development : Aptana Studio, PHP, HTML, CSS, JavaScript, SQL, AJAX Discipline: One to one marketing, E-marketing (SEO & SEA, display, emailing, affiliate program) Mix marketing, Viral Marketing, Social network marketing. The output shoud be something like : Sectoral and competitive analysis Business Development Specific structure and road design software - Macao AutoCAD Photoshop In-Design Illustrator organising events Development Aptana Studio PHP HTML CSS JavaScript SQL AJAX Mix marketing Viral Marketing Social network marketing emailing SEO One to one marketing As you see only skills remains no other representation text. I know this is possible using text mining technics but how to do it ? the database is realy large.. it's a good thing because we can calculate text frequency and decide if it's a real skill or just meaningless text... The big problem is .. how to determin that "blablabla" is a skill ? thanks

Read the article
Agile and Scrum burning me down please help me figuring out the truth

- by jadook

hi all, in the last while I installed MS-TFS 2008 then started to get myself prepared to use Agile Process Guidance template shipped with the TFS. with little googling I passed through Mike Cohn materials: I watched his conference in youtube "sponsored by google: http://www.youtube.com/watch?v=fb9Rzyi8b90 http://www.youtube.com/watch?v=jeT0pOVg0EI Read his book "Agile Estimating and Planning" Watching the video series in his website: http://www.mountaingoatsoftware.com/presentations-tag/video-recorded I was very happy while absorbing and eating the techniques he is using with the teams and how agile and scrum is such a great software process/methodology until I saw Mike answering a question regarding an architect role and talking about the requirements document... at that point everything start falling apart due to the following: Last year I had been assigned to make full analysis "including requirements gathering" for big project "very high priority project". within 2 months of hardwork, dedication and commitment I delivered the whole analysis with full satisfaction of the customer and my BOSS and ZERO amendments. Later on, the project entered the architecting, development ... phases. due to the fact that the system included many competitive and exciting features I requested patenting it and its going in the process... so imagine you are the kind of person who used to love facing all kind of challenges and returning with excellent experience and results for the stakeholders and yourself, How fairly agile and scrum processes will credit and admit your talent and passion while the scrum master/coach treat the team as one unit that accomplish user stories and converge through trial and error approach??!!!! with that dark thoughts about agile and scrum I found many people "anti agile" and on top of them is "Crispin Rogers Johnson": http://agile-crispin.blogspot.com/ that guy made anti statement for everything Mike Cohn used to talk about. I really don't know what to do next! so any guidance will be appreciated. Thanks,

Read the article
rails + compass: advantages vs using haml + blueprint directly

- by egarcia

I've got some experience using haml (+sass) on rails projects. I recently started using them with blueprintcss - the only thing I did was transform blueprint.css into a sass file, and started coding from there. I even have a rails generator that includes all this by default. It seems that Compass does what I do, and other things. I'm trying to understand what those other things are - but the documentation/tutorials weren't very clear. These are my conclusions: Compass comes with built-in sass mixins that implement common CSS idioms, such as links with icons or horizontal lists. My solution doesn't provide anything like that. (1 point for Compass). Compass has several command-line options: you can create a rails project, but you can also "install" it on an existing rails project. A rails generator could be personalized to do the same thing, I guess. (Tie). Compass has two modes of working with blueprint: "basic" and "semantic" usage. I'm not clear about the differences between those. With my rails generator I only have one mode, but it seems enough. (Tie) Apparently, Compass is prepared to use other frameworks, besides blueprint (e.g. YUI). I could not find much documentation about this, and I'm not interested on it anyway - blueprint is ok for me (Tie). Compass' learning curve seems a bit stiff and the documentation seems sparse. Learning could be a bit difficult. On the other hand, I know the ins and outs of my own system and can use it right away. (1 point for my system). With this analysis, I'm hesitant to give Compass a try. Is my analysis correct? Are Am I missing any key points, or have I evaluated any of these points wrongly?

Read the article
structured vs. unstructured data in db

- by Igor

the question is one of design. i'm gathering a big chunk of performance data with lots of key-value pairs. pretty much everything in /proc/cpuinfo, /proc/meminfo/, /proc/loadavg, plus a bunch of other stuff, from several hundred hosts. right now, i just need to display the latest chunk of data in my UI. i will probably end up doing some analysis of the data gathered to figure out performance problems down the road, but this is a new application so i'm not sure what exactly i'm looking for performance-wise just yet. i could structure the data in the db -- have a column for each key i'm gathering. the table would end up being O(100) columns wide, it would be a pain to put into the db, i would have to add new columns if i start gathering a new stat. but it would be easy to sort/analyze the data just using SQL. or i could just dump my unstructured data blob into the table. maybe three columns -- host id, timestamp, and a serialized version of my array, probably using JSON in a TEXT field. which should I do? am i going to be sorry if i go with the unstructured approach? when doing analysis, should i just convert the fields i'm interested in and create a new, more structured table? what are the trade-offs i'm missing here?

Read the article
Exponential regression : p-value and F significance

- by Saravanan K

I am new to statistics. I have a set of independent data and dependent data (X,Y), where I would like to do an exponential regression to obtain its p-value and significant F (already obtained R2 and also the coefficients through mathematical calculation). What is the natural evolution from the (X,Y) data to mathematically calculate those variables. Spent a week on the internet to study this but unable to find the right answer. Often an exponential data, y=be^(mx) will be converted first to a linear data, ln y = mx + ln b . Then a linear regression will done on the converted data, obtaining its p-value etc. Assume we use a statistical tool such as Excel's Analysis ToolPak: Data Analysis : Regression, it will produce a result such as below, I believe the p-value and Significant F value is representing the converted linear data and not the original exponential data. Questions: What is the approach/steps used by Excel to get the p-value and Significant F value for the converted linear data as shown in the statistic output in the image above? It is not clear in their help page or website. Can the p-value and Significant F could be mathematically calculated for exponential regression without using a statistical tool? Can you assist to point me to the right link if this has been answered before.

Read the article
What is the difference between cubes and the Unified Dimensional Model (if any)?

- by ngm

I'm currently researching SQL Server 2008 as a business intelligence solution, and currently looking at Analysis Services (and I'm pretty new to business intelligence as a whole...) I'm a bit confused by some of the terms in SSAS, particularly the conceptual differences between cubes and MS's Unified Dimensional Model. I believe that a cube in SSAS is basically an OLAP cube -- dimensions, measures, something that sits between the underlying data source and a business user. But then that's kind of what I understand UDM to be as well. The docs for SQL Server 2005 seem to suggest as much: "A cube is essentially synonymous with a Unified Dimensional Model (UDM)". But then the SQL Server 2008 pages sort of suggest that UDM is a wrapper for both multidimensional data (cubes) and relational data: "Use the Unified Dimensional Model to provide one consolidated business view for relational and multidimensional data that includes business entities, business logic, calculations, and metrics." This blog post suggests similarly: "UDM provides a single dimensional model for all OLAP analysis and relational reporting needs. So you can use either MDX or SQL" Is UDM something that sits above cubes? Or are they the same thing? I presume I would develop cubes with the Cube Designer application; what would I develop a UDM with?

Read the article
Is there a disassembler + debugger for java (ala OllyDbg / SoftICE for assembler)?

- by Ran Biron

Is there a utility similar to OllyDbg / SoftICE for java? I.e. execute class (from jar / with class path) and, without source code, show the disassembly of the intermediate code with ability to step through / step over / search for references / edit specific intermediate code in memory / apply edit to file... If not, is it even possible to write something like this (assuming we're willing to live without hotspot for the debug duration)? Edit: I'm not talking about JAD or JD or Cavaj. These are fine decompilers, but I don't want a decompiler for several reasons, most notable is that their output is incorrect (at best, sometimes just plain wrong). I'm not looking for a magical "compiled bytes to java code" - I want to see the actual bytes that are about to be executed. Also, I'd like the ability to change those bytes (just like in an assembly debugger) and, hopefully, write the changed part back to the class file. Edit2: I know javap exists - but it does only one way (and without any sort of analysis). Example (code taken from the vmspec documentation): From java code, we use "javac" to compile this: void setIt(int value) { i = value; } int getIt() { return i; } to a java .class file. Using javap -c I can get this output: Method void setIt(int) 0 aload_0 1 iload_1 2 putfield #4 5 return Method int getIt() 0 aload_0 1 getfield #4 4 ireturn This is OK for the disassembly part (not really good without analysis - "field #4 is Example.i"), but I can't find the two other "tools": A debugger that goes over the instructions themselves (with stack, memory dumps, etc), allowing me to examine the actual code and environment. A way to reverse the process - edit the disassembled code and recreate the .class file (with the edited code).

Read the article
Scripts to parse and download iTunes Connect and AppStore data

- by bradhouse

I'm looking for recommendations of a script or series of scripts that download and parse iTunes Connect sales data and AppStore comments, ratings and rankings data for a defined app. I'm also aware of solutions like: AppViz appsales-mobile iphone-stats Heartbeat.app I'm sure I'll find a few more with more searching. I can't help but feel there must be a really decent set of open source scripts out there to do this, given how many developers are now writing apps for the AppStore. Would be interested to hear any commercial offerings as well (although my personal preference is for open source, so I can at least see what it is doing with my iTunes Connect login credentials). To be clear, I'm really looking for something that hits all of the areas mentioned: App Store (per store) Comments Ratings Category/store rankings iTunes Connect The contents of the sales reports Analysis/graphs of the data is not necessary (but would be a nice to have I guess). I'm not really looking for something like AppSales Mobile above, I would like the raw data so I can do my own analysis and formatting. So far it looks like AppViz (listed above) is the best out there. Any suggestions on what is good/available or should I just go roll my own?

Read the article
deadlock because of foreign key?

- by George2

Hello everyone, I am using SQL Server 2008 Enterprise. I met with deadlock in the following store procedure, but because of my fault, I did not record the deadlock graph. But now I can not reproduce deadlock issue. I want to have a postmortem to find the root cause of deadlock to avoid deadlock in the future. The deadlock happens on delete statement. For the delete statement, Param1 is a column of table FooTable, Param1 is a foreign key of another table (refers to another primary key clustered index column of the other table). There is no index on Param1 itself for table FooTable. FooTable has another column which is used as clustered primary key, but not Param1 column. Here is my guess why there is deadlock, and I want to let people review whether my analysis is correct? Since Param1 column has no index, there will be a table scan, and will acquire table level lock, because of foreign key, the delete operation will also need to check master table (e.g. to acquire lock on master table); Some operation on master table acquires master table lock, but want to acquire lock on FooTable; (1) and (2) cause cycle lock which makes deadlock happen. My analysis correct? Any reproduce scenario? create PROCEDURE [dbo].[FooProc] ( @Param1 int ,@Param2 int ,@Param3 int ) AS DELETE FooTable WHERE Param1 = @Param1 INSERT INTO FooTable ( Param1 ,Param2 ,Param3 ) VALUES ( @Param1 ,@Param2 ,@Param3 ) DECLARE @ID bigint SET @ID = ISNULL(@@Identity,-1) IF @ID > 0 BEGIN SELECT IdentityStr FROM FooTable WHERE ID = @ID END thanks in advance, George

Read the article
Getting started with massive data

- by Max

I'm a math guy and occasionally do some statistics/machine learning analysis consulting projects on the side. The data I have access to are usually on the smaller side, at most a couple hundred of megabytes (and almost always far less), but I want to learn more about handling and analyzing data on the gigabyte/terabyte scale. What do I need to know and what are some good resources to learn from? Hadoop/MapReduce is one obvious start. Is there a particular programming language I should pick up? (I primarily work now in Python, Ruby, R, and occasionally Java, but it seems like C and Clojure are often used for large-scale data analysis?) I'm not really familiar with the whole NoSQL movement, except that it's associated with big data. What's a good place to learn about it, and is there a particular implementation (Cassandra, CouchDB, etc.) I should get familiar with? Where can I learn about applying machine learning algorithms to huge amounts of data? My math background is mostly on the theory side, definitely not on the numerical or approximation side, and I'm guessing most of the standard ML algorithms don't really scale. Any other suggestions on things to learn would be great!

Read the article
Is this a good job description? What title would you give this position?

- by Zack Peterson

Department: Information Technology Reports To: Chief Information Officer Purpose: Company's ________________ is specifically engaged in the development of World Wide Web applications and distributed network applications. This person is concerned with all facets of the software development process and specializes in software product management. He or she contributes to projects in an application architect role and also performs individual programming tasks. Essential Duties & Responsibilities: This person is involved in all aspects of the software development process such as: Participation in software product definitions, including requirements analysis and specification Development and refinement of simulations or prototypes to confirm requirements Feasibility and cost-benefit analysis, including the choice of architecture and framework Application and database design Implementation (e.g. installation, configuration, customization, integration, data migration) Authoring of documentation needed by users and partners Testing, including defining/supporting acceptance testing and gathering feedback from pre-release testers Participation in software release and post-release activities, including support for product launch evangelism (e.g. developing demonstrations and/or samples) and subsequent product build/release cycles Maintenance Qualifications: Bachelor's degree in computer science or software engineering Several years of professional programming experience Proficiency in the general technology of the World Wide Web: Hypertext Transfer Protocol (HTTP) Hypertext Markup Language (HTML) JavaScript Cascading Style Sheets (CSS) Proficiency in the following principles, practices, and techniques: Accessibility Interoperability Usability Security (especially prevention of SQL injection and cross-site scripting (XSS) attacks) Object-oriented programming (e.g. encapsulation, inheritance, modularity, polymorphism, etc.) Relational database design (e.g. normalization, orthogonality) Search engine optimization (SEO) Asynchronous JavaScript and XML (AJAX) Proficiency in the following specific technologies utilized by Company: C# or Visual Basic .NET ADO.NET (including ADO.NET Entity Framework) ASP.NET (including ASP.NET MVC Framework) Windows Presentation Foundation (WPF) Language Integrated Query (LINQ) Extensible Application Markup Language (XAML) jQuery Transact-SQL (T-SQL) Microsoft Visual Studio Microsoft Internet Information Services (IIS) Microsoft SQL Server Adobe Photoshop

Read the article
Simplest way to automatically alter "const" value in Java during compile time

- by Michael Mao

Hi all: This is a question corresponds to my uni assignment so I am very sorry to say I cannot adopt any one of the following best practices in a short time -- you know -- assignment is due tomorrow :( link to Best way to alter const in Java on StackOverflow Basically the only task (I hope so) left for me is the performance tuning. I've got a bunch of predefined "const" values in my single-class agent source code like this: //static final values private static final long FT_THRESHOLD = 400; private static final long FT_THRESHOLD_MARGIN = 50; private static final long FT_SMOOTH_PRICE_IDICATOR = 20; private static final long FT_BUY_PRICE_MODIFIER = 0; private static final long FT_LAST_ROUNDS_STARTTIME = 90; private static final long FT_AMOUNT_ADJUST_MODIFIER = 5; private static final long FT_HISTORY_PIRCES_LENGTH = 10; private static final long FT_TRACK_DURATION = 5; private static final int MAX_BED_BID_NUM_PER_AUC = 12; I can definitely alter the values manually and then compile the code to give it another go around. But the execution time for a thorough "statistic analysis" usually requires over 2000 times of execution, which will typically lasts more than half an hour on my own laptop... So I hope there is a way to alter values using other ways than dig into the source code to change the "const" values there, so I can automatically distributed compiled code to other people's PC and let them run the statistic analysis instead. One other reason for a automatically value adjustment is that I can try using my own agent to defeat itself by choosing different "const" values. Although my values are derived from previous history and statistical results, they are far from the most optimized values. I hope there is a easy way so I can quickly adopt that so to have a good sleep tonight while the computer does everything for me... :) Any hints on this sort of stuff? Any suggestion is welcomed and much appreciated.

Read the article
Using an element against an entire list in Haskell

- by Snick

I have an assignment and am currently caught in one section of what I'm trying to do. Without going in to specific detail here is the basic layout: I'm given a data element, f, that holds four different types inside (each with their own purpose): data F = F Float Int, Int a function: func :: F -> F-> Q Which takes two data elements and (by simple calculations) returns a type that is now an updated version of one of the types in the first f. I now have an entire list of these elements and need to run the given function using one data element and return the type's value (not the data element). My first analysis was to use a foldl function: myfunc :: F -> [F] -> Q myfunc y [] = func y y -- func deals with the same data element calls myfunc y (x:xs) = foldl func y (x:xs) however I keep getting the same error: "Couldn't match expected type 'F' against inferred type 'Q'. In the first argument of 'foldl', namely 'myfunc' In the expression: foldl func y (x:xs) I apologise for such an abstract analysis on my problem but could anyone give me an idea as to what I should do? Should I even use a fold function or is there recursion I'm not thinking about?

Read the article
R: Are there any alternatives to loops for subsetting from an optimization standpoint?

- by Adam

A recurring analysis paradigm I encounter in my research is the need to subset based on all different group id values, performing statistical analysis on each group in turn, and putting the results in an output matrix for further processing/summarizing. How I typically do this in R is something like the following: data.mat <- read.csv("...") groupids <- unique(data.mat$ID) #Assume there are then 100 unique groups results <- matrix(rep("NA",300),ncol=3,nrow=100) for(i in 1:100) { tempmat <- subset(data.mat,ID==groupids[i]) #Run various stats on tempmat (correlations, regressions, etc), checking to #make sure this specific group doesn't have NAs in the variables I'm using #and assign results to x, y, and z, for example. results[i,1] <- x results[i,2] <- y results[i,3] <- z } This ends up working for me, but depending on the size of the data and the number of groups I'm working with, this can take up to three days. Besides branching out into parallel processing, is there any "trick" for making something like this run faster? For instance, converting the loops into something else (something like an apply with a function containing the stats I want to run inside the loop), or eliminating the need to actually assign the subset of data to a variable?

Read the article
Can ScalaCheck/Specs warnings safely be ignored when using SBT with ScalaTest?

- by pdbartlett

I have a simple FunSuite-based ScalaTest: package pdbartlett.hello_sbt import org.scalatest.FunSuite class SanityTest extends FunSuite { test("a simple test") { assert(true) } test("a very slightly more complicated test - purposely fails") { assert(42 === (6 * 9)) } } Which I'm running with the following SBT project config: import sbt._ class HelloSbtProject(info: ProjectInfo) extends DefaultProject(info) { // Dummy action, just to show config working OK. lazy val solveQ = task { println("42"); None } // Managed dependencies val scalatest = "org.scalatest" % "scalatest" % "1.0" % "test" } However, when I runsbt test I get the following warnings: ... [info] == test-compile == [info] Source analysis: 0 new/modified, 0 indirectly invalidated, 0 removed. [info] Compiling test sources... [info] Nothing to compile. [warn] Could not load superclass 'org.scalacheck.Properties' : java.lang.ClassNotFoundException: org.scalacheck.Properties [warn] Could not load superclass 'org.specs.Specification' : java.lang.ClassNotFoundException: org.specs.Specification [warn] Could not load superclass 'org.specs.Specification' : java.lang.ClassNotFoundException: org.specs.Specification [info] Post-analysis: 3 classes. [info] == test-compile == For the moment I'm assuming these are just "noise" (caused by the unified test interface?) and that I can safely ignore them. But it is slightly annoying to some inner OCD part of me (though not so annoying that I'm prepared to add dependencies for the other frameworks). Is this a correct assumption, or are there subtle errors in my test/config code? If it is safe to ignore, is there any other way to suppress these errors, or do people routinely include all three frameworks so they can pick and choose the best approach for different tests? TIA, Paul. (ADDED: scala v2.7.7 and sbt v0.7.4)

Read the article
SPSS - sum of squares change radically with slight model changes in ANOVA??

- by Pat

I have noticed that the sum of squares in my models can change fairly radically with even the slightest adjustment to my models???? Is this normal???? I'm using SPSS 16, and both models presented below used the same data and variables with only one small change - categorizing one of the variables as either a 2 level or 3 level variable. Details - using a 2 x 2 x 6 mixed model ANOVA with the 6 being the repeated measure i get the following in the between group analysis ------------------------------------------------------------ Source | Type III SS | df | MS | F | Sig ------------------------------------------------------------ intercept | 4086.46 | 1 | 4086.46 | 104.93 | .000 X | 224.61 | 1 | 224.61 | 5.77 | .019 Y | 2.60 | 1 | 2.60 | .07 | .80 X by Y | 19.25 | 1 | 19.25 | .49 | .49 Error | 2570.40 | 66 | 38.95 | Then, when I use the exact same data but a slightly different model in which variable Y has 3 levels instead of 2 levels I get the following ------------------------------------------------------------ Source | Type III SS | df | MS | F | Sig ------------------------------------------------------------ intercept | 3603.88 | 1 | 3603.88 | 90.89 | .000 X | 171.89 | 1 | 171.89 | 4.34 | .041 Y | 19.23 | 2 | 9.62 | .24 | .79 X by Y | 17.90 | 2 | 17.90 | .80 | .80 Error | 2537.76 | 64 | 39.65 | I don't understand why variable X would have a different sum of squares simply because variable Y gets devided up into 3 levels instead of 2. This is also the case in the within groups analysis too. Please help me understand :D Thank you in advance Pat

Read the article
Git add not working with .png files?

- by D Lawson

I have a dirty working tree, dirty because I made changes to source files and touched up some images. I was trying to add just the images to the index, so I ran this command: git add *.png But, this doesn't add the files. There were a few new image files that were added, but none of the ones that were modified/pre-existing were added. What gives? Edit: Here is some relevant terminal output $ git status # On branch master # # Changed but not updated: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: src/main/java/net/plugins/analysis/FormMatcher.java # modified: src/main/resources/icons/doctor_edit_male.png # modified: src/main/resources/icons/doctor_female.png # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # src/main/resources/icons/arrow_up.png # src/main/resources/icons/bullet_arrow_down.png # src/main/resources/icons/bullet_arrow_up.png no changes added to commit (use "git add" and/or "git commit -a") Then executed "git add *.png" (no output after command) Then: $ git status # On branch master # # Changes to be committed: # (use "git reset HEAD <file>..." to unstage) # # new file: src/main/resources/icons/arrow_up.png # new file: src/main/resources/icons/bullet_arrow_down.png # new file: src/main/resources/icons/bullet_arrow_up.png # # Changed but not updated: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: src/main/java/net/plugins/analysis/FormMatcher.java # modified: src/main/resources/icons/doctor_edit_female.png # modified: src/main/resources/icons/doctor_edit_male.png

Read the article
Dimension Reduction in Categorical Data with missing values

- by user227290

I have a regression model in which the dependent variable is continuous but ninety percent of the independent variables are categorical(both ordered and unordered) and around thirty percent of the records have missing values(to make matters worse they are missing randomly without any pattern, that is, more that forty five percent of the data hava at least one missing value). There is no a priori theory to choose the specification of the model so one of the key tasks is dimension reduction before running the regression. While I am aware of several methods for dimension reduction for continuous variables I am not aware of a similar statical literature for categorical data (except, perhaps, as a part of correspondence analysis which is basically a variation of principal component analysis on frequency table). Let me also add that the dataset is of moderate size 500000 observations with 200 variables. I have two questions. Is there a good statistical reference out there for dimension reduction for categorical data along with robust imputation (I think the first issue is imputation and then dimension reduction)? This is linked to implementation of above problem. I have used R extensively earlier and tend to use transcan and impute function heavily for continuous variables and use a variation of tree method to impute categorical values. I have a working knowledge of Python so if something is nice out there for this purpose then I will use it. Any implementation pointers in python or R will be of great help. Thank you.

Read the article
Where should test classes be stored in the project?

- by limc

I build all my web projects at work using RAD/Eclipse, and I'm interested to know where do you guys normally store your test's *.class files. All my web projects have 2 source folders: "src" for source and "test" for testcases. The generated *.class files for both source folders are currently placed under WebContent/WEB-INF/classes folder. I want to separate the test *.class files from the src *.class files for 2 reasons:- There's no point to store them in WebContent/WEB-INF/classes and deploy them in production. Sonar and some other static code analysis tools don't produce an accurate static code analysis because it takes account of my crappy yet correct testcase code. So, right now, I have the following output folders:- "src" source folder compiles to WebContent/WEB-INF/classes folder. "test" source folder compiles to target/test-classes folder. Now, I'm getting this warning from RAD:- Broken single-root rule: A project may not contain more than one output folder. So, it seems like Eclipse-based IDEs prefer one project = one output folder, yet it provides an option for me to set up a custom output folder for my additional source folder from the "build path" dialog, and then it barks at me. I know I can just disable this warning myself, but I want to know how you guys handle this. Thanks.

Read the article
What type of data store should I use for my ios app?

- by mwiederrecht

I am pretty new to ios and using servers so forgive me. I am building an ios app for research. I need to monitor things that the user does and then push it up to a server for analysis (yes, with user and IRB permission). On the client's side I need to keep quite a bit of data that won't really change except in the case of pulling an updated version from the server, and then a minimal amount of user-specific data. Most of the data I will collect needs to be pushed to a server for analysis and then can be deleted from the client side. I am struggling to figure out what kind of data store I need to use, especially since I am not quite sure how the pushing and pulling from the server process works yet. Does it make sense to use Core Data? XML? SQLite? I like the Core Data idea, but I am not sure what kind of problems I will run into when I need to send large amounts of data to it and from it from the server. I imagine I might need to send data in a different form than it is probably stored in on either end - so what kind of overhead am I likely to run into in the process of converting that data? Is there a good format to save stuff in that would work well for me on both ends AND for sending the data? As you can probably tell, I could use some advice. Thanks!

Read the article
Python2.7: How can I speed up this bit of code (loop/lists/tuple optimization)?

- by user89

I repeat the following idiom again and again. I read from a large file (sometimes, up to 1.2 million records!) and store the output into an SQLite databse. Putting stuff into the SQLite DB seems to be fairly fast. def readerFunction(recordSize, recordFormat, connection, outputDirectory, outputFile, numObjects): insertString = "insert into NODE_DISP_INFO(node, analysis, timeStep, H1_translation, H2_translation, V_translation, H1_rotation, H2_rotation, V_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)" analysisNumber = int(outputPath[-3:]) outputFileObject = open(os.path.join(outputDirectory, outputFile), "rb") outputFileObject, numberOfRecordsInFileObject = determineNumberOfRecordsInFileObjectGivenRecordSize(recordSize, outputFileObject) numberOfRecordsPerObject = (numberOfRecordsInFileObject//numberOfObjects) loop1StartTime = time.time() for i in range(numberOfRecordsPerObject ): processedRecords = [] loop2StartTime = time.time() for j in range(numberOfObjects): fout = outputFileObject .read(recordSize) processedRecords.append(tuple([j+1, analysisNumber, i] + [x for x in list(struct.unpack(recordFormat, fout))])) loop2EndTime = time.time() print "Time taken to finish loop2: {}".format(loop2EndTime-loop2StartTime) dbInsertStartTime = time.time() connection.executemany(insertString, processedRecords) dbInsertEndTime = time.time() loop1EndTime = time.time() print "Time taken to finish loop1: {}".format(loop1EndTime-loop1StartTime) outputFileObject.close() print "Finished reading output file for analysis {}...".format(analysisNumber) When I run the code, it seems that "loop 2" and "inserting into the database" is where most execution time is spent. Average "loop 2" time is 0.003s, but it is run up to 50,000 times, in some analyses. The time spent putting stuff into the database is about the same: 0.004s. Currently, I am inserting into the database every time after loop2 finishes so that I don't have to deal with running out RAM. What could I do to speed up "loop 2"?

Read the article

Search Results

Search found 2585 results on 104 pages for 'forensic analysis'.

Page 43/104 | < Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >

- by Marco Russo (SQLBI)

- by Red Gate Software BI Tools Team

- by Eric Lifka

- by Rob

- by yox

- by jadook

- by egarcia

- by Igor

- by Saravanan K

- by ngm

- by Ran Biron

- by bradhouse

- by George2

- by Max

- by Zack Peterson

- by Michael Mao

- by Snick

- by Adam

- by pdbartlett

- by Pat

- by D Lawson

- by user227290

- by limc

- by mwiederrecht

- by user89

< Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >