Search Results

Search found 356 results on 15 pages for 'datasets'.

Page 8/15 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

What is spreadsheet useful for?

- by zvrba

I have been in computer business for 15 years in various roles (sysadmin, developer, researcher), and I have never encountered someone using excel for something more advanced than for formatting tables, or as an ad-hoc database that could have been maintained in a text-file. I had to do heavy data-processing and plotting and for that I used some perl scripts + gnuplot, got tiredof it, and went over to R eventually. 2D spreadsheet just didn't seem well-suited for doing statistical analyses over 5-dimensional datasets (not to mention that it produces UGLY plots). I attempted to use spreadsheet for time-tracking, and found out that I would have better been served by a relational database, so I gave up on using excel for that too. For example, it's important to consistently name tasks, and I needed to find out unique task names in a given column across several sheets (I had one timesheet for each month). How do you make such "query" in a program that essentially evaluates independent cells and has little notion of relations between them? So, what are spreadsheets useful for? Why do they have a bunch of mathematical stuff built into them when, AFAICT, people use them mostly as table formatters or bad substitutes for databases?

Read the article
Trying to learn how to use WCF services in a WPF app, using MVVM

- by Rod

We're working on a major re-write of a legacy VB6 app, into a WPF app. I've written several WCF services, which are meant to be used with the new WPF app. We want to use the MVVM design pattern to do this, but we don't have experience at that. So, in order to learn MVVM we've watched a video on WindowsClient called How Do I: Build Data-driven WPF Application using the MVVM pattern. This is a great introduction, and we refer to it a lot, but for our situation it doesn't quite give us enough. For example, we're not certain how to use datasets returned by my WCF services in our new WPF app using the ideas that Todd Miranda introduced in the video I referenced. If we did as we think we're supposed to do, then we should design a class that is exactly like the class of data returned in my WCF service. But we're wondering, why do that, when the WCF service already has such a class? And yet, the class in the WPF app has to at least implement the INotifyPropertyChanged interface. So, we're not sure what to do.

Read the article
How do I express subtle relationships in my data?

- by Chuck H

"A" is related to "B" and "C". How do I show that "B" and "C" might, by this context, be related as well? Example: Here are a few headlines about a recent Broadway play: 1 - David Mamet's Glengarry Glen Ross, Starring Al Pacino, Opens on Broadway 2 - Al Pacino in 'Glengarry Glen Ross': What did the critics think? 3 - Al Pacino earns lackluster reviews for Broadway turn 4 - Theater Review: Glengarry Glen Ross Is Selling Its Stars Hard 5 - Glengarry Glen Ross; Hey, Who Killed the Klieg Lights? Problem: Running a fuzzy-string match over these records will establish some relationships, but not others, even though a human reader could pick them out from context in much larger datasets. How do I find the relationship that suggests #3 is related to #4? Both of them can be easily connected to #1, but not to each other. Is there a (Googlable) name for this kind of data or structure? What kind of algorithm am I looking for? Goal: Given 1,000 headlines, a system that automatically suggests that these 5 items are all probably about the same thing. To be honest, it's been so long since I've programmed I'm at a loss how to properly articulate this problem. (I don't know what I don't know, if that makes sense). This is a personal project and I'm writing it in Python. Thanks in advance for any help, advice, and pointers!

Read the article
Working with data and meta data that are separated on different servers

- by afuzzyllama

While developing a product, I've come across a situation where my group wants to store meta data for data entry forms (questions, layout, etc) in a different database then the database where the collected data is stored. This is mostly for security because we want to be able to have our meta data public facing, while keeping collected data as secure as possible. I was thinking about writing a web service that provides the meta information that the data collection program could access. The only issue I see with this approach is the front end is going to have to match the meta data with the collected data, which would be more efficient as a join on the back end. Currently, this system is slated to run on .NET and MSSQL. I haven't played around with .NET libraries running in SQL, but I'm considering trying to create logic that would pull from the web service, convert the meta data into a table that SQL can join on, and return the combined data and meta data that way. Is this solution the wrong way to approach the problem? Is there a pattern or "industry standard" way of bringing together two datasets that don't live in the same database?

Read the article
Minimize useless tweaking of a numeric app

- by Potatoswatter

I'm developing a numeric application (nonlinear optimizer), with a zillion knobs to tweak and rising. It's not my first foray into this domain, but this time there are even more variables in the code and I'm on a tight schedule. Don't want to waste time fiddling. Days or even months can potentially be wasted adjusting variables, recompiling, and reprocessing benchmark datasets. The resulting data is viewed and trouble spots are checked. The overall quality of the solution is reported by the program but the meaning of the report could change over time. (Numeric units for the report are one thing I'm trying to nail down.) One main problem is organizing result files to identify each with specific code changes. Note taking can be a pain, is there software to help with this? Are there agreed best practices to making this kind of development cycle reliably move forward? The solver package converges to its optimal solution with mechanical determination, but I'm all too familiar with the way an excess of design decisions can mire development.

Read the article
Self Service Reporting With PowerPivot

- by blakmk

There are so many cool new features in Sql 2008 release 2 it was difficult for me to pick a topic for T-SQL Tuesday . But the one that I am now a secret fan of, I once resented for its creation. Let me explain, for years I have encountered reporting systems cobbled together in tools like Access and Excel built by "database hobbyists" who had no formal training in database design or best practices. They would take their monstrosities as far as they could go before ultimatley it stopped working or the person that wrote it left the company. At that point it would become the resident DBA's problem to support it as a Live application. So when I first heard of Power Pivot, a sense of Deja Vu overtook me and I felt like the guy in the Ausin Powers movie , knowing the inevitable is coming but somehow unsure how to get out of the way. But when I eventually saw it in action, I quickly realised that it is a very powerful tool. It has a much smaller "time to market" than traditional BI architectures. Combined with the new features of Excel, some pretty impressive dashboards can be produced.Of course PowerPivot is not a magic bullet and along with potential scalability issues there are the usual issues such as master data management and data quality that cannot be overcome easily with power pivot. As a tool though, it has potential. Traditional BI is expensive, both in terms of time and the amount of resources it takes to deliver the system. The time lag between an analyst or a commercial accountant requesting reports and the report being delivered can make a huge commercial difference. I have observed companies where empowered end users become extremely productive when allowed to plough in to various disperate datasets. It may not be the correct way or the most sustainable but its cheap and quick. In these times when budgets are being slashed and we are forced to deliver more with less, why not empower the end user in a tool that is designed for exactly this task.... @blakmk

Read the article
Object oriented EDI handling in PHP

- by Robert van der Linde

I'm currently starting a new sub project where I will: Retrieve the order information from our mainframe Save the order information to our web-apps' database Send the order as EDI (either D01b or D93a) Receive the order response, despatch advice and invoice messages Do all kinds of fun things with the resulting datasets. However I am struggling with my initial class designs. The order information will be retrieved from the mainframe which will result in a "AOrder" class, this isn't a problem, I am not sure about how to mold this local object into an EDI string. Should I create EDIOrder/EDIOrderResponse/etc classes with matching decorators (EDIOrderD01BDecorator, EDIOrderD93ADecorator)? Do I need builder objects or can I do: // $myOrder is instance of AOrder $myOrder->toEDIOrder(); $decorator = new EDIOrderD01BDecorator($myOrder); $edi = $decorator->getEDIString(); And it'll have to work the other way around as well. Is the following code a good way of handling this problem or should I go about this differently? $ediString = $myEDIMessageBroker->fetch(); $ediOrderResponse = EDIOrderResponse::fromString($ediString); I'm just not so sure about how I should go about designing the classes and interactions between them. Thanks for reading and helping.

Read the article
flat files vs. RDBMS database, few read/writes, few changes

- by Bob Lapique

I have to handle data from long term (years, decades) climate monitoring stations. The data flow usually starts with raw data (voltages, etc.) plus quality check information (pressure, temperature, flow rate, etc.) generally recorded @ 1Hz. Then, the data are assigned a quality flag (human and/or program), processed (apply calibration curves) and flagged. So, we basically end up with 2 datasets : raw and processed data. New data are typically added once a day (~500Ko/day/instrument). Simultaneous queries are not likely to ever happen. I wanted to go for a RDBMS (we have a MySQL server) and have some experience in database design, but the IT guy keeps telling me that flat files will to the job just as well. I suspect him to try to make his life easier when it comes to backup/upgrade the MySQL. There are not so many links between data, they don't change much, but the quality flags will change. A RDBMS is easier to compare data from different instruments on a "many days" scale, compared to daily text files. Well, what would you advise ? Thanks.

Read the article
Testing of visualization projects

- by paxRoman

We develop small to large visualization projects for different tasks and industries and sometimes while rewriting them a couple of times in the process we hit walls because we discover that we need to add a lot of code to support new requirements. Now we have established a design process that seems to work well (at least we reduced the development time for each new project quite a bit), but we're still left scratching our heads around this question: what exactly should we test when testing visualizations? If everything that we want to explore is on the screen (bounded visualizations)? If the data is ok - if data is valid (that's one of the nice things about visualizations you can spot errors in your datasets)? Usability? User interaction? Code quality? I can tell you for sure that a simple check of the code quality is certainly not enough! Is there a classic paper / book about how to test visualizations? Also do you happen to know about classic design patterns for visualizations (except the obvious ones like Pub-Sub)?

Read the article
What to do as a new team lead on a project with maintainability problems?

- by Mr_E

I have just been put in charge of a code project with maintainability problems. What things can I do to get the project on a stable footing? I find myself in a place where we are working with a very large multi-tiered .NET system that is missing a lot of the important things such as unit tests, IOC, MEF, too many static classes, pure datasets etc. I'm only 24 but I've been here for almost three years (this app has been in development for 5) and mostly due to time constraints we've been just adding in more crap to fit the other crap. After doing a number of projects in my free time I have begun to understand just how important all those concepts are. Also due to employee shifting I find myself to now be the team lead on this project and I really want to come up with some smart ways to improve this app. Ways where the value can be explained to management. I have ideas of what I would like to do but they all seem so overwhelming without much upfront gain. Any stories of how people have or would have dealt with this would be a very interesting read. Thanks.

Read the article
High-level strategy for distinguishing a regular string from invalid JSON (ie. JSON-like string detection)

- by Jonline

Disclaimer On Absence of Code: I have no code to post because I haven't started writing; was looking for more theoretical guidance as I doubt I'll have trouble coding it but am pretty befuddled on what approach(es) would yield best results. I'm not seeking any code, either, though; just direction. Dilemma I'm toying with adding a "magic method"-style feature to a UI I'm building for a client, and it would require intelligently detecting whether or not a string was meant to be JSON as against a simple string. I had considered these general ideas: Look for a sort of arbitrarily-determined acceptable ratio of the frequency of JSON-like syntax (ie. regex to find strings separated by colons; look for colons between curly-braces, etc.) to the number of quote-encapsulated strings + nulls, bools and ints/floats. But the smaller the data set, the more fickle this would get look for key identifiers like opening and closing curly braces... not sure if there even are more easy identifiers, and this doesn't appeal anyway because it's so prescriptive about the kinds of mistakes it could find try incrementally parsing chunks, as those between curly braces, and seeing what proportion of these fractional statements turn out to be valid JSON; this seems like it would suffer less than (1) from smaller datasets, but would probably be much more processing-intensive, and very susceptible to a missing or inverted brace Just curious if the computational folks or algorithm pros out there had any approaches in mind that my semantics-oriented brain might have missed. PS: It occurs to me that natural language processing, about which I am totally ignorant, might be a cool approach; but, if NLP is a good strategy here, it sort of doesn't matter because I have zero experience with it and don't have time to learn & then implement/ this feature isn't worth it to the client.

Read the article
Serving east/west coasts with Geoipdns and MaxMind GeoLite data

- by netvope

I want to serve east (west) coast visitors with my Virginia (California) server. To do so, I plan to use Geoipdns and the IP-to-location mappings from MaxMind. MaxMind provide two datasets for free: GeoLite Country and GeoLite City. However, neither of them has east/west coast regions defined. A possible solution is to write a script to combine all the IP ranges for the east/west coast cities in GeoLite City, but that sounds a little bit stupid. What is the best practice in doing this? Any suggestions or alternatives?

Read the article
How do I aggregate results from an Adjacency list using PHP's SPL

- by Stephen J. Fuhry

I've tried using nested sets, and they become very difficult to maintain when dealing with multiple trees and lots of other complications.. I'd like to give PHP's SPL library a stab at this (btw, we are PHP 5.3, MySQL 5.1). Given two datasets: The Groups: +-------+--------+---------------------+---------------+ | id | parent | Category Name | child_key | +-------+--------+---------------------+---------------+ | 11133 | 7707 | Really Cool Products| 47054 | | 7709 | 7708 | 24" Monitors | 57910 | | 7713 | 7710 | Hot Tubs | 35585 | | 7716 | 7710 | Hot Dogs | 00395 | | 11133 | 7707 | Really Cool Products| 66647 | | 7715 | 7710 | Suction Cups | 08396 | +-------+--------+---------------------+---------------+ The Items +------------+------------+-----------+----------+---------+ | child_key | totalprice | totalcost | totalqty | onorder | (jan, feb, mar..) +------------+------------+-----------+----------+---------+ | 24171 | 10.50 | 20.10 | 200 | 100 | | 35685 | 10.50 | 20.10 | 200 | 100 | | 76505 | 10.50 | 20.10 | 200 | 100 | | 04365 | 10.50 | 20.10 | 200 | 100 | | 01975 | 10.50 | 20.10 | 200 | 100 | | 12150 | 10.50 | 20.10 | 200 | 100 | | 40060 | 10.50 | 20.10 | 200 | 100 | | 08396 | 10.50 | 20.10 | 200 | 100 | +------------+------------+-----------+----------+---------+ The figures are actually much more complicated than this (I am actually aggregating a variable amount of months or years over the past 15yrs, so there may need to be 20 columns of aggregated results). I have been trying to figure out RecursiveIterator and IteratorAggregate, but I am having a difficult time finding real world examples that are generic enough to really wrap my head around these classes. Can someone give me a head start?

Read the article
SSRS 2005: Filter Nested Table within a List

- by Even Mien

In SQL Server Reporting Services 2005, how can I filter a nested table within a list? I have 2 datasets. The first, datasetHeader, contains one row per account. The second, datasetDetails contains multiple rows per account. Control: Dataset name List: datasetHeader Table: datasetDetails The table is placed within the list. When I attempt to filter on the table, I get fields from datasetHeader instead of datasetDetails. Previously I had the table within a subreport, and I had that working by using parameters; however, I needed to pull it into the main report because of the implied KeepTogether=true property for subreports that was causing undesired pagination.

Read the article
Algorithm to generate numerical concept hierarchy

- by Christophe Herreman

I have a couple of numerical datasets that I need to create a concept hierarchy for. For now, I have been doing this manually by observing the data (and a corresponding linechart). Based on my intuition, I created some acceptable hierarchies. This seems like a task that can be automated. Does anyone know if there is an algorithm to generate a concept hierarchy for numerical data? To give an example, I have the following dataset: Bangladesh 521 Brazil 8295 Burma 446 China 3259 Congo 2952 Egypt 2162 Ethiopia 333 France 46037 Germany 44729 India 1017 Indonesia 2239 Iran 4600 Italy 38996 Japan 38457 Mexico 10200 Nigeria 1401 Pakistan 1022 Philippines 1845 Russia 11807 South Africa 5685 Thailand 4116 Turkey 10479 UK 43734 US 47440 Vietnam 1042 for which I created the following hierarchy: LOWEST ( < 1000) LOW (1000 - 2500) MEDIUM (2501 - 7500) HIGH (7501 - 30000) HIGHEST ( 30000)

Read the article
Are TClientDataSets part of your toolkit, or have they been replaced by something else?

- by Tom1952

I have 50 or 60 records of four or five fields. I need to load the records into RAM (From a CSV file), search on different fields, enumerate, etc. Not a lot of data, not a lot of functionality. I was all excited to use the new (to me in D2010) TDictionary or TList, but thought that a TClientDataset (which I've never used before) might be more appropriate. With a TClientDataSet, I can use .Locate on any field, enumerate with while NOT CDS.EOF, etc. And, what exactly is this MidasLib that I have to use with CDS? Can I reasonably expect it to be supported in the future? Is TClientDataSet still considered state-of-the-art, or is it showing its age and somewhat deprecated (literally and figuratively)? I've seen colleagues use DX's TdxMemData. Why use it (or any of the other handful of memory datasets I've seen while googling this issue) rather than a CDS? Related question: http://stackoverflow.com/questions/274958/delphi-using-tclientdataset-as-an-in-memory-dataset

Read the article
Metalanguage like BNF or XML-Schema to validate a tree-instance against a tree-model

- by Stefan

Hi! I'm implementing a new machine learning algorithm in Java that extracts a prototype datastructure from a set of structured datasets (tree-structure). As im developing a generic library for that purpose, i kept my design independent from concrete data-representations like XML. My problem now is that I need a way to define a data model, which is basically a ruleset describing valid trees, against which a set of trees is being matched. I thought of using BNF or a similar dialect. Basically I need a way to iterate through the space of all valid TreeNodes defined by the ModelTree (Like a search through the search space for algorithms like A*) so that i can compare my set of concrete trees with the model. I know that I'll have to deal with infinite spaces there but first things first. I know, it's rather tricky (and my sentences are pretty bumpy) but I would appreciate any clues. Thanks in advance, Stefan

Read the article
Using LINQ to fetch result from nested SQL queries

- by Shantanu Gupta

This is my first question and first day in Linq so bit difficult day for me to understand. I want to fetch some records from database i.e. select * from tblDepartment where department_id in ( select department_id from tblMap where Guest_Id = @GuestId ) I have taken two DataTable. i.e. tblDepartment, tblMap Now I want to fetch this result and want to store it in third DataTable. How can I do this. I have been able to construct this query up till now after googling. var query = from myrow in _dtDepartment.AsEnumerable() where myrow.Field<int>("Department_Id") == _departmentId select myrow; Please provide me some link for learning Linq mainly for DataTables and DataSets. EDIT: I have got a very similar example here but i m still not able to understand how it is working. Please put some torch on it.

Read the article
Entity Framework 4 relationship management in POCO Templates - More lazy than FixupCollection?

- by Joe Wood

I've been taking a look at EF4 POCO templates in beta 2. The FixupCollection looks fine for maintaining the model correctness after updating the relationship collection property (i.e. product.Orders it would set the order.Product reference ). But what about support for handling the scenario when some of those Order objects are removed from the context? The use-case of maintaining cascading deletes in the in-memory model. The old Typed DataSet model used to do this by performing the query through the container to derive the relationship results. Like the DataSet, this would require a reference to the ObjectContext inside the entity class so that it could query the top-level Order collection. Better support for Separation of Concerns in the ObjectContext would be required. It looks like EF is not suited to this use-case that DataSets did out of the box.... am I right?

Read the article
Large XML files in dataset (outofmemory)

- by dklein

Hi folks, I am currently trying to load a slightly large xml file into a dataset. The xml file is about 700 MB and every time I try to read the xml it needs plenty of time and after a while it throws an "out of memory" exception. DataSet ds = new DataSet(); ds.ReadXml(pathtofile); The main problem is, that it is necessary for me to use those datasets (I use it to import the data from xml file into a sybase database (foreach table, foreach row, foreach column)) and that I have no scheme file. I already googled a while, but I did only find solutions that won't be usable for me.

Read the article
MVVM Good Design. DataSet or a RowViewModel

- by LnDCobra

I have just started learning MVVM and having a dilemna. If I have a a main ViewModel and inside this model I have a number of datasets. Now should I be creating a new ViewModel for each row inside the dataset? Or expose the DataSet itself as a DependencyProperty? For now the dataset has about 20 rows inside it, and the thought of iterating through each row to create a ViewModel binding to each row.... might not be the best option for performance reasons and memory reasons in the future, like when there are 1000+ rows. Should I still go ahead and create a RowViewModel and iterate through the dataset? And have an ObservableCollection of it or just expose the dataset? Any help would be greatly appreciated.

Read the article
High volume SVM (machine learning) system

- by flyingcrab

I working on a possible machine learning project that would be expected to do high speed computations for machine learning using SVM (support vector machines) and possibly some ANN. I'm resonably comfortable working on matlab with these, but primarly in small datasets, just for experimentation. I'm wondering if this matlab based approach will scale? or should i be looking into something else? C++ / gpu based computing? java wrapping of the matlab code and pushing it onto app engine? Incidentally, there seems to be a lot fo literature on GPUs, but not much on how useful they are on machine learning applications using matlab, & the cheapest CUDA enlabled GPU money can buy? is it even worth the trouble?

Read the article
Professional jQuery based Combobox control?

- by splattne

Are there any professional Combobox controls (dropdown list with autosuggestion) based on the jQuery library? It should be able to handle large datasets and have some skinning options. A multi-column result list would be great too. I'm working with ASP.NET, but it's a not a problem if I had to write a wrapper for it. I'm already using a third-party control, but I ran into some compatibilty issues between two vendor's controls. Well, I want to get rid of this kind of dependencies.

Read the article
Using Solr and Zends Lucene port together...

- by thebluefox

Afternoon chaps, After my adventures with Zend-Lucene-Search, and discovering it isn't all its cracked up to be when indexing large datasets, I've turned to Solr (thanks to Bill Karwin for that :) ) I've got Solr indexing the db far far quicker now, taking just over 8 minutes to index a table of just over 1.7million rows - which I'm very pleased with. However, when I come to try and search the index with the Zend port, I run into the following error; Fatal error: Uncaught exception 'Zend_Search_Lucene_Exception' with message 'Unsupported segments file format' in /var/www/Zend/Search/Lucene.php:407 Stack trace: #0 /var/www/Zend/Search/Lucene.php(555): Zend_Search_Lucene-_readSegmentsFile() #1 /var/www/z_search.php(12): Zend_Search_Lucene-__construct('tmp/feeds_index') #2 {main} thrown in /var/www/Zend/Search/Lucene.php on line 407 I've tried to have a search around but can't seem to find anything about this problem, everyone just seems to be able to get them to work? Any help as always much appreciated :) Thanks, Tom

Read the article
Is there an extensible SQL like query language that is safe for exposing via a public API?

- by Lokkju

I want to expose some spatial (and a few non-spatial) datasets via a public API. The backend store will either be PostgreSQL/PostGIS, sqlite/spatialite, or CouchDB/GeoCouch. My goal is to find a some, preferably standard, way to allow people to make complex spatial queries against the data. I would like it to be a simple GET based request. The idea is to allow safe SQL type queries, without allowing unsafe ones. I would rather modify something that is off the shelf than doing the entire thing myself. I specifically want to support requesting specific fields from a table; joining results; and spatial functions that are already implemented by the underlying datastore. Ideas anyone?

Read the article

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >