Search Results

Search found 36 results on 2 pages for 'datamining'.

Page 1/2 | 1 2 | Next Page >

Best DataMining Database

- by Eric

I am an ocasional Python programer who only have worked so far with MYSQL or SQLITE databases. I am the computer person for everything in a small compamy and I have been started a new project where I think it is about time to try new databases. Sales departament makes a CSV dump every week and I need to make a small scripting application that allow people form other departaments mixing the information, mostly linking the records. I have all this solved, my problem is the speed, I am using just plain text files for all this and unsurprisingly it is very slow. I thought about using mysql, but then I need installing mysql in every desktop, sqlite is easier, but it is very slow. I do not need a full relational database, just some way of play with big amounts of data in a decent time. Many thanks!

Read the article
DataMining / Analyzing responses to Multiple Choice Questions in a survey

- by Shailesh Tainwala

Hi, I have a set of training data consisting of 20 multiple choice questions (A/B/C/D) answered by a hundred respondents. The answers are purely categorical and cannot be scaled to numerical values. 50 of these respondents were selected for free product trial. The selection process is not known. What interesting knowledge can be mined from this information? The following is a list of what I have come up with so far- A study of percentages (Example - Percentage of people who answered B on Qs.5 and got selected for free product trial) Conditional probabilities (Example - What is the probability that a person will get selected for free product trial given that he answered B on Qs.5) Naive Bayesian classifier (This can be used to predict whether a person will be selected or not for a given set of values for any subset of questions). Can you think of any other interesting analysis or data-mining activities that can be performed? The usual suspects like correlation can be eliminated as the response is not quantifiable/scoreable. Is my approach correct?

Read the article
Datamining on a mysql database

- by sliptix

Hello, I Begin with textmining. I have two database tables with thousands of data.. a table for "skills" and a table for "skills categories" every "skill" belongs to a skills categorie. a "skill" is , physicaly, a varchar(200) field in the database, where there is some text describing the skill. Here are some skills extracted from the skills table: "PHP (good level), Java (intermediaite), C++" "PHP5" "project management and quality management" "begining Javascript" "water engineering" "dfsdf zerze rzer" "cibling customers" what i want to do is to extract knowledge from those fields, i mean extract only the real skill and ignore the rest of useless text. for the above example i want to get only an array with: "PHP" "Java" "C++" "PHP5" "project management" "quality management" "Javascript" "water engineering" "cibling customers" what should i do to extract the skills from tons of data please ? do you know specific algorithms to do this ? ex : k-means ... ? Thanks in advance.

Read the article
Datamining library for .NET

- by ryudice

Hi, Does anybody know about any dataming libraries for .net?

Read the article
Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

- by Registered User

Hi All! I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in new line the last column will be diagnosis 0 or 1, in the testing part of data diagnosis column will be empty - data set contain about 1000 records) and then make predictions in testing part of data :/ I've never done anything similar so I'll appreciate any advice or information about solution to similar problem. I was thinking about Java Machine Learning Library or Java Data Mining Package but I'm not sure if it's right direction... ? and I'm still not sure how to tackle this challenge... Please advise. All the best!

Read the article
Detecting similar words among n text documents

- by javanes

Hi; I have n documents and want to find common words that are included in these documents. For example I want to say (n-3) documents include the word "web". Certainly I can do this by basic data structures but there maybe efficient algorithm or a way to handle same words with different suffix. Is there any algorithm for such purposes? I am unfamiliar with datamining world. In general manner is there a term used for efforts of finding similarities between different documents? If there is then I will make my research easily. Thanks.

Read the article
Does data mining qualify as an abuse?

- by Hybryd

Hi all, today I had a strange experience with my ISP. They disabled my password for internet connection, and when I called them, they enabled it again, but they didn't say why it happened. In the last couple of days I was running a data mining that I made for one forum to get some useful info about business that I'm in. So I thought, maybe my ISP figured that 10,000 page requests in couple of hours to the same site may be some kind of attack. What do you think, does it qualify as an attack? Is it even ok to data mine in that way?

Read the article
Overwhelmed by Machine Learning---is there an ML101 book?

- by StackUnderflow

It seems like there are so many subfields linked to Machine Learning. Is there a book or a blog that gives an overview of those different fields and what each of them do, maybe how to get started, and what background knowledge is required?

Read the article
Sparse parameter selection using Genetic Algorithm

- by bgbg

Hello, I'm facing a parameter selection problem, which I would like to solve using Genetic Algorithm (GA). I'm supposed to select not more than 4 parameters out of 3000 possible ones. Using the binary chromosome representation seems like a natural choice. The evaluation function punishes too many "selected" attributes and if the number of attributes is acceptable, it then evaluates the selection. The problem is that in these sparse conditions the GA can hardly improve the population. Neither the average fitness cost, nor the fitness of the "worst" individual improves over the generations. All I see is slight (even tiny) improvement in the score of the best individual, which, I suppose, is a result of random sampling. Encoding the problem using indices of the parameters doesn't work either. This is most probably, due to the fact that the chromosomes are directional, while the selection problem isn't (i.e. chromosomes [1, 2, 3, 4]; [4, 3, 2, 1]; [3, 2, 4, 1] etc. are identical) What problem representation would you suggest? P.S If this matters, I use PyEvolve.

Read the article
What data mining tools do you use?

- by python dude

Hello everyone, Besides the two well-known Open Source tools RapidMiner and Weka, are there any other good tools (either Open Source or Commercial), which you can recommend for data mining? Thanks in advance!

Read the article
Text mining on large database (data mining)

- by yox

Hello, I have a large database of resumes (CV), and a certain table skills grouping all users skills. inside that table there's a field skill_text that describes the skill in full text. I'm looking for an algorithm/software/method to extract significant terms/phrases from that table in order to build a new table with standarized skills.. Here are some examples skills extracted from the DB : Sectoral and competitive analysis Business Development (incl. in international settings) Specific structure and road design software - Microstation, Macao, AutoCAD (basic knowledge) Creative work (Photoshop, In-Design, Illustrator) checking and reporting back on campaign progress organising and attending events and exhibitions Development : Aptana Studio, PHP, HTML, CSS, JavaScript, SQL, AJAX Discipline: One to one marketing, E-marketing (SEO & SEA, display, emailing, affiliate program) Mix marketing, Viral Marketing, Social network marketing. The output shoud be something like : Sectoral and competitive analysis Business Development Specific structure and road design software - Macao AutoCAD Photoshop In-Design Illustrator organising events Development Aptana Studio PHP HTML CSS JavaScript SQL AJAX Mix marketing Viral Marketing Social network marketing emailing SEO One to one marketing As you see only skills remains no other representation text. I know this is possible using text mining technics but how to do it ? the database is realy large.. it's a good thing because we can calculate text frequency and decide if it's a real skill or just meaningless text... The big problem is .. how to determin that "blablabla" is a skill ? thanks

Read the article
Mining Groups of people from Wikipedia

- by AlgoMan

I am trying to get the list of people from the http://en.wikipedia.org/wiki/Category:People_by_occupation . I have to go through all the sections and get people from each section. How should i go about it ? Should i use a crawler and get the pages and search through those using BeautifulSoup ? Or is there any other alternative to get the same from Wikipedia ?

Read the article
browse data in Android SQLite Database

- by Justin C

Is there a way for an Android user to browse the SQLite databases on his/her phone and view the data in the databases? I use the SoftTrace beta program a lot. It's great but has no way that I can find to download the data it tracks to a PC. Thanks

Read the article
Netflix Dataset

- by user108088

I know the Netflix dataset has been removed from the Netflix site as well as from the UCI repository, is there any other site that still has it. If it isn't anywhere online, would someone who still has the dataset be willing to make it into a torrent?

Read the article
Data Mining - Predictive Analysis

- by IMHO

We are looking at acquiring Data Mining software to primarily run predictive analysis processes. How does SQL Server Data Mining solution compares to other solutions like SPSS from IBM? Since SQL Server DM is included in SQL Server Enterprise license - what would be the justification to spend extra couple 100K to buy separate software just to do DM?

Read the article
Software to find dependencys in a database that not have them as restrictions

- by Tomas Friden

I have a database in SQL server 2005 that originaly comes fom an old mainframe. All relations was set in the surrounding software and there are non i the database. I need to find the relations, not by field name but by actual contence in the registers. (as suggestions, I realize I'l have to check them up) It would be nice with some extras, like finding motherless posts etc. but not a primary demand. There are some 200 registers with max 2 000 000 posts in any register. Does any one know of a software that can help me with this? The software I can find presuposes that relations are set in the database :(

Read the article
Retail knowledge inference

- by blueomega

So i am doing a research on how can i infer knowledge from reports (not with a specific format), but after pre processing, i should have some kind of formatted data. A fairly basic inference would be: "Retailer has X stock." and "X is sellable." - "Retailer sells X" the knowledge i focus is retail domain oriented, and if possible i should improve its efficiency with each iteration. Is this scifi(some of my friends think it is)? The related stuff i find online are "expert systems" that find anomalies, fuzzy inference systems and some rants about "easy knowledge". Can you help me with some points for me to focus or orient me in some research directions? blueomega

Read the article
Data Mining open source tools

- by Andriyev

Hi I'm due to take up a project which is into data mining. Before I jump in I wanted to probe around for different data mining tools (preferably open source) which allows web based reporting. In my scenario the all the data would be provided to me, so I'm not supposed to crawl for it. In n nutshell, am looking for a tool which does - Data Analysis, Web based Reporting, provides some kind of a dashboard and mining features. I have worked on the Microsoft Analysis Services and BOXI and off late I have been looking at Pentaho, which seems to be a good option. Please share your experiences on any such tool which you know of. cheers

Read the article
Best XML format for log events in terms of tool support for data mining and visualization?

- by Thorbjørn Ravn Andersen

We want to be able to create log files from our Java application which is suited for later processing by tools to help investigate bugs and gather performance statistics. Currently we use the traditional "log stuff which may or may not be flattened into text form and appended to a log file", but this works the best for small amounts of information read by a human. After careful consideration the best bet has been to store the log events as XML snippets in text files (which is then treated like any other log file), and then download them to the machine with the appropriate tool for post processing. I'd like to use as widely supported an XML format as possible, and right now I am in the "research-then-make-decision" phase. I'd appreciate any help both in terms of XML format and tools and I'd be happy to write glue code to get what I need. What I've found so far: log4j XML format: Supported by chainsaw and Vigilog. Lilith XML format: Supported by Lilith Uninvestigated tools: Microsoft Log Parser: Apparently supports XML. OS X log viewer: plus there is a lot of tools on http://www.loganalysis.org/sections/parsing/generic-log-parsers/ Any suggestions?

Read the article
'Similarity' in Data Mining

- by Shailesh Tainwala

In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? If yes, what does it deal with. Any examples, links, references will be helpful. Also, being new to the field, I would like the community opinion on how closely related Data Mining and Artificial Intelligence are. Are they synonyms, is one the subset of the other? Thanks in advance for sharing your knowledge.

Read the article
How to apply Data Mining (Association Rule) to a huge database ?

- by stckvrflw

Hello What I want to do is to apply Association method of data mining on my SQL Server 2000 database. Association rule is something like "finding the most frequent items that appear together in database." For those who don't know or who want to remember what is association method is like, take a look at this presentation about Association rule in Data Mining. www.authorstream.com/Presentation/a.besimi-233030-data-mining-intro-seeu-education-ppt-powerpoint 17th slide gives a nice example of applying association rule on a database. So Can you help me about how should I write my SQL codes (If they will be sufficient of course) Thanks.

Read the article
Languages for implementing decision trees

- by Shailesh Tainwala

What would be a good choice of programming language in which to implement a decision tree? The results of the implementation will be for personal use only, so no need to consider ability to publish etc. I have heard that Octave is a good option, can anyone explain why a matrix based language is recommended for implementing decision trees?

Read the article
Performing a SVD on tweets. Memory problem

- by plotti

I have generated a huge csv file as an output from my pos tagging and stemming. It looks like this: word1, word2, word3, ..., word14400 person1 1 2 0 1 person2 0 0 1 0 ... person650 It contains the word counts for each person. Like this I am getting characteristic vectors for each person. I want to run a SVD on this beast, but it seems the matrix is too big to be held in memory to perform the operation. My quesion is: should i reduce the column size by removing words which have a column sum of for example 1, which means that they have been used only once. Do I bias the data too much with this attempt? I tried the rapidminer attempt, by loading the csv into the db. and then sequentially reading it in with batches for processing, like rapidminer proposes. But Mysql can't store that many columns in a table. If i transpose the data, and then retranspose it on import it also takes ages.... -- So in general I am asking for advice how to perform a svd on such a corpus.

Read the article
How to avoid becoming a programmer while still beign closely involved with computer science/Industry

- by WeShallOvercome

I am studying computer science (A masters at an Ivy league), however most of the jobs i find involve way too much of programming. And frankly programming is not an issue, however programming without a meaning (read financial institution (non trading), other non mainstream jobs) bore me to death! I dont want to end up becoming a .NET,C#, Java kind of programmer. Can someone tell where i should look for jobs if i wish to do some real computer science work such as Machine Learning etc. I don't mind programming but becoming a Financial Software dev at Bloomeberg or an SDET at Microsoft isn't actually one of my goals. [note: I have interviewed for intern both positions listed above, and thankfully i got an intern for a data mining position in a top 750 Alexa rank web company] Sorry if angered anyone with a subjective question

Read the article
Keyword sorting algorithm

- by Nai

I have over 1000 surveys, many of which contains open-ended replies. I would like to be able to 'parse' in all the words and get a ranking of the most used words (disregarding common words) to spot a trend. How can I do this? Is there a program I can use? EDIT If a 3rd party solution is not available, it would be great if we can keep the discussion to microsoft technologies only. Cheers.

Read the article

1 2 | Next Page >