cluster analysis - Page 150

Interesting questions related to lighttpd on Amazon EC2

- by terence410

This problem appeared today and I have no idea what is going on. Please share you ideas. I have 1 EC2 DB server (MYSQL + NFS File Sharing + Memcached). And I have 3 EC2 Web servers (lighttpd) where it will mounted the NFS folders on the DB server. Everything going smoothly for months but suddenly there is an interesting phenomenon. In every 8 minutes to 10 minutes, PHP file will be unreachable. This will last about 1 minute and then back to normal. Normal files like .html file are unaffected. All servers have the same problem exactly at the same time. I have spent one whole day to analysis the reason. Finally, I find out when the problem appear, the file descriptor of lighttpd suddenly increased a lot. I used ls /proc/1234/fd | wc -l to check the number of fd. The # of fd is around 250 in normal time. However, when the problem appeared, it will be raised to 1500 and then back to normal. It sounds funny, right? Do you have any idea what's going on? ======================== The CPU graph of one of the web server.

Read the article

Debugging Actionmailer

- by Trip

I have actionmailer set up. Emails are not being sent, and no errors. Where can I start my search to debug this? class Notifier < ActionMailer::Base default_url_options[:host] = APP_DOMAIN def email_blast(user, subject, message) subject subject from NOTIFIER_EMAIL recipients user.email sent_on Time.zone.now body :user => user.first_name + ' ' + user.last_name, :message => message end I do get a return in my log that the email was sent, just no actual email goes through. Also the reason, that this is not working is because I switched form a cluster to a solo box and some server settings were overwritten. I suspect that is probably the reason why this is not working. Anyone know what specific server settings I would have to look at ? UPDATE: ActionMailer::Base.delivery_method = :sendmail config.action_mailer.default_url_options = { :host => "75.101.153.93" } I found this in my production.rb . This code was originally here when it worked. Again, I believe that there must be something missing on my server..I did a 'which sendmail' and it returned /usr/bin/sendmail , so I added this : config.action_mailer.raise_delivery_errors = false config.action_mailer.perform_deliveries = true config.action_mailer.sendmail_settings = { :location => '/usr/bin/sendmail', :arguments => '-i -t' } Redeployed, restarted the server, and tested it. No emails were sent. The production.log said something was sent : Processing MediaController#create_a_video (for 173.161.167.41 at 2010-06-03 11:58:13) [GET] Parameters: {"action"=>"create_a_video", "controller"=>"media", "organization_id"=>"470", "_"=>"1275591493194"} Sent mail to [email protected] Rendering media/create_a_video Completed in 128ms (View: 51, DB: 1) | 200 OK [http://invent.hqchannel.com/organizations/470/media/create_a_video?_=1275591493194]

Read the article

mysql subselect alternative

- by Arnold

Hi, Lets say I am analyzing how high school sports records affect school attendance. So I have a table in which each row corresponds to a high school basketball game. Each game has an away team id and a home team id (FK to another "team table") and a home score and an away score and a date. I am writing a query that matches attendance with this seasons basketball games. My sample output will be (#_students_missed_class, day_of_game, home_team, away_team, home_team_wins_this_season, away_team_wins_this_season) I now want to add how each team did the previous season to my analysis. Well, I have their previous season stored in the game table but i should be able to accomplish that with a subselect. So in my main select statement I add the subselect: SELECT COUNT(*) FROM game_table WHERE game_table.date BETWEEN 'start of previous season' AND 'end of previous season' AND ( (game_table.home_team = team_table.id AND game_table.home_score > game_table.away_score) OR (game_table.away_team = team_table.id AND game_table.away_score > game_table.home_score)) In this case team-table.id refers to the id of the home_team so I now have all their wins calculated from the previous year. This method of calculation is neither time nor resource intensive. The Explain SQL shows that I have ALL in the Type field and I am not using a Key and the query times out. I'm not sure how I can accomplish a more efficient query with a subselect. It seems proposterously inefficient to have to write 4 of these queries (for home wins, home losses, away wins, away losses). I am sure this could be more lucid. I'll absolutely add color tomorrow if anyone has questions

Read the article

Which version of Grady Booch's OOA/D book should I buy?

- by jackj

Grady Booch's "Object-Oriented Analysis and Design with Applications" is available brand new in both the 2nd edition (1993) and the 3rd edition (2007), while many used copies of both editions are available. Here are my concerns: 1) The 2nd edition uses C++: given that I just finished reading my first two C++ books (Accelerated C++ and C++ Primer) I guess practical tips can only help, so the 2nd edition is probably best (I think the 3rd edition has absolutely no code). On the other hand, the C++ books I read insist on the importance of using standard C++, whereas Booch's 2nd edition was published before the 1998 standard. 2) The 2nd edition is shorter (608 pages vs. 720) so, I guess, it will be slightly easier to get through. 3) The 3rd edition uses UML 2.0, whereas the 2nd edition is pre-UML. Some reviews say that the notation in the 2nd edition is close enough to UML, so it doesn't matter, but I don't know if I should be worrying about this or not. 4) The 2nd edition is available in good-shape used copies for considerably less than what the 3rd one goes for. Given all the above factors, do you think I should buy the 2nd or the 3rd edition? Recommendations on other books are also welcome but I would prefer it if whoever answers has read at least one of the versions of Booch's book (preferably both!). I have already bought but not read GoF and Riel's books. I also know that I should practice a lot with real-life code. Thanks.

Read the article

Hadoop/MapReduce: Reading and writing classes generated from DDL

- by Dave

Hi, Can someone walk me though the basic work-flow of reading and writing data with classes generated from DDL? I have defined some struct-like records using DDL. For example: class Customer { ustring FirstName; ustring LastName; ustring CardNo; long LastPurchase; } I've compiled this to get a Customer class and included it into my project. I can easily see how to use this as input and output for mappers and reducers (the generated class implements Writable), but not how to read and write it to file. The JavaDoc for the org.apache.hadoop.record package talks about serializing these records in Binary, CSV or XML format. How do I actually do that? Say my reducer produces IntWritable keys and Customer values. What OutputFormat do I use to write the result in CSV format? What InputFormat would I use to read the resulting files in later, if I wanted to perform analysis over them?

Read the article

Access Violation in std::pair

- by sameer karjatkar

I have an application which is trying to populate a pair. Out of nowhere the application crashes. The Windbg analysis on the crash dump suggests: PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ DEFAULT_BUCKET_ID: INVALID_POINTER_READ STACK_TEXT: 0389f1dc EPFilter32!std::vector<std::pair<unsigned int,unsigned int>,std::allocator<std::pair<unsigned int,unsigned int> > >::size+0xc INVALID_POINTER_READ_c0000005_Test.DLL!std::vector_std::pair_unsigned_int, unsigned_int_,std::allocator_std::pair_unsigned_int,unsigned_int_____::size Following is the code snap in the code where it fails: for (unsigned i1 = 0; i1 < size1; ++i1) { for (unsigned i2 = 0; i2 < size2; ++i2) { const branch_info& b1 = en1.m_branches[i1]; //Exception here :crash const branch_info& b2 = en2.m_branches[i2]; } } where branch_info is std::pair<unsigned int,unsigned int> and the en1.m_branches[i1] fetches me a pair value.

Read the article

How can I visualise a "broken" hierarchical dataset?

I have a reasonably large datatable structured something like this: StaffNo Grade Direct Boss2 Boss3 Boss4 Boss5 Boss6 ------- ----- ----- ----- ----- ----- ----- ----- 10001 1 10002 10002 10057 10094 10043 10099 10002 2 10057 NULL 10057 10094 10043 10099 10003 1 10004 10004 10057 10094 10043 10099 10004 2 10057 NULL 10057 10094 10043 10099 10057 3 10094 NULL NULL 10094 10043 10099 etc.... i.e. a unique id , their level (grade) in the hierarchy, a record of their bosses ID and the IDs of the supervisors above. (The 2,3,4, etc refers to the boss at that particular grade). The system relies on a strict hierarchy - if you are my boss (/parent) then your boss must be my grandparent. Unfortunately this rule is not enforced within the data model and the data ultimately comes from other systems which don't even know about the rule, let alone observe it. So you and I may share the same boss, but our bosses boss won't be the same. note: I cannot change the data model I cannot fix the data at source. So (for the moment) I have to fix the data once it's in place. Once a fortnight someone will do something which breaks the model and I'll need to modify the procs slightly to resolve. Not ideal, but I'm stuck with this for the next six months. Anyway, specific queries are easy to produce but I find it hard to keep track of the bigger picutre. The application which sits on this runs without complaint regardless but navigating around the system becoming extraordinarily confusing. So my question is: Can anyone recommend a tool (or technique) for generating some kind of "broken tree" diagram in this sort of circumstances? I don't want something that will fix things for me, or attempt statistical analysis but at least something that will give a visual indication of how broken it is at any one time. Note : At the moment this is in a SQL Server database but I'm open to ideas utilising C#, Perl or Python.

Read the article

Handling missing/incomplete data in R--is there function to mask but not remove NAs?

- by doug

As you would expect from a DSL aimed at data analysis, R handles missing/incomplete data very well, for instance: Many R functions have an 'na.rm' flag that you can set to 'T' to remove the NAs: mean( c(5,6,12,87,9,NA,43,67), na.rm=T) But if you want to deal with NAs before the function call, you need to do something like this: to remove each 'NA' from a vector: vx = vx[!is.na(a)] to remove each 'NA' from a vector and replace it w/ a '0': ifelse(is.na(vx), 0, vx) to remove entire each row that contains 'NA' from a data frame: dfx = dfx[complete.cases(dfx),] All of these functions permanently remove 'NA' or rows with an 'NA' in them. Sometimes this isn't quite what you want though--making an 'NA'-excised copy of the data frame might be necessary for the next step in the workflow but in subsequent steps you often want those rows back (e.g., to calculate a column-wise statistic for a column that has missing rows caused by a prior call to 'complete cases' yet that column has no 'NA' values in it). to be as clear as possible about what i'm looking for: python/numpy has a class, 'masked array', with a 'mask' method, which lets you conceal--but not remove--NAs during a function call. Is there an analogous function in R?

Read the article

Hadoop reduce task gets hung

- by user806098

I set up a hadoop cluster with 4 nodes, When running a map-reduce task, the map task finishes quickly, while the reduce task hangs at 27% percent. I checked the log, it's that the reduce task fails to fetch map output from map nodes. The job tracker log of master shows messages like this: 2011-06-27 19:55:14,748 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201106271953_0001_r_000000_0' to tip task_201106271953_0001_r_000000, for tracker 'tracker_web30.bbn.com.cn:localhost/127.0.0.1:56476' And the name node log of master shows messages like this: 2011-06-27 14:00:52,898 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310, call register(DatanodeRegistration(202.106.199.39:50010, storageID=DS-1989397900-202.106.199.39-50010-1308723051262, infoPort=50075, ipcPort=50020)) from 192.168.225.19:16129: error: java.io.IOException: verifyNodeRegistration: unknown datanode 202.106.199.3 9:50010 However, neither the "web30.bbn.com.cn" or 202.106.199.39, 202.106.199.3 is the slave node. I think such ip/domains appear because hadoop fails to resolve a node(first in the Intranet DNS server), then it goes to a higher-level DNS server, later to the top, still fails, then the "junk" ip/domains are returned. But I checked my config, it goes like this: /etc/hosts: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 192.168.225.16 master 192.168.225.66 slave1 192.168.225.20 slave5 192.168.225.17 slave17 conf/core-site.xml: hadoop.tmp.dir /root/hadoop_tmp/hadoop_${user.name} fs.default.name hdfs://master:54310 io.sort.mb 1024 hdfs-site.xml: dfs.replication 3 masters: master slaves: master slave1 slave5 slave17 Also, all firewalls(iptables) are turned off, and ssh between each 2 nodes is ok. so I don't know where exact the error comes from. Please help. Thanks a lot.

Read the article

Generating easy-to-remember random identifiers

- by Carl Seleborg

Hi all, As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2". My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember. Examples would be "azil3", "ulmops", "fel2way", etc. I just made these up, but they are much easier to recognize when you see many of them at once. I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose. Do you know of anything else? Thanks! Carl

Read the article

Strange Matlab error: "??? Subscript indices must either be real positive integers or logicals"

- by Roee Adler

I have a function func that returns a vector a. I usually plot a and then perform further analysis on it. I have a certain scenario when once I try to plot a, I get a "??? Subscript indices must either be real positive integers or logicals" error. Take a look at the following piece of code to see the vector's behavior: K>> a a = 5.7047 6.3529 6.4826 5.5750 4.1488 5.8343 5.3157 5.4454 K>> plot(a) ??? Subscript indices must either be real positive integers or logicals. K>> for i=1:length(a); b(i) = a(i); end; K>> b b = 5.7047 6.3529 6.4826 5.5750 4.1488 5.8343 5.3157 5.4454 K>> plot(b) ??? Subscript indices must either be real positive integers or logicals. The scenario where this happens is when I call function func from within another function (call it outer_func), and return the result directly as outer_func's result. When debugging inside outer_func, I can plot a properly, but outside the scope of outer_func, its result has the above behavior. What can cause this? Where do I start from?

Read the article

Advanced All In One .NET Framework

- by alfredo dobrekk

Hi, i m starting a new project that would basically take input from user and save them to database among about 30 screens, and i would like to find a framework that will allow the maximum number of these features out of the box : .net c#. windows form. unit testing continuous integration screens with lists, combo boxes, text boxes, add, delete, save, cancel that are easy to update when you add a property to your classes or a field to your database. auto completion on controls to help user find its way use of an orm like nhibernate easy multithreading and display of wait screens for user easy undo redo tabbed child windows search forms ability to grant access to some functionnalities according to user profiles mvp/mvvm or whatever design patterns either some code generation from database to c# classe or generation of database schema from c# classes some kind of database versioning / upgrade to easily update database when i release patches to application once in production automatic control resizing code metrics analysis some code generator i can use against my entities that would generate some rough form i can rearrange after code documentation generator ... Any ideas ? I know its lot but i really would like to use existing code to build upon so i can focus on business rules. Could splitting the requirements on 3 or 4 existing open source framework be possible ? Do u have any suggestion to add to the list before starting ? What open source tools would u use to achieve these ?

Read the article

Generic dataset handling library

- by Pep.

Hello, I want to build a generic Perl module for handling and analysing biomedical character separated datasets and which can, most certain, be used on any kind of datasets that contain a mixture of categorical (A,B,C,..) and continuous (1.2,3,881..) and identifier (XXX1,XXX2...). The plan is to have people initialize the module and then use some arguments to point to the data file(s), the place were the analysis reports should be placed and the structure of the data. By structure of data I mean which variable is in which place and its name/type. And this is where I need some enlightenment. I am baffled how to do this in a clean way. Obviously, having people create a simple schema file, be it XML or some other format would be the cleanest but maybe not all people enjoy doing something like this. The solutions I can think of are: Create a configuration file in XML or similar and with a prespecified format. Pass the information during initialization of the module. Use the first row of the data as headers and try to guess types (ouch) Surely there must be a "canonical" way of doing this that is also usable and efficient. Thanks p.

Read the article

How can I superimpose modified loess lines on a ggplot2 qplot?

- by briandk

Background Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-) My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make sure that: The residuals don't systematically covary with the predictor The residuals are homoscedastic with respect to each predictor in the model On point (2), my textbook has this to say: Some statistical packages allow the analyst to plot lowess fit lines at the mean of the residuals (0-line), 1 standard deviation above the mean, and 1 standard deviation below the mean of the residuals....In the present case {their example}, the two lines {mean + 1sd and mean - 1sd} remain roughly parallel to the lowess {0} line, consistent with the interpretation that the variance of the residuals does not change as a function of X. (p. 131) How can I modify loess lines? I know how to generate a scatterplot with a "0-line,": # First, I'll make a simple linear model and get its diagnostic stats library(ggplot2) data(cars) mod <- fortify(lm(speed ~ dist, data = cars)) attach(mod) str(mod) # Now I want to make sure the residuals are homoscedastic qplot (x = dist, y = .resid, data = mod) + geom_smooth(se = FALSE) # "se = FALSE" Removes the standard error bands But does anyone know how I can use ggplot2 and qplot to generate plots where the 0-line, "mean + 1sd" AND "mean - 1sd" lines would be superimposed? Is that a weird/complex question to be asking?

Read the article

How to keep your unit test Arrange step simple and still guarantee DDD invariants ?

- by ian31

DDD recommends that the domain objects should be in a valid state at any time. Aggregate roots are responsible for guaranteeing the invariants and Factories for assembling objects with all the required parts so that they are initialized in a valid state. However this seems to complicate the task of creating simple, isolated unit tests a lot. Let's assume we have a BookRepository that contains Books. A Book has : an Author a Category a list of Bookstores you can find the book in These are required attributes : a book has to have an author, a category and at least a book store you can buy the book from. There's likely to be a BookFactory since it is quite a complex object, and the Factory will initialize the Book with at least all the mentioned attributes. Now we want to unit test a method of the BookRepository that returns all the Books. To test if the method returns the books, we have to set up a test context (the Arrange step in AAA terms) where some Books are already in the Repository. If the only tool at our disposal to create Book objects is the Factory, the unit test now also uses and is dependent on the Factory and inderectly on Category, Author and Store since we need those objects to build up a Book and then place it in the test context. Would you consider this is a dependency in the same way that in a Service unit test we would be dependent on, say, a Repository that the Service would call ? How would you solve the problem of having to re-create a whole cluster of objects in order to be able to test a simple thing ? How would you break that dependency and get rid of all these attributes we don't need in our test ? By using mocks or stubs ? If you mock up things a Repository contains, what kind of mock/stubs would you use as opposed to when you mock up something the object under test talks to or consumes ?

Read the article

Deterministic and non uniform long string generation from seed

- by Limonup

I had this weird idea for an encryption that I wanted to try out, it may be bad, and it may have done before, but I'm just doing it for fun. The short version of the question is: Is it possible to generate a long, deterministic and non-uniformly distributed string/sequence of numbers from a small seed? Long(er) version: I was thinking to encrypt a text by changing encoding. The new encoding would be generated via Huffman algorithm. To work well, the Huffman algorithm would need a fairly long text with non uniform distribution. Then characters can have different bit-lengths which would be the primary strength of this encryption. The problem is that its impractical to enter in/remember a long text each time you want to decrypt the text. So I was wondering if it was possible to generate a text from password seed? It doesn't matter what the text is, as long as it has non uniform distribution of characters and that the exact same sequence can be recreated each time you give it the same seed. Preferably, are there any functions/extensions in Python that can do this? EDIT: To expand on the "strength" of varying bit length: if I have a string "test", ASCII values 116, 101, 115, 116, which gives bit values of 1110100 1100101 1110011 1110100 Then, say my Huffman algorithm generates encoding like t = 101 e = 1100111 s = 10001 The final string is 101 1100111 10001 101, if we encode this back to ASCII, we get 1011100 1111000 1101000, which is 3 entirely different characters. Obviously its impossible to perform any kind of frequency analysis or something like that on this.

Read the article

html body inside if condition

- by gautam

Hi, i have two different bodies for different conditions. can i do like this.... <% if(yes) %> <body onload="JavaScript:timedRefresh(120000);"> <% } else { %> <body> <% } %> Please help me... Thanks in advance.. I want to compare my title in if condition how can i do??? currently i am doing like this: <head> <title> <tiles:getAsString name="title"/> </title> </head> <% if(%><tiles:getAsString name="title"/><%.equals("Trade Confirmation Analysis Screen")) %> <body onload="JavaScript:timedRefresh(120000);"> <%else { %> <body> <% } %> Its giving me error..Unable to compile class for JSP:

Read the article

writing into csv file

- by arin

I recived alarms in a form of text as the following NE=JKHGDUWJ3 Alarm=Data Center STALZ AC Failure Occurrence=2012/4/3 22:18:19 GMT+02:00 Clearance=Details=Cabinet No.=0, Subrack No.=40, Slot No.=0, Port No.=5, Board Type=EDS, Site No.=49, Site Type=HSJYG500 GSM, Site Name=Google1 . I need to fill it in csv file so later I can perform some analysis I came up with this namespace AlarmSaver { public partial class Form1 : Form { string filesName = "c:\\alarms.csv"; public Form1() { InitializeComponent(); } private void buttonSave_Click(object sender, EventArgs e) { if (!File.Exists(filesName)) { string header = "NE,Alarm,Occurrence,Clearance,Details"; File.AppendAllText(filesName,header+Environment.NewLine); } StringBuilder sb = new StringBuilder(); string line = textBoxAlarm.Text; int index = line.IndexOf(" "); while (index > 0) { string token = line.Substring(0, index).Trim(); line = line.Remove(0, index + 2); string[] values = token.Split('='); if (values.Length ==2) { sb.Append(values[1] + ","); } else { if (values.Length % 2 == 0) { string v = token.Remove(0, values[0].Length + 1).Replace(',', ';'); sb.Append(v + ","); } else { sb.Append("********" + ","); string v = token.Remove(0, values[0].Length + 1 + values[1].Length + 1).Replace(',', ';'); sb.Append(v + ","); } } index = line.IndexOf(" "); } File.AppendAllText(filesName, sb.ToString() + Environment.NewLine); } } } the results are as I want except when I reach the part of Details=Cabinet No.=0, Subrack No.=40, Slot No.=0, Port No.=5, Board Type=KDL, Site No.=49, Site Type=JDKJH99 GSM, Site Name=Google1 . I couldnt split them into seperate fields. as the results showing is as I want to split the details I want each element in details to be in a column to be somthin like its realy annoying :-) please help , thank you in advance

Read the article

Raw types and subtyping

- by Dmitrii

We have generic class SomeClass<T>{ } We can write the line: SomeClass s= new SomeClass<String>(); It's ok, because raw type is supertype for generic type. But SomeClass<String> s= new SomeClass(); is correct to. Why is it correct? I thought that type erasure was before type checking, but it's wrong. From Hacker's Guide to Javac When the Java compiler is invoked with default compile policy it performs the following passes: parse: Reads a set of *.java source files and maps the resulting token sequence into AST-Nodes. enter: Enters symbols for the definitions into the symbol table. process annotations: If Requested, processes annotations found in the specified compilation units. attribute: Attributes the Syntax trees. This step includes name resolution, type checking and constant folding. flow: Performs data ow analysis on the trees from the previous step. This includes checks for assignments and reachability. desugar: Rewrites the AST and translates away some syntactic sugar. generate: Generates Source Files or Class Files. Generic is syntax sugar, hence type erasure invoked at 6 pass, after type checking, which invoked at 4 pass. I'm confused.

Read the article

Which to use, XMP or RDF?

- by zotty

What's the difference between RDF and XMP? From what I can tell, XMP is derived from RDF... so what does it offer that RDF doesn't? My particular situation is this: I've got some images which need tagging with details of how an experiment was performed, and what sort of data analysis has been performed on the images. A colleague of mine is pushing for XMP, but he's thinking of the images as photos - they're not really, they're just bits of data. From what I've seen (mainly by opening images in notepad++) the XMP data looks very similar to RDF - even so far as using RDF in the tag names (e.g. <rdf:Seq>). I'd like this data to be usable by other people who use similar instruments for similar experiments, so creating a mini standard (schema?) seems like the way to go. Apologies for the lack of fundemental understanding - I'm a Doctor, not a programmer! If it makes any difference, the language of choice will be C#. Edit for more information: First off, thanks for the excellent replies - thinking of XMP as a vocabulary for RDF makes things a lot clearer. The sort of data I'll be storing wont be avaliable in any of the pre-defined sets. It'll detail experimental set ups, locations and results. I think using RDF is the way to go.

Read the article

How to keep your unit tests simple and isolated and still guarantee DDD invariants ?

- by ian31

DDD recommends that the domain objects should be in a valid state at any time. Aggregate roots are responsible for guaranteeing the invariants and Factories for assembling objects with all the required parts so that they are initialized in a valid state. However this seems to complicate the task of creating simple, isolated unit tests a lot. Let's assume we have a BookRepository that contains Books. A Book has : an Author a Category a list of Bookstores you can find the book in These are required attributes : a book has to have an author, a category and at least a book store you can buy the book from. There's likely to be a BookFactory since it is quite a complex object, and the Factory will initialize the Book with at least all the mentioned attributes. Now we want to unit test a method of the BookRepository that returns all the Books. To test if the method returns the books, we have to set up a test context (the Arrange step in AAA terms) where some Books are already in the Repository. If the only tool at our disposal to create Book objects is the Factory, the unit test now also uses and is dependent on the Factory and inderectly on Category, Author and Store since we need those objects to build up a Book and then place it in the test context. Would you consider this is a dependency in the same way that in a Service unit test we would be dependent on, say, a Repository that the Service would call ? How would you solve the problem of having to re-create a whole cluster of objects in order to be able to test a simple thing ? How would you break that dependency and get rid of all these attributes we don't need in our test ? By using mocks or stubs ? If you mock up things a Repository contains, what kind of mock/stubs would you use as opposed to when you mock up something the object under test talks to or consumes ?

Read the article

Are functional programming languages good for practical tasks?

- by Clueless

It seems to me from my experimenting with Haskell, Erlang and Scheme that functional programming languages are a fantastic way to answer scientific questions. For example, taking a small set of data and performing some extensive analysis on it to return a significant answer. It's great for working through some tough Project Euler questions or trying out the Google Code Jam in an original way. At the same time it seems that by their very nature, they are more suited to finding analytical solutions than actually performing practical tasks. I noticed this most strongly in Haskell, where everything is evaluated lazily and your whole program boils down to one giant analytical solution for some given data that you either hard-code into the program or tack on messily through Haskell's limited IO capabilities. Basically, the tasks I would call 'practical' such as Aceept a request, find and process requested data, and return it formatted as needed seem to translate much more directly into procedural languages. The most luck I have had finding a functional language that works like this is Factor, which I would liken to a reverse-polish-notation version of Python. So I am just curious whether I have missed something in these languages or I am just way off the ball in how I ask this question. Does anyone have examples of functional languages that are great at performing practical tasks or practical tasks that are best performed by functional languages?

Read the article

How do I get started designing and implementing a script interface for my .NET application?

- by Peter Mortensen

How do I get started designing and implementing a script interface for my .NET application? There is VSTA (the .NET equivalent of VBA for COM), but as far as I understand I would have to pay a license fee for every installation of my application. It is an open source application so this will not work. There is also e.g. the embedding of interpreters (IronPython?), but I don't understand how this would allow exposing an "object model" (see below) to external (or internal) scripts. What is the scripting interface story in .NET? Is it somehow trivial in .NET to do this? Background: I have once designed and implemented a fairly involved script interface for a Macintosh application for acquisition and analysis of data from a mass spectrometer (Mac OS, System 7) and later a COM interface for a Windows application. Both were designed with an "object model" and classes (that can have properties). These are overloaded words, but in a scripting interface context object model is essentially a containment hiarchy of objects of specific classes. Classes have properties and lists of contained objects. E.g. like in the COM interfaces exposed in Microsoft Office applications, where the application object can be used to add to its list of documents (with the side effect of creating the GUI representation of a document). External scripts can create new objects in a container and navigate through the content of the hiarchy at any given time. In the Macintosh case scripts could be written in e.g. AppleScript or Frontier. On the Macintosh the implementation of a scripting interface was very complicated. Support for it in Metroworks' C++ class library (the name escapes me right now) made it much simpler.

Read the article

Need help implementing this algorithm with map Hadoop MapReduce

- by Julia

Hi all! i have algorithm that will go through a large data set read some text files and search for specific terms in those lines. I have it implemented in Java, but I didnt want to post code so that it doesnt look i am searching for someone to implement it for me, but it is true i really need a lot of help!!! This was not planned for my project, but data set turned out to be huge, so teacher told me I have to do it like this. EDIT(i did not clarified i previos version)The data set I have is on a Hadoop cluster, and I should make its MapReduce implementation I was reading about MapReduce and thaught that i first do the standard implementation and then it will be more/less easier to do it with mapreduce. But didnt happen, since algorithm is quite stupid and nothing special, and map reduce...i cant wrap my mind around it. So here is shortly pseudo code of my algorithm LIST termList (there is method that creates this list from lucene index) FOLDER topFolder INPUT topFolder IF it is folder and not empty list files (there are 30 sub folders inside) FOR EACH sub folder GET file "CheckedFile.txt" analyze(CheckedFile) ENDFOR END IF Method ANALYZE(CheckedFile) read CheckedFile WHILE CheckedFile has next line GET line FOR(loops through termList) GET third word from line IF third word = term from list append whole line to string buffer ENDIF ENDFOR END WHILE OUTPUT string buffer to file Also, as you can see, each time when "analyze" is called, new file has to be created, i understood that map reduce is difficult to write to many outputs??? I understand mapreduce intuition, and my example seems perfectly suited for mapreduce, but when it comes to do this, obviously I do not know enough and i am STUCK! Please please help.

Read the article

Are document-oriented databases any more suitable than relational ones for persisting objects?

- by Owen Fraser-Green

In terms of database usage, the last decade was the age of the ORM with hundreds competing to persist our object graphs in plain old-fashioned RMDBS. Now we seem to be witnessing the coming of age of document-oriented databases. These databases are highly optimized for schema-free documents but are also very attractive for their ability to scale out and query a cluster in parallel. Document-oriented databases also hold a couple of advantages over RDBMS's for persisting data models in object-oriented designs. As the tables are schema-free, one can store objects belonging to different classes in an inheritance hierarchy side-by-side. Also, as the domain model changes, so long as the code can cope with getting back objects from an old version of the domain classes, one can avoid having to migrate the whole database at every change. On the other hand, the performance benefits of document-oriented databases mainly appear to come about when storing deeper documents. In object-oriented terms, classes which are composed of other classes, for example, a blog post and its comments. In most of the examples of this I can come up with though, such as the blog one, the gain in read access would appear to be offset by the penalty in having to write the whole blog post "document" every time a new comment is added. It looks to me as though document-oriented databases can bring significant benefits to object-oriented systems if one takes extreme care to organize the objects in deep graphs optimized for the way the data will be read and written but this means knowing the use cases up front. In the real world, we often don't know until we actually have a live implementation we can profile. So is the case of relational vs. document-oriented databases one of swings and roundabouts? I'm interested in people's opinions and advice, in particular if anyone has built any significant applications on a document-oriented database.

Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 150/172 | < Previous Page | 146 147 148 149 150 151 152 153 154 155 156 157 | Next Page >

- by terence410

- by Trip

- by Arnold

- by jackj

- by Dave

- by sameer karjatkar

- by doug

- by user806098

- by Carl Seleborg

- by Roee Adler

- by alfredo dobrekk

- by Pep.

- by briandk

- by ian31

- by Limonup

- by gautam

- by arin

- by Dmitrii

- by zotty

- by ian31

- by Clueless

- by Peter Mortensen

- by Julia

- by Owen Fraser-Green

< Previous Page | 146 147 148 149 150 151 152 153 154 155 156 157 | Next Page >