weighted - Page 4 - Developer IT

All minimum spanning trees implementation

- by russtbarnacle

I've been looking for an implementation (I'm using networkx library.) that will find all the minimum spanning trees (MST) of an undirected weighted graph. I can only find implementations for Kruskal's Algorithm and Prim's Algorithm both of which will only return a single MST. I've seen papers that address this problem (such as http://fano.ics.uci.edu/cites/Publication/Epp-TR-95-50.html) but my head tends to explode someway through trying to think how to translate it to code. In fact i've not been able to find an implementation in any language!

Read the article

How does TopCoder evaluates code?

- by Carlos

If you are familiar with TopCoder you know that your source-code gets a final "grade/points" this depends on time, how many compiles, etc, one of the highest weighted being performance. But how can they test that, is there some sort of simple code (java or c++) to do it that you could share for me to evaluate and hopefully write my own to test the programs I write for University? This is sort of a follow up question to this one where I ask if shorter code results in best performance. P.S: Im interested in both of how topcoders knows performance and writing code to test performance.

Read the article

construct graph from python set type.

- by Vincent

The sort question, is the an off the self function to make a graph from a set of python sets? The longer: I have several python set types. They each overlap or some are sub sets of others. I would like to make a graph (as in nodes and edges) with the edges weighted by common intersection of the sets. There are several graphing packages for python. (NetworkX, igraph,...) I am not familiar with the use of any of them. Will any of them make a graph directly from a list of sets ie, MakeGraphfromSets(alistofsets) If not do you know of an example of how to take the list of sets to define the edges. It actually looks like it might be straight forward but an example is always good to have.

Read the article

How do you boost term relevance in Sql Server Full Text Search like you can in Lucene?

- by Snives

I'm doing a typical full text search using containstable using 'ISABOUT(term1,term2,term3)' and although it supports term weighting that's not what I need. I need the ability to boost the relevancy of terms contained in certain portions of text. For example, it is customary for metatags or page title to be weighted differently than body text when searching web pages. Although I'm not dealing with web pages I do seek the same functionality. In Lucene it's called Document Field Level Boosting. How would one natively do this in Sql Server Full Text Search?

Read the article

F# replace ref variable with something fun

- by Stephen Swensen

I have the following F# functions which makes use of a ref variable to seed and keep track of a running total, something tells me this isn't in the spirit of fp or even particular clear on its own. I'd like some direction on the clearest (possible fp, but if an imperative approach is clearer I'd be open to that) way to express this in F#. Note that selectItem implements a random weighted selection algorithm. type WeightedItem(id: int, weight: int) = member self.id = id member self.weight = weight let selectItem (items: WeightedItem list) (rand:System.Random) = let totalWeight = List.sumBy (fun (item: WeightedItem) -> item.weight) items let selection = rand.Next(totalWeight) + 1 let runningWeight = ref 0 List.find (fun (item: WeightedItem) -> runningWeight := !runningWeight + item.weight !runningWeight >= selection) items let items = [new WeightedItem(1,100); new WeightedItem(2,50); new WeightedItem(3,25)] let selection = selectItem items (new System.Random())

Read the article

How to serialize a graph ?

- by Michael

This is an interview question: How to serialize a graph ? I saw this answer but I am not sure if this is enough. It looks like a very confusing "open question" and the candidates are probably expected to ask more questions about the requirements: what the nodes and edges are, how they are serialized themselves, is this graph weighted, directed, etc., how many nodes/edges are in the graph.What about the infrastructure ? Is it a plain file system or we should/can use a database ? So, how would you answer this question ?

Read the article

Optimizing a Parking Lot Problem. What algorithims should I use to fit the most amount of cars in th

- by Adam Gent

What algorithms (brute force or not) would I use to put in as many cars (assume all cars are the same size) in a parking lot so that there is at least one exit (from the container) and a car cannot be blocked. Or can someone show me an example of this problem solved programmatically. The parking lot varies in shape would be nice but if you want to assume its some invariant shape that is fine. Another Edit: Assume that driving distance in the parking lot is not a factor (although it would be totally awesome if it was weighted factor to number of cars in lot). Another Edit: Assume 2 Dimensional (no cranes or driving over cars). Another Edit: You cannot move cars around once they are parked (its not a valet parking lot). I hope the question is specific enough now.

Read the article

Find a node in a Graph that minimizes the distance between two other nodes

- by Andrés

Here is the thing. I have a directed weighted graph G, with V vertices and E edges. Given two nodes in the graph, let's say A, and B, and given the weight of an edge A-B denoted as w(A, B), I need to find a node C so that max(w(A, C), w(B, C)) is minimal among all possibilities. By possibilities I mean all the values C can take. I don't know if it is completely clear, if it's not, I'll try to be more precise. Thanks in advance.

Read the article

Java: how to represent graphs?

- by Rosarch

I'm implementing some algorithms to teach myself about graphs and how to work with them. What would you recommend is the best way to do that in Java? I was thinking something like this: public class Vertex { private ArrayList<Vertex> outnodes; //Adjacency list. if I wanted to support edge weight, this would be a hash map. //methods to manipulate outnodes } public class Graph { private ArrayList<Vertex> nodes; //algorithms on graphs } But I basically just made this up. Is there a better way? Also, I want it to be able to support variations on vanilla graphs like digraphs, weighted edges, multigraphs, etc.

Read the article

Data structure for unrooted trees

- by Esmond

I'm having problems figuring out how to build an unrooted tree with weighted edges and what data structure to store such a tree. An example of an unrooted tree would be like the one here: http://www.bio.davidson.edu/courses/GENOMICS/seq/unrooted.gif The problem i am having is the leaves would only have 1 link to the internal nodes and the internal nodes would have 3 links(the internal nodes would have 2 children and a link to another internal node). Do i have to distinguish between the 2 different kinds of nodes or can i have one class having the function of both types of nodes?

Read the article

Information Modeling

- by Betamoo

The sensor module in my project consists of a rotating camera, that collects noisy information about moving objects in the surrounding environment. The information consists of distance, angle and relative change of the moving objects.. The limiting view range of the camera makes it essential to rotate the camera periodically to update environment information... I was looking for algorithms / ways to model these information, in order to be able to guess / predict / learn motion properties of these object.. My current proposed idea is to store last n snapshots of each object in a queue. I take weighted average of positions and velocities of moving object, but I think it is a poor method... Can you state some titles that suit this case? Thanks

Read the article

Distinct select on Oracle

- by funktku

What i am trying to do is a simple recommender , must take the biggest weighted top 40 element's node2 element. Calculation for weight comes from (E.WEIGHT * K.GRADE). Now this code succesfully returns top 40 elements. However, i don't want E.NODE2 to return duplicates. POSTGRE SQL allowed me to do SELECT DISTINCT ON (NODE2) E.NODE2 , (E.WEIGHT * K.GRADE). How can i do the same in oracle? The complete sql query; SELECT * FROM (SELECT DISTINCT E.NODE2 , (E.WEIGHT * K.GRADE) FROM KUAISFAST K, EDGES E WHERE K.ID = 1 AND K.COURSE_ID = E.NODE1 AND E.NODE2 NOT IN( SELECT K2.COURSE_ID FROM KUAISFAST K2 WHERE K2.ID = 1 ) ORDER BY( E.WEIGHT * K.GRADE ) DESC) TEMP WHERE rownum <= 40

Read the article

Sorting results by a char(1) column

- by Brandon

I have a stored procedure which basically does something like select top 1 expiryDate, flag, (bunch of other columns) from someTable (bunch of joins) order by expiryDate desc So this will grab the record that expires last. This works for most cases, except some records have a flag that are just a char(1). Most of the time it's just Y or N. So it'll return something like 2010-12-31 N 2010-10-05 Y 2010-08-05 N 2010-03-01 F 2010-01-31 N This works, most of the time, but is there any way to order it by the Flag column as well? So I'd want to group the results by Y, then N, and F and any other flags can go last in any order. I thought this would just be an order by, but since the flags are not weighted by the alphabetic value, I'm a little stumped. (Note: These are not my tables, I don't know if using the characters like this was a good idea or not, but it's not something I can change).

Read the article

Manipulate data for scaling

- by user1487000

I have this data: Game 1: 7.0/10.0, Reviewed: 1000 times Game 2: 7.5/10.0, Reviewed: 3000 times Game 3: 8.9/10.0, Reviewed: 140,000 times Game 4: 10.0/10.0 Reviewed: 5 times . . . I want to manipulate this data in a way to make each rating reflective of how many times it has been reviewed. For example Game 3 should have a little heavier weight than than Game 4, since it has been reviewed way more. And Game 2's 7 should be weighted more than Game 1's 7. Is there a proper function to do this scaling? In such a way that ScaledGameRating = OldGameRating * (some exponential function?)

Read the article

Tuxedo Load Balancing

- by Todd Little

A question I often receive is how does Tuxedo perform load balancing. This is often asked by customers that see an imbalance in the number of requests handled by servers offering a specific service. First of all let me say that Tuxedo really does load or request optimization instead of load balancing. What I mean by that is that Tuxedo doesn't attempt to ensure that all servers offering a specific service get the same number of requests, but instead attempts to ensure that requests are processed in the least amount of time. Simple round robin "load balancing" can be employed to ensure that all servers for a particular service are given the same number of requests. But the question I ask is, "to what benefit"? Instead Tuxedo scans the queues (which may or may not correspond to servers based upon SSSQ - Single Server Single Queue or MSSQ - Multiple Server Single Queue) to determine on which queue a request should be placed. The scan is always performed in the same order and during the scan if a queue is empty the request is immediately placed on that queue and request routing is done. However, should all the queues be busy, meaning that requests are currently being processed, Tuxedo chooses the queue with the least amount of "work" queued to it where work is the sum of all the requests queued weighted by their "load" value as defined in the UBBCONFIG file. What this means is that under light loads, only the first few queues (servers) process all the requests as an empty queue is often found before reaching the end of the scan. Thus the first few servers in the queue handle most of the requests. While this sounds non-optimal, in fact it capitalizes on the underlying operating systems and hardware behavior to produce the best possible performance. Round Robin scheduling would spread the requests across all the available servers and thus require all of them to be in memory, and likely not share much in the way of hardware or memory caches. Tuxedo's system maximizes the various caches and thus optimizes overall performance. Hopefully this makes sense and now explains why you may see a few servers handling most of the requests. Under heavy load, meaning enough load to keep all servers that can handle a request busy, you should see a relatively equal number of requests processed. Next post I'll try and cover how this applies to servers in a clustered (MP) environment because the load balancing there is a little more complicated. Regards,Todd LittleOracle Tuxedo Chief Architect

Read the article

What algorithms can I use to detect if articles or posts are duplicates?

- by michael

I'm trying to detect if an article or forum post is a duplicate entry within the database. I've given this some thought, coming to the conclusion that someone who duplicate content will do so using one of the three (in descending difficult to detect): simple copy paste the whole text copy and paste parts of text merging it with their own copy an article from an external site and masquerade as their own Prepping Text For Analysis Basically any anomalies; the goal is to make the text as "pure" as possible. For more accurate results, the text is "standardized" by: Stripping duplicate white spaces and trimming leading and trailing. Newlines are standardized to \n. HTML tags are removed. Using a RegEx called Daring Fireball URLs are stripped. I use BB code in my application so that goes to. (ä)ccented and foreign (besides Enlgish) are converted to their non foreign form. I store information about each article in (1) statistics table and in (2) keywords table. (1) Statistics Table The following statistics are stored about the textual content (much like this post) text length letter count word count sentence count average words per sentence automated readability index gunning fog score For European languages Coleman-Liau and Automated Readability Index should be used as they do not use syllable counting, so should produce a reasonably accurate score. (2) Keywords Table The keywords are generated by excluding a huge list of stop words (common words), e.g., 'the', 'a', 'of', 'to', etc, etc. Sample Data text_length, 3963 letter_count, 3052 word_count, 684 sentence_count, 33 word_per_sentence, 21 gunning_fog, 11.5 auto_read_index, 9.9 keyword 1, killed keyword 2, officers keyword 3, police It should be noted that once an article gets updated all of the above statistics are regenerated and could be completely different values. How could I use the above information to detect if an article that's being published for the first time, is already existing within the database? I'm aware anything I'll design will not be perfect, the biggest risk being (1) Content that is not a duplicate will be flagged as duplicate (2) The system allows the duplicate content through. So the algorithm should generate a risk assessment number from 0 being no duplicate risk 5 being possible duplicate and 10 being duplicate. Anything above 5 then there's a good possibility that the content is duplicate. In this case the content could be flagged and linked to the article's that are possible duplicates and a human could decide whether to delete or allow. As I said before I'm storing keywords for the whole article, however I wonder if I could do the same on paragraph basis; this would also mean further separating my data in the DB but it would also make it easier for detecting (2) in my initial post. I'm thinking weighted average between the statistics, but in what order and what would be the consequences...

Read the article

Why are MVC & TDD not employed more in game architecture?

- by secoif

I will preface this by saying I haven't looked a huge amount of game source, nor built much in the way of games. But coming from trying to employ 'enterprise' coding practices in web apps, looking at game source code seriously hurts my head: "What is this view logic doing in with business logic? this needs refactoring... so does this, refactor, refactorrr" This worries me as I'm about to start a game project, and I'm not sure whether trying to mvc/tdd the dev process is going to hinder us or help us, as I don't see many game examples that use this or much push for better architectural practices it in the community. The following is an extract from a great article on prototyping games, though to me it seemed exactly the attitude many game devs seem to use when writing production game code: Mistake #4: Building a system, not a game ...if you ever find yourself working on something that isn’t directly moving your forward, stop right there. As programmers, we have a tendency to try to generalize our code, and make it elegant and be able to handle every situation. We find that an itch terribly hard not scratch, but we need to learn how. It took me many years to realize that it’s not about the code, it’s about the game you ship in the end. Don’t write an elegant game component system, skip the editor completely and hardwire the state in code, avoid the data-driven, self-parsing, XML craziness, and just code the damned thing. ... Just get stuff on the screen as quickly as you can. And don’t ever, ever, use the argument “if we take some extra time and do this the right way, we can reuse it in the game”. EVER. is it because games are (mostly) visually oriented so it makes sense that the code will be weighted heavily in the view, thus any benefits from moving stuff out to models/controllers, is fairly minimal, so why bother? I've heard the argument that MVC introduces a performance overhead, but this seems to me to be a premature optimisation, and that there'd more important performance issues to tackle before you worry about MVC overheads (eg render pipeline, AI algorithms, datastructure traversal, etc). Same thing regarding TDD. It's not often I see games employing test cases, but perhaps this is due to the design issues above (mixed view/business) and the fact that it's difficult to test visual components, or components that rely on probablistic results (eg operate within physics simulations). Perhaps I'm just looking at the wrong source code, but why do we not see more of these 'enterprise' practices employed in game design? Are games really so different in their requirements, or is a people/culture issue (ie game devs come from a different background and thus have different coding habits)?

Read the article

Postfix Problem (helo/hostname mismatch)!

- by CuSS

Hi all, I have a server, and it is running a error for one email only (all other mails in that domain are working). How can i fix it? (The error is above:) May 17 11:43:56 webserver postfix/policyd-weight[5596]: weighted check: IN_DYN_PBL_SPAMHAUS=3.25 NOT_IN_SBL_XBL_SPAMHAUS=-1.5 NOT_IN_SPAMCOP=-1.5 NOT_IN_BL_NJABL=-1.5 DSBL_ORG=ERR(0) CL_IP_NE_HELO=4.75 RESOLVED_IP_IS_NOT_HELO=1.5 HELO_NUMERIC=10.625 (check from: .eticagest. - helo: .[10.0.0.17]. - helo-domain: .17].) FROM_NOT_FAILED_HELO(DOMAIN)=6.25; <client=188.80.139.211> <helo=[10.0.0.17]> <[email protected]> <[email protected]>; rate: 21.875 May 17 11:43:56 webserver postfix/policyd-weight[5596]: decided action=550 Mail appeared to be SPAM or forged. Ask your Mail/DNS-Administrator to correct HELO and DNS MX settings or to get removed from DNSBLs; MTA helo: [10.0.0.17], MTA hostname: bl15-139-211.dsl.telepac.pt[188.80.139.211] (helo/hostname mismatch); <client=188.80.139.211> <helo=[10.0.0.17]> <[email protected]> <[email protected]>; delay: 6s

Read the article

How to calculate unweighted averages in Excel PivotTable?

- by yonatron

I often make PivotTables in which each row contains a number of per-person average measures. I then want to look at the unweighted column average for each measure, and usually make some kind of chart from these. Because my individual cells are often averaged from different numbers of data points, the Grand Total row ends up being a weighted average, which I’m not interested in. So I usually make my own average row a few rows above the table to use for my charts. That’s not too much work, but there’s another problem. I often add a few more people’s worth of data to the PivotTables’ source, then refresh the tables. This means my average row needs to be updated to encompass more rows from the PivotTable. Not a huge deal with one table, but when I have lots of them across lots of sheets, I have to do find/replace on a whole bunch of formulas. So: is there a way to automatically get unweighted column averages in a PivotTable, such that when the table is refreshed, the averages don’t change locations and encompass the newly added (or removed) data Thanks

Read the article

How to calculate unweighted averages in Excel PivotTable?

- by yonatron

I often make PivotTables in which each row contains a number of per-person average measures. I then want to look at the unweighted column average for each measure, and usually make some kind of chart from these. Because my individual cells are often averaged from different numbers of data points, the Grand Total row ends up being a weighted average, which I’m not interested in. So I usually make my own average row a few rows above the table to use for my charts. That’s not too much work, but there’s another problem. I often add a few more people’s worth of data to the PivotTables’ source, then refresh the tables. This means my average row needs to be updated to encompass more rows from the PivotTable. Not a huge deal with one table, but when I have lots of them across lots of sheets, I have to do find/replace on a whole bunch of formulas. So: is there a way to automatically get unweighted column averages in a PivotTable, such that when the table is refreshed, the averages don’t change locations and encompass the newly added (or removed) data Thanks

Read the article

“Query cost (relative to the batch)” <> Query cost relative to batch

- by Dave Ballantyne

OK, so that is quite a contradictory title, but unfortunately it is true that a common misconception is that the query with the highest percentage relative to batch is the worst performing. Simply put, it is a lie, or more accurately we dont understand what these figures mean. Consider the two below simple queries: SELECT * FROM Person.BusinessEntity JOIN Person.BusinessEntityAddress ON Person.BusinessEntity.BusinessEntityID = Person.BusinessEntityAddress.BusinessEntityID go SELECT * FROM Sales.SalesOrderDetail JOIN Sales.SalesOrderHeader ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID After executing these and looking at the plans, I see this : So, a 13% / 87% split , but 13% / 87% of WHAT ? CPU ? Duration ? Reads ? Writes ? or some magical weighted algorithm ? In a Profiler trace of the two we can find the metrics we are interested in. CPU and duration are well out but what about reads (210 and 1935)? To save you doing the maths, though you are more than welcome to, that’s a 90.2% / 9.8% split. Close, but no cigar. Lets try a different tact. Looking at the execution plan the “Estimated Subtree cost” of query 1 is 0.29449 and query 2 its 1.96596. Again to save you the maths that works out to 13.03% and 86.97%, round those and thats the figures we are after. But, what is the worrying word there ? “Estimated”. So these are not “actual” execution costs, but what’s the problem in comparing the estimated costs to derive a meaning of “Most Costly”. Well, in the case of simple queries such as the above , probably not a lot. In more complicated queries , a fair bit. By modifying the second query to also show the total number of lines on each order SELECT *,COUNT(*) OVER (PARTITION BY Sales.SalesOrderDetail.SalesOrderID) FROM Sales.SalesOrderDetail JOIN Sales.SalesOrderHeader ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID The split in percentages is now 6% / 94% and the profiler metrics are : Even more of a discrepancy. Estimates can be out with actuals for a whole host of reasons, scalar UDF’s are a particular bug bear of mine and in-fact the cost of a udf call is entirely hidden inside the execution plan. It always estimates to 0 (well, a very small number). Take for instance the following udf Create Function dbo.udfSumSalesForCustomer(@CustomerId integer) returns money as begin Declare @Sum money Select @Sum= SUM(SalesOrderHeader.TotalDue) from Sales.SalesOrderHeader where CustomerID = @CustomerId return @Sum end If we have two statements , one that fires the udf and another that doesn't: Select CustomerID from Sales.Customer order by CustomerID go Select CustomerID,dbo.udfSumSalesForCustomer(Customer.CustomerID) from Sales.Customer order by CustomerID The costs relative to batch is a 50/50 split, but the has to be an actual cost of firing the udf. Indeed profiler shows us : No where even remotely near 50/50!!!! Moving forward to window framing functionality in SQL Server 2012 the optimizer sees ROWS and RANGE ( see here for their functional differences) as the same ‘cost’ too SELECT SalesOrderDetailID,SalesOrderId, SUM(LineTotal) OVER(PARTITION BY salesorderid ORDER BY Salesorderdetailid RANGE unbounded preceding) from Sales.SalesOrderdetail go SELECT SalesOrderDetailID,SalesOrderId, SUM(LineTotal) OVER(PARTITION BY salesorderid ORDER BY Salesorderdetailid Rows unbounded preceding) from Sales.SalesOrderdetail By now it wont be a great display to show you the Profiler trace reads a *tiny* bit different. So moral of the story, Percentage relative to batch can give a rough ‘finger in the air’ measurement, but dont rely on it as fact.

Read the article

A proposal for #DAX Code Formatting #ssas #powerpivot #tabular

- by Marco Russo (SQLBI)

I recently published a set of rules for DAX code formatting. The following is an example of what I obtain: CALCULATE ( SUMX ( Orders, Orders[Amount] ), FILTER ( ALL ( Customers ), CALCULATE ( COUNTROWS ( Sales ), ALL ( Calendar[Date] ) ) > 42 + 8 – 25 * ( 3 - 1 ) + 2 – 1 + 2 – 1 + CALCULATE ( 2 + 2 – 2 + 2 - 2 ) – CALCULATE ( 4 ) ) ) The goal is to improve code readability and I look forward to implement a code formatting feature in DAX Studio. The DAX Editor already supports the rules described in the article. I am also considering whether to add a rule specific for ADDCOLUMNS / SUMMARIZE because I would like to see the “pairs” of arguments to define a column in the same row or with a special indentation rule (DAX expression for a column is indented in the line following the column name). EVALUATE CALCULATETABLE ( CALCULATETABLE ( SUMMARIZE ( Audience, 'Date'[Year], Individuals[Gender], Individuals[AgeRange], "Num of Rows", FORMAT (COUNTROWS (Audience), "#,#"), "Weighted Mean Age", SUMX (Audience, Audience[Weight] * Audience[Age]) / SUM (Audience[Weight]) ), SUMMARIZE ( BridgeIndividualsTargets, Individuals[ID_Individual] ), Audience[Weight] > 0 ), Targets[Target] = "Maschi", 'Date'[Year] = 2010, 'Date'[MonthName] = "January" ) I would like to get feedback for that – you can use comments here or comments in original article. Thanks!

Read the article

Fast Data - Big Data's achilles heel

- by thegreeneman

At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database. Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions. Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity. The software in the Fast Data solution stack involves 3 key pieces and their integration: Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database. All three of these technologies address a high throughput, low latency data management requirement. Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream. Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access. It also has some special capabilities to deploy remote behavioral execution for "near data" processing. The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads. For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics. Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis. Understand how NoSQL plays a critical role in Big Data capture and enrichment while simultaneously providing a low latency and scalable data management infrastructure thru clustered, always on, parallel processing in a shared nothing architecture. Learn how easily a NoSQL cluster can be deployed to provide essential services in industry specific Fast Data solutions. See these technologies work together in a demonstration highlighting the salient features of these Fast Data enabling technologies in a location based personalization service. The question then becomes how do these things work together to deliver an end to end Fast Data solution. The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together. You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval. Here is a great reference to how these two technologies are integrated and work together. Coherence & Oracle NoSQL Database. On the stream processing side, it is similar as with the Coherence case. As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage. OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream. Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together. OEP & Oracle NoSQL Database.

Read the article

Programming R/Sweave for proper \Sexpr output

- by deoksu

Hi I'm having a bit of a problem programming R for Sweave, and the #rstats twitter group often points here, so I thought I'd put this question to the SO crowd. I'm an analyst- not a programmer- so go easy on me my first post. Here's the problem: I am drafting a survey report in Sweave with R and would like to report the marginal returns in line using \Sexpr{}. For example, rather than saying: Only 14% of respondents said 'X'. I want to write the report like this: Only \Sexpr{p.mean(variable)}$\%$ of respondents said 'X'. The problem is that Sweave() converts the results of the expression in \Sexpr{} to a character string, which means that the output from expression in R and the output that appears in my document are different. For example, above I use the function 'p.mean': p.mean<- function (x) {options(digits=1) mmm<-weighted.mean(x, weight=weight, na.rm=T) print(100*mmm) } In R, the output looks like this: p.mean(variable) >14 but when I use \Sexpr{p.mean(variable)}, I get an unrounded character string (in this case: 13.5857142857143) in my document. I have tried to limit the output of my function to 'digits=1' in the global environment, in the function itself, and and in various commands. It only seems to contain what R prints, not the character transformation that is the result of the expression and which eventually prints in the LaTeX file. as.character(p.mean(variable)) >[1] 14 >[1] "13.5857142857143" Does anyone know what I can do to limit the digits printed in the LaTeX file, either by reprogramming the R function or with a setting in Sweave or \Sexpr{}? I'd greatly appreciate any help you can give. Thanks, David

Read the article

Sorting Arrays by More the One Value, and Prioritizing the Sort based on Column data.

- by Mark Tomlin

I'm looking for a way to sort an array (we call this a row), with an array of values (that I'll call columns). Each row has columns that must be sorted based on the priority of: timetime, lapcount & timestamp. Each column cotains this information: split1, split2, split3, laptime, lapcount, timestamp. laptime if in hundredths of a second. (1:23.45 or 1 Minute, 23 Seconds & 45 Hundredths is 8345.) Lapcount is a simple unsigned tiny int, or unsigned char. timestamp is unix epoch. The lowest laptime should be at the get a better standing in this sort. Should two peoples laptimes equal, then timestamp will be used to give the better standing in this sort. Should two peoples timestamp equal, then the person with less of a lapcount get's the better standing in this sort. By better standing, I mean closer to the top of the array, closer to the index of zero where it a numerical array. I think the array sorting functions built into php can do this with a callback, I was wondering what the best approch was for a weighted sort like this would be.

Search Results

Search found 142 results on 6 pages for 'weighted'.

Page 4/6 | < Previous Page | 1 2 3 4 5 6 | Next Page >

- by russtbarnacle

- by Carlos

- by Vincent

- by Snives

- by Stephen Swensen

- by Michael

- by Adam Gent

- by Andrés

- by Rosarch

- by Esmond

- by Betamoo

- by funktku

- by Brandon

- by user1487000

- by Todd Little

- by michael

- by secoif

- by CuSS

- by yonatron

- by yonatron

- by Dave Ballantyne

- by Marco Russo (SQLBI)

- by thegreeneman

- by deoksu

- by Mark Tomlin

< Previous Page | 1 2 3 4 5 6 | Next Page >