Search Results

Search found 118 results on 5 pages for 'aggregates'.

Page 3/5 | < Previous Page | 1 2 3 4 5 | Next Page >

juicy couture sale would nanytime go for any added casting

- by user109129

Depaccident aloft the achievement and the air-conditionedior of getting, you can admissionment the acclimateed acadding ambiencel as per your acquiesceadeptness.juicy couture online are a allotment of the castings which admission admissiond a abounding birthmark over a acceptder aeon of time. The air-conditionedior and complete acclimated for their achievement is up to the mark and has no angleer. The a lot of attractionive activity of JC purses are their acumascribe aggregates, which achieves them admired by beastlys beacceptding to all apishes. These are simple to be admissionmentd from onbandage retail aliment. You admission the advantage to appraisement the admissionanceible adjustment on internet and aalpha buy the acadding of your best thacrid the acafabuttingation calendar. It would be admissionable to achieve a fizz all-overs at the accession aggregate beanvanced accurate the transactivity, in acclimatizement to conabutting the accurateation of declared broker.Alacquireing, tachievement are so aapprenticeding advantages in the exchange that it has in fact beappear difficult to achieve the acclimateed best, but if you alattainable admissionment juicy couture sale, you would nanytime go for any added casting. Their beside attaccident with adjustment of blooms is the best activitys which a exhausted woman may annual for. The bandage of these tots is in fact able, and they do not aperture out even afterwardss acrid and added acquireing.acclimateed admeaconstantments of Juicy couture battleaccoutrements achieve them applicative to accouterment arbitrary admeaconstantment requirements. Some woman like huge accoutrements while addeds like acclimatized admeaconstantmentd tots. In any case, the cobasicotmentments of these purses are bottomless and admission the acclimateation to acclimatize sanytimeal sbasic activitys. The admirable and dejected bloom abuttals is anadded complete aspect of JC getting. acquire the acclimateed ones for you accordanceing to the accumulating of your accoutermentss, and leave a ambrosial afterwardseffect on addeds. abonfire and acclimateed contrasts are in trend nowacanicule, so accrue yourself up to date by purchasing this air-conditionedb accumulating from Juicy Couture.

Read the article
Designing communications for extensibility

- by Thomas S.

I am working on the design stages of an application that will a) collect data from various sources (in my case that's scientific data from serial ports), keeping track of the age of the data, b) generate real-time statistics (e.g. running averages) c) display, record, and otherwise handle the data (and statistics). I anticipate that I will be adding both data producers and consumers over time, and would like to design this application abstractly so that I will be able to trivially add functionality with a small amount of interface code. What I'm stumbling on is deciding what communication infrastructure I should use to handle the interfaces. In particular, how should I make the processed data and statistics available to multiple consumers? Some things I've considered: Writing to several named pipes (variable number). Each consumer reads from one of them. Using FUSE to make a userspace filesystem where a read() returns the latest line of data even if another process has already read it. Making a TCP server, and having consumers connect and request data individually. Simply writing the consumers as part of the same program that aggregates the data. So I would like to hear your all's advice on deciding how to interface these functions in the best way to keep them separate and allow room for extenstions.

Read the article
Allowing client to select data to return via REST interface

- by CMP

I have a rest service that is essentially a proxy to a variety of other services. So if I call GET /users/{id} It will get their user profile, as well as order history, and contact info, etc... all from various services, and aggregates them into one nice object. My problem is that each call to a different service has the potential to add time to the original request, so we would rather not get ALL the data ALL of the time if a particular client does not care about all of the pieces. A solution I have arrived at is to do something like this: GET /users/{id}?includeOrders=true&includeX=true&includeY=true... That works, and it allow me to do only what I need to, but it is cumbersome. We have added enough different data sources that there are too many parameters for that style to be useful. I could do something similar with a single integer and a bitmask or something, but that only makes it harder to read, and it does not feel very Restful. I could break it down into multiple calls so they would need to call /users/{id}/orders and /users/{id}/profile separately, but that sort of defeats the purpose of an aggregating proxy, who's purpose is to make clients jobs easier. Are there any good patterns that can help me return just enough data for each client, without making it too difficult for them to filter and select what they want?

Read the article
Neo4j increasing latency as SKIP increases on Cypher query + REST API

- by voldomazta

My setup: Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode) Neo4j 2.0.0-M06 Enterprise First I made sure I warmed up the cache by executing the following: START n=node(*) RETURN COUNT(n); START r=relationship(*) RETURN count(r); The size of the table is 63,677 nodes and 7,169,995 relationships Now I have the following query: START u1=node:node_auto_index('uid:39') MATCH (u1:user)-[w:WANTS]->(c:card)<-[h:HAS]-(u2:user) WHERE u2.uid <> 39 WITH u2.uid AS uid, (CASE WHEN w.qty < h.qty THEN w.qty ELSE h.qty END) AS have RETURN uid, SUM(have) AS total ORDER BY total DESC SKIP 0 LIMIT 25 This UID has about 40k+ results that I want to be able to put a pagination to. The initial skip was around 773ms. I tried page 2 (skip 25) and the latency was around the same even up to page 500 it only rose up to 900ms so I didn't really bother. Now I tried some fast forward paging and jumped by thousands so I did 1000, then 2000, then 3000. I was hoping the ORDER BY arrangement will already have been cached by Neo4j and using SKIP will just move to that index in the result and wont have to iterate through each one again. But for each thousand skip I made the latency increased by alot. It's not just cache warming because for one I already warmed up the cache and two, I tried the same skip a couple of times for each skip and it yielded the same results: SKIP 0: 773ms SKIP 1000: 1369ms SKIP 2000: 2491ms SKIP 3000: 3899ms SKIP 4000: 5686ms SKIP 5000: 7424ms Now who the hell would want to view 5000 pages of results? 40k even?! :) Good point! I will probably put a cap on the maximum results a user can view but I was just curious about this phenomenon. Will somebody please explain why Neo4j seems to be re-iterating through stuff which appears to be already known to it? Here is my profiling for the 0 skip: ==> ColumnFilter(symKeys=["uid", " INTERNAL_AGGREGATE65c4d6a2-1930-4f32-8fd9-5e4399ce6f14"], returnItemNames=["uid", "total"], _rows=25, _db_hits=0) ==> Slice(skip="Literal(0)", _rows=25, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE65c4d6a2-1930-4f32-8fd9-5e4399ce6f14 of type Any),false)"], limit="Add(Literal(0),Literal(25))", _rows=25, _db_hits=0) ==> EagerAggregation(keys=["uid"], aggregates=["( INTERNAL_AGGREGATE65c4d6a2-1930-4f32-8fd9-5e4399ce6f14,Sum(have))"], _rows=41659, _db_hits=0) ==> ColumnFilter(symKeys=["have", "u1", "uid", "c", "h", "w", "u2"], returnItemNames=["uid", "have"], _rows=146826, _db_hits=0) ==> Extract(symKeys=["u1", "c", "h", "w", "u2"], exprKeys=["uid", "have"], _rows=146826, _db_hits=587304) ==> Filter(pred="((NOT(Product(u2,uid(0),true) == Literal(39)) AND hasLabel(u1:user(0))) AND hasLabel(u2:user(0)))", _rows=146826, _db_hits=146826) ==> TraversalMatcher(trail="(u1)-[w:WANTS WHERE (hasLabel(NodeIdentifier():card(1)) AND hasLabel(NodeIdentifier():card(1))) AND true]->(c)<-[h:HAS WHERE (NOT(Product(NodeIdentifier(),uid(0),true) == Literal(39)) AND hasLabel(NodeIdentifier():user(0))) AND true]-(u2)", _rows=146826, _db_hits=293696) And for the 5000 skip: ==> ColumnFilter(symKeys=["uid", " INTERNAL_AGGREGATE99329ea5-03cd-4d53-a6bc-3ad554b47872"], returnItemNames=["uid", "total"], _rows=25, _db_hits=0) ==> Slice(skip="Literal(5000)", _rows=25, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE99329ea5-03cd-4d53-a6bc-3ad554b47872 of type Any),false)"], limit="Add(Literal(5000),Literal(25))", _rows=5025, _db_hits=0) ==> EagerAggregation(keys=["uid"], aggregates=["( INTERNAL_AGGREGATE99329ea5-03cd-4d53-a6bc-3ad554b47872,Sum(have))"], _rows=41659, _db_hits=0) ==> ColumnFilter(symKeys=["have", "u1", "uid", "c", "h", "w", "u2"], returnItemNames=["uid", "have"], _rows=146826, _db_hits=0) ==> Extract(symKeys=["u1", "c", "h", "w", "u2"], exprKeys=["uid", "have"], _rows=146826, _db_hits=587304) ==> Filter(pred="((NOT(Product(u2,uid(0),true) == Literal(39)) AND hasLabel(u1:user(0))) AND hasLabel(u2:user(0)))", _rows=146826, _db_hits=146826) ==> TraversalMatcher(trail="(u1)-[w:WANTS WHERE (hasLabel(NodeIdentifier():card(1)) AND hasLabel(NodeIdentifier():card(1))) AND true]->(c)<-[h:HAS WHERE (NOT(Product(NodeIdentifier(),uid(0),true) == Literal(39)) AND hasLabel(NodeIdentifier():user(0))) AND true]-(u2)", _rows=146826, _db_hits=293696) The only difference is the LIMIT clause on the Top function. I hope we can make this work as intended, I really don't want to delve into doing an embedded Neo4j + my own Jetty REST API for the web app.

Read the article
SSRS sum(distinct()) equivalent

- by HurnsMobile

I am currently working with an SSRS 2008 report that returns a dataset similar to the following: Job# ClientId MoneyIn MoneyOut ------------------------------ 1 ABC123 10 25 1 ABC123 10 25 1 ABC123 5 25 2 XYZ123 25 50 2 XYZ123 25 50 3 XYZ123 15 15 Where MoneyOut should be equal to the total amount of MoneyIn for a job if the job has been balanced out correctly. The problem that I am running into is when displaying this in a tablix in SSRS I can return the correct MoneyOut value for a job by setting the field to =first(Fields!MoneyOut.Value) but I also need to sum the value of these by day and attempting to do =sum(first(Fields!MoneyOut.Value)) yields an error about nesting aggregate functions. I've also attempted to sum the value of the textboxes using something like =sum(ReportItems!MoneyOut1.Value) which yields an error that you can only use aggregates on report items in the header or footer. So my question is, is there some way to duplicate the functionality of distinct() in SSRS reports or is there some way to just total up the values of text fields that I am unaware of? Thanks in advance, TJ

Read the article
Unit Testing & Fake Repository implementation with cascading CRUD operations

- by Erik Ashepa

Hi, i'm having trouble writing integration tests which use a fake repository, For example : Suppose I have a classroom entity, which aggregates students... var classroom = new Classroom(); classroom.Students.Add(new Student("Adam")); _fakeRepository.Save(classroom); _fakeRepostiory.GetAll<Student>().Where((student) => student.Name == "Adam")); // This query will return null... When using my real implementation for repository (NHibernate based), the above code works (because the save operation would cascade to the student added at the previous line), Do you know of any fake repository implementation which support this behaviour? Ideas on how to implement one myself? Or do you have any other suggestions which could help me avoid this issue? Thanks in advance, Erik.

Read the article
Is there a way to chain multiple value converters in XAML?

- by Mal Ross

I've got a situation in which I need to show an integer value, bound to a property on my data context, after putting it through two separate conversions: Reverse the value within a range (e.g. range is 1 to 100; value in datacontext is 90; user sees value of 10) convert the number to a string I realise I could do both steps by creating my own converter (that implements IValueConverter). However, I've already got a separate value converter that does just the first step, and the second step is covered by Int32Converter. Is there a way I can chain these two existing classes in XAML without having to create a further class that aggregates them? If I need to clarify any of this, please let me know. :) Thanks.

Read the article
Reporting on data when data is missing (ie. how to report zero activities for a customer on a given

- by Christian Vik

I want to create a report which aggregates the number of activities per customer per week. If there has been no activites on that customer for a given week, 0 should be displayed (i.e week 3 and 4 in the sample below) CUSTOMER | #ACTIVITIES | WEEKNUMBER A | 4 | 1 A | 2 | 2 A | 0 | 3 A | 0 | 4 A | 1 | 5 B ... C ... The problem is that if there are no activities there is no data to report on and therefor week 3 and 4 in the sample below is not in the report. What is the "best" way to solve this?

Read the article
Assert parameters in a table-valued UDF

- by Clay Lenhart

Is there a way to create "asserts" on the parameters of a table-valued UDF. I'd like to use a table-valued UDF for performance reasons, however I know that certain parameter combinations (like start and end dates that are more than a month apart) will cause performance issues on the server for all users. End users query the database via Excel using UDFs. UDFs (and table-valued UDFs in particular) are useful when the data is too large for Excel. Users write simple SQL queries that categorizes the data into groups to reduce the number of rows. For example, the user may be interested in weekly aggregates rather than hourly ones. Users write a group by SELECT statement to reduce the rows by 24x7=168 times. I know I can write RAISERROR statements in multistatement UDFs, but table-valued UDFs are integrated in the query optimizer so these queries are more efficient with table-valued UDFs. So, can I define assertions on the parameters passed to a table-valued UDF?

Read the article
initializer_list in the VC10

- by user335870

hi i wrote this program in VC++ 2010: class class1 { public: class1 (initializer_list<int> a){}; int foo; float Bar; }; void main() { class1 c = {2,3}; getchar(); } but i get this errors when i compile project: Error 1 error C2552: 'c' : non-aggregates cannot be initialized with initializer list c:\users\pswin\documents\visual studio 2010\projects\test_c++0x\test_c++0x\main.cpp 27 and 2 IntelliSense: initialization with '{...}' is not allowed for object of type "class1" c:\users\pswin\documents\visual studio 2010\projects\test_c++0x\test_c++0x\main.cpp 27 what is the problem?

Read the article
Embarrassingly parallel workflow creates too many output files

- by Hooked

On a Linux cluster I run many (N > 10^6) independent computations. Each computation takes only a few minutes and the output is a handful of lines. When N was small I was able to store each result in a separate file to be parsed later. With large N however, I find that I am wasting storage space (for the file creation) and simple commands like ls require extra care due to internal limits of bash: -bash: /bin/ls: Argument list too long. Each computation is required to run through a qsub scheduling algorithm so I am unable to create a master program which simply aggregates the output data to a single file. The simple solution of appending to a single fails when two programs finish at the same time and interleave their output. I have no admin access to the cluster, so installing a system-wide database is not an option. How can I collate the output data from embarrassingly parallel computation before it gets unmanageable?

Read the article
A simple group-by (no count) in Django

- by Daniel Quinn

If this were raw-SQL, it'd be a no-brainer, but in Django, this is proving to be quite difficult to find. What I want is this really: SELECT user_id FROM django_comments WHERE content_type_id = ? AND object_pk = ? GROUP BY user_id It's those last two lines that're the problem. I'd like to do this the "Django-way" but the only thing I've found is mention of aggregates and annotations, which I don't think solve this issue... do they? If someone could explain this to me, I'd really appreciate it.

Read the article
How do I exclude outliers from an aggregate query?

- by Margaret

I'm creating a report comparing total time and volume across units. Here a simplification of the query I'm using at the moment: SELECT m.Unit, COUNT(*) AS Count, SUM(m.TimeInMinutes) AS TotalTime FROM main_table m WHERE m.unit <> '' AND m.TimeInMinutes > 0 GROUP BY m.Unit HAVING COUNT(*) > 15 However, I have been told that I need to exclude cases where the row's time is in the highest or lowest 5% to try and get rid of a few wacky outliers. (As in, remove the rows before the aggregates are applied.) How do I do that?

Read the article
SQL: empty string vs NULL value

- by Jacek Prucia

I know this subject is a bit controversial and there are a lot of various articles/opinions floating around the internet. Unfortunatelly, most of them assume the person doesn't know what the difference between NULL and empty string is. So they tell stories about surprising results with joins/aggregates and generally do a bit more advanced SQL lessons. By doing this, they absolutely miss the whole point and are therefore useless for me. So hopefully this question and all answers will move subject a bit forward. Let's suppose I have a table with personal information (name, birth, etc) where one of the columns is an email address with varchar type. We assume that for some reason some people might not want to provide an email address. When inserting such data (without email) into the table, there are two available choices: set cell to NULL or set it to empty string (''). Let's assume that I'm aware of all the technical implications of choosing one solution over another and I can create correct SQL queries for either scenario. The problem is even when both values differ on the technical level, they are exactly the same on logical level. After looking at NULL and '' I came to a single conclusion: I don't know email address of the guy. Also no matter how hard i tried, I was not able to sent an e-mail using either NULL or empty string, so apparently most SMTP servers out there agree with my logic. So i tend to use NULL where i don't know the value and consider empty string a bad thing. After some intense discussions with colleagues i came with two questions: am I right in assuming that using empty string for an unknown value is causing a database to "lie" about the facts? To be more precise: using SQL's idea of what is value and what is not, I might come to conclusion: we have e-mail address, just by finding out it is not null. But then later on, when trying to send e-mail I'll come to contradictory conclusion: no, we don't have e-mail address, that @!#$ Database must have been lying! Is there any logical scenario in which an empty string '' could be such a good carrier of important information (besides value and no value), which would be troublesome/inefficient to store by any other way (like additional column). I've seen many posts claiming that sometimes it's good to use empty string along with real values and NULLs, but so far haven't seen a scenario that would be logical (in terms of SQL/DB design). P.S. Some people will be tempted to answer, that it is just a matter of personal taste. I don't agree. To me it is a design decision with important consequences. So i'd like to see answers where opion about this is backed by some logical and/or technical reasons.

Read the article
Excel 2007 pivot table does not aggregate properly

- by Patrick

I am using a an excel pivot table to summarize some data and just found a problem. The problem deals with how aggregate values are calculated. Let's say I have a table of data with three columns: Name, Date, Value. If I create a table where Name and then Date are used as Row Labels and Value is the aggregate value, ie Average. The pivot table will look something like this: +John .3450 5/14/2010 1.234 5/15/2010 3.450 5/16/2010 -3.25 What I think should be happening here is that the values for each date are averaged and then those values are averaged to come up with the value in the same row as the Name, John. But that is not what it does. It takes the average for each date, which it shows across from the date, but then instead of taking the average of those numbers, it actually uses the raw data and computes the average for all of John's values. It should show the average of the daily averages to correspond with the tree hierarchy, but instead just shows me the average for all of John's values. It essential will only aggregate at one level, but visually creates sub levels that it is not using. Does anyone know how to change this or understand by what logic this makes sense? Why would I create any sub groupings if I cannot compute aggregates on them?

Read the article
Advanced Data Source Engine coming to Telerik Reporting Q1 2010

This is the final blog post from the pre-release series. In it we are going to share with you some of the updates coming to our reporting solution in Q1 2010. A new Declarative Data Source Engine will be added to Telerik Reporting, that will allow full control over data management, and deliver significant gains in rendering performance and memory consumption. Some of the engines new features will be: Data source parameters - those parameters will be used to limit data retrieved from the data source to just the data needed for the report. Data source parameters are processed on the data source side, however only queried data is fetched to the reporting engine, rather than the full data source. This leads to lower memory consumption, because data operations are performed on queried data only, rather than on all data. As a result, only the queried data needs to be stored in the memory vs. the whole dataset, which was the case with the old approach Support for stored procedures - they will assist in achieving a consistent implementation of logic across applications, and are especially practical for performing repetitive tasks. A stored procedure stores the SQL statements and logic, which can then be executed in different reports and/or applications. Stored Procedures will not only save development time, but they will also improve performance, because each stored procedure is compiled on the data base server once, and then is reutilized. In Telerik Reporting, the stored procedure will also be parameterized, where elements of the SQL statement will be bound to parameters. These parameterized SQL queries will be handled through the data source parameters, and are evaluated at run time. Using parameterized SQL queries will improve the performance and decrease the memory footprint of your application, because they will be applied directly on the database server and only the necessary data will be downloaded on the middle tier or client machine; Calculated fields through expressions - with the help of the new reporting engine you will be able to use field values in formulas to come up with a calculated field. A calculated field is a user defined field that is computed "on the fly" and does not exist in the data source, but can perform calculations using the data of the data source object it belongs to. Calculated fields are very handy for adding frequently used formulas to your reports; Improved performance and optimized in-memory OLAP engine - the new data source will come with several improvements in how aggregates are calculated, and memory is managed. As a result, you may experience between 30% (for simpler reports) and 400% (for calculation-intensive reports) in rendering performance, and about 50% decrease in memory consumption. Full design time support through wizards - Declarative data sources are a great advance and will save developers countless hours of coding. In Q1 2010, and true to Telerik Reportings essence, using the new data source engine and its features requires little to no coding, because we have extended most of the wizards to support the new functionality. The newly extended wizards are available in VS2005/VS2008/VS2010 design-time. More features will be revealed on the product's what's new page when the new version is officially released in a few days. Also make sure you attend the free webinar on Thursday, March 11th that will be dedicated to the updates in Telerik Reporting Q1 2010. Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
Logging errors caused by exceptions deep in the application

- by Kaleb Pederson

What are best-practices for logging deep within an application's source? Is it bad practice to have multiple event log entries for a single error? For example, let's say that I have an ETL system whose transform step involves: a transformer, pipeline, processing algorithm, and processing engine. In brief, the transformer takes in an input file, parses out records, and sends the records through the pipeline. The pipeline aggregates the results of the processing algorithm (which could do serial or parallel processing). The processing algorithm sends each record through one or more processing engines. So, I have at least four levels: Transformer - Pipeline - Algorithm - Engine. My code might then look something like the following: class Transformer { void Process(InputSource input) { try { var inRecords = _parser.Parse(input.Stream); var outRecords = _pipeline.Transform(inRecords); } catch (Exception ex) { var inner = new ProcessException(input, ex); _logger.Error("Unable to parse source " + input.Name, inner); throw inner; } } } class Pipeline { IEnumerable<Result> Transform(IEnumerable<Record> records) { // NOTE: no try/catch as I have no useful information to provide // at this point in the process var results = _algorithm.Process(records); // examine and do useful things with results return results; } } class Algorithm { IEnumerable<Result> Process(IEnumerable<Record> records) { var results = new List<Result>(); foreach (var engine in Engines) { foreach (var record in records) { try { engine.Process(record); } catch (Exception ex) { var inner = new EngineProcessingException(engine, record, ex); _logger.Error("Engine {0} unable to parse record {1}", engine, record); throw inner; } } } } } class Engine { Result Process(Record record) { for (int i=0; i<record.SubRecords.Count; ++i) { try { Validate(record.subRecords[i]); } catch (Exception ex) { var inner = new RecordValidationException(record, i, ex); _logger.Error( "Validation of subrecord {0} failed for record {1}", i, record ); } } } } There's a few important things to notice: A single error at the deepest level causes three log entries (ugly? DOS?) Thrown exceptions contain all important and useful information Logging only happens when failure to do so would cause loss of useful information at a lower level. Thoughts and concerns: I don't like having so many log entries for each error I don't want to lose important, useful data; the exceptions contain all the important but the stacktrace is typically the only thing displayed besides the message. I can log at different levels (e.g., warning, informational) The higher level classes should be completely unaware of the structure of the lower-level exceptions (which may change as the different implementations are replaced). The information available at higher levels should not be passed to the lower levels. So, to restate the main questions: What are best-practices for logging deep within an application's source? Is it bad practice to have multiple event log entries for a single error?

Read the article
Designing status management for a file processing module

- by bot

The background One of the functionality of a product that I am currently working on is to process a set of compressed files ( containing XML files ) that will be made available at a fixed location periodically (local or remote location - doesn't really matter for now) and dump the contents of each XML file in a database. I have taken care of the design for a generic parsing module that should be able to accommodate the parsing of any file type as I have explained in my question linked below. There is no need to take a look at the following link to answer my question but it would definitely provide a better context to the problem Generic file parser design in Java using the Strategy pattern The Goal I want to be able to keep a track of the status of each XML file and the status of each compressed file containing the XML files. I can probably have different statuses defined for the XML files such as NEW, PROCESSING, LOADING, COMPLETE or FAILED. I can derive the status of a compressed file based on the status of the XML files within the compressed file. e.g status of the compressed file is COMPLETE if no XML file inside the compressed file is in a FAILED state or status of the compressed file is FAILED if the status of at-least one XML file inside the compressed file is FAILED. A possible solution The Model I need to maintain the status of each XML file and the compressed file. I will have to define some POJOs for holding the information about an XML file as shown below. Note that there is no need to store the status of a compressed file as the status of a compressed file can be derived from the status of its XML files. public class FileInformation { private String compressedFileName; private String xmlFileName; private long lastModifiedDate; private int status; public FileInformation(final String compressedFileName, final String xmlFileName, final long lastModified, final int status) { this.compressedFileName = compressedFileName; this.xmlFileName = xmlFileName; this.lastModifiedDate = lastModified; this.status = status; } } I can then have a class called StatusManager that aggregates a Map of FileInformation instances and provides me the status of a given file at any given time in the lifetime of the appliciation as shown below : public class StatusManager { private Map<String,FileInformation> processingMap = new HashMap<String,FileInformation>(); public void add(FileInformation fileInformation) { fileInformation.setStatus(0); // 0 will indicates that the file is in NEW state. 1 will indicate that the file is in process and so on.. processingMap.put(fileInformation.getXmlFileName(),fileInformation); } public void update(String filename,int status) { FileInformation fileInformation = processingMap.get(filename); fileInformation.setStatus(status); } } That takes care of the model for the sake of explanation. So whats my question? Edited after comments from Loki and answer from Eric : - I would like to know if there are any existing design patterns that I can refer to while coming up with a design. I would also like to know how I should go about designing the status management classes. I am more interested in understanding how I can model the status management classes. I am not interested in how other components are going to be updated about a change in status at the moment as suggested by Eric.

Read the article
Big Data – ClustrixDB – Extreme Scale SQL Database with Real-time Analytics, Releases Software Download – NewSQL

- by Pinal Dave

There are so many things to learn and there is so little time we all have. As we have little time we need to be selective to learn whatever we learn. I believe I know quite a lot of things in SQL but I still do not know what is around SQL. I have started to learn about NewSQL recently. If you wonder what is NewSQL I encourage all of you to read my blog post about NewSQL over here Big Data – Buzz Words: What is NewSQL – Day 10 of 21. NewSQL databases are quickly becoming popular – providing the scale of NoSQL with the SQL features and transactions. As a part of learning NewSQL database, I have recently started to learn about ClustrixDB. ClustrixDB has been the most mature NewSQL database used by some of the largest internet sites in the world for over 3 years, with extensive SQL support. In addition to scale, it provides fast real-time analytics by bringing massively parallel processing (MPP), available only in warehousing databases, to the transactional database. The reason I am more intrigued about learning ClustrixDB is their recent announcement on Oct 31. ClustrixDB was only available as an appliance, but now with their software release on Oct 31, everyone can use it. It is now available as forever free for up to 12 cores with community support, and there is a 45 day trial for unlimited cluster sizes. With the forever free world, I am indeed interested in ClustrixDB now. I know that few of the leading eCommerce sites in the world uses them for their transactional database. Here are few of the details I have quickly noted for ClustrixDB. ClustrixDB allows user to: Scale by simply adding nodes to the cluster with a single command Run billions of transactions a day Run fast real-time analytics Achieve high-availability with recovery from node failure Manages itself Easily migrate from MySQL as it is nearly plug-and-play compatible, use MySQL drivers, tools and replication. While I was going through the documentation I realized that ClustrixDB also has extensive support for SQL features including complex queries involving joins on a dozen or more tables, aggregates, sorts, sub-queries. It also supports stored procedures, triggers, foreign keys, partitioned and temporary tables, and fully online schema changes. It is indeed a very matured product and SQL solution. Indeed Clusterix sound very promising solution, I decided to dig a bit deeper to understand who are current customers of the Clustrix as they exist in the industry for quite a few years. Their client list is indeed very interesting and here is my quick research about them. Twoo.com – Europe’s largest social discovery (dating) site runs 4.4 Billion Transactions a day with table sizes over a Terabyte, on a 168 core cluster. EngageBDR – Top 3 in the online advertising category uses ClustrixDB to serve 6.9 billion ads a day through real-time bidding platform. Their reports went from 4 hours to 15 seconds. NoMoreRack – Top 2 fastest growing e-commerce company in US used ClustrixDB for high availability and fast growth through Amazon cloud. MakeMyTrip – India’s leading travel site runs on ClustrixDB with two clusters running as multi-master in Chennai and Bangalore. Many enterprises such as AOL, CSC, Rakuten, Symantec use ClustrixDB when their applications need scale. I must accept that I am impressed with the information I have learned so far and now is the time to do some hand’s on experience with their product. I want to learn this technology so in future when it is about NewSQL, I know what I am talking about. Read more why Clustrix explains why you ClustrixDB might be the right database for you. Download ClustrixDB with me today and install it on your machine so in future when we discuss the technical aspects of it, we all are on the same page. The software can be downloaded here. Reference : Pinal Dave (http://blog.SQLAuthority.com)Filed under: Big Data, MySQL, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Clustrix

Read the article
The five steps of business intelligence adoption: where are you?

- by Red Gate Software BI Tools Team

When I was in Orlando and New York last month, I spoke to a lot of business intelligence users. What they told me suggested a path of BI adoption. The user’s place on the path depends on the size and sophistication of their organisation. Step 1: A company with a database of customer transactions will often want to examine particular data, like revenue and unit sales over the last period for each product and territory. To do this, they probably use simple SQL queries or stored procedures to produce data on demand. Step 2: The results from step one are saved in an Excel document, so business users can analyse them with filters or pivot tables. Alternatively, SQL Server Reporting Services (SSRS) might be used to generate a report of the SQL query for display on an intranet page. Step 3: If these queries are run frequently, or business users want to explore data from multiple sources more freely, it may become necessary to create a new database structured for analysis rather than CRUD (create, retrieve, update, and delete). For example, data from more than one system — plus external information — may be incorporated into a data warehouse. This can become ‘one source of truth’ for the business’s operational activities. The warehouse will probably have a simple ‘star’ schema, with fact tables representing the measures to be analysed (e.g. unit sales, revenue) and dimension tables defining how this data is aggregated (e.g. by time, region or product). Reports can be generated from the warehouse with Excel, SSRS or other tools. Step 4: Not too long ago, Microsoft introduced an Excel plug-in, PowerPivot, which allows users to bring larger volumes of data into Excel documents and create links between multiple tables. These BISM Tabular documents can be created by the database owners or other expert Excel users and viewed by anyone with Excel PowerPivot. Sometimes, business users may use PowerPivot to create reports directly from the primary database, bypassing the need for a data warehouse. This can introduce problems when there are misunderstandings of the database structure or no single ‘source of truth’ for key data. Step 5: Steps three or four are often enough to satisfy business intelligence needs, especially if users are sophisticated enough to work with the warehouse in Excel or SSRS. However, sometimes the relationships between data are too complex or the queries which aggregate across periods, regions etc are too slow. In these cases, it can be necessary to formalise how the data is analysed and pre-build some of the aggregations. To do this, a business intelligence professional will typically use SQL Server Analysis Services (SSAS) to create a multidimensional model — or “cube” — that more simply represents key measures and aggregates them across specified dimensions. Step five is where our tool, SSAS Compare, becomes useful, as it helps review and deploy changes from development to production. For us at Red Gate, the primary value of SSAS Compare is to establish a dialog with BI users, so we can develop a portfolio of products that support creation and deployment across a range of report and model types. For example, PowerPivot and the new BISM Tabular model create a potential customer base for tools that extend beyond BI professionals. We’re interested in learning where people are in this story, so we’ve created a six-question survey to find out. Whether you’re at step one or step five, we’d love to know how you use BI so we can decide how to build tools that solve your problems. So if you have a sixty seconds to spare, tell us on the survey!

Read the article
Should I expose IObservable<T> on my interfaces?

- by Alex

My colleague and I have dispute. We are writing a .NET application that processes massive amounts of data. It receives data elements, groups subsets of them into blocks according to some criterion and processes those blocks. Let's say we have data items of type Foo arriving some source (from the network, for example) one by one. We wish to gather subsets of related objects of type Foo, construct an object of type Bar from each such subset and process objects of type Bar. One of us suggested the following design. Its main theme is exposing IObservable objects directly from the interfaces of our components. // ********* Interfaces ********** interface IFooSource { // this is the event-stream of objects of type Foo IObservable<Foo> FooArrivals { get; } } interface IBarSource { // this is the event-stream of objects of type Bar IObservable<Bar> BarArrivals { get; } } / ********* Implementations ********* class FooSource : IFooSource { // Here we put logic that receives Foo objects from the network and publishes them to the FooArrivals event stream. } class FooSubsetsToBarConverter : IBarSource { IFooSource fooSource; IObservable<Bar> BarArrivals { get { // Do some fancy Rx operators on fooSource.FooArrivals, like Buffer, Window, Join and others and return IObservable<Bar> } } } // this class will subscribe to the bar source and do processing class BarsProcessor { BarsProcessor(IBarSource barSource); void Subscribe(); } // ******************* Main ************************ class Program { public static void Main(string[] args) { var fooSource = FooSourceFactory.Create(); var barsProcessor = BarsProcessorFactory.Create(fooSource) // this will create FooSubsetToBarConverter and BarsProcessor barsProcessor.Subscribe(); fooSource.Run(); // this enters a loop of listening for Foo objects from the network and notifying about their arrival. } } The other suggested another design that its main theme is using our own publisher/subscriber interfaces and using Rx inside the implementations only when needed. //********** interfaces ********* interface IPublisher<T> { void Subscribe(ISubscriber<T> subscriber); } interface ISubscriber<T> { Action<T> Callback { get; } } //********** implementations ********* class FooSource : IPublisher<Foo> { public void Subscribe(ISubscriber<Foo> subscriber) { /* ... */ } // here we put logic that receives Foo objects from some source (the network?) publishes them to the registered subscribers } class FooSubsetsToBarConverter : ISubscriber<Foo>, IPublisher<Bar> { void Callback(Foo foo) { // here we put logic that aggregates Foo objects and publishes Bars when we have received a subset of Foos that match our criteria // maybe we use Rx here internally. } public void Subscribe(ISubscriber<Bar> subscriber) { /* ... */ } } class BarsProcessor : ISubscriber<Bar> { void Callback(Bar bar) { // here we put code that processes Bar objects } } //********** program ********* class Program { public static void Main(string[] args) { var fooSource = fooSourceFactory.Create(); var barsProcessor = barsProcessorFactory.Create(fooSource) // this will create BarsProcessor and perform all the necessary subscriptions fooSource.Run(); // this enters a loop of listening for Foo objects from the network and notifying about their arrival. } } Which one do you think is better? Exposing IObservable and making our components create new event streams from Rx operators, or defining our own publisher/subscriber interfaces and using Rx internally if needed? Here are some things to consider about the designs: In the first design the consumer of our interfaces has the whole power of Rx at his/her fingertips and can perform any Rx operators. One of us claims this is an advantage and the other claims that this is a drawback. The second design allows us to use any publisher/subscriber architecture under the hood. The first design ties us to Rx. If we wish to use the power of Rx, it requires more work in the second design because we need to translate the custom publisher/subscriber implementation to Rx and back. It requires writing glue code for every class that wishes to do some event processing.

Read the article
Implement DDD and drawing the line between the an Entity and value object

- by William

I am implementing an EMR project. I would like to apply a DDD based approach to the problem. I have identified the "Patient" as being the core object of the system. I understand Patient would be an entity object as well as an aggregrate. I have also identified that every patient must have a "Doctor" and "Medical Records". The medical records would encompass Labs, XRays, Encounter.... I believe those would be entity objects as well. Let us take a Encounter for example. My implementation currently has a few fields as "String" properties, which are the complaint, assessment and plan. The other items necessary for an Encounter are vitals. I have implemented vitals as a value object. Given that it will be necessary to retrieve vitals without haveing to retrieve each Encounter then do vitals become part of the Encounter aggregate and patient aggregrate. I am assuming I could view the Encounter as an aggregrate, because other items are spwaned from the Encounter like prescriptions, lab orders, xrays. Is approach right that I am taking in identifying my entities and aggregates. In the case of vitals, they are specific to a patient, but outside of that there is not any other identity associated with them.

Read the article
ORM Persistence by Reachability violates Aggregate Root Boundaries?

- by Johannes Rudolph

Most common ORMs implement persistence by reachability, either as the default object graph change tracking mechanism or an optional. Persistence by reachability means the ORM will check the aggregate roots object graph and determines wether any objects are (also indirectly) reachable that are not stored inside it's identity map (Linq2Sql) or don't have their identity column set (NHibernate). In NHibernate this corresponds to cascade="save-update", for Linq2Sql it is the only supported mechanism. They do both, however only implement it for the "add" side of things, objects removed from the aggregate roots graph must be marked for deletion explicitly. In a DDD context one would use a Repository per Aggregate Root. Objects inside an Aggregate Root may only hold references to other Aggregate Roots. Due to persistence by reachability it is possible this other root will be inserted in the database event though it's corresponding repository wasn't invoked at all! Consider the following two Aggregate Roots: Contract and Order. Request is part of the Contract Aggregate. The object graph looks like Contract->Request->Order. Each time a Contractor makes a request, a corresponding order is created. As this involves two different Aggregate Roots, this operation is encapsulated by a Service. //Unit Of Work begins Request r = ...; Contract c = ContractRepository.FindSingleByKey(1); Order o = OrderForRequest(r); // creates a new order aggregate r.Order = o; // associates the aggregates c.Request.Add(r); ContractRepository.SaveOrUpdate(c); // OrderAggregate is reachable and will be inserted Since this Operation happens in a Service, I could still invoke the OrderRepository manually, however I wouldn't be forced to!. Persistence by reachability is a very useful feature inside Aggregate Roots, however I see no way to enforce my Aggregate Boundaries.

Read the article
ProgrammingError when aggregating over an annotated & grouped Django ORM query

- by ento

I'm trying to construct a query to get the "average, maximum, minimum number of items purchased by a single user". The data source is this simple sales record table: class SalesRecord(models.Model): id = models.IntegerField(primary_key=True) user_id = models.IntegerField() product_code = models.CharField() price = models.IntegerField() created_at = models.DateTimeField() A new record is inserted into this table for every item purchased by a user. Here's my attempt at building the query: q = SalesRecord.objects.all() q = q.values('user_id').annotate( # group by user and count the # of records count=Count('id'), # (= # of items) ).order_by() result = q.aggregate(Max('count'), Min('count'), Avg('count')) When I try to execute the code, a ProgrammingError is raised at the last line: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM (SELECT sales_records.user_id AS user_id, COUNT(sales_records.`' at line 1") Django's error screen shows that the SQL is SELECT FROM (SELECT `sales_records`.`player_id` AS `player_id`, COUNT(`sales_records`.`id`) AS `count` FROM `sales_records` WHERE (`sales_records`.`created_at` >= %s AND `sales_records`.`created_at` <= %s ) GROUP BY `sales_records`.`player_id` ORDER BY NULL) subquery It's not selecting anything! Can someone please show me the right way to do this? Hacking Django I've found that clearing the cache of selected fields in django.db.models.sql.BaseQuery.get_aggregation() seems to solve the problem. Though I'm not really sure this is a fix or a workaround. @@ -327,10 +327,13 @@ # Remove any aggregates marked for reduction from the subquery # and move them to the outer AggregateQuery. + self._aggregate_select_cache = None + self.aggregate_select_mask = None for alias, aggregate in self.aggregate_select.items(): if aggregate.is_summary: query.aggregate_select[alias] = aggregate - del obj.aggregate_select[alias] + if alias in obj.aggregate_select: + del obj.aggregate_select[alias] ... yields result: {'count__max': 267, 'count__avg': 26.2563, 'count__min': 1}

Read the article
Resources related to data-mining and gaming on social networks

- by darren

Hi all I'm interested in the problem of patterning mining among players of social networking games. For example detecting cheaters of a game, given a company's user database. So far I have been following the usual recipe for a data mining project: construct a data warehouse that aggregates significant information select a classifier, and train it with a subsectio of records from the warehouse validate classifier with another test set lather, rinse, repeat Surprisingly, I've found very little in this area regarding literature, best practices, etc. I am hoping to crowdsource the information gathering problem here. Specifically what I'm looking for: What classifiers have worked will for this type of pattern mining (it seems highly temporal, users playing games, users receiving rewards, users transferring prizes etc). Are there any highly agreed upon attributes specific to social networking / gaming data? What is a practical amount of information that should be considered? One problem I've run into is data overload, where queries and data cleansing may take days to complete. Related to point above, what hardware resources are required to produce results? I've found it difficult to estimate the amount of computing power I will require for production use. It has become apparent that a white box in the corner does not have enough horse-power for such a project. Are companies generally resorting to cloud solutions? Are they buying clusters? Basically, any resources (theoretical, academic, or practical) about implementing a social networking / gaming pattern-mining program would be very much appreciated. Thanks.

Read the article

Search Results

Search found 118 results on 5 pages for 'aggregates'.

Page 3/5 | < Previous Page | 1 2 3 4 5 | Next Page >

- by user109129

- by Thomas S.

- by CMP

- by voldomazta

- by HurnsMobile

- by Erik Ashepa

- by Mal Ross

- by Christian Vik

- by Clay Lenhart

- by user335870

- by Hooked

- by Daniel Quinn

- by Margaret

- by Jacek Prucia

- by Patrick

- by Kaleb Pederson

- by bot

- by Pinal Dave

- by Red Gate Software BI Tools Team

- by Alex

- by William

- by Johannes Rudolph

- by ento

- by darren

< Previous Page | 1 2 3 4 5 | Next Page >