Search Results

Search found 118 results on 5 pages for 'aggregates'.

Page 4/5 | < Previous Page | 1 2 3 4 5  | Next Page >

  • Service Oriented Architecture & Domain-Driven Design

    - by Michael
    I've always developed code in a SOA type of way. This year I've been trying to do more DDD but I keep getting the feeling that I'm not getting it. At work our systems are load balanced and designed not to have state. The architecture is: Website ===Physical Layer== Main Service ==Physical Layer== Server 1/Service 2/Service 3/Service 4 Only Server 1,Service 2,Service 3 and Service 4 can talk to the database and the Main Service calls the correct service based on products ordered. Every physical layer is load balanced too. Now when I develop a new service, I try to think DDD in that service even though it doesn't really feel like it fits. I use good DDD principles like entities, value types, repositories, aggregates, factories and etc. I've even tried using ORM's but they just don't seem like they fit in a stateless architecture. I know there are ways around it, for example use IStatelessSession instead of ISession with NHibernate. However, ORM just feel like they don't fit in a stateless architecture. I've noticed I really only use some of the concepts and patterns DDD has taught me but the overall architecture is still SOA. I am starting to think DDD doesn't fit in large systems but I do think some of the patterns and concepts do fit in large systems. Like I said, maybe I'm just not grasping DDD or maybe I'm over analyzing my designs? Maybe by using the patterns and concepts DDD has taught me I am using DDD? Not sure if there is really a question to this post but more of thoughts I've had when trying to figure out where DDD fits in overall systems and how scalable it truly is. The truth is, I don't think I really even know what DDD is?

    Read the article

  • ASP.NET: aggregating validators in a user control

    - by orsogufo
    I am developing a web application where I would like to perform a set of validations on a certain field (an account name in the specific case). I need to check that the value is not empty, matches a certain pattern and is not already used. I tried to create a UserControl that aggregates a RequiredFieldValidator, a RegexValidator and a CustomValidator, then I created a ControlToValidate property like this: public partial class AccountNameValidator : System.Web.UI.UserControl { public string ControlToValidate { get { return ViewState["ControlToValidate"] as string; } set { ViewState["ControlToValidate"] = value; AccountNameRequiredFieldValidator.ControlToValidate = value; AccountNameRegexValidator.ControlToValidate = value; AccountNameUniqueValidator.ControlToValidate = value; } } } However, if I insert the control on a page and set ControlToValidate to some control ID, when the page loads I get an error that says Unable to find control id 'AccountName' referenced by the 'ControlToValidate' property of 'AccountNameRequiredFieldValidator', which makes me think that the controls inside my UserControl cannot resolve correctly the controls in the parent page. So, I have two questions: 1) Is it possible to have validator controls inside a UserControl validate a control in the parent page? 2) Is it correct and good practice to "aggregate" multiple validator controls in a UserControl? If not, what is the standard way to proceed?

    Read the article

  • Architectural conundrum

    - by Dejan
    The worst thing when working on a one man project is the lack of input that you usually get from your coworkers. And because of the lack of that you tend to make obvious mistakes. After going down that road for some time I would need some help from the community. I started a little home-brew project that should turn into a portal of some sorts. And the main thing that is bothering me is the persistence layer that i have concocted. It should be completely separated from the presentation layer for starters and a OR mapper is also somewhere. This is because I have multiple data stores that have to be used. So the base idea was that the individual "repositories" operate each on their individual database and that the business layer then aggregates the business objects which are then transformed in the presentation layer into view objects. The main problem I face is the following: Multiple classes for the same concept - There is a DAL representation of a user and BL representation of user and a view representation of a user. I can handle the transformation with a tool but is this really the right way. I mean they are all nicely separated, but the overhead is quite something. What do you think? Am I going too deep into the separation of concern rabbit hole or is this still normal?

    Read the article

  • Copy SQL Server data from one server to another on a schedule

    - by rwmnau
    I have a pair of SQL Servers at different webhosts, and I'm looking for a way to periodically update the one server using the other. Here's what I'm looking for: As automated as possible - ideally, without any involvement on my part once it's set up. Pushes a number of databases, in their entirely (including any schema changes) from one server to the other Freely allows changes on the source server without breaking my process. For this reason, I don't want to use replication, as I'd have to break it every time there's an update on the source, and then recreate the publication and subscription One database is about 4GB in size and contains binary data. I'm not sure if there's a way to export this to a script, but it would be a mammoth file if I did. Originally, I was thinking of writing something that takes a scheduled full backup of each database, FTPs the backups from one server to the other once they're done, and then the new server picks it up and restores it. The only downside I can see to this is that there's no way to know that the backups are done before starting to transfer them - can these backups be done synchronously? Also, the server being refreshes is our test server, so if there's some downtime involved in moving the data, that's fine. Does anybody out there have a better idea, or is what I'm currently considering the best non-replication way to go? Thanks for your help, everybody. UPDATE: I ended up designing a custom solution to get this done using BAT files, 7Zip,command line FTP, and OSQL, so it runs in a completely automatic way and aggregates the data from a dozen servers across the country. I've detailed the steps in a blog entry. Thanks for all your input!

    Read the article

  • Maven - 'all' or 'parent' project for aggregation?

    - by disown
    For educational purposes I have set up a project layout like so (flat in order to suite eclipse better): -product | |-parent |-core |-opt |-all Parent contains an aggregate project with core, opt and all. Core implements the mandatory part of the application. Opt is an optional part. All is supposed to combine core with opt, and has these two modules listed as dependencies. I am now trying to make the following artifacts: product-core.jar product-core-src.jar product-core-with-dependencies.jar product-opt.jar product-opt-src.jar product-opt-with-dependencies.jar product-all.jar product-all-src.jar product-all-with-dependencies.jar Most of them are fairly straightforward to produce. I do have some problem with the aggregating artifacts though. I have managed to make the product-all-src.jar with a custom assembly descriptor in the 'all' module which downloads the sources for all non-transitive deps, and this works fine. This technique also allows me to make the product-all-with-dependencies.jar. I however recently found out that you can use the source:aggregate goal in the source plugin to aggregate sources of the entire aggregate project. This is also true for the javadoc plugin, which also aggregates through the usage of the parent project. So I am torn between my 'all' module approach and ditching the 'all' module and just use the 'parent' module for all aggregation. It feels unclean to have some aggregate artifacts produced in 'parent', and others produced in 'all'. Is there a way of making an 'product-all' jar in the parent project, or to aggregate javadoc in the 'all' project? Or should I just keep both? Thanks

    Read the article

  • What database systems should an startup company consider?

    - by Am
    Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough. This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?

    Read the article

  • How should rules for Aggregate Roots be enforced?

    - by MylesRip
    While searching the web, I came across a list of rules from Eric Evans' book that should be enforced for aggregates: The root Entity has global identity and is ultimately responsible for checking invariants Root Entities have global identity. Entities inside the boundary have local identity, unique only within the Aggregate. Nothing outside the Aggregate boundary can hold a reference to anything inside, except to the root Entity. The root Entity can hand references to the internal Entities to other objects, but they can only use them transiently (within a single method or block). Only Aggregate Roots can be obtained directly with database queries. Everything else must be done through traversal. Objects within the Aggregate can hold references to other Aggregate roots. A delete operation must remove everything within the Aggregate boundary all at once When a change to any object within the Aggregate boundary is committed, all invariants of the whole Aggregate must be satisfied. This all seems fine in theory, but I don't see how these rules would be enforced in the real world. Take rule 3 for example. Once the root entity has given an exteral object a reference to an internal entity, what's to keep that external object from holding on to the reference beyond the single method or block? (If the enforcement of this is platform-specific, I would be interested in knowing how this would be enforced within a C#/.NET/NHibernate environment.)

    Read the article

  • Sort queryset by a generic foreign key (django)?

    - by thornomad
    I am using Django's comment framework which utilizes generic foreign keys. Question: How do I sort a given model's queryset by their comment count using the generic foreign key lookup? Reading the django docs on the subject it says one needs to calculate them not using the aggregation API: Django's database aggregation API doesn't work with a GenericRelation. [...] For now, if you need aggregates on generic relations, you'll need to calculate them without using the aggregation API. The only way I can think of, though, would be to iterate through my queryset, generate a list with content_type and object_id's for each item, then run a second queryset on the Comment model filtering by this list of content_type and object_id ... sort the objects by the count, then re-create a new queryset in this order by pulling the content_object for each comment ... This just seems wrong and I'm not even sure how to pull it off. Ideas? Someone must have done this before. I found this post online but it requires me handwriting SQL -- is that really necessary ?

    Read the article

  • Can I split a single SQL 2008 DB Table into multiple filegroups, based on a discriminator column?

    - by Pure.Krome
    Hi folks, I've got a SQL Server 2008 R2 database which has a number of tables. Two of these tables contains a lot of large data .. mainly because one of them is VARBINARY(MAX) and the sister table is GEOGRAPHY. (Why two tables? Read Below if you're interested***) The data in these tables are geospatial shapes, such as zipcode boundaries. Now, the first 70K odd rows are for DataType = 1 the rest 5mil rows are for DataType = 2 Now, is it possible to split the table data into two files? so all rows that are for DataType != 2 goes into File_A and DataType = 2 goes into File_B? This way, when I backup the DB, I can skip adding File_B so my download is waaaaay smaller? Is this possible? I guessing you might be thinking - why not keep them as TWO extra tables? Mainly because in the code, the data is conceptually the same .. it's just happens that I want to split the storage of this model data. It really messes up my model if I now how two aggregates in my model, instead of one. ***Entity Framework doesn't like Tables with GEOGRAPHY, so i have to create a new table which transforms the GEOGRAPHY to VARBINARY, and then drop that into EF.

    Read the article

  • Is it possible to aggregate over differing where clauses?

    - by BenAlabaster
    Is it possible to calculate multiple aggregates based on differing where clauses? For instance: Let's say I have two tables, one for Invoice and one for InvoiceLineItems. The invoice table has a total field for the invoice total, and each of the invoice line item records in the InvoiceLineItems table contains a field that denotes whether the line item is discountable or not. I want three sum totals, one where Discountable = 0 and one where Discountable = 1 and one where Discountable is irrelevant. Such that my output would be: InvoiceNumber Total DiscountableTotal NonDiscountableTotal ------------- ----- ----------------- -------------------- 1 53.27 27.27 16.00 2 38.94 4.76 34.18 3... The only way I've found so far is by using something like: Select i.InvoiceNumber, i.Total, t0.Total As DiscountableTotal, t1.Total As NonDiscountableTotal From Invoices i Left Join ( Select InvoiceNumber, Sum(Amount), From InvoiceLineItems Where Discountable = 0 Group By InvoiceNumber ) As t0 On i.InvoiceNumber = t0.InvoiceNumber Left Join ( Select InvoiceNumber, Sum(Amount) From InvoiceLineItems Where Discountable = 1 Group By InvoiceNumber ) As t1 On i.InvoiceNumber = t1.InvoiceNumber This seems somewhat cumbersome, it would be nice if I could do something like: Select InvoiceNumber, Sum(Amount) Where Discountable = 1 As Discountable Sum(Amount) Where Discountable = 0 As NonDiscountable Group By InvoiceNumber I realize that SQL is completely invalid, but it logically portrays what I'm trying to do... TIA P.S. I need this to run on a SQL Server 2000 instance, but I am also interested (for future reference) if/how I would achieve this on SQL Server 2005/2008.

    Read the article

  • Conceptual data modeling: Is RDF the right tool? Other solutions?

    - by paprika
    I'm planning a system that combines various data sources and lets users do simple queries on these. A part of the system needs to act as an abstraction layer that knows all connected data sources: the user shouldn't [need to] know about the underlying data "providers". A data provider could be anything: a relational DBMS, a bug tracking system, ..., a weather station. They are hooked up to the query system through a common API that defines how to "offer" data. The type of queries a certain data provider understands is given by its "offer" (e.g. I know these entities, I can give you aggregates of type X for relationship Y, ...). My concern right now is the unification of the data: the various data providers need to agree on a common vocabulary (e.g. the name of the entity "customer" could vary across different systems). Thus, defining a high level representation of the entities and their relationships is required. So far I have the following requirements: I need to be able to define objects and their properties/attributes. Further, arbitrary relations between these objects need to be represented: a verb that defines the nature of the relation (e.g. "knows"), the multiplicity (e.g. 1:n) and the direction/navigability of the relation. It occurs to me that RDF is a viable option, but is it "the right tool" for this job? What other solutions/frameworks do exist for semantic data modeling that have a machine readable representation and why are they better suited for this task? I'm grateful for every opinion and pointer to helpful resources.

    Read the article

  • Tigther code - javascript object array

    - by Scott Silvi
    Inside the callback of a $.getJSON call, I have the code outlined below. The first for block aggregates 'total' & assigns values to sov[i]. The map function calculates the percentage of total. I then instantiate a variable called sovData. With the jQuery Flot graph, any objects that are empty aren't added to the pie chart, so this works for up to 7 different slices/datasets. What I'd like to do is only initialize the ones I need (e.g. sovData would have up to 'howMany - 1' (kws.length -1 ) objects inside of it, likely via something similar to dashboards[i] & sov[i]. How would I do this? Code: var sov = [], howMany = kws.length, total = 0, i = 0; for ( i; i < howMany; i++) { total += sov[ i ] = +parseInt(data.sov['sov' + ( i+1 ) ],10) || 0; } var dashboards = data.dashboards; sov = $.map( sov, function(v) { var s = Math.round( ( (v / total) * 10e3 ) / 100); return s < 1 ? 1 : s; }); var sovData = [{ label : dashboards[0], data : sov[0] }, { label : dashboards[1], data : sov[1] }, { label : dashboards[2], data : sov[2] }, { label : dashboards[3], data : sov[3] }, { label : dashboards[4], data : sov[4] }, { label : dashboards[5], data : sov[5] }, { label : dashboards[6], data : sov[6] } ]

    Read the article

  • debugging JBoss 100% CPU usage

    - by Nate
    We are using JBoss to run two of our WARs. One is our web app, the other is our web service. The web app accesses a database on another machine and makes requests to the web service. The web service makes JMS requests to other machines, aggregates the data, and returns it. At our biggest client, about once a month the JBoss Java process takes 100% of all CPUs. The machine running JBoss has 8 CPUs. Our web app is still accessible during this time, however pages take about 3 minutes to load. Restarting JBoss restores everything to normal. The database machine and all the other machines are fine, only the machine running JBoss is affected. Memory usage is normal. Network utilization is normal. There are no suspect error messages in the JBoss logs. I have set up a test environment as close as possible to the client's production environment and I've done load testing with as much as 2x the number of concurrent users. I have not gotten my test environment to replicate the problem. Where do we go from here? How can we narrow down the problem? Currently the only plan we have is to wait until the problem occurs in production on its own, then do some debugging to determine the cause. So far people have just restarted JBoss when the problem occurred to minimize down time. Next time it happens they will get a developer to take a look. The question is, next time it happens, what can be done to determine the cause? We could setup a separate JBoss instance on the same box and install the web app separately from the web service. This way when the problem next occurs we will know which WAR has the problem (assuming it is our code). This doesn't narrow it down much though. Should I enable JMX remote? This way the next time the problem occurs I can connect with VisualVM and see which threads are taking the CPU and what the hell they are doing. However, is there a significant down side to enabling JMX remote in a production environment? Is there another way to see what threads are eating the CPU and to get a stacktrace to see what they are doing? Any other ideas? Thanks!

    Read the article

  • Integrating Code Metrics in TFS 2010 Build

    - by Jakob Ehn
    The build process template and custom activity described in this post is available here: http://cid-ee034c9f620cd58d.office.live.com/self.aspx/BlogSamples/CodeMetricsSample.zip Running code metrics has been available since VS 2008, but only from inside the IDE. Yesterday Microsoft finally releases a Visual Studio Code Metrics Power Tool 10.0, a command line tool that lets you run code metrics on your applications.  This means that it is now possible to perform code metrics analysis on the build server as part of your nightly/QA builds (for example). In this post I will show how you can run the metrics command line tool, and also a custom activity that reads the output and appends the results to the build log, and also fails he build if the metric values exceeds certain (configurable) treshold values. The code metrics tool analyzes all the methods in the assemblies, measuring cyclomatic complexity, class coupling, depth of inheritance and lines of code. Then it calculates a Maintainability Index from these values that is a measure f how maintanable this method is, between 0 (worst) and 100 (best). For information on hwo this value is calculated, see http://blogs.msdn.com/b/codeanalysis/archive/2007/11/20/maintainability-index-range-and-meaning.aspx. After this it aggregates the information and present it at the class, namespace and module level as well. Running Metrics.exe in a build definition Running the actual tool is easy, just use a InvokeProcess activity last in the Compile the Project sequence, reference the metrics.exe file and pass the correct arguments and you will end up with a result XML file in the drop directory. Here is how it is done in the attached build process template: In the above sequence I first assign the path to the code metrics result file ([BinariesDirectory]\result.xml) to a variable called MetricsResultFile, which is then sent to the InvokeProcess activity in the Arguments property. Here are the arguments for the InvokeProcess activity: Note that we tell metrics.exe to analyze all assemblies located in the Binaries folder. You might want to do some more intelligent filtering here, you probably don’t want to analyze all 3rd party assemblies for example. Note also the path to the metrics.exe, this is the default location when you install the Code Metrics power tool. You must of course install the power tool on all build servers. Using the standard output logging (in the Handle Standard Output/Handle Error Output sections), we get the following output when running the build: Integrating Code Metrics into the build Having the results available next to the build result is nice, but we want to have results integrated in the build result itself, and also to affect the outcome of the build. The point of having QA builds that measure, for example, code metrics is to make it very clear how the code being built measures up to the standards of the project/company. Just having a XML file available in the drop location will not cause the developers to improve their code, but a (partially) failing build will! To do this, we need to write a custom activity that parses the metrics result file, logs it to the build log and fails the build if the values frfom the metrics is below/above some predefined treshold values. The custom activity performs the following steps Parses the XML. I’m using Linq 2 XSD for this, since the XML schema for the result file is available, it is vey easy to generate code that lets you query the structure using standard Linq operators. Runs through the metric result hierarchy and logs the metrics for each level and also verifies maintainability index and the cyclomatic complexity with the treshold values. The treshold values are defined in the build process template are are sent in as arguments to the custom activity If the treshold values are exceeded, the activity either fails or partially fails the current build. For more information about the structure of the code metrics result file, read Cameron Skinner's post about it. It is very simpe and easy to understand. I won’t go through the code of the custom activity here, since there is nothing special about it and it is available for download so you can look at it and play with it yourself. The treshold values for Maintainability Index and Cyclomatic Complexity is defined in the build process template, and can be modified per build definition: I have taken the default value for these settings from my colleague Terje Sandström post on Code Metrics - suggestions for approriate limits. You’ll notice that this is quite an improvement compared to using code metrics inside the IDE, where Red/Yellow/Green limits are fixed (and the default values are somewaht strange, see Terjes post for a discussion on this) This is the first version of the code metrics integration with TFS 2010 Build, I will proabably enhance the functionality and the logging (the “tree view” structure in the log becomes quite hard to read) soon. I will also consider adding it to the Community TFS Build Extensions site when it becomes a bit more mature. Another obvious improvement is to extend the data warehouse of TFS and push the metric results back to the warehouse and make it visible in the reports.

    Read the article

  • Availability Best Practices on Oracle VM Server for SPARC

    - by jsavit
    This is the first of a series of blog posts on configuring Oracle VM Server for SPARC (also called Logical Domains) for availability. This series will show how to how to plan for availability, improve serviceability, avoid single points of failure, and provide resiliency against hardware and software failures. Availability is a broad topic that has filled entire books, so these posts will focus on aspects specifically related to Oracle VM Server for SPARC. The goal is to improve Reliability, Availability and Serviceability (RAS): An article defining RAS can be found here. Oracle VM Server for SPARC Principles for Availability Let's state some guiding principles for availability that apply to Oracle VM Server for SPARC: Avoid Single Points Of Failure (SPOFs). Systems should be configured so a component failure does not result in a loss of application service. The general method to avoid SPOFs is to provide redundancy so service can continue without interruption if a component fails. For a critical application there may be multiple levels of redundancy so multiple failures can be tolerated. Oracle VM Server for SPARC makes it possible to configure systems that avoid SPOFs. Configure for availability at a level of resource and effort consistent with business needs. Effort and resource should be consistent with business requirements. Production has different availability requirements than test/development, so it's worth expending resources to provide higher availability. Even within the category of production there may be different levels of criticality, outage tolerances, recovery and repair time requirements. Keep in mind that a simple design may be more understandable and effective than a complex design that attempts to "do everything". Design for availability at the appropriate tier or level of the platform stack. Availability can be provided in the application, in the database, or in the virtualization, hardware and network layers they depend on - or using a combination of all of them. It may not be necessary to engineer resilient virtualization for stateless web applications applications where availability is provided by a network load balancer, or for enterprise applications like Oracle Real Application Clusters (RAC) and WebLogic that provide their own resiliency. It's (often) the same architecture whether virtual or not: For example, providing resiliency against a lost device path or failing disk media is done for the same reasons and may use the same design whether in a domain or not. It's (often) the same technique whether using domains or not: Many configuration steps are the same. For example, configuring IPMP or creating a redundant ZFS pool is pretty much the same within the guest whether you're in a guest domain or not. There are configuration steps and choices for provisioning the guest with the virtual network and disk devices, which we will discuss. Sometimes it is different using domains: There are new resources to configure. Most notable is the use of alternate service domains, which provides resiliency in case of a domain failure, and also permits improved serviceability via "rolling upgrades". This is an important differentiator between Oracle VM Server for SPARC and traditional virtual machine environments where all virtual I/O is provided by a monolithic infrastructure that itself is a SPOF. Alternate service domains are widely used to provide resiliency in production logical domains environments. Some things are done via logical domains commands, and some are done in the guest: For example, with Oracle VM Server for SPARC we provide multiple network connections to the guest, and then configure network resiliency in the guest via IP Multi Pathing (IPMP) - essentially the same as for non-virtual systems. On the other hand, we configure virtual disk availability in the virtualization layer, and the guest sees an already-resilient disk without being aware of the details. These blogs will discuss configuration details like this. Live migration is not "high availability" in the sense of "continuous availability": If the server is down, then you don't live migrate from it! (A cluster or VM restart elsewhere would be used). However, live migration can be part of the RAS (Reliability, Availability, Serviceability) picture by improving Serviceability - you can move running domains off of a box before planned service or maintenance. The blog Best Practices - Live Migration on Oracle VM Server for SPARC discusses this. Topics Here are some of the topics that will be covered: Network availability using IP Multipathing and aggregates Disk path availability using virtual disks defined with multipath groups ("mpgroup") Disk media resiliency configuring for redundant disks that can tolerate media loss Multiple service domains - this is probably the most significant item and the one most specific to Oracle VM Server for SPARC. It is very widely deployed in production environments as the means to provide network and disk availability, but it can be confusing. Subsequent articles will describe why and how to configure multiple service domains. Note, for the sake of precision: an I/O domain is any domain that has a physical I/O resource (such as a PCIe bus root complex). A service domain is a domain providing virtual device services to other domains; it is almost always an I/O domain too (so it can have something to serve). Resources Here are some important links; we'll be drawing on their content in the next several articles: Oracle VM Server for SPARC Documentation Maximizing Application Reliability and Availability with SPARC T5 Servers whitepaper by Gary Combs Maximizing Application Reliability and Availability with the SPARC M5-32 Server whitepaper by Gary Combs Summary Oracle VM Server for SPARC offers features that can be used to provide highly-available environments. This and the following blog entries will describe how to plan and deploy them.

    Read the article

  • Windows in StreamInsight: Hopping vs. Snapshot

    - by Roman Schindlauer
    Three weeks ago, we explained the basic concept of windows in StreamInsight: defining sets of events that serve as arguments for set-based operations, like aggregations. Today, we want to discuss the so-called Hopping Windows and compare them with Snapshot Windows. We will compare these two, because they can serve similar purposes with different behaviors; we will discuss the remaining window type, Count Windows, another time. Hopping (and its syntactic-sugar-sister Tumbling) windows are probably the most straightforward windowing concept in StreamInsight. A hopping window is defined by its length, and the offset from one window to the next. They are aligned with some absolute point on the timeline (which can also be given as a parameter to the window) and create sets of events. The diagram below shows an example of a hopping window with length of 1h and hop size (the offset) of 15 minutes, hence creating overlapping windows:   Two aspects in this diagram are important: Since this window is overlapping, an event can fall into more than one windows. If an (interval) event spans a window boundary, its lifetime will be clipped to the window, before it is passed to the set-based operation. That’s the default and currently only available window input policy. (This should only concern you if you are using a time-sensitive user-defined aggregate or operator.) The set-based operation will be applied to each of these sets, yielding a result. This result is: A single scalar value in case of built-in or user-defined aggregates. A subset of the input payloads, in case of the TopK operator. Arbitrary events, when using a user-defined operator. The timestamps of the result are almost always the ones of the windows. Only the user-defined  operator can create new events with timestamps. (However, even these event lifetimes are subject to the window’s output policy, which is currently always to clip to the window end.) Let’s assume we were calculating the sum over some payload field: var result = from window in source.HoppingWindow( TimeSpan.FromHours(1), TimeSpan.FromMinutes(15), HoppingWindowOutputPolicy.ClipToWindowEnd) select new { avg = window.Avg(e => e.Value) }; Now each window is reflected by one result event:   As you can see, the window definition defines the output frequency. No matter how many or few events we got from the input, this hopping window will produce one result every 15 minutes – except for those windows that do not contain any events at all, because StreamInsight window operations are empty-preserving (more about that another time). The “forced” output for every window can become a performance issue if you have a real-time query with many events in a wide group & apply – let me explain: imagine you have a lot of events that you group by and then aggregate within each group – classical streaming pattern. The hopping window produces a result in each group at exactly the same point in time for all groups, since the window boundaries are aligned with the timeline, not with the event timestamps. This means that the query output will become very bursty, delivering the results of all the groups at the same point in time. This becomes especially obvious if the events are long-lasting, spanning multiple windows each, so that the produced result events do not change their value very often. In such a case, a snapshot window can remedy. Snapshot windows are more difficult to explain than hopping windows: they represent those periods in time, when no event changes occur. In other words, if you mark all event start and and times on your timeline, then you are looking at all snapshot window boundaries:   If your events are never overlapping, the snapshot window will not make much sense. It is commonly used together with timestamp modification, which make it a very powerful tool. Or as Allan Mitchell expressed in in a recent tweet: “I used to look at SnapshotWindow() with disdain. Now she is my mistress, the one I turn to in times of trouble and need”. Let’s look at a simple example: I want to compute the average of some value in my events over the last minute. I don’t want this output be produced at fixed intervals, but at soon as it changes (that’s the true event-driven spirit!). The snapshot window will include all currently active event at each point in time, hence we need to extend our original events’ lifetimes into the future: Applying the Snapshot window on these events, it will appear to be “looking back into the past”: If you look at the result produced in this diagram, you can easily prove that, at each point in time, the current event value represents the average of all original input event within the last minute. Here is the LINQ representation of that query, applying the lifetime extension before the snapshot window: var result = from window in source .AlterEventDuration(e => TimeSpan.FromMinutes(1)) .SnapshotWindow(SnapshotWindowOutputPolicy.Clip) select new { avg = window.Avg(e => e.Value) }; With more complex modifications of the event lifetimes you can achieve many more query patterns. For instance “running totals” by keeping the event start times, but snapping their end times to some fixed time, like the end of the day. Each snapshot then “sees” all events that have happened in the respective time period so far. Regards, The StreamInsight Team

    Read the article

  • NDepend Evaluation: Part 3

    - by Anthony Trudeau
    NDepend is a Visual Studio add-in designed for intense code analysis with the goal of high code quality. NDepend uses a number of metrics and aggregates the data in pleasing static and active visual reports. My evaluation of NDepend will be broken up into several different parts. In the first part of the evaluation I looked at installing the add-in.  And in the last part I went over my first impressions including an overview of the features.  In this installment I provide a little more detail on a few of the features that I really like. Dependency Matrix The dependency matrix is one of the rich visual components provided with NDepend.  At a glance it lets you know where you have coupling problems including cycles.  It does this with number indicating the weight of the dependency and a color-coding that indicates the nature of the dependency. Green and blue cells are direct dependencies (with the difference being whether the relationship is from row-to-column or column-to-row).  Black cells are the ones that you really want to know about.  These indicate that you have a cycle.  That is, type A refers to type B and type B also refers to Type A. But, that’s not the end of the story.  A handy pop-up appears when you hover over the cell in question.  It explains the color, the dependency, and provides several interesting links that will teach you more than you want to know about the dependency. You can double-click the problem cells to explode the dependency.  That will show the dependencies on a method-by-method basis allowing you to more easily target and fix the problem.  When you’re done you can click the back button on the toolbar. Dependency Graph The dependency graph is another component provided.  It’s complementary to the dependency matrix, but it isn’t as easy to identify dependency issues using the window. On a positive note, it does provide more information than the matrix. My biggest issue with the dependency graph is determining what is shown.  This was not readily obvious.  I ended up using the navigation buttons to get an acceptable view.  I would have liked to choose what I see. Once you see the types you want you can get a decent idea of coupling strength based on the width of the dependency lines.  Double-arrowed lines are problematic and are shown in red.  The size of the boxes will be related to the metric being displayed.  This is controlled using the Box Size drop-down in the toolbar.  Personally, I don’t find the size of the box to be helpful, so I change it to Constant Font. One nice thing about the display is that you can see the entire path of dependencies when you hover over a type.  This is done by color-coding the dependencies and dependants.  It would be nice if selecting the box for the type would lock the highlighting in place. I did find a perhaps unintended work-around to the color-coding.  You can lock the color-coding in by hovering over the type, right-clicking, and then clicking on the canvas area to clear the pop-up menu.  You can then do whatever with it including saving it to an image file with the color-coding. CQL NDepend uses a code query language (CQL) to work with your code just like it was a database.  CQL cannot be confused with the robustness of T-SQL or even LINQ, but it represents an impressive attempt at providing an expressive way to enumerate and interrogate your code. There are two main windows you’ll use when working with CQL.  The CQL Query Explorer allows you to define what queries (rules) are run as part of a report – I immediately unselected rules that I don’t want in my results.  The CQL Query Edit window is where you can view or author your own rules.  The explorer window is pretty self-explanatory, so I won’t mention it further other than to say that any queries you author will appear in the custom group. Authoring your own queries is really hard to screw-up.  The Intellisense-like pop-ups tell you what you can do while making composition easy.  I was able to create a query within two minutes of playing with the editor.  My query warns if any types that are interfaces don’t start with an “I”. WARN IF Count > 0 IN SELECT TYPES WHERE IsInterface AND !NameLike “I” The results from the CQL Query Edit window are immediate. That fact makes it useful for ad hoc querying.  It’s worth mentioning two things that could make the experience smoother.  First, out of habit from using Visual Studio I expect to be able to scroll and press Tab to select an item in the list (like Intellisense).  You have to press Enter when you scroll to the item you want.  Second, the commands are case-sensitive.  I don’t see a really good reason to enforce that. CQL has a lot of potential not just in enforcing code quality, but also enforcing architectural constraints that your enterprise has defined. Up Next My next update will be the final part of the evaluation.  I will summarize my experience and provide my conclusions on the NDepend add-in. ** View Part 1 of the Evaluation ** ** View Part 2 of the Evaluation ** Disclaimer: Patrick Smacchia contacted me about reviewing NDepend. I received a free license in return for sharing my experiences and talking about the capabilities of the add-in on this site. There is no expectation of a positive review elicited from the author of NDepend.

    Read the article

  • Get non-overlapping dates ranges for prices history data

    - by Anonymouse
    Hello, Let's assume that I have the following table: CREATE TABLE [dbo].[PricesHist]( [Product] varchar NOT NULL, [Price] [float] NOT NULL, [StartDate] [datetime] NOT NULL, [EndDate] [datetime] NOT NULL ) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D2C00000000 AS DateTime), CAST(0x00009D2C00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D2D00000000 AS DateTime), CAST(0x00009D2D00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 2.5, CAST(0x00009D2E00000000 AS DateTime), CAST(0x00009D2E00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3000000000 AS DateTime), CAST(0x00009D3000000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3100000000 AS DateTime), CAST(0x00009D3100000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3400000000 AS DateTime), CAST(0x00009D3400000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 2.5, CAST(0x00009D3500000000 AS DateTime), CAST(0x00009D3500000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3600000000 AS DateTime), CAST(0x00009D3600000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3700000000 AS DateTime), CAST(0x00009D3700000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3800000000 AS DateTime), CAST(0x00009D3800000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3A00000000 AS DateTime), CAST(0x00009D3A00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3B00000000 AS DateTime), CAST(0x00009D3B00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 2.5, CAST(0x00009D3C00000000 AS DateTime), CAST(0x00009D3C00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3D00000000 AS DateTime), CAST(0x00009D3D00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3E00000000 AS DateTime), CAST(0x00009D3E00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D3F00000000 AS DateTime), CAST(0x00009D3F00000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D4100000000 AS DateTime), CAST(0x00009D4100000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D4200000000 AS DateTime), CAST(0x00009D4200000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 2.5, CAST(0x00009D4300000000 AS DateTime), CAST(0x00009D4300000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D4400000000 AS DateTime), CAST(0x00009D4400000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D4500000000 AS DateTime), CAST(0x00009D4500000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D4600000000 AS DateTime), CAST(0x00009D4600000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 4.9, CAST(0x00009D4800000000 AS DateTime), CAST(0x00009D4800000000 AS DateTime)) INSERT [dbo].[PricesHist] ([Product], [Price], [StartDate], [EndDate]) VALUES (N'Apples', 2.5, CAST(0x00009D4A00000000 AS DateTime), CAST(0x00009D4A00000000 AS DateTime)) As you can see, there are two prices on that month for Apples. 4.90 and 2.50. In order to tidy this table up, I need to get this information as a date range rather than a row per day as it currently is. I can obviously do this with Min and Max aggregates easily but the ranges overlap and other business code expect non-overlapping ranges. I also tried to achieve this with self joins and row_number(), but without much success... Here is what I'm trying to achieve as the output: Product | StartDate | EndDate | Price ------------------------------------------- Apples | 01 Mar 2010 | 02 Mar 2010 | 4.90 Apples | 03 Mar 2010 | 03 Mar 2010 | 2.50 Apples | 05 Mar 2010 | 09 Mar 2010 | 4.90 Apples | 10 Mar 2010 | 10 Mar 2010 | 2.50 Apples | 11 Mar 2010 | 16 Mar 2010 | 4.90 Apples | 17 Mar 2010 | 17 Mar 2010 | 2.50 Apples | 18 Mar 2010 | 23 Mar 2010 | 4.90 Apples | 24 Mar 2010 | 24 Mar 2010 | 2.50 Apples | 25 Mar 2010 | 30 Mar 2010 | 4.90 Apples | 31 Mar 2010 | 31 Mar 2010 | 2.50 What would please be the best approach to get this done? Thanks a lot in advance,

    Read the article

  • debugging JBoss 100% CPU usage

    - by NateS
    Originally posted on Server Fault, where it was suggested this question might better asked here. We are using JBoss to run two of our WARs. One is our web app, the other is our web service. The web app accesses a database on another machine and makes requests to the web service. The web service makes JMS requests to other machines, aggregates the data, and returns it. At our biggest client, about once a month the JBoss Java process takes 100% of all CPUs. The machine running JBoss has 8 CPUs. Our web app is still accessible during this time, however pages take about 3 minutes to load. Restarting JBoss restores everything to normal. The database machine and all the other machines are fine, only the machine running JBoss is affected. Memory usage is normal. Network utilization is normal. There are no suspect error messages in the JBoss logs. I have set up a test environment as close as possible to the client's production environment and I've done load testing with as much as 2x the number of concurrent users. I have not gotten my test environment to replicate the problem. Where do we go from here? How can we narrow down the problem? Currently the only plan we have is to wait until the problem occurs in production on its own, then do some debugging to determine the cause. So far people have just restarted JBoss when the problem occurred to minimize down time. Next time it happens they will get a developer to take a look. The question is, next time it happens, what can be done to determine the cause? We could setup a separate JBoss instance on the same box and install the web app separately from the web service. This way when the problem next occurs we will know which WAR has the problem (assuming it is our code). This doesn't narrow it down much though. Should I enable JMX remote? This way the next time the problem occurs I can connect with VisualVM and see which threads are taking the CPU and what the hell they are doing. However, is there a significant down side to enabling JMX remote in a production environment? Is there another way to see what threads are eating the CPU and to get a stacktrace to see what they are doing? Any other ideas? Thanks!

    Read the article

  • Django : presenting a form very different from the model and with multiple field values in a Django-

    - by sebpiq
    Hi ! I'm currently doing a firewall management application for Django, here's the (simplified) model : class Port(models.Model): number = models.PositiveIntegerField(primary_key=True) application = models.CharField(max_length=16, blank=True) class Rule(models.Model): port = models.ForeignKey(Port) ip_source = models.IPAddressField() ip_mask = models.IntegerField(validators=[MaxValueValidator(32)]) machine = models.ForeignKey("vmm.machine") What I would like to do, however, is to display to the user a form for entering rules, but with a very different organization than the model : Port 80 O Not open O Everywhere O Specific addresses : --------- delete field --------- delete field + add address field Port 443 ... etc Where Not open means that there is no rule for the given port, Everywhere means that there is only ONE rule (0.0.0.0/0) for the given port, and with specific addresses, you can add as many addresses as you want (I did this with JQuery), which will make as many rules. Now I did a version completely "handmade", meaning that I create the forms entirely in my templates, set input names with a prefix, and parse all the POSTed stuff in my view (which is quite painful, and means that there's no point in using a web framework). I also have a class which aggregates the rules together to easily pre-fill the forms with the informations "not open, everywhere, ...". I'm passing a list of those to the template, therefore it acts as an interface between my model and my "handmade" form : class MachinePort(object): def __init__(self, machine, port): self.machine = machine self.port = port @property def fully_open(self): for rule in self.port.rule_set.filter(machine=self.machine): if ipaddr.IPv4Network("%s/%s" % (rule.ip_source, rule.ip_mask)) == ipaddr.IPv4Network("0.0.0.0/0"): return True else : return False @property def partly_open(self): return bool(self.port.rule_set.filter(machine=self.machine)) and not self.fully_open @property def not_open(self): return not self.partly_open and not self.fully_open But all this is rather ugly ! Do anyone of you know if there is a classy way to do this ? In particular with the form... I don't know how to have a form that can have an undefined number of fields, neither how to transform these fields into Rule objects (because all the rule fields would have to be gathered from the form), neither how to save multiple objects... Well I could try to hack into the Form class, but seems like too much work for such a special case. Is there any nice feature I'm missing ?

    Read the article

  • Optimize date query for large child tables: GiST or GIN?

    - by Dave Jarvis
    Problem 72 child tables, each having a year index and a station index, are defined as follows: CREATE TABLE climate.measurement_12_013 ( -- Inherited from table climate.measurement_12_013: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass), -- Inherited from table climate.measurement_12_013: station_id integer NOT NULL, -- Inherited from table climate.measurement_12_013: taken date NOT NULL, -- Inherited from table climate.measurement_12_013: amount numeric(8,2) NOT NULL, -- Inherited from table climate.measurement_12_013: category_id smallint NOT NULL, -- Inherited from table climate.measurement_12_013: flag character varying(1) NOT NULL DEFAULT ' '::character varying, CONSTRAINT measurement_12_013_category_id_check CHECK (category_id = 7), CONSTRAINT measurement_12_013_taken_check CHECK (date_part('month'::text, taken)::integer = 12) ) INHERITS (climate.measurement) CREATE INDEX measurement_12_013_s_idx ON climate.measurement_12_013 USING btree (station_id); CREATE INDEX measurement_12_013_y_idx ON climate.measurement_12_013 USING btree (date_part('year'::text, taken)); (Foreign key constraints to be added later.) The following query runs abysmally slow due to a full table scan: SELECT count(1) AS measurements, avg(m.amount) AS amount FROM climate.measurement m WHERE m.station_id IN ( SELECT s.id FROM climate.station s, climate.city c WHERE -- For one city ... -- c.id = 5182 AND -- Where stations are within an elevation range ... -- s.elevation BETWEEN 0 AND 3000 AND 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 ) AND -- -- Begin extracting the data from the database. -- -- The data before 1900 is shaky; insufficient after 2009. -- extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND -- Whittled down by category ... -- m.category_id = 1 AND m.taken BETWEEN -- Start date. (extract( YEAR FROM m.taken )||'-01-01')::date AND -- End date. Calculated by checking to see if the end date wraps -- into the next year. If it does, then add 1 to the current year. -- (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date GROUP BY extract( YEAR FROM m.taken ) The sluggishness comes from this part of the query: m.taken BETWEEN /* Start date. */ (extract( YEAR FROM m.taken )||'-01-01')::date AND /* End date. Calculated by checking to see if the end date wraps into the next year. If it does, then add 1 to the current year. */ (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date The HashAggregate from the plan shows a cost of 10006220141.11, which is, I suspect, on the astronomically huge side. There is a full table scan on the measurement table (itself having neither data nor indexes) being performed. The table aggregates 237 million rows from its child tables. Question What is the proper way to index the dates to avoid full table scans? Options I have considered: GIN GiST Rewrite the WHERE clause Separate year_taken, month_taken, and day_taken columns to the tables What are your thoughts? Thank you!

    Read the article

  • Is a many-to-many relationship with extra fields the right tool for my job?

    - by whichhand
    Previously had a go at asking a more specific version of this question, but had trouble articulating what my question was. On reflection that made me doubt if my chosen solution was correct for the problem, so this time I will explain the problem and ask if a) I am on the right track and b) if there is a way around my current brick wall. I am currently building a web interface to enable an existing database to be interrogated by (a small number of) users. Sticking with the analogy from the docs, I have models that look something like this: class Musician(models.Model): first_name = models.CharField(max_length=50) last_name = models.CharField(max_length=50) dob = models.DateField() class Album(models.Model): artist = models.ForeignKey(Musician) name = models.CharField(max_length=100) class Instrument(models.Model): artist = models.ForeignKey(Musician) name = models.CharField(max_length=100) Where I have one central table (Musician) and several tables of associated data that are related by either ForeignKey or OneToOneFields. Users interact with the database by creating filtering criteria to select a subset of Musicians based on data the data on the main or related tables. Likewise, the users can then select what piece of data is used to rank results that are presented to them. The results are then viewed initially as a 2 dimensional table with a single row per Musician with selected data fields (or aggregates) in each column. To give you some idea of scale, the database has ~5,000 Musicians with around 20 fields of related data. Up to here is fine and I have a working implementation. However, it is important that I have the ability for a given user to upload there own annotation data sets (more than one) and then filter and order on these in the same way they can with the existing data. The way I had tried to do this was to add the models: class UserDataSets(models.Model): user = models.ForeignKey(User) name = models.CharField(max_length=100) description = models.CharField(max_length=64) results = models.ManyToManyField(Musician, through='UserData') class UserData(models.Model): artist = models.ForeignKey(Musician) dataset = models.ForeignKey(UserDataSets) score = models.IntegerField() class Meta: unique_together = (("artist", "dataset"),) I have a simple upload mechanism enabling users to upload a data set file that consists of 1 to 1 relationship between a Musician and their "score". Within a given user dataset each artist will be unique, but different datasets are independent from each other and will often contain entries for the same musician. This worked fine for displaying the data, starting from a given artist I can do something like this: artist = Musician.objects.get(pk=1) dataset = UserDataSets.objects.get(pk=5) print artist.userdata_set.get(dataset=dataset.pk) However, this approach fell over when I came to implement the filtering and ordering of query set of musicians based on the data contained in a single user data set. For example, I could easily order the query set based on all of the data in the UserData table like this: artists = Musician.objects.all().order_by(userdata__score) But that does not help me order by the results of a given single user dataset. Likewise I need to be able to filter the query set based on the "scores" from different user data sets (eg find all musicians with a score 5 in dataset1 and < 2 in dataset2). Is there a way of doing this, or am I going about the whole thing wrong?

    Read the article

  • SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

    - by pinaldave
    Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Five Ways Enterprise 2.0 Can Transform Your Business - Q&A from the Webcast

    - by [email protected]
    A few weeks ago, Vince Casarez and I presented with KMWorld on the Five Ways Enterprise 2.0 Can Transform Your Business. It was an enjoyable, interactive webcast in which Vince and I discussed the ways Enterprise 2.0 can transform your business and more importantly, highlighted key customer examples of how to do so. If you missed the webcast, you can catch a replay here. We had a lot of audience participation in some of the polls we conducted and in the Q&A session. We weren't able to address all of the questions during the broadcast, so we attempted to answer them here: Q: Which area within your firm focuses on Web 2.0? Meaning, do you find new departments developing just to manage the web 2.0 (Twitter, Facebook, etc.) user experience or are you structuring current departments? A: There are three distinct efforts within Oracle. The first is around delivery of these Web 2.0 services for enterprise deployments. This is the focus of the WebCenter team. The second effort is injecting these Web 2.0 services into use cases that drive the different enterprise applications. This effort is focused on how to manage these external services and bring them into a cohesive flow for marketing programs, customer care, and purchasing. The third effort is how we consume these services internally to enhance Oracle's business delivery. It leverages the technologies and use cases of the first two but also pushes the envelope with regards to future directions of these other two areas. Q: In a business, Web 2.0 is mostly like action logs. How can we leverage the official process practice versus the logs of a recent action? Example: a system configuration modified last night on a call out versus the official practice that everybody would use in the morning.A: The key thing to remember is that most Web 2.0 actions / activity streams today are based on collaboration and communication type actions. At least with public social sites like Facebook and Twitter. What we're delivering as part of the WebCenter Suite are not just these types of activities but also enterprise application activities. These enterprise application activities come from different application modules: purchasing, HR, order entry, sales opportunity, etc. The actions within these systems are normally tied to a business object or process: purchase order/customer, employee or department, customer and supplier, customer and product, respectively. Therefore, the activities or "logs" as you name them are able to be "typed" so that as a viewer, you can filter or decide to see only certain types of information. In your example, you could have a view that only showed you recent "configuration" changes and this could be right next to a view that showed off the items to be watched every morning. Q: It's great to hear about customers using the software but is there any plan for future webinars to show what the products/installs look like? That would be very helpful.A: We don't have a webinar planned to show off the install process. However, we have a viewlet that's posted on Oracle Technology Network. You can see it here:http://www.oracle.com/technetwork/testcontent/wcs-install-098014.htmlAnd we've got excellent documentation that walks you through the steps here:http://download.oracle.com/docs/cd/E14571_01/install.1111/e12001/install.htmAnd there's a whole set of demos and examples of what WebCenter can do at this URL:http://www.oracle.com/technetwork/middleware/webcenter/release11-demos-097468.html Q: How do you anticipate managing metadata across the enterprise to make content findable?A: We need to first make sure we are all talking about the same thing when we use a word like "metadata". Here's why...  For a developer, metadata means information that describes key elements of the portal or application and what the portal or application can do. For content systems, metadata means key terms that provide a taxonomy or folksonomy about the information that is being indexed, ordered, and managed. For business intelligence systems, metadata means key terms that provide labels to groups of data that most non-mathematicians need to understand. And for SOA, metadata means labels for parts of the processes that business owners should understand that connect development terminology. There are also additional requirements for metadata to be available to the team building these new solutions as well as requirements to make this metadata available to the running system. These requirements are often separated by "design time" and "run time" respectively. So clearly, a general goal of managing metadata across the enterprise is very challenging. We've invested a huge amount of resources around Oracle Metadata Services (MDS) to be able to provide a more generic system for all of these elements. No other vendor has anything like this technology foundation in their products. This provides a huge benefit to our customers as they will now be able to find content, processes, people, and information from a common set of search interfaces with consistent enterprise wide results. Q: Can you give your definition of terms as to document and content, please?A: Content applies to a broad category of information from Word documents, presentations and reports through attachments to invoices and/or purchase orders. Content is essentially any type of digital asset including images, video, and voice. A document is just one type of content. Q: Do you have special integration tools to realize an interaction between UCM and WebCenter Spaces/Services?A: Yes, we've dedicated a whole team of engineers to exploit the key features of Oracle UCM within WebCenter.  While ensuring that WebCenter can connect to other non-Oracle systems, we've made sure that with the combined set of Oracle technology, no other solution can match the combined power and integration.  This is part of the Oracle Fusion Middleware strategy which is to provide best in class capabilities for Content and Portals.  When combined together, the synergy between the two products enables users to quickly add capabilities when they are needed.  For example, simple document sharing is part of the combined product offering, but if legal discovery or archiving is required, Oracle UCM product includes these capabilities that can be quickly added.  There's no need to move content around or add another system to support this, it's just a feature that gets turned on within Oracle UCM. Q: All customers have some interaction with their applications and have many older versions, how do you see some of these new Enterprise 2.0 capabilities adding value to existing enterprise application deployments?A: Just as Service Oriented Architectures allowed for connecting the processes of different applications systems to work together, there's a need for a similar approach with regards to these enterprise 2.0 capabilities. Oracle WebCenter is built on a core architecture that allows for SOA of these Enterprise 2.0 services so that one set of scalable services can be used and integrated directly into any type of application. In this way, users can get immediate value out of the Enterprise 2.0 capabilities without having to wait for the next major release or upgrade. These centrally managed WebCenter services expose a set of standard interfaces that make it extremely easy to add them into existing applications no matter what technology the application has been implemented. Q: We've heard about Oracle Next Generation applications called "Fusion Applications", can you tell me how all this works together?A: Oracle WebCenter powers the core collaboration and social computing services found within Fusion Applications. It is the core user experience technology for how all the application screens have been implemented. And the core concept of task flows allows for all the Fusion Applications modules to be adaptable and composable by business users and IT without needing to be a professional developer. Oracle WebCenter is at the heart of the new Fusion Applications. In addition, the same patterns and technologies are now being added to the existing applications including JD Edwards, Siebel, Peoplesoft, and eBusiness Suite. The core technology enables all these customers to have a much smoother upgrade path to Fusion Applications. They get immediate benefits of injecting new user interactions into their existing applications without having to completely move to Fusion Applications. And then when the time comes, their users will already be well versed in how the new capabilities work. Q: Does any of this work with non Oracle software? Other databases? Other application servers? etc.A: We have made sure that Oracle WebCenter delivers the broadest set of development choices so that no matter what technology you developers are using, WebCenter capabilities can be quickly and easily added to the site or application. In addition, we have certified Oracle WebCenter to run against non-Oracle databases like DB2 and SQLServer. We have stated plans for certification against MySQL as well. Later in CY 2011, Oracle will provide certification on non-Oracle application servers such as WebSphere and JBoss. Q: How do we balance User and IT requirements in regards to Enterprise 2.0 technologies?A: Wrong decisions are often made because employee knowledge is not tapped efficiently and opportunities to innovate are often missed because the right people do not work together. Collaboration amongst workers in the right business context is critical for success. While standalone Enterprise 2.0 technologies can improve collaboration for collaboration's sake, using social collaboration tools in the context of business applications and processes will improve business responsiveness and lead companies to a more competitive position. As these systems become more mission critical it is essential that they maintain the highest level of performance and availability while scaling to support larger communities. Q: What are the ways in which Enterprise 2.0 can improve business responsiveness?A: With a wide range of Enterprise 2.0 tools in the marketplace, CIOs need to deploy solutions that will meet the requirements from users as well as address the requirements from IT. Workers want a next-generation user experience that is personalized and aggregates their daily tools and tasks, while IT needs to ensure the solution is secure, scalable, flexible, reliable and easily integrated with existing systems. An open and integrated approach to deploying portals, content management, and collaboration can enhance your business by addressing both the needs of knowledge workers for better information and the IT mandate to conserve resources by simplifying, consolidating and centralizing infrastructure and administration.  

    Read the article

  • Rebuilding CoasterBuzz, Part III: The architecture using the "Web stack of love"

    - by Jeff
    This is the third post in a series about rebuilding one of my Web sites, which has been around for 12 years. I hope to relaunch in the next month or two. More: Part I: Evolution, and death to WCF Part II: Hot data objects I finally hit a point in the re-do of CoasterBuzz where I feel like the major pieces are in place... rewritten, ported and what not, so that I can focus now on front-end design and more interesting creative problems. I've been asked on more than one occasion (OK, just twice) what's going on under the covers, so I figure this might be a good time to explain the overall architecture. As it turns out, I'm using a whole lof of the "Web stack of love," as Scott Hanselman likes to refer to it. Oh that Hanselman. First off, at the center of it all, is BizTalk. Just kidding. That's "enterprise architecture" humor, where every discussion starts with how they'll use BizTalk. Here are the bigger moving parts: It's fairly straight forward. A common library lives in a number of Web apps, all of which are (or will be) powered by ASP.NET MVC 4. They all talk to the same database. There is the main Web site, which also has the endpoint for the Silverlight-based Feed app. The cstr.bz site handles redirects, which are generated when news items are published and sent to Twitter. Facebook publishing is handled via the RSS Graffiti Facebook app. The API site handles requests from the Windows Phone app. The main site depends very heavily on POP Forums, the open source, MVC-based forum I maintain. It serves a number of functions, primarily handling users. These user objects serve in non-forum roles to handle things like news and database contributions, maintaining track records (coaster nerd for "list of rides I've been on") and, perhaps most importantly, paid club memberships. Before I get into more specifics, note that the "glue" for everything is Ninject, the dependency injection framework. I actually prefer StructureMap these days, but I started with Ninject in POP Forums a long time ago. POP Forums has a static class, PopForumsActivation, that new's up an instance of the container, and you can call it from where ever. The downside is that the forums require Ninject in your MVC app as the default dependency resolver. At some point, I'll decouple it, but for now it's not in the way. In the general sense, the entire set of apps follow a repository-service-controller-view pattern. Repos just do data access, service classes do business logic, controllers compose and route, views view. The forum also provides Scoring Game functionality. The Scoring Game is a reasonably abstract framework to award users points based on certain actions, and then award achievements when a certain number of point events happen. For example, the forum already awards a point when someone plus-one's a post you made. You can set up an achievement that says, "Give the user an award when they've had 100 posts plus'd." It also does zero-point entries into the ledger, so if you make a post, you could award an achievement based on 100 posts made. Wiring in the scoring game to CoasterBuzz functionality is just a matter of going to the Ninject container and getting an instance of the event publisher, and passing it events. Forum adapters were introduced into POP Forums a few versions ago, and they can intercept the model generated for forum topic lists and threads and designate an alternate view. These are used to make the "Day in Pictures" forum, where users can upload photos as frame-by-frame photo threads. Another adapter adds an association UI, so users can associate specific amusement parks with their trip report posts. The Silverlight-based Feed app talks to a simple JSON endpoint in the main app. This uses an underlying library I wrote ages ago, simply called Feeds, that aggregates event information. You inherit from a base class that creates instances of a publisher interface, and then use that class to send it an event type and any number of data fields. Feeds has two publishers: One is to the database, and that's used for the endpoint that talks to the Silverlight app. The second publisher publishes to Twitter, if the event is of the type "news." The wiring is a little strange, because for the new posts and topics events, I'm actually pulling out the forum repository classes from the Ninject container and replacing them with overridden methods to publish. I should probably be doing this at the service class level, but whatever. It's my mess. cstr.bz doesn't do anything interesting. It looks up the path, and if it has a match, does a 301 redirect to the long URL. The API site just serves up JSON for the Windows Phone app. The Windows Phone app is Silverlight, of course, and there isn't much to it. It does use the control toolkit, but beyond that, it relies on a simple class that creates a Webclient and calls the server for JSON to deserialize. The same class is now used by the Feed app, which used to use WCF. Simple is better. Data access in POP Forums is all straight SQL, because a lot of it was ported from the ASP.NET version. Most CoasterBuzz data access is handled by the Entity Framework, using the code-first model. The context class in this case does a lot of work to make sure that the table and key mapping works, since much of it breaks from the normal conventions of EF. One of the more powerful things you can do with EF, once you understand the little gotchas, is split tables by row into different entities. For example, a roller coaster photo has everything in the same row, including the metadata, the thumbnail bytes and the image itself. Obviously, if you want to get a list of photos to iterate over in a view, you don't want to get the image data. The use of navigation properties makes it easier to get just what you want. The front end includes Razor views in MVC, and jQuery is used for client-side goodness. I'm also using jQuery UI in a few places, for tabs, a dialog box and autocomplete. I'm also, tentatively, using jQuery Mobile. I've already ported most forum views to Mobile, but they need some work as v1.1 isn't finished yet. I'm not sure if I'll ship CoasterBuzz with mobile views or not yet. It's on the radar, but not something in my delivery criteria. That covers all of the big frameworks in play. Next time I hope to talk more about the front-end experience, which to me is where most of the fun is these days. Hoping to launch in the next month or two. Getting tired of looking at the old site!

    Read the article

< Previous Page | 1 2 3 4 5  | Next Page >