Search Results

Search found 946 results on 38 pages for 'surrogate pairs'.

Page 25/38 | < Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >

  • Awk or Sed: File Annotation

    - by lukmac
    Hallo, my SO friend, my question is: Specification: annotate the fields of FILE_2 to the corresponding position of FILE_1. A field is marked, and hence identified, by a delimiter pair. I did this job in python before I knew awk and sed, with a couple hundred lines of code. Now I want to see how powerful and efficient awk and sed can be. Show me some masterpiece of awk or sed, please! The delimiter pairs can be configured in FILE_3, but let's assume the first delimiter in a pair is 'Marker (number i) start', the other one is 'Marker (number i) done' Example: |-----------------FILE_1------------------| text text text text blabla Marker_1_start Marker_1_done any text in between blabla Marker_2_start Marker_2_done text text |-----------------FILE_2------------------| Marker_1_start 11 1111 Marker_1_done Marker_2_start 2222 22 Marker_2_done Expected Output: |-----------------FILE_Out------------------| text text text text blabla Marker_1_start 11 1111 Marker_1_done any text in between blabla Marker_2_start 2222 22 Marker_2_done text text

    Read the article

  • Fastest method in merging of the two: dicts vs lists

    - by tipu
    I'm doing some indexing and memory is sufficient but CPU isn't. So I have one huge dictionary and then a smaller dictionary I'm merging into the bigger one: big_dict = {"the" : {"1" : 1, "2" : 1, "3" : 1, "4" : 1, "5" : 1}} smaller_dict = {"the" : {"6" : 1, "7" : 1}} #after merging resulting_dict = {"the" : {"1" : 1, "2" : 1, "3" : 1, "4" : 1, "5" : 1, "6" : 1, "7" : 1}} My question is for the values in both dicts, should I use a dict (as displayed above) or list (as displayed below) when my priority is to use as much memory as possible to gain the most out of my CPU? For clarification, using a list would look like: big_dict = {"the" : [1, 2, 3, 4, 5]} smaller_dict = {"the" : [6,7]} #after merging resulting_dict = {"the" : [1, 2, 3, 4, 5, 6, 7]} Side note: The reason I'm using a dict nested into a dict rather than a set nested in a dict is because JSON won't let me do json.dumps because a set isn't key/value pairs, it's (as far as the JSON library is concerned) {"a", "series", "of", "keys"} Also, after choosing between using dict to a list, how would I go about implementing the most efficient, in terms of CPU, method of merging them? I appreciate the help.

    Read the article

  • Improve SQL query performance

    - by Anax
    I have three tables where I store actual person data (person), teams (team) and entries (athlete). The schema of the three tables is: In each team there might be two or more athletes. I'm trying to create a query to produce the most frequent pairs, meaning people who play in teams of two. I came up with the following query: SELECT p1.surname, p1.name, p2.surname, p2.name, COUNT(*) AS freq FROM person p1, athlete a1, person p2, athlete a2 WHERE p1.id = a1.person_id AND p2.id = a2.person_id AND a1.team_id = a2.team_id AND a1.team_id IN ( SELECT id FROM team, athlete WHERE team.id = athlete.team_id GROUP BY team.id HAVING COUNT(*) = 2 ) GROUP BY p1.id ORDER BY freq DESC Obviously this is a resource consuming query. Is there a way to improve it?

    Read the article

  • How to reverse a dictionary that it has repeated values (python)

    - by Galois
    Hi guys! So, I have a dictionary with almost 100,000 (key, values) pairs and the majority of the keys map to the same values. For example imagine something like that: dict = {'a': 1, 'c': 2, 'b': 1, 'e': 2, 'd': 3, 'h': 1, 'j': 3} What I want to do, is to reverse the dictionary so that each value in dict is going to be a key at the reverse_dict and is going to map to a list of all the dict.keys that used to map to that value at the dict. So based on the example above I would get: reversed_dict = {1: ['a', 'b', 'h'], 2:['e', 'c'] , 3:['d', 'j']} I came up with a solution that is very expensive and I would really want to hear any ideas more efficient than mine. my expensive solution: reversed_dict = {} for value in dict.values(): reversed_dict[value] = [] for key in dict.keys(): if dict[key] == value: if key not in reversed_dict[value]: reversed_dict[value].append(key) Output >> reversed_dict = {1: ['a', 'b', 'h'], 2: ['c', 'e'], 3: ['d', 'j']} I would really appreciate to hear any ideas better and more efficient than than mine. Thanks!

    Read the article

  • Python: Picking an element without replacement

    - by wpeters
    I would like to slice random letters from a string. Given s="howdy" I would like to pick elements from 's' without replacement but keep the index number. For example >>> random.sample(s,len(s)) ['w', 'h', 'o', 'd', 'y'] is close to what I want, but I would actually prefer something like [('w',2), ('h',0), ('o',1), ('d',3), ('y',4)] with letter-index pairs. This is important because the same letter appears in 's' more than once. ie) "letter" where 't' appears twice but I need to distinguish the first 't' from the 'second'. Ideally I actually only need to pick letters as I need them but scrambling and calculating all the letters in a list (as shown above) is ok.

    Read the article

  • Password generation, best practice

    - by Aidan
    I need to generate some passwords, I want to avoid characters that can be confused for each other. Is there a definitive list of characters I should avoid? my current list is il10o8B3Evu![]{} Are there any other pairs of characters that are easy to confuse? for special characters I was going to limit myself to those under the number keys, though I know that this differs depending on your keyboards nationality! As a rider question, I would like my passwords to be 'wordlike'do you have a favoured algorithm for that? Thanks :)

    Read the article

  • jQuery passing an array to an AJAX call

    - by Josh K
    I have an array: myarr = []; I'm filling it with some values: myarray['name'] = "Me!"; Now I want to transform that array into a set of Key = Value pairs. I though jQuery would do it automatically, but it doesn't seem to. $.ajax ({ type: "POST", dataType: "text", url: "myurl", data: myarr }); Is there a way to do this or something I'm doing wrong? I get no javascript errors, and no serverside errors other then no POST information at all.

    Read the article

  • Django: Converting an entire Model into a single dictionary

    - by LarrikJ
    Is there a good way in Django to convert an entire model to a dictionary? I mean, like this: class DictModel(models.Model): key = models.CharField(20) value = models.CharField(200) DictModel.objects.all().to_dict() ... with the result being a dictionary with the key/value pairs made up of records in the Model? Has anyone else seen this as being useful for them? Thanks. Update I just wanted to add is that my ultimate goal is to be able to do a simple variable lookup inside a Template. Something like: {{ DictModel.exampleKey }} With a result of DictModel.objects.get(key__exact=exampleKey).value Overall, though, you guys have really surprised me with how helpful allof your responses are, and how different the ways to approach it can be. Thanks a lot.

    Read the article

  • Updateable Priority Queue

    - by user1427661
    Is there anything built into the C++ Standard Library that allows me to work in a priority queue/heap like data structure (i.e., can always pop the highest value from the list, can define how the highest value is determined for custom classes, etc.) but allows me to update the keys in the heap? I'm dealing with fairly simple data, pairs to be exact, but I need to be able to update the value of a given key within the heap easily for my algorithm to function. WHat is the best way to achieve this in C++?

    Read the article

  • different brainpoolP521r1 parameters for flexiProvider. Why?

    - by user182846
    I am trying to generate ECC public private key pairs using flexiProvider. I have noticed that values for parameters like p and q are different in brainpoolP521r1 of flexiProvider than those which are specified in many sites. Values specified are Q= AADD.... but what I get is Q=8948... Any idea why flexiProvider does not use specified values and whether having different values is affectes security. I am new to ECC. Any help will be greatly appreciated.

    Read the article

  • In ruby is there detailed documentation / information on how "<<" can and should be used (see exampl

    - by kamal
    use a YML file, which has key , value pairs yml_hosts = YAML::load(File.open('hosts.yml')) ..... for each pair yml_hosts.each_pair {|key_hosts , value_hosts| ...... redirect to a String "value_hosts" value_hosts << "#{$.} #{line}" if line =~ /recoverable NFE/ Is there a better way of doing this, since i am using the condition: if ! value_hosts.empty? to do an action, like sending email, etc but value_hosts is never Empty so i always get an email, even though, i ONLY want top get an email, if line =~ /recoverable NFE/

    Read the article

  • Don’t miss rare Venus transit across Sun on June 5th. Once in a life time event.

    - by Gopinath
    Space lovers here is a rare event you don’t want to miss. On June 5th or 6th of 2012,depending on which part of the globe you live, the planet Venus will pass across Sun and it will not happen again until 2117. During the six hour long spectacular transit you can see the shadow of Venus cross Sun. The transit of Venus occurs in pairs eight years apart, with the previous one taking place in 2004. The next pair of transits occurs after 105.5 & 121.5 years later. The best place to watch the event would be a planetarium nearby with telescope facility. If not you watch it directly but must protect your eyes at all times with proper solar filters. Where can we see the transit? The transit of Venus is going to be clearly visible in Europe, Asia, United States and some part of Australia. Americans will be able to see transit in the evening of Tuesday, June 5, 2012. Eurasians and Africans can see the transit in the morning of June 6, 2012. At what time the event occurs? The principal events occurring during a transit are conveniently characterized by contacts, analogous to the contacts of an annular solar eclipse. The transit begins with contact I, the instant the planet’s disk is externally tangent to the Sun. Shortly after contact I, the planet can be seen as a small notch along the solar limb. The entire disk of the planet is first seen at contact II when the planet is internally tangent to the Sun. Over the course of several hours, the silhouetted planet slowly traverses the solar disk. At contact III, the planet reaches the opposite limb and once again is internally tangent to the Sun. Finally, the transit ends at contact IV when the planet’s limb is externally tangent to the Sun. Event Universal Time Contact I 22:09:38 Contact II 22:27:34 Greatest 01:29:36 Contact III 04:31:39 Contact IV 04:49:35   Transit of Venus animation Here is a nice video animation on the transit of Venus Map courtesy of Steven van Roode, source NASA

    Read the article

  • Craftsmanship Tour: Day 3 &amp; 4 8th Light

    - by Liam McLennan
    Thursday morning the Illinois public transport system came through for me again. I took the Metra train north from Union Station (which was seething with inbound commuters) to Prairie Crossing (Libertyville). At Prairie Crossing I met Paul and Justin from 8th Light and then Justin drove us to the office. The 8th Light office is in an small business park, in a semi-rural area, surrounded by ponds. Upstairs there are two spacious, open areas for developers. At one end of the floor is Doug Bradbury’s walk-and-code station; a treadmill with a desk and computer so that a developer can get exercise at work. At the other end of the floor is a hammock. This irregular office furniture is indicative of the 8th Light philosophy, to pursue excellence without being limited by conventional wisdom. 8th Light have a wall covered in posters, each illustrating one person’s software craftsmanship journey. The posters are a fascinating visualisation of the similarities and differences between each of our progressions. The first thing I did Thursday morning was to create my own poster and add it to the wall. Over two days at 8th Light I did some pairing with the 8th Lighters and we shared thoughts on software development. I am not accustomed to such a progressive and enlightened environment and I found the experience inspirational. At 8th Light TDD, clean code, pairing and kaizen are deeply ingrained in the culture. Friday, during lunch, 8th Light hosted a ‘lunch and learn’ event. Paul Pagel lead us through a coding exercise using micro-pomodori. We worked in pairs, focusing on the pedagogy of pair programming and TDD. After lunch I recorded this interview with Paul Pagel and Justin Martin. We discussed 8th light, craftsmanship, apprenticeships and the limelight framework. Interview with Paul Pagel and Justin Martin My time at Didit, Obtiva and 8th Light has convinced me that I need to give up some of my independence and go back to working in a team. Craftsmen advance their skills by learning from each other, and I can’t do that working at home by myself. The challenge is finding the right team, and becoming a part of it.

    Read the article

  • Equal Gifts Algorithm Problem

    - by 7Aces
    Problem Link - http://opc.iarcs.org.in/index.php/problems/EQGIFTS It is Lavanya's birthday and several families have been invited for the birthday party. As is customary, all of them have brought gifts for Lavanya as well as her brother Nikhil. Since their friends are all of the erudite kind, everyone has brought a pair of books. Unfortunately, the gift givers did not clearly indicate which book in the pair is for Lavanya and which one is for Nikhil. Now it is up to their father to divide up these books between them. He has decided that from each of these pairs, one book will go to Lavanya and one to Nikhil. Moreover, since Nikhil is quite a keen observer of the value of gifts, the books have to be divided in such a manner that the total value of the books for Lavanya is as close as possible to total value of the books for Nikhil. Since Lavanya and Nikhil are kids, no book that has been gifted will have a value higher than 300 Rupees... For the problem, I couldn't think of anything except recursion. The code I wrote is given below. But the problem is that the code is time-inefficient and gives TLE (Time Limit Exceeded) for 9 out of 10 test cases! What would be a better approach to the problem? Code - #include<cstdio> #include<climits> #include<algorithm> using namespace std; int n,g[150][2]; int diff(int a,int b,int f) { ++f; if(f==n) { if(a>b) { return a-b; } else { return b-a; } } return min(diff(a+g[f][0],b+g[f][1],f),diff(a+g[f][1],b+g[f][0],f)); } int main() { int i; scanf("%d",&n); for(i=0;i<n;++i) { scanf("%d%d",&g[i][0],&g[i][1]); } printf("%d",diff(g[0][0],g[0][1],0)); } Note - It is just a practice question, & is not part of a competition.

    Read the article

  • Would someone please explain Octree Collisions to me?

    - by A-Type
    I've been reading everything I can find on the subject and I feel like the pieces are just about to fall into place, but I just can't quite get it. I'm making a space game, where collisions will occur between planets, ships, asteroids, and the sun. Each of these objects can be subdivided into 'chunks', which I have implemented to speed up rendering (the vertices can and will change often at runtime, so I've separated the buffers). These subdivisions also have bounding primitives to test for collision. All of these objects are made of blocks (yeah, it's that kind of game). Blocks can also be tested for rough collisions, though they do not have individual bounding primitives for memory reasons. I think the rough testing seems to be sufficient, though. So, collision needs to be fairly precise; at block resolution. Some functions rely on two blocks colliding. And, of course, attacking specific blocks is important. Now what I am struggling with is filtering my collision pairs. As I said, I've read a lot about Octrees, but I'm having trouble applying it to my situation as many tutorials are vague with very little code. My main issues are: Are Octrees recalculated each frame, or are they stored in memory and objects are shuffled into different divisions as they move? Despite all my reading I still am not clear on this... the vagueness of it all has been frustrating. How far do Octrees subdivide? Planets in my game are quite large, while asteroids are smaller. Do I subdivide to the size of the planet, or asteroid (where planet is in multiple divisions)? Or is the limit something else entirely, like number of elements in the division? Should I load objects into the octrees as 'chunks' or in the whole, then break into chunks later? This could be specific to my implementation, I suppose. I was going to ask about how big my root needed to be, but I did manage to find this question, and the second answer seems sufficient for me. I'm afraid I don't really get what he means by adding new nodes and doing subdivisions upon adding new objects, probably because I'm confused about whether the tree is maintained in memory or recalculated per-frame.

    Read the article

  • Is MongoDB a good choice or not for my application?

    - by shubham
    I have a Reporting application which stores the reports in xml format as recieved from source (XML schema is not defined, it can be any format) and those reports contain some keys and values. Like jobid, setid be keys for 1 type of report and userid, groupId for another type of report etc. The type of keys that can be referred from the document is determined by the namespaces used in the xml doc. These keys are stored on the basis of namespace used in the xml document. For e.g. If a tag in xml fragment uses namespace= "myspace1", then I have keys A and B for myspace1 stored in another table. It will fetch those keys from that table for this namespace, look for their values in xml doc and store it in another table along with the pointer to this xml document (Id of a record storing complete xml document in a cell). Use cases: When the user comes and queries for that key and value, I return the document or a set of documents that are having those key/value pairs. When the user comes and queries for a certain key and provide a name for xslt (pre stored), I fetch the set of documents fulfilling that criteria and convert that xml to html with the specified xslt. When the user comes and asks for a particular fragment of a doc then it can fetch a subset from a particular document also. When the user comes and queries for top x values of a certain key, I return the set of documents that are having top 10 values of that key. I am using DB2 database for its support of xml along with relational capabilities. That makes easier for me to run xpath expressions and fetch values of keys and also aggregate a set of documents fullfilling a criteria, all on the database side. Problems: DB2 stores XML doc of upto 2GB in size. Retrieval is very slow. If some thing involves many documents, then it takes significant time for things to show up in browser, and the user has to wait. Can MongoDb help in this case, as it is document oriented? can I do xml related xpath queries and document transformations on db side? Or is it ok to use both in such a case?

    Read the article

  • Running Built-In Test Simulator with SOA Suite Healthcare 11g in PS4 and PS5

    - by Shub Lahiri, A-Team
    Background SOA Suite for Healthcare Integration pack comes with a pre-installed simulator that can be used as an external endpoint to generate inbound and outbound HL7 traffic on specified MLLP ports. This is a command-line utility that can be very handy when trying to build a complete end-to-end demo within a standalone, closed environment. The ant-based utility accepts the name of a configuration file as the command-line input argument. The format of this configuration file has changed between PS4 and PS5. In PS4, the configuration file was XML based and in PS5, it is name-value property based. The rest of this note highlights these differences and provides samples that can be used to run the first scenario from the product samples set. PS4 - Configuration File The sample configuration file for PS4 is shown below. The configuration file contains information about the following items: Directory for incoming and outgoing files for the host running SOA Suite Healthcare Polling Interval for the directory External Endpoint Logical Names External Endpoint Server Host Name and Ports Message throughput to be simulated for generating outbound messages Documents to be handled by different endpoints A copy of this file can be downloaded from here. PS5 - Configuration File The corresponding sample configuration file for PS5 is shown below. The configuration file contains similar information about the sample scenario but is not in XML format. It has name-value pairs specified in the form of a properties file. This sample file can be downloaded from here. Simulator Configuration Before running the simulator, the environment has to be set by defining the proper ANT_HOME and JAVA_HOME. The following extract is taken from a working sample shell script to set the environment: Also, as a part of setting the environment, template jndi.properties and logging.properties can be generated by using the following ant command: ant -f ant-b2bsimulator-util.xml b2bsimulator-prop Sample jndi.properties and logging.properties are shown below and can be modified, as needed. The jndi.properties contains information about connectivity to the local Weblogic Managed Server instance and the logging.properties file controls the amount of logging that can be generated from the running simulator process. Simulator Usage - Start and Stop The command syntax to launch the simulator via ant is the same in PS4 and PS5. Only the appropriate configuration file has to be supplied as the command-line argument, for example: ant -f ant-b2bsimulator-util.xml b2bsimulatorstart -Dargs="simulator1.hl7-config.xml" This will start the simulator and will keep running to provide an active external endpoint for SOA Healthcare Integration engine. To stop the simulator, a similar ant command can be used, for example: ant -f ant-b2bsimulator-util.xml b2bsimulatorstop

    Read the article

  • Nhibernate: One-To-Many mapping problem - Cannot cascade delete without inverse. Set NULL error

    - by KnaveT
    Hi, I have the current scenario whereby an Article has only 1 Outcome each. Each Article may or may not have an Outcome. In theory, this is a one-to-one mapping, but since NHibernate does not really support one-to-one, I used a One-To-Many to substitute. My Primary Key on the child table is actually the ArticleID (FK). So I have the following setup: Classes public class Article { public virtual Int32 ID { get;set;} private ICollection<ArticleOutcome> _Outcomes {get;set;} public virtual ArticleOutcome Outcome { get { if( this._Outcomes !=null && this._Outcomes.Count > 0 ) return this._Outcomes.First(); return null; } set { if( value == null ) { if( this._Outcomes !=null && this._Outcomes.Count > 0 ) this._Outcomes.Clear(); } else { if( this._Outcomes == null ) this._Outcomes = new HashSet<ArticleOutcome>(); else if ( this._Outcomes.Count >= 1 ) this._Outcomes.Clear(); this._Outcomes.Add( value ); } } } } public class ArticleOutcome { public virtual Int32 ID { get;set; } public virtual Article ParentArticle { get;set;} } Mappings public class ArticleMap : ClassMap<Article> { public ArticleMap() { this.Id( x=> x.ID ).GeneratedBy.Identity(); this.HasMany<ArticleOutcome>( Reveal.Property<Article>("_Outcomes") ) .AsSet().KeyColumn("ArticleID") .Cascade.AllDeleteOrphan() //Cascade.All() doesn't work too. .LazyLoad() .Fetch.Select(); } } public class ArticleOutcomeMap : ClassMap<ArticleOutcome> { public ArticleOutcomeMap(){ this.Id( x=> x.ID, "ArticleID").GeneratedBy.Foreign("ParentArticle"); this.HasOne( x=> x.ParentArticle ).Constrained (); //This do not work also. //this.References( x=> x.ParentArticle, "ArticleID" ).Not.Nullable(); } } Now my problem is this: It works when I do an insert/update of the Outcome. e.g. var article = new Article(); article.Outcome = new ArticleOutcome { xxx = "something" }; session.Save( article ); However, I encounter SQL errors when attempting to delete via the Article itself. var article = session.Get( 123 ); session.Delete( article ); //throws SQL error. The error is something to the like of Cannot insert NULL into ArticleID in ArticleOutcome table. The deletion works if I place Inverse() to the Article's HasMany() mapping, but insertion will fail. Does anyone have a solution for this? Or do I really have to add a surrogate key to the ArticleOutcome table?

    Read the article

  • please explain NHibernate HiLo

    - by Ben
    I'm struggling to get my head round how the HiLo generator works in NHibernate. I've read the explanation here which made things a little clearer. My understanding is that each SessionFactory retrieves the high value from the database. This improves performance because we have access to IDs without hitting the database. The explanation from the above link also states: For instance, supposing you have a "high" sequence with a current value of 35, and the "low" number is in the range 0-1023. Then the client can increment the sequence to 36 (for other clients to be able to generate keys while it's using 35) and know that keys 35/0, 35/1, 35/2, 35/3... 35/1023 are all available. How does this work in a web application as don't I only have one SessionFactory and therefore one hi value. Does this mean that in a disconnected application you can end up with duplicate (low) ids in your entity table? In my tests I used these settings: <id name="Id" unsaved-value="0"> <generator class="hilo"/> </id> I ran a test to save 100 objects. The IDs in my table went from 32768 - 32868. The next hi value was incremented to 2. Then I ran my test again and the Ids were in the range 65536 - 65636. First off, why start at 32768 and not 1, and secondly why the jump from 32868 to 65536? Now I know that my surrogate keys shouldn't have any meaning but we do use them in our application. Why can't I just have them increment nicely like a SQL Server identity field would. Finally can someone give me an explanation of how the max_lo parameter works? Is this the maximum number of low values (entity ids in my head) that can be created against the high value? This is one topic in NHibernate that I have struggled to find documentation for. I read the entire NHibernate in action book and it still doesn't go into how this works in any detail. Thanks Ben

    Read the article

  • What database table structure should I use for versions, codebases, deployables?

    - by Zac Thompson
    I'm having doubts about my table structure, and I wonder if there is a better approach. I've got a little database for version control repositories (e.g. SVN), the packages (e.g. Linux RPMs) built therefrom, and the versions (e.g. 1.2.3-4) thereof. A given repository might produce no packages, or several, but if there are more than one for a given repository then a particular version for that repository will indicate a single "tag" of the codebase. A particular version "string" might be used to tag a version of the source code in more than one repository, but there may be no relationship between "1.0" for two different repos. So if packages P and Q both come from repo R, then P 1.0 and Q 1.0 are both built from the 1.0 tag of repo R. But if package X comes from repo Y, then X 1.0 has no relationship to P 1.0. In my (simplified) model, I have the following tables (the x_id columns are auto-incrementing surrogate keys; you can pretend I'm using a different primary key if you wish, it's not really important): repository - repository_id - repository_name (unique) ... version - version_id - version_string (unique for a particular repository) - repository_id ... package - package_id - package_name (unique) - repository_id ... This makes it easy for me to see, for example, what are valid versions of a given package: I can join with the version table using the repository_id. However, suppose I would like to add some information to this database, e.g., to indicate which package versions have been approved for release. I certainly need a new table: package_version - version_id - package_id - package_version_released ... Again, the nature of the keys that I use are not really important to my problem, and you can imagine that the data column is "promotion_level" or something if that helps. My doubts arise when I realize that there's really a very close relationship between the version_id and the package_id in my new table ... they must share the same repository_id. Only a small subset of package/version combinations are valid. So I should have some kind of constraint on those columns, enforcing that ... ... I don't know, it just feels off, somehow. Like I'm including somehow more information than I really need? I don't know how to explain my hesitance here. I can't figure out which (if any) normal form I'm violating, but I also can't find an example of a schema with this sort of structure ... not being a DBA by profession I'm not sure where to look. So I'm asking: am I just being overly sensitive?

    Read the article

  • SSIS - XML Source Script

    - by simonsabin
    The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml");   foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer")                        from order in customer.Elements("Orders").Elements("Order")                        select new { Account = customer.Attribute("AccountNumber").Value                                   , OrderDate = order.Attribute("OrderDate").Value }                        )) {     Output0Buffer.AddRow();     Output0Buffer.AccountNumber = xdata.Account;     Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer")                                from order in customer.Elements("Orders").Elements("Order")                                select new { Account = customer.Attribute("AccountNumber").Value                                           , OrderDate = order.Attribute("OrderDate").Value }                                ))         {             Output0Buffer.AddRow();             Output0Buffer.AccountNumber = xdata.Account;             Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate);         } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) {     using (XmlReader xr = XmlReader.Create(filename))     {         xr.MoveToContent();         while (xr.Read()) //Reads the first element         {             while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName)             {                 XElement node = (XElement)XElement.ReadFrom(xr);                   yield return node;             }         }         xr.Close();     } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.        

    Read the article

  • SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

    - by pinaldave
    Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • error reading keytab file krb5.keytab

    - by Banjer
    I've noticed these kerberos keytab error messages on both SLES 11.2 and CentOS 6.3: sshd[31442]: pam_krb5[31442]: error reading keytab 'FILE: / etc/ krb5. keytab' /etc/krb5.keytab does not exist on our hosts, and from what I understand of the keytab file, we don't need it. Per this kerberos keytab introduction: A keytab is a file containing pairs of Kerberos principals and encrypted keys (these are derived from the Kerberos password). You can use this file to log into Kerberos without being prompted for a password. The most common personal use of keytab files is to allow scripts to authenticate to Kerberos without human interaction, or store a password in a plaintext file. This sounds like something we do not need and is perhaps better security-wise to not have it. How can I keep this error from popping up in our system logs? Here is my krb5.conf if its useful: banjer@myhost:~> cat /etc/krb5.conf # This file managed by Puppet # [libdefaults] default_tkt_enctypes = RC4-HMAC DES-CBC-MD5 DES-CBC-CRC default_tgs_enctypes = RC4-HMAC DES-CBC-MD5 DES-CBC-CRC preferred_enctypes = RC4-HMAC DES-CBC-MD5 DES-CBC-CRC default_realm = FOO.EXAMPLE.COM dns_lookup_kdc = true clockskew = 300 [logging] default = SYSLOG:NOTICE:DAEMON kdc = FILE:/var/log/kdc.log kadmind = FILE:/var/log/kadmind.log [appdefaults] pam = { ticket_lifetime = 1d renew_lifetime = 1d forwardable = true proxiable = false retain_after_close = false minimum_uid = 0 debug = false banner = "Enter your current" } Let me know if you need to see any other configs. Thanks. EDIT This message shows up in /var/log/secure whenever a non-root user logs in via SSH or the console. It seems to only occur with password-based authentication. If I do a key-based ssh to a server, I don't see the error. If I log in with root, I do not see the error. Our Linux servers authenticate against Active Directory, so its a hearty mix of PAM, samba, kerberos, and winbind that is used to authenticate a user.

    Read the article

  • 3GB RAM Installed and Detected by BIOS, Windows Vista 32bit Only Sees 2GB

    - by Nathan Taylor
    I am attempting to install more RAM on a Windows Vista 32bit machine which is using a X6DAL-XG motherboard and the RAM amount reported in the BIOS is 3GB+, but Windows is only reporting 2GB installed. The motherboard has 6 RAM bays which I have populated with various combinations of 4 1GB sticks, and 2 512mb sticks, but no matter how I configure them Windows doesn't see more than 2GB. I realize of course 32-bit Windows has a 3gb cap on memory, but that doesn't explain why it will only report 2GB when there are in fact (currently) 5GB installed. I should think I would be able to see at least 3GB. According to the spec list for the motherboard the minimum RAM requirements are DDR333/266mhz installed in pairs. I have done this exactly, and the BIOS isn't reporting any problems at POST. RAM Configuration (according to CPU-Z): Slot #1: Kingston 128mx72D266C25 - 1024mb PC2100 (133mhz) Slot #2: Kingston KVR266X72RC25/1024 - 1024mb PC2100 (133mhz) Slot #3: PQI - 512mb PC2700 (166mhz) Slot #4: Kingston 128mx72D266C25 - 1024mb PC2100 (133mhz) Slot #5: Kingston KVR266X72RC25/1024 - 1024mb PC2100 (133mhz) Slot #6: PQI - 512mb PC2700 (166mhz) I'm not sure if memory specs above conflict with this statement in the motherboard manual or not: Memory Support The X6DAL-XG supports up to 12GB/24GB of registered ECC DDR333/266 (PC2700/PC2100) memory. The motherboard was designed to support 4GB (PC2100) modules in each slot, but only the 2GB modules have been tested. When using registered ECC DDR333 (PC2700) memory, installing four pieces of double-banked memory or six pieces of single-banked memory is supported. So, am I doing something wrong with the RAM I have now, or is there some sort of compatibility problem which I am missing? Thanks!

    Read the article

  • WS2008 NTP - Using time.windows.com,0x9 - Time always skewed forwards

    - by David
    I have a domain controller configured to use time.windows.com (with 0x09 flags set). I've noticed that frequently the systems' clock is fast - it varies from 10 minutes to even 45 minutes. I always have to keep resetting the system date/time back to what it should be. When I run "w32tm /query /source" it tells me it's using time.windows.com, and obviously I trust Microsoft not to serve incorrect times, but why is my server's clock fast? EDIT: There are a few Time-Service events in the System log: Event ID: 142 Message: The time service has stopped advertising as a time source because the local clock is not synchronized. Event ID: 139 Message: The time service has started advertising as a time source. These two messages appear in pairs every hour or so. Event 142 appears 14 to 16 minutes after 139 appears. Going back a few months, these events appear: Event ID: 35 Message: The time service is now synchronizing the system time with the time source time.windows.com,0x9 (ntp.m|0x9|0.0.0.0:123-65.55.21.21:123). Event ID: 37 Message: The time provider NtpClient is currently receiving valid time data from time.windows.com,0x9 (ntp.m|0x9|0.0.0.0:123-65.55.21.21:123). Event ID: 47 Message: Time Provider NtpClient: No valid response has been received from manually configured peer time.windows.com,0x9 after 8 attempts to contact it. This peer will be discarded as a time source and NtpClient will attempt to discover a new peer with this DNS name. The error was: The time sample was rejected because: The peer is not synchronized, or it has been too long since the peer's last synchronization. These three events only appear once in the log, back in October.

    Read the article

< Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >