duplicate - Page 24 - Developer IT

Traversing ORM relationships returns duplicate results

- by NKing253

I have 4 tables -- store, catalog_galleries, catalog_images, and catalog_financials. When I traverse the relationship from store --> catalog_galleries --> catalog_images in other words: store.getCatalogGallery().getCatalogImages() I get duplicate records. Does anyone know what could be the cause of this? Any suggestions on where to look? The store table has a OneToOne relationship with catalog_galleries which in turn has a OneToMany relationship with catalog_images and an eager fetch type. The store table also has a OneToMany relationship with catalog_financials.

Read the article

ASP.NET MVC2 Radio Button generates duplicate HTML id-s

- by Dmitriy Nagirnyak

Hi, It seems that the default ASP.NET MVC2 Html helper generates duplicate HTML IDs when using code like this (EditorTemplates/UserType.ascx): <%@ Control Language="C#" Inherits="System.Web.Mvc.ViewUserControl<UserType>" %> <%: Html.RadioButton("", UserType.Primary, Model == UserType.Primary) %> <%: Html.RadioButton("", UserType.Standard, Model == UserType.Standard) %> <%: Html.RadioButton("", UserType.ReadOnly, Model == UserType.ReadOnly) %> The HTML it produces is: <input checked="checked" id="UserType" name="UserType" type="radio" value="Primary" /> <input id="UserType" name="UserType" type="radio" value="Standard" /> <input id="UserType" name="UserType" type="radio" value="ReadOnly" /> That clearly shows a problem. So I must be misusing the Helper or something. I can manually specify the id as html attribute but then I cannot guarantee it will be unique. So the question is how to make sure that the IDs generated by RadioButton helper are unique for each value and still preserve the conventions for generating those IDs (so nested models are respected? (Preferably not generating IDs manually.) Thanks, Dmitriy,

Read the article

How to prevent duplicate records being inserted with SqlBulkCopy when there is no primary key

- by kscott

I receive a daily XML file that contains thousands of records, each being a business transaction that I need to store in an internal database for use in reporting and billing. I was under the impression that each day's file contained only unique records, but have discovered that my definition of unique is not exactly the same as the provider's. The current application that imports this data is a C#.Net 3.5 console application, it does so using SqlBulkCopy into a MS SQL Server 2008 database table where the columns exactly match the structure of the XML records. Each record has just over 100 fields, and there is no natural key in the data, or rather the fields I can come up with making sense as a composite key end up also having to allow nulls. Currently the table has several indexes, but no primary key. Basically the entire row needs to be unique. If one field is different, it is valid enough to be inserted. I looked at creating an MD5 hash of the entire row, inserting that into the database and using a constraint to prevent SqlBulkCopy from inserting the row,but I don't see how to get the MD5 Hash into the BulkCopy operation and I'm not sure if the whole operation would fail and roll back if any one record failed, or if it would continue. The file contains a very large number of records, going row by row in the XML, querying the database for a record that matches all fields, and then deciding to insert is really the only way I can see being able to do this. I was just hoping not to have to rewrite the application entirely, and the bulk copy operation is so much faster. Does anyone know of a way to use SqlBulkCopy while preventing duplicate rows, without a primary key? Or any suggestion for a different way to do this?

Read the article

JPA merge fails due to duplicate key

- by wobblycogs

I have a simple entity, Code, that I need to persist to a MySQL database. public class Code implements Serializable { @Id private String key; private String description; ...getters and setters... } The user supplies a file full of key, description pairs which I read, convert to Code objects and then insert in a single transaction using em.merge(code). The file will generally have duplicate entries which I deal with by first adding them to a map keyed on the key field as I read them in. A problem arises though when keys differ only by case (for example: XYZ and XyZ). My map will, of course, contain both entries but during the merge process MySQL sees the two keys as being the same and the call to merge fails with a MySQLIntegrityConstraintViolationException. I could easily fix this by uppercasing the keys as I read them in but I'd like to understand exactly what is going wrong. The conclusion I have come to is that JPA considers XYZ and XyZ to be different keys but MySQL considers them to be the same. As such when JPA checks its list of known keys (or does whatever it does to determine whether it needs to perform an insert or update) it fails to find the previous insert and issuing another which then fails. Is this corrent? Is there anyway round this other than better filtering the client data? I haven't defined .equals or .hashCode on the Code class so perhaps this is the problem.

Read the article

NHibernate returning duplicate object in child collections when using Fetch

- by UpTheCreek

When doing a query like this (using Nhibernate 2.1.2): ICriteria criteria = session.CreateCriteria<MyRootType>() .SetFetchMode("ChildCollection1", FetchMode.Eager) .SetFetchMode("ChildCollection2", FetchMode.Eager) .Add(Restrictions.IdEq(id)); I am getting multiple duplicate objects in some cartesian fashion. E.g. if ChildCollection1 has 3 elements, and ChildColection2 has 2 elements then I get results with each element in ChildColection1 one duplicated, and each element in ChildColection2 triplicated! This was a bit of a WTF moment for me... So how to do this correctly? Is using SetFetchMode like this only supported when specifying one collection? Am I just using it wrong (I've seen some references to results transformers, but imagined this would be simplier). Is this something that's different in NH3? Update: As per Felice's suggestion, I tried using the DistinctRootEntity transformer, but this is still returning duplicates. Code: ICriteria criteria = session.CreateCriteria<MyRootType>() .SetFetchMode("ChildCollection1", FetchMode.Eager) .SetFetchMode("ChildCollection2", FetchMode.Eager) .Add(Restrictions.IdEq(id)); criteria.SetResultTransformer(Transformers.DistinctRootEntity); return criteria.UniqueResult<MyRootType>();

Read the article

How can I remove duplicate nodes in XQuery?

- by Brabster

I have an XML document I generate on the fly, and I need a function to eliminate any duplicate nodes from it. My function looks like: declare function local:start2() { let $data := local:scan_books() return <books>{$data}</books> }; Sample output is: <books> <book> <title>XML in 24 hours</title> <author>Some Guy</author> </book> <book> <title>XML in 24 hours</title> <author>Some Guy</author> </book> </books> I want just the one entry in my books root tag, and there are other tags, like say pamphlet in there too that need to have duplicates removed. Any ideas? Updated following comments. By unique nodes, I mean remove multiple occurrences of nodes that have the exact same content and structure.

Read the article

Duplicate method 'ProcessRequest' in ASPX

- by Mauricio Scheffer

I'm trying to code ASP.NET MVC views (WebForms view engine) in F#. I can already write regular ASP.NET WebForms ASPX and it works ok, e.g. <%@ Page Language="F#" %> <% for i in 1..2 do %> <%=sprintf "%d" i %> so I assume I have everything in my web.config correctly set up. However, when I make the page inherit from ViewPage: <%@ Page Language="F#" Inherits="System.Web.Mvc.ViewPage" %> I get this error: Compiler Error Message: FS0442: Duplicate method. The abstract method 'ProcessRequest' has the same name and signature as an abstract method in an inherited type. The problem seems to be this piece of code generated by the F# CodeDom provider: [<System.Diagnostics.DebuggerNonUserCodeAttribute>] abstract ProcessRequest : System.Web.HttpContext -> unit [<System.Diagnostics.DebuggerNonUserCodeAttribute>] default this.ProcessRequest (context:System.Web.HttpContext) = let mutable context = context base.ProcessRequest(context) |> ignore when I change the Page directive to use C# instead, the generated code is: [System.Diagnostics.DebuggerNonUserCodeAttribute()] public new virtual void ProcessRequest(System.Web.HttpContext context) { base.ProcessRequest(context); } which of course works fine and AFAIK is not semantically the same as the generated F# code. I'm using .NET 4.0.30319.1 (RTM) and MVC 2 RTM

Read the article

Duplicate column name by JPA with @ElementCollection and @Inheritance

- by gerry

I've created the following scenario: @javax.persistence.Entity @Inheritance(strategy = InheritanceType.TABLE_PER_CLASS) public class MyEntity implements Serializable{ @Id @GeneratedValue protected Long id; ... @ElementCollection @CollectionTable(name="ENTITY_PARAMS") @MapKeyColumn (name = "ENTITY_KEY") @Column(name = "ENTITY_VALUE") protected Map<String, String> parameters; ... } As well as: @javax.persistence.Entity public class Sensor extends MyEntity{ @Id @GeneratedValue protected Long id; ... // so here "protected Map<String, String> parameters;" is inherited !!!! ... } So running this example, no tables are created and i get the following message: WARNUNG: Got SQLException executing statement "CREATE TABLE ENTITY_PARAMS (Entity_ID BIGINT NOT NULL, ENTITY_VALUE VARCHAR(255), ENTITY_KEY VARCHAR(255), Sensor_ID BIGINT NOT NULL, ENTITY_VALUE VARCHAR(255))": com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Duplicate column name 'ENTITY_VALUE' I also tried overriding the attributes on the Sensor class... @AttributeOverrides({ @AttributeOverride(name = "ENTITY_KEY", column = @Column(name = "SENSOR_KEY")), @AttributeOverride(name = "ENTITY_VALUE", column = @Column(name = "SENSOR_VALUE")) }) ... but the same error. Can anybody help me?

Read the article

Database Design Question regaurding duplicate information.

- by galford13x

I have a database that contains a history of product sales. For example the following table CREATE TABLE SalesHistoryTable ( OrderID, // Order Number Unique to all orders ProductID, // Product ID can be used as a Key to look up product info in another table Price, // Price of the product per unit at the time of the order Quantity, // quantity of the product for the order Total, // total cost of the order for the product. (Price * Quantity) Date, // Date of the order StoreID, // The store that created the Order PRIMARY KEY(OrderID)); The table will eventually have millions of transactions. From this, profiles can be created for products in different geographical regions (based on the StoreID). Creating these profiles can be very time consuming as a database query. For example. SELECT ProductID, StoreID, SUM(Total) AS Total, SUM(Quantity) QTY, SUM(Total)/SUM(Quantity) AS AvgPrice FROM SalesHistoryTable GROUP BY ProductID, StoreID; The above query could be used to get the Information based on products for any particular store. You could then determine which store has sold the most, has made the most money, and on average sells for the most/least. This would be very costly to use as a normal query run anytime. What are some design descisions in order to allow these types of queries to run faster assuming storage size isn’t an issue. For example, I could create another Table with duplicate information. Store ID (Key), Product ID, TotalCost, QTY, AvgPrice And provide a trigger so that when a new order is received, the entry for that store is updated in a new table. The cost for the update is almost nothing. What should be considered when given the above scenario?

Read the article

Entity Framework Duplicate type name within an assembly (6.1.0)

- by CodeMilian

I am not sure what is going on but I keep getting the following exception when doing a query. "Duplicate type name within an assembly." I have not been able to find a solution on the web. I had resolved the issue by removing entity framework from all the projects in the solutions and re-installing using nugget. Then all of the sudden the exception is back. I have verified my table schema over and over and find nothing wrong with. This is the query causing the exception. var BaseQuery = from Users in db.Users join UserInstalls in db.UserTenantInstalls on Users.ID equals UserInstalls.UserID join Installs in db.TenantInstalls on UserInstalls.TenantInstallID equals Installs.ID where Users.Username == Username && Users.Password == Password && Installs.Name == Install select Users; var Query = BaseQuery.Include("UserTenantInstalls.TenantInstall"); return Query.FirstOrDefault(); As I mentioned previously the same query was working before. The data has not changed and the code has not changed.

Read the article

How to compare 2 lists and merge them in Python/MySQL?

- by NJTechGuy

I want to merge data. Following are my MySQL tables. I want to use Python to traverse though a list of both Lists (one with dupe = 'x' and other with null dupes). For instance : a b c d e f key dupe -------------------- 1 d c f k l 1 x 2 g h j 1 3 i h u u 2 4 u r t 2 x From the above sample table, the desired output is : a b c d e f key dupe -------------------- 2 g c h k j 1 3 i r h u u 2 What I have so far : import string, os, sys import MySQLdb from EncryptedFile import EncryptedFile enc = EncryptedFile( os.getenv("HOME") + '/.py-encrypted-file') user = enc.getValue("user") pw = enc.getValue("pw") db = MySQLdb.connect(host="127.0.0.1", user=user, passwd=pw,db=user) cursor = db.cursor() cursor2 = db.cursor() cursor.execute("select * from delThisTable where dupe is null") cursor2.execute("select * from delThisTable where dupe is not null") result = cursor.fetchall() result2 = cursor2.fetchall() for cursorFieldname in cursor.description: for cursorFieldname2 in cursor2.description: if cursorFieldname[0] == cursorFieldname2[0]: ### How do I compare the record with same key value and update the original row null field value with the non-null value from the duplicate? Please fill this void... cursor.close() cursor2.close() db.close() Thanks guys!

Read the article

SQL Query to duplicate records based on If statement

- by user328371

Hi, I'm trying to write an SQL query that will duplicate records depending on a field in another table. I am running mySQL 5. (I know duplicating records shows that the database structure is bad, but I did not design the database and am not in a position to redo it all - it's a shopp ecommerce database running on wordpress.) Each product with a particular attribute needs a link to the same few images, so the product will need a row per image in a table - the database doesn't actually contain the image, just its filename. (the images are of clipart for a customer to select from) Based on these records... SELECT * FROM `wp_shopp_spec` WHERE name='Can Be Personalised' and content='Yes' I want to do something like this.. For each record that matches that query, copy records 5134 - 5139 from wp_shopp_asset but change the id so it's unique and set the cell in column 'parent' to have the value of 'product' from the table wp_shopp_spec. This will mean 6 new records are created for each record matching the above query, all with the same value in 'parent' but with unique ids and every other column copied from the original (ie. records 5134-5139) Hope that's clear enough - any help greatly appreciated.

Read the article

Optimize a MySQL count each duplicate Query

- by Onema

I have the following query That gets the city name, city id, the region name, and a count of duplicate names for that record: SELECT Country_CA.City AS currentCity, Country_CA.CityID, globe_region.region_name, ( SELECT count(Country_CA.City) FROM Country_CA WHERE City LIKE currentCity ) as counter FROM Country_CA LEFT JOIN globe_region ON globe_region.region_id = Country_CA.RegionID AND globe_region.country_code = Country_CA.CountryCode ORDER BY City This example is for Canada, and the cities will be displayed on a dropdown list. There are a few towns in Canada, and in other countries, that have the same names. Therefore I want to know if there is more than one town with the same name region name will be appended to the town name. Region names are found in the globe_region table. Country_CA and globe_region look similar to this (I have changed a few things for visualization purposes) CREATE TABLE IF NOT EXISTS `Country_CA` ( `City` varchar(75) NOT NULL DEFAULT '', `RegionID` varchar(10) NOT NULL DEFAULT '', `CountryCode` varchar(10) NOT NULL DEFAULT '', `CityID` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`City`,`RegionID`), KEY `CityID` (`CityID`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; AND CREATE TABLE IF NOT EXISTS `globe_region` ( `country_code` char(2) COLLATE utf8_unicode_ci NOT NULL, `region_code` char(2) COLLATE utf8_unicode_ci NOT NULL, `region_name` varchar(50) COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`country_code`,`region_code`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; The query on the top does exactly what I want it to do, but It takes way too long to generate a list for 5000 records. I would like to know if there is a way to optimize the sub-query in order to obtain the same results faster. the results should look like this City CityID region_name counter sheraton 2349269 British Columbia 1 sherbrooke 2349270 Quebec 2 sherbrooke 2349271 Nova Scotia 2 shere 2349273 British Columbia 1 sherridon 2349274 Manitoba 1

Read the article

duplicate rows in join table with has_many => through and accepts_nested_attributes_for

- by shalako

An event has many artists, and an artist has many events. The join of an artist and an event is called a performance. I want to add artists to an event. This works except that I'm getting duplicate entries into my join table when creating a new event. This causes problems elsewhere. event.rb has_many :performances, :dependent => :destroy has_many :artists, :through => :performances accepts_nested_attributes_for :artists, :reject_if => proc {|a| a['name'].blank?} accepts_nested_attributes_for :performances, :reject_if => proc { |a| a['artist_id'].blank? }, :allow_destroy => true artist.rb has_many :performances, :dependent => :destroy has_many :artists, :through => :performances performance.rb belongs_to :artist belongs_to :event events_controller.rb def new @event = Event.new @event.artists.build respond_to do |format| format.html # new.html.erb format.xml { render :xml => @event } end end def create @event = Event.new(params[:event]) respond_to do |format| if @event.save flash[:notice] = 'Event was successfully created.' format.html { redirect_to(admin_events_url) } format.xml { render :xml => @event, :status => :created, :location => @event } else format.html { render :action => "new" } format.xml { render :xml => @event.errors, :status => :unprocessable_entity } end end end output Performance Create (0.2ms) INSERT INTO `performances` (`event_id`, `artist_id`) VALUES(7, 19) Performance Create (0.1ms) INSERT INTO `performances` (`event_id`, `artist_id`) VALUES(7, 19)

Read the article

how to elegantly duplicate a graph (neural network)

- by macias

I have a graph (network) which consists of layers, which contains nodes (neurons). I would like to write a procedure to duplicate entire graph in most elegant way possible -- i.e. with minimal or no overhead added to the structure of the node or layer. Or yet in other words -- the procedure could be complex, but the complexity should not "leak" to structures. They should be no complex just because they are copyable. I wrote the code in C#, so far it looks like this: neuron has additional field -- copy_of which is pointer the the neuron which base copied from, this is my additional overhead neuron has parameterless method Clone() neuron has method Reconnect() -- which exchanges connection from "source" neuron (parameter) to "target" neuron (parameter) layer has parameterless method Clone() -- it simply call Clone() for all neurons network has parameterless method Clone() -- it calls Clone() for every layer and then it iterates over all neurons and creates mappings neuron=copy_of and then calls Reconnect to exchange all the "wiring" I hope my approach is clear. The question is -- is there more elegant method, I particularly don't like keeping extra pointer in neuron class just in case of being copied! I would like to gather the data in one point (network's Clone) and then dispose it completely (Clone method cannot have an argument though).

Read the article

Mysql query in drupal database - groupwise maximum with duplicate data

- by nselikoff

I'm working on a mysql query in a Drupal database that pulls together users and two different cck content types. I know people ask for help with groupwise maximum queries all the time... I've done my best but I need help. This is what I have so far: # the artists SELECT users.uid, users.name AS username, n1.title AS artist_name FROM users LEFT JOIN users_roles ur ON users.uid=ur.uid INNER JOIN role r ON ur.rid=r.rid AND r.name='artist' LEFT JOIN node n1 ON n1.uid = users.uid AND n1.type = 'submission' WHERE users.status = 1 ORDER BY users.name; This gives me data that looks like: uid username artist_name 1 foo Joe the Plumber 2 bar Jane Doe 3 baz The Tooth Fairy Also, I've got this query: # artwork SELECT n.nid, n.uid, a.field_order_value FROM node n LEFT JOIN content_type_artwork a ON n.nid = a.nid WHERE n.type = 'artwork' ORDER BY n.uid, a.field_order_value; Which gives me data like this: nid uid field_order_value 1 1 1 2 1 3 3 1 2 4 2 NULL 5 3 1 6 3 1 Additional relevant info: nid is the primary key for an Artwork every Artist has one or more Artworks valid data for field_order_value is NULL, 1, 2, 3, or 4 field_order_value is not necessarily unique per Artist - an Artist could have 4 Artworks all with field_order_value = 1. What I want is the row with the minimum field_order_value from my second query joined with the artist information from the first query. In cases where the field_order_value is not valuable information (either because the Artist has used duplicate values among their Artworks or left that field NULL), I would like the row with the minimum nid from the second query.

Read the article

Datastructure choices for highspeed and memory efficient detection of duplicate of strings

- by Jonathan Holland

I have a interesting problem that could be solved in a number of ways: I have a function that takes in a string. If this function has never seen this string before, it needs to perform some processing. If the function has seen the string before, it needs to skip processing. After a specified amount of time, the function should accept duplicate strings. This function may be called thousands of time per second, and the string data may be very large. This is a highly abstracted explanation of the real application, just trying to get down to the core concept for the purpose of the question. The function will need to store state in order to detect duplicates. It also will need to store an associated timestamp in order to expire duplicates. It does NOT need to store the strings, a unique hash of the string would be fine, providing there is no false positives due to collisions (Use a perfect hash?), and the hash function was performant enough. The naive implementation would be simply (in C#): Dictionary<String,DateTime> though in the interest of lowering memory footprint and potentially increasing performance I'm evaluating a custom data structures to handle this instead of a basic hashtable. So, given these constraints, what would you use? EDIT, some additional information that might change proposed implementations: 99% of the strings will not be duplicates. Almost all of the duplicates will arrive back to back, or nearly sequentially. In the real world, the function will be called from multiple worker threads, so state management will need to be synchronized.

Read the article

Prevent duplicate entries in arraylist

- by timyh

Say I create some object class like so public class thing { private String name; private Integer num; public oDetails (String a, Integer b) { name = a; num = b; } ...gets/ sets/ etc Now I want to create an arraylist to hold a number of this object class like so. ArrayList<thing> myList = new ArrayList<thing>; thing first = new thing("Star Wars", 3); thing second = new thing("Star Wars", 1); myList.add(first); myList.add(second); I would like to include some sort of logic so that in this case...when we try and add object "second" rather than add a new object to the arrayList, we add second.getNum() to first.getNum(). So if you were to iterate through the ArrayList it would be "Star Wars", 4 I am having trouble coming up with an elegant way of handling this. And as the arraylist grows, searching through it to determine if there are duplicate name items becomes cumbersome. Can anyone provide some guidance on this?

Read the article

MySQL Datefields: duplicate or calculate?

- by Konerak

We are using a table with a structure imposed upon us more than 10 years ago. We are allowed to add columns, but urged not to change existing columns. Certain columns are meant to represent dates, but are put in different format. Amongst others: * CHAR(6): YYMMDD * CHAR(6): DDMMYY * CHAR(8): YYYYMMDD * CHAR(8): DDMMYYYY * DATE * DATETIME Since we now would like to do some more complex queries, using advanced date functions, my manager proposed to d*uplicate those problem columns* to a proper FORMATTED_OLDCOLUMNNAME column using a DATE or DATETIME format. Is this the way to go? Couldn't we just use the STR_TO_DATE function each time we accessed the columns? To avoid every query having to copy-paste the function, I could still work with a view or a stored procedure, but duplicating data to avoid recalculation sounds wrong. Solutions I see (I guess I prefer 2.2.1) 1. Physically duplicate columns 1.1 In the same table 1.1.1 Added by each script that does a modification (INSERT/UPDATE/REPLACE/...) 1.1.2 Maintained by a trigger on each modification 1.2 In a separate table 1.2.1 Added by each script that does a modification (INSERT/UPDATE/REPLACE/...) 1.2.2 Maintained by a trigger on each modification 2. On-demand transformation 2.1 Each query has to perform the transformation 2.1.1 Using copy-paste in the source code 2.1.2 Using a library 2.1.3 Using a STORED PROCEDURE 2.2 A view performs the transformation 2.2.1 A separate table replacing the entire table 2.2.2 A separate table just adding the date-fields for the primary keys Am I right to say it's better to recalculate than to store? And would a view be a good solution?

Read the article

Replace duplicate values in array with new randomly generated values

- by RussellDias

I have below a function (created by Gordon in a previous question that went unanswered) that creates an array with n amount of values. The sum of the array is equal to $max. function randomDistinctPartition($n, $max) { $partition= array(); for($i=1; $i < $n; $i++) { $maxSingleNumber = $max - $n; $partition[] = $number = rand(1, $maxSingleNumber); } $max -= $number; } $partition[] = $max; return $partition; } For example: If I set $n = 4 and $max = 30. Then I should get the following. array(5, 7, 10, 8); However, this function does not take into account duplicates and 0s. What I would like - and have been trying to accomplish - is to generate an array with unique numbers that add up to my predetermined variable $max. No Duplicate numbers and No 0 and/or negative integers.

Read the article

Optimize Duplicate Detection

- by Dave Jarvis

Background This is an optimization problem. Oracle Forms XML files have elements such as: <Trigger TriggerName="name" TriggerText="SELECT * FROM DUAL" ... /> Where the TriggerText is arbitrary SQL code. Each SQL statement has been extracted into uniquely named files such as: sql/module=DIAL_ACCESS+trigger=KEY-LISTVAL+filename=d_access.fmb.sql sql/module=REP_PAT_SEEN+trigger=KEY-LISTVAL+filename=rep_pat_seen.fmb.sql I wrote a script to generate a list of exact duplicates using a brute force approach. Problem There are 37,497 files to compare against each other; it takes 8 minutes to compare one file against all the others. Logically, if A = B and A = C, then there is no need to check if B = C. So the problem is: how do you eliminate the redundant comparisons? The script will complete in approximately 208 days. Script Source Code The comparison script is as follows: #!/bin/bash echo Loading directory ... for i in $(find sql/ -type f -name \*.sql); do echo Comparing $i ... for j in $(find sql/ -type f -name \*.sql); do if [ "$i" = "$j" ]; then continue; fi # Case insensitive compare, ignore spaces diff -IEbwBaq $i $j > /dev/null # 0 = no difference (i.e., duplicate code) if [ $? = 0 ]; then echo $i :: $j >> clones.txt fi done done Question How would you optimize the script so that checking for cloned code is a few orders of magnitude faster? System Constraints Using a quad-core CPU with an SSD; trying to avoid using cloud services if possible. The system is a Windows-based machine with Cygwin installed -- algorithms or solutions in other languages are welcome. Thank you!

Read the article

Optimizing near-duplicate value search

- by GApple

I'm trying to find near duplicate values in a set of fields in order to allow an administrator to clean them up. There are two criteria that I am matching on One string is wholly contained within the other, and is at least 1/4 of its length The strings have an edit distance less than 5% of the total length of the two strings The Pseudo-PHP code: foreach($values as $value){ foreach($values as $match){ if( ( $value['length'] < $match['length'] && $value['length'] * 4 > $match['length'] && stripos($match['value'], $value['value']) !== false ) || ( $match['length'] < $value['length'] && $match['length'] * 4 > $value['length'] && stripos($value['value'], $match['value']) !== false ) || ( abs($value['length'] - $match['length']) * 20 < ($value['length'] + $match['length']) && 0 < ($match['changes'] = levenshtein($value['value'], $match['value'])) && $match['changes'] * 20 <= ($value['length'] + $match['length']) ) ){ $matches[] = &$match; } } } I've tried to reduce calls to the comparatively expensive stripos and levenshtein functions where possible, which has reduced the execution time quite a bit. However, as an O(n^2) operation this just doesn't scale to the larger sets of values and it seems that a significant amount of the processing time is spent simply iterating through the arrays. Some properties of a few sets of values being operated on Total | Strings | # of matches per string | | Strings | With Matches | Average | Median | Max | Time (s) | --------+--------------+---------+--------+------+----------+ 844 | 413 | 1.8 | 1 | 58 | 140 | 593 | 156 | 1.2 | 1 | 5 | 62 | 272 | 168 | 3.2 | 2 | 26 | 10 | 157 | 47 | 1.5 | 1 | 4 | 3.2 | 106 | 48 | 1.8 | 1 | 8 | 1.3 | 62 | 47 | 2.9 | 2 | 16 | 0.4 | Are there any other things I can do to reduce the time to check criteria, and more importantly are there any ways for me to reduce the number of criteria checks required (for example, by pre-processing the input values), since there is such low selectivity?

Read the article

MYSQL variables - SET @var

- by Lizard

I am attempting to create a mysql snippet that will analyse a table and remove duplicate entries (duplicates are based on two fields not entire record) I have the following code that works when I hard code the variables in the queries, but when I take them out and put them as variables I get mysql errors, below is the script SET @tblname = 'mytable'; SET @fieldname = 'myfield'; SET @concat1 = 'checkfield1'; SET @concat2 = 'checkfield2'; ALTER TABLE @tblname ADD `tmpcheck` VARCHAR( 255 ) NOT NULL; UPDATE @tblname SET `tmpcheck` = CONCAT(@concat1,'-',@concat2); CREATE TEMPORARY TABLE `tmp_table` ( `tmpfield` VARCHAR( 100 ) NOT NULL ) ENGINE = MYISAM ; INSERT INTO `tmp_table` (`tmpfield`) SELECT @fieldname FROM @tblname GROUP BY `tmpcheck` HAVING ( COUNT(`tmpcheck`) > 1 ); DELETE FROM @tblname WHERE @fieldname IN (SELECT `tmpfield` FROM `tmp_table`); ALTER TABLE @tblname DROP `tmpcheck`; I am getting the following error: #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '@tblname ADD `tmpcheck` VARCHAR( 255 ) NOT NULL' at line 1 Is this because I can't use a variable for a table name? What else could be wrong or how wopuld I get around this issue. Thanks in adavnce

Read the article

Displaying a message after adding duplicate records in database

- by user1770370

I wrote program in C# winforms and SQL server and LINQ to SQL. I use user control instead of form. In my user control, I put 3 textbox, txtStartNumber, txtEndNumber, txtQuantity. user define value of textboxes, when clicked button, it will insert some records according to the value of txtQuantity. I want to when duplicate number is created, it won't add to database and display message. how do i do? I must write code in code behind or server side? i must set this in store procedure or trigger? private void btnSave_Click(object sender, EventArgs e) { long from = Convert.ToInt64(txt_barcode_f.Text); long to = Convert.ToInt64(txt_barcode_t.Text); long quantity = Convert.ToInt64(to - from); int card_Type_ID=Convert.ToInt32(cmb_BracodeType .SelectedValue); long[] arrCardNum = new long[(to - from)]; arrCardNum[0]=from; for (long i = from; i < to; i++) { for(int j=0; j<(to-from) ;j++) { arrCardNum[j]=from+j; string r = arrCardNum[j].ToString(); sp.SaveCards(r, 2, card_Type_ID, SaveDate, 2); } } } Stored Procedure code. ALTER PROCEDURE dbo.SaveCards @Barcode_Num int ,@Card_Status_ID int ,@Card_Type_ID int ,@SaveDate varchar(10) ,@Save_User_ID int AS BEGIN INSERT INTO [Parking].[dbo].[TBL_Cards] ([Barcode_Num] ,[Card_Status_ID] ,[Card_Type_ID] ,[Save_User_ID]) VALUES (@Barcode_Num ,@Card_Status_ID ,@Card_Type_ID ,@Save_User_ID) END

Read the article

find a duplicate series in SQL

- by SomeMiscGuy

I have a table with 3 columns containing a variable number of records based off of the first column which is a foreign key. I am trying to determine if I can detect when there is a duplicate across multiple rows for an entire series declare @finddupseries table ( portid int, asset_id int, allocation float ) ; INSERT INTO @finddupseries SELECT 250, 6, 0.05 UNION ALL SELECT 250, 66, 0.8 UNION ALL SELECT 250, 2, 0.105 UNION ALL SELECT 250, 4, 0.0225 UNION ALL SELECT 250, 5, 0.0225 UNION ALL SELECT 251, 13, 0.6 UNION ALL SELECT 251, 2, 0.3 UNION ALL SELECT 251, 5, 0.1 UNION ALL SELECT 252, 13, 0.8 UNION ALL SELECT 252, 2, 0.15 UNION ALL SELECT 252, 5, 0.05 UNION ALL SELECT 253, 13, 0.4 UNION ALL SELECT 253, 2, 0.45 UNION ALL SELECT 253, 5, 0.15 UNION ALL SELECT 254, 6, 0.05 UNION ALL SELECT 254, 66, 0.8 UNION ALL SELECT 254, 2, 0.105 UNION ALL SELECT 254, 4, 0.0225 UNION ALL SELECT 254, 5, 0.0225 select * from @finddupseries The records for portid 250 and 254 match. Is there any way I can write a query to detect this? edit: yes, the entire series must match. Also, if there was a way to determine which one it DID match would be helpful as the actual table has around 10k records. thanks!

Search Results

Search found 3880 results on 156 pages for 'duplicate'.

Page 24/156 | < Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31 | Next Page >

- by NKing253

- by Dmitriy Nagirnyak

- by kscott

- by wobblycogs

- by UpTheCreek

- by Brabster

- by Mauricio Scheffer

- by gerry

- by galford13x

- by CodeMilian

- by NJTechGuy

- by user328371

- by Onema

- by shalako

- by macias

- by nselikoff

- by Jonathan Holland

- by timyh

- by Konerak

- by RussellDias

- by Dave Jarvis

- by GApple

- by Lizard

- by user1770370

- by SomeMiscGuy

< Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31 | Next Page >