Search Results

Search found 5 results on 1 pages for 'agrep'.

Page 1/1 | 1 

  • Create a unique ID by fuzzy matching of names (via agrep using R)

    - by tbrambor
    Using R, I am trying match on people's names in a dataset structured by year and city. Due to some spelling mistakes, exact matching is not possible, so I am trying to use agrep() to fuzzy match names. A sample chunk of the dataset is structured as follows: df <- data.frame(matrix( c("1200013","1200013","1200013","1200013","1200013","1200013","1200013","1200013", "1996","1996","1996","1996","2000","2000","2004","2004","AGUSTINHO FORTUNATO FILHO","ANTONIO PEREIRA NETO","FERNANDO JOSE DA COSTA","PAULO CEZAR FERREIRA DE ARAUJO","PAULO CESAR FERREIRA DE ARAUJO","SEBASTIAO BOCALOM RODRIGUES","JOAO DE ALMEIDA","PAULO CESAR FERREIRA DE ARAUJO"), ncol=3,dimnames=list(seq(1:8),c("citycode","year","candidate")) )) The neat version: citycode year candidate 1 1200013 1996 AGUSTINHO FORTUNATO FILHO 2 1200013 1996 ANTONIO PEREIRA NETO 3 1200013 1996 FERNANDO JOSE DA COSTA 4 1200013 1996 PAULO CEZAR FERREIRA DE ARAUJO 5 1200013 2000 PAULO CESAR FERREIRA DE ARAUJO 6 1200013 2000 SEBASTIAO BOCALOM RODRIGUES 7 1200013 2004 JOAO DE ALMEIDA 8 1200013 2004 PAULO CESAR FERREIRA DE ARAUJO I'd like to check in each city separately, whether there are candidates appearing in several years. E.g. in the example, PAULO CEZAR FERREIRA DE ARAUJO PAULO CESAR FERREIRA DE ARAUJO appears twice (with a spelling mistake). Each candidate across the entire data set should be assigned a unique numeric candidate ID. The dataset is fairly large (5500 cities, approx. 100K entries) so a somewhat efficient coding would be helpful. Any suggestions as to how to implement this?

    Read the article

  • Approximate string matching with a letter confusion matrix?

    - by zigglenaut
    I'm trying to model a phonetic recognizer that has to isolate instances of words (strings of phones) out of a long stream of phones that doesn't have gaps between each word. The stream of phones may have been poorly recognized, with letter substitutions/insertions/deletions, so I will have to do approximate string matching. However, I want the matching to be phonetically-motivated, e.g. "m" and "n" are phonetically similar, so the substitution cost of "m" for "n" should be small, compared to say, "m" and "k". So, if I'm searching for [mein] "main", it would match the letter sequence [meim] "maim" with, say, cost 0.1, whereas it would match the letter sequence [meik] "make" with, say, cost 0.7. Similarly, there are differing costs for inserting or deleting each letter. I can supply a confusion matrix that, for each letter pair (x,y), gives the cost of substituting x with y, where x and y are any letter or the empty string. I know that there are tools available that do approximate matching such as agrep, but as far as I can tell, they do not take a confusion matrix as input. That is, the cost of any insertion/substitution/deletion = 1. My question is, are there any open-source tools already available that can do approximate matching with confusion matrices, and if not, what is a good algorithm that I can implement to accomplish this?

    Read the article

  • Space (and pipe sign) works on occasion only

    - by Timo Riikonen
    I have an issue that when I try to write pipe sign "|" and space after that, I sometimes get wrong type of space (\240) and my command fails. This issue persists on different shells. How could I fix this? I am using Finnish keyboard layout. timo@timo-i7-ubuntu:~$ ps -ef | grep ruby timo 7169 2633 0 12:12 pts/2 00:00:00 ruby1.9.1 /usr/local/bin/rails new admin4 timo 8736 26515 0 14:22 pts/4 00:00:00 grep --color=auto ruby timo@timo-i7-ubuntu:~$ ps -ef | grep ruby No command ' grep' found, did you mean: Command 'igrep' from package 'openimageio-tools' (universe) Command 'dgrep' from package 'debian-goodies' (main) Command 'rgrep' from package 'grep' (main) Command 'zgrep' from package 'gzip' (main) Command 'zgrep' from package 'zutils' (universe) Command 'sgrep' from package 'sgrep' (universe) Command 'lgrep' from package 'lv' (universe) Command 'egrep' from package 'grep' (main) Command 'ngrep' from package 'ngrep' (universe) Command 'grep' from package 'grep' (main) Command 'agrep' from package 'agrep' (multiverse) Command 'pgrep' from package 'procps' (main) Command 'xgrep' from package 'xgrep' (universe) Command 'vgrep' from package 'atfs' (universe) Command 'fgrep' from package 'grep' (main)  grep: command not found timo@timo-i7-ubuntu:~$ cat pipecom ps -ef | grep rails timo@timo-i7-ubuntu:~$ cat pipecom2 ps -ef | grep rails timo@timo-i7-ubuntu:~$ ./pipecom timo 7169 2633 0 12:12 pts/2 00:00:00 ruby1.9.1 /usr/local/bin/rails new admin4 timo 8777 8775 0 14:26 pts/4 00:00:00 grep rails timo@timo-i7-ubuntu:~$ ./pipecom2 ./pipecom2: line 1: $'\302\240grep': command not found timo@timo-i7-ubuntu:~$ diff -w pipecom pipecom2 1c1 < ps -ef | grep rails --- > ps -ef | grep rails

    Read the article

  • How to search a text file for strings between two tokens in Ubuntu terminal and save the output?

    - by Blue
    How can I search a text file for this pattern in Ubuntu terminal and save the output as a text file? I'm looking for everything between the string "abc" and the string "cde" in a long list of data. For example: blah blah abc fkdljgn cde blah blah blah blah blah blah abc skdjfn cde blah In the example above I would be looking for an output such as this: fkdljgn skdjfn It is important that I can also save the data output as a text file. Can I use grep or agrep and if so, what is the format?

    Read the article

  • CodePlex Daily Summary for Sunday, November 21, 2010

    CodePlex Daily Summary for Sunday, November 21, 2010Popular ReleasesMDownloader: MDownloader-0.15.24.6966: Fixed Updater; Fixed minor bugs;Smith Html Editor: Smith Html Editor V0.75: The first public release.MiniTwitter: 1.59: MiniTwitter 1.59 ???? ?? User Streams ????????????????? ?? ?????????????? ???????? ?????????????.NET Extensions - Extension Methods Library for C# and VB.NET: Release 2011.01: Added new extensions for - object.CountLoopsToNull Added new extensions for DateTime: - DateTime.IsWeekend - DateTime.AddWeeks Added new extensions for string: - string.Repeat - string.IsNumeric - string.ExtractDigits - string.ConcatWith - string.ToGuid - string.ToGuidSave Added new extensions for Exception: - Exception.GetOriginalException Added new extensions for Stream: - Stream.Write (overload) And other new methods ... Release as of dotnetpro 01/2011Code Sample from Microsoft: Visual Studio 2010 Code Samples 2010-11-19: Code samples for Visual Studio 2010Prism Training Kit: Prism Training Kit 4.0: Release NotesThis is an updated version of the Prism training Kit that targets Prism 4.0 and added labs for some of the new features of Prism 4.0. This release consists of a Training Kit with Labs on the following topics Modularity Dependency Injection Bootstrapper UI Composition Communication MEF Navigation Note: Take into account that this is a Beta version. If you find any bugs please report them in the Issue Tracker PrerequisitesVisual Studio 2010 Microsoft Word 2...Free language translator and file converter: Free Language Translator 2.2: Starting with version 2.0, the translator encountered a major redesign that uses MEF based plugins and .net 4.0. I've also fixed some bugs and added support for translating subtitles that can show up in video media players. Version 2.1 shows the context menu 'Translate' in Windows Explorer on right click. Version 2.2 has links to start the media file with its associated subtitle. Download the zip file and expand it in a temporary location on your local disk. At a minimum , you should uninstal...Free Silverlight & WPF Chart Control - Visifire: Visifire SL and WPF Charts v3.6.4 Released: Hi, Today we are releasing Visifire 3.6.4 with few bug fixes: * Multi-line Labels were getting clipped while exploding last DataPoint in Funnel and Pyramid chart. * ClosestPlotDistance property in Axis was not behaving as expected. * In DateTime Axis, Chart threw exception on mouse click over PlotArea if there were no DataPoints present in Chart. * ToolTip was not disappearing while changing the DataSource property of the DataSeries at real-time. * Chart threw exception ...Microsoft SQL Server Product Samples: Database: AdventureWorks 2008R2 SR1: Sample Databases for Microsoft SQL Server 2008R2 (SR1)This release is dedicated to the sample databases that ship for Microsoft SQL Server 2008R2. See Database Prerequisites for SQL Server 2008R2 for feature configurations required for installing the sample databases. See Installing SQL Server 2008R2 Databases for step by step installation instructions. The SR1 release contains minor bug fixes to the installer used to create the sample databases. There are no changes to the databases them...VidCoder: 0.7.2: Fixed duplicated subtitles when running multiple encodes off of the same title.Craig's Utility Library: Craig's Utility Library Code 2.0: This update contains a number of changes, added functionality, and bug fixes: Added transaction support to SQLHelper. Added linked/embedded resource ability to EmailSender. Updated List to take into account new functions. Added better support for MAC address in WMI classes. Fixed Parsing in Reflection class when dealing with sub classes. Fixed bug in SQLHelper when replacing the Command that is a select after doing a select. Fixed issue in SQL Server helper with regard to generati...MFCMAPI: November 2010 Release: Build: 6.0.0.1023 Full release notes at SGriffin's blog. If you just want to run the tool, get the executable. If you want to debug it, get the symbol file and the source. The 64 bit build will only work on a machine with Outlook 2010 64 bit installed. All other machines should use the 32 bit build, regardless of the operating system. Facebook BadgeDotNetNuke® Community Edition: 05.06.00: Major HighlightsAdded automatic portal alias creation for single portal installs Updated the file manager upload page to allow user to upload multiple files without returning to the file manager page. Fixed issue with Event Log Email Notifications. Fixed issue where Telerik HTML Editor was unable to upload files to secure or database folder. Fixed issue where registration page is not set correctly during an upgrade. Fixed issue where Sendmail stripped HTML and Links from emails...mVu Mobile Viewer: mVu Mobile Viewer 0.7.10.0: Tube8 fix.EPPlus-Create advanced Excel 2007 spreadsheets on the server: EPPlus 2.8.0.1: EPPlus-Create advanced Excel 2007 spreadsheets on the serverNew Features Improved chart support Different chart-types series on the same chart Support for secondary axis and a lot of new properties Better styling Encryption and Workbook protection Table support Import csv files Array formulas ...and a lot of bugfixesAutoLoL: AutoLoL v1.4.2: Added support for more clients (French and Russian) Settings are now stored sepperatly for each user on a computer Auto Login is much faster now Auto Login detects and handles caps lock state properly nowTailspinSpyworks - WebForms Sample Application: TailspinSpyworks-v0.9: Contains a number of bug fixes and additional tutorial steps as well as complete database implementation details.ASP.NET MVC Project Awesome (jQuery Ajax helpers): 1.3 and demos: It contains a rich set of helpers (controls) that you can use to build highly responsive and interactive Ajax-enabled Web applications. These helpers include Autocomplete, AjaxDropdown, Lookup, Confirm Dialog, Popup Form and Pager tested on mozilla, safari, chrome, opera, ie 9b/8/7/6 new stuff in 1.3 Autocomplete helper Autocomplete and AjaxDropdown can have parentId and be filled with data depending on the value of the parent PopupForm besides Content("ok") on success can also return J...Nearforums - ASP.NET MVC forum engine: Nearforums v4.1: Version 4.1 of the ASP.NET MVC forum engine, with great improvements: TinyMCE added as visual editor for messages (removed CKEditor). Integrated AntiSamy for cleaner html user post and add more prevention to potential injections. Admin status page: a page for the site admin to check the current status of the configuration / db / etc. View Roadmap for more details.UltimateJB: UltimateJB 2.01 PL3 KakaRoto + PSNYes by EvilSperm: Voici une version attendu avec impatience pour beaucoup : - La Version PSNYes pour pouvoir jouer sur le PSN avec une PS3 Jailbreaker. - Pour l'instant le PSNYes n'est disponible qu'avec les PS3 en firmwares 3.41 !!! - La version PL3 KAKAROTO intégre ses dernières modification et prépare a l'intégration du Firmware 3.30 !!! Conclusion : - UltimateJB PSNYes => Valide l'utilisation du PSN : Uniquement compatible avec les 3.41 - ultimateJB DEFAULT => Pas de PSN mais disponible pour les PS3 sui...New Projects1600hours: 1600hours project made in C++.aoleDownload: Aole Series DownloadBills and Cash Flow: Bills and Cash Flow is a simple multi-tenant application to track bills and view cash flowCUDAagrep: CUDAagrep, a fast CUDA implementation of agrep algorithm for approximate DNA/RNA sequence matching.DNN5 Simple Ticketing Module: This is a simple DNN module that accepts trouble tickets and creates a knowledge base for a company.EntityOH: Dynamic Entities ORMFxcop ASP.NET Security Rules: Fxcop ASP.NET security rules This is a set of code analysis rules aiming at analyzing ASP.NET and ASP.NET MVC security against best practices. The rules can be used by Visual Studio 10 Ultimate or FxCop v10 standalone.Head First Design Patterns - Code Examples in C#: This project consists of ported code examples from the book Head First Design Patterns by Eric and Elizabeth Freeman into C#.HTML5 Media Player (Video / Audio): A .NET implementation of the VideoJS and AudioJS open source projects with video and audio support for HTML5. Excellent for use with iPod, iPad, iPhone, etc.Keyword Auction Simulator: This is the project for simulating the keyword auction like Adwords.mAdcOW Office Add-Ins: A collection of handy Office 2010 add-ins.Manga to Epub: Manga to Epub allow you to convert a bunch of images to a single "epub" file, readable on your reader. It handles most of the image types as well as several archives. You have multiple customization options, such as trimming the images in order to remove white borders.Mapua Career Ramp Up: A joint endeavor with the Philippine IT industry leaders and with Mapua School of Information Technology to build an online collaborative database system to Ramp-Up graduating students on their career as future IT Professionals. minami: Minami is a Project what focuse the work on Stability and Features. Is Development in C++minami-dev: Comes later the Description.Mobile RPG: Mobile RPG is five ATtiny85 microcontrollers playing their own RPG characters with a primary MCU acting as GM. Its a fun exercise in autonomous role playing.NetSnoop: Netsnoop allows everyone to get a quick overview over alle the current connections on their workstation.nGso: GSO algorithm implementation based on http://www.springerlink.com/content/y065470472612847/fulltext.pdf Glowworm swarm optimization for simultaneous capture of multiple local optima of multimodal functions K.N. Krishnanand · D. GhoseOpenID Starter Kit for ASP.NET MVC: OpenID Starter Kit for ASP.NET MVC is used to jump start building your web application with ASP.NET MVC with OpenID login system. It is also a good education resource if you want to learn how to implement OpenID into a ASP.NET MVC.Orchard Contact Us Module: Add a contact us page to your Orchard site using this module.Persian Scheduler and Calendar Control: This is a Jalali (Persian or shamsi) calendar and scheduler control in silverlight. Choosing the name 'Jalali' is in honor of 'Hakim omar khayyam' the founder of Jalali calendar. This is under the lisence of 'Barid New Systems' company.Popfly Metadata Generator: Creates Metadata for New project.PurpleStoat: A modular, extensible Silverlight application shell using Prism, Unity and the Enterprise Library, and written in C#. It includes a WCF service which provides AuthZ and logging services to the shell, which are also available to the modules.QL Config Compare Tool: The QL Config Compare Tool enables you to compare two QuakeLive configs. It creates a detailed overview of the differences and is able to save statistics.SQL PHI Identifier: SQL PHI Identifier is an auditing tool for DBA's in a healthcare environment to be able to help identify which databases/tables might hold protected health information (PHI). Using this information a DBA can then take the necessary steps to secure that data adequately.Sqlite ORM: Sqlite ORM is at present a simple Class to Table mapper for Sqlite databases. Tables are created on demand, and designed to future proof for Sharding. Code has 100% unit test coverage.Test shop: Test shopVarMerger - ??????? ????????? ??? ???????? ????????????.: VarMerger - ?????????? (Add-In) ??? MS Word 2007, ??????? ????????? ??????????? ???????? ???????? ??????? ?? ??????, ?????????? ????????? ?????? ? ??????. Visual Studio Add-In For creating Vista Gadget: The absence of tools in Visual Studio that can help developers to create Vista gadgets is strange and disappointing, in my opinion., I want to show you some tools that can help you to develop Vista gadgets using only Visual Studio 2008 or 2010 IDE.Vocal Remover - VST Plugin: VST Plugin Removes vocal form songs using M/S system trick with EQ on mid signal. source in C++ IDE: Visual Studio 2010 Express Edition LIB: Steinberg VST SDK 2.4Windows Phone 7 To Go: A project with demos for Windows Phone 7 FeaturesWinware: Winware is not only an Entity Framework, but beyond.XTengine: Xtengine makes it easier for XNA developers to develop in a compositional manner. You'll no longer have to write specific game classes with deep hierarchies or hardcode to load levels. It's developed in C# with XNA 4.0, with WP7 in mind.

    Read the article

1