Search Results

Search found 5 results on 1 pages for 'harshsinghal'.

Page 1/1 | 1 

  • Essential skills of a Data Scientist

    - by harshsinghal
    I would like to know more about the relevant skills in the arsenal of a Data Scientist, and with new technologies coming in every day, how one picks and chooses the essentials. A few ideas germane to this discussion: Knowing SQL and the use of a DB such as MySQL, PostgreSQL was great till the advent of NoSql and non-relational databases. MongoDB, CouchDB etc. are becoming popular to work with web-scale data. Knowing a stats tool like R is enough for analysis, but to create applications one may need to add Java, Python, and such others to the list. Data now comes in the form of text, urls, multi-media to name a few, and there are different paradigms associated with their manipulation. What about cluster computing, parallel computing, the cloud, Amazon EC2, Hadoop ? OLS Regression now has Artificial Neural Networks, Random Forests and other relatively exotic machine learning/data mining algos. for company Thoughts?

    Read the article

  • Creating a local CRAN repository

    - by harshsinghal
    I would like to create a local CRAN repository such that users in my company can install packages from it and the system admins can update the local repo periodically. Access to the CRAN mirrors is currently denied. Is there a simple way to do this? Thank you for your time.

    Read the article

  • n-grams from text in PostgreSQL

    - by harshsinghal
    I am looking to create n-grams from text column in PostgreSQL. I currently split(on white-space) data(sentences) in a text column to an array. select regexp_split_to_array(sentenceData,E'\s+') from tableName Once I have this array, how do I go about: Creating a loop to find n-grams, and write each to a row in another table Using unnest I can obtain all the elements of all the arrays on separate rows, and maybe I can then think of a way to get n-grams from a single column, but I'd loose the sentence boundaries which I wise to preserve. Sample SQL code for PostgreSQL to emulate the above scenario create table tableName(sentenceData text); INSERT INTO tableName(sentenceData) VALUES('This is a long sentence'); INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!'); INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese'); select regexp_split_to_array(sentenceData,E'\s+') from tableName; select unnest(regexp_split_to_array(sentenceData,E'\s+')) from tableName;

    Read the article

  • Calling software modules (Java, Perl, etc.) from R

    - by harshsinghal
    I've recently started using R for Natural Language Processing tasks and find that a lot of applications are available in Java and Perl (for my purposes). For example: A few perl modules are available to find distance measures between words by querying Wordnet. I am aware of the R Wordnet package, but it does not perform the tasks that these CPAN modules do. Many Java packages for NLP are out there, which I'd like to use from within R. I know of rJava, RSPerl, the simple system command amongst others, but I'd like more examples of how I could make calls to Java and Perl applications from R. Recently I tried capturing console output from a perl script. cat( 'print "Hello World\n";',file="hello.pl" ) system(command="c:\Perl64\bin\perl hello.pl") system(command=paste(Sys.getenv("COMSPEC"),"/c","C:\Perl64\bin\perl hello.pl")) None of the above system commands showed 'Hello World' on the R console. I've used "system" before to run perl scripts to perform tasks without wanting to capture console output. Any hints and redirection to other more extensive sources of information would be highly appreciated. Thank you

    Read the article

  • Document/Scripts management for R code

    - by harshsinghal
    Hi useRs, I am looking for a solution that allows me to keep a track of a multitude of R scripts that I create for various projects and purposes. Some scripts are easily tracked to specific projects, whereas others are "convenience" functions created to serve a set of tasks. Is there a way I can create a central DB and query it to find which scripts match most appropriately? I could create a system using a DBMS manually, but are users aware of anything in general or specific to R, that comes in the form of a software tool (maybe FOSS) ?

    Read the article

1