Search Results

Search found 7058 results on 283 pages for 'job'.

Page 31/283 | < Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38  | Next Page >

  • How many programming jobs are there that require German/French language ?

    - by HJ-INCPP
    Hello, I want to improve my chances getting a job (entry-level:programming) by learning another language. How many jobs that require exclusively French, German, English are there ? Which is better to learn (more/better jobs): French or German ? Is it worth it (or should I learn another programming language instead :D) ? Thank you. P.S I live in Romania, I (think I) know English

    Read the article

  • Guide to migrate from delayed_job to resque?

    - by Eli
    Does anyone have or know of a guide for how to migrate from Delayed Job to Resque? I can't find any on Google and figure I must not be the first one doing this. Just a general list of changes that need to made and things to watch out for would be great.

    Read the article

  • As a programmer, what are some telltale signs that you're about to get fired or laid off?

    - by plaureano
    If you have ever been fired from a job, did you notice anything different about the behavior of your peers or upper management just before your termination? What are some common signs to look for among your coworkers and project manager(s) that would indicate your position is severely at risk? EDIT: My instincts were right, and I opted to resign rather than face termination. I guess when you have that "gut feeling" that something is about to happen, it's a strong sign that you should be heading for the exit...

    Read the article

  • Online job-searching is tedious. Help me automate it.

    - by ehsanul
    Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program to look through a large number of postings, and save the URLs of just those jobs that don't require years of experience. I don't require help writing the scraper to get the html bodies of possibly relevant job posts. The issue is accurately detecting the level of experience required for the job. This should not be too difficult as job posts are usually very explicit about this ("must have 5 years experience in..."), but there may be some issues with overly simple solutions. In my case, I'm looking for entry-level positions. Often they don't say "entry-level", but inclusion of the words probably means the job should be saved. Next, I can safely exclude a job the says it requires "5 years" of experience in whatever, so a regex like /\d\syears/ seems reasonable to exclude jobs. But then, I realized some jobs say they'll take 0-2 years of experience, matches the exclusion regex but is clearly a job I want to take a look at. Hmmm, I can handle that with another regex. But some say "less than 2 years" or "fewer than 2 years". Can handle that too, but it makes me wonder what other patterns I'm not thinking of, and possibly excluding many jobs. That's what brings me here, to find a better way to do this than regexes, if there is one. I'd like to minimize the false negative rate and save all the jobs that seem like they might not require many years of experience. Does excluding anything that matches /[3-9]\syears|1\d\syears/ seem reasonable? Or is there a better way? Training a bayesian filter maybe?

    Read the article

  • Replication Services as ETL extraction tool

    - by jorg
    In my last blog post I explained the principles of Replication Services and the possibilities it offers in a BI environment. One of the possibilities I described was the use of snapshot replication as an ETL extraction tool: “Snapshot Replication can also be useful in BI environments, if you don’t need a near real-time copy of the database, you can choose to use this form of replication. Next to an alternative for Transactional Replication it can be used to stage data so it can be transformed and moved into the data warehousing environment afterwards. In many solutions I have seen developers create multiple SSIS packages that simply copies data from one or more source systems to a staging database that figures as source for the ETL process. The creation of these packages takes a lot of (boring) time, while Replication Services can do the same in minutes. It is possible to filter out columns and/or records and it can even apply schema changes automatically so I think it offers enough features here. I don’t know how the performance will be and if it really works as good for this purpose as I expect, but I want to try this out soon!” Well I have tried it out and I must say it worked well. I was able to let replication services do work in a fraction of the time it would cost me to do the same in SSIS. What I did was the following: Configure snapshot replication for some Adventure Works tables, this was quite simple and straightforward. Create an SSIS package that executes the snapshot replication on demand and waits for its completion. This is something that you can’t do with out of the box functionality. While configuring the snapshot replication two SQL Agent Jobs are created, one for the creation of the snapshot and one for the distribution of the snapshot. Unfortunately these jobs are  asynchronous which means that if you execute them they immediately report back if the job started successfully or not, they do not wait for completion and report its result afterwards. So I had to create an SSIS package that executes the jobs and waits for their completion before the rest of the ETL process continues. Fortunately I was able to create the SSIS package with the desired functionality. I have made a step-by-step guide that will help you configure the snapshot replication and I have uploaded the SSIS package you need to execute it. Configure snapshot replication   The first step is to create a publication on the database you want to replicate. Connect to SQL Server Management Studio and right-click Replication, choose for New.. Publication…   The New Publication Wizard appears, click Next Choose your “source” database and click Next Choose Snapshot publication and click Next   You can now select tables and other objects that you want to publish Expand Tables and select the tables that are needed in your ETL process In the next screen you can add filters on the selected tables which can be very useful. Think about selecting only the last x days of data for example. Its possible to filter out rows and/or columns. In this example I did not apply any filters. Schedule the Snapshot Agent to run at a desired time, by doing this a SQL Agent Job is created which we need to execute from a SSIS package later on. Next you need to set the Security Settings for the Snapshot Agent. Click on the Security Settings button.   In this example I ran the Agent under the SQL Server Agent service account. This is not recommended as a security best practice. Fortunately there is an excellent article on TechNet which tells you exactly how to set up the security for replication services. Read it here and make sure you follow the guidelines!   On the next screen choose to create the publication at the end of the wizard Give the publication a name (SnapshotTest) and complete the wizard   The publication is created and the articles (tables in this case) are added Now the publication is created successfully its time to create a new subscription for this publication.   Expand the Replication folder in SSMS and right click Local Subscriptions, choose New Subscriptions   The New Subscription Wizard appears   Select the publisher on which you just created your publication and select the database and publication (SnapshotTest)   You can now choose where the Distribution Agent should run. If it runs at the distributor (push subscriptions) it causes extra processing overhead. If you use a separate server for your ETL process and databases choose to run each agent at its subscriber (pull subscriptions) to reduce the processing overhead at the distributor. Of course we need a database for the subscription and fortunately the Wizard can create it for you. Choose for New database   Give the database the desired name, set the desired options and click OK You can now add multiple SQL Server Subscribers which is not necessary in this case but can be very useful.   You now need to set the security settings for the Distribution Agent. Click on the …. button Again, in this example I ran the Agent under the SQL Server Agent service account. Read the security best practices here   Click Next   Make sure you create a synchronization job schedule again. This job is also necessary in the SSIS package later on. Initialize the subscription at first synchronization Select the first box to create the subscription when finishing this wizard Complete the wizard by clicking Finish The subscription will be created In SSMS you see a new database is created, the subscriber. There are no tables or other objects in the database available yet because the replication jobs did not ran yet. Now expand the SQL Server Agent, go to Jobs and search for the job that creates the snapshot:   Rename this job to “CreateSnapshot” Now search for the job that distributes the snapshot:   Rename this job to “DistributeSnapshot” Create an SSIS package that executes the snapshot replication We now need an SSIS package that will take care of the execution of both jobs. The CreateSnapshot job needs to execute and finish before the DistributeSnapshot job runs. After the DistributeSnapshot job has started the package needs to wait until its finished before the package execution finishes. The Execute SQL Server Agent Job Task is designed to execute SQL Agent Jobs from SSIS. Unfortunately this SSIS task only executes the job and reports back if the job started succesfully or not, it does not report if the job actually completed with success or failure. This is because these jobs are asynchronous. The SSIS package I’ve created does the following: It runs the CreateSnapshot job It checks every 5 seconds if the job is completed with a for loop When the CreateSnapshot job is completed it starts the DistributeSnapshot job And again it waits until the snapshot is delivered before the package will finish successfully Quite simple and the package is ready to use as standalone extract mechanism. After executing the package the replicated tables are added to the subscriber database and are filled with data:   Download the SSIS package here (SSIS 2008) Conclusion In this example I only replicated 5 tables, I could create a SSIS package that does the same in approximately the same amount of time. But if I replicated all the 70+ AdventureWorks tables I would save a lot of time and boring work! With replication services you also benefit from the feature that schema changes are applied automatically which means your entire extract phase wont break. Because a snapshot is created using the bcp utility (bulk copy) it’s also quite fast, so the performance will be quite good. Disadvantages of using snapshot replication as extraction tool is the limitation on source systems. You can only choose SQL Server or Oracle databases to act as a publisher. So if you plan to build an extract phase for your ETL process that will invoke a lot of tables think about replication services, it would save you a lot of time and thanks to the Extract SSIS package I’ve created you can perfectly fit it in your usual SSIS ETL process.

    Read the article

  • Should we tell our expected and current CTC just in mail before interview?

    - by jitendra
    Should we tell our expected and current CTC just in mail before interview? I read on many resume advices "never put salary info in resume" but these day every company ask expected CTC then takes interview. What should i give in reply to this type of mail where company is asking for expected and current CTC before interview? Can they appoint me directly , without interview? Should i ask any other question to company before giving expected and current CTC? Hi, This is Mikel from london I found your resume on a job portal and it's very good .We have very urgent requirements @ london. Requirement1 : Senior Web Designer Experience: min4+yrs Skills:HTML,Adobe Photoshop, Javascript,CSS, Dreamweaver,Accessibility Etc.. If you looking for change just forward your latest resume to [email protected] along with these details Contact Number: Current CTC: Expected CTC: Notice Period: Current Location: Like to Relocate to London (Y/N):

    Read the article

  • Made an interview mistake. Should I try to correct after the fact?

    - by AT Developer
    Ever been in a situation where you were in an interview, and realized immediately afterwards (after the nervousness wore off) that you did something wrong? I had a phone interview today. I was asked an n-ary tree problem, and coded an algorithm that used a space overhead, then a different algorithm with no space overhead. However, my solution was inefficient, since I traversed the tree top-down rather than bottom-up. The interviewer said I did a good job, but I'm still wondering if he noticed and marked down for my choice of implementation. Should I follow up with an email correcting myself, or just let it and avoid making things worse?

    Read the article

  • How to find two most distant points?

    - by depesz
    This is a question that I was asked on a job interview some time ago. And I still can't figure out sensible answer. Question is: you are given set of points (x,y). Find 2 most distant points. Distant from each other. For example, for points: (0,0), (1,1), (-8, 5) - the most distant are: (1,1) and (-8,5) because the distance between them is larger from both (0,0)-(1,1) and (0,0)-(-8,5). The obvious approach is to calculate all distances between all points, and find maximum. The problem is that it is O(n^2), which makes it prohibitively expensive for large datasets. There is approach with first tracking points that are on the boundary, and then calculating distances for them, on the premise that there will be less points on boundary than "inside", but it's still expensive, and will fail in worst case scenario. Tried to search the web, but didn't find any sensible answer - although this might be simply my lack of search skills.

    Read the article

  • rails wiki site - article edit highlighting/strikethrough with htmldiff maxes cpu

    - by mark
    Hi I'm implementing a wiki style site and want to highlight changes made to articles between successive versions. Using htmldiff to highlight changes works great, except it is rather cpu intensive. I'm using the awesome vestal_versions plugin for versioning. So how best to handle this? I considered having an on_create callback on version creation create a delayed job that processes and then stores the htmldiff processed article (in the version table row). If this is a good approach, how can I extend vestal_versions without touching the gem? Or maybe there would be a better approach. Any advice is much appreciated. :)

    Read the article

  • How much do politics/office intrigue interfere with your day to day tasks at work?

    - by Michael Dorgan
    I'm currently blessed to be employed at a location where politics are pretty much non-existant and management overhead is nearly nil. As I've only worked at this one location for my entire, lengthy career, I have very little frame of reference outside of an occasional Dilbert comic or offhand comment from others about just how bad office politics and management interference get in the way of getting your code done elsewhere. While I'm not actively looking for a new job, this one point has made me quite reluctant to even look seriously elsewhere. My question is, just how much are politics a way of life in larger companies - in or out of the game industry and how much does it affect your day to day satisfaction?

    Read the article

  • Any way to have delayed_job execute some run-once code at startup and use across all jobs?

    - by Rob Cameron
    So I've got a delayed_job task that pushes some info to an XMPP server. Ideally you create a connection to XMPP once and then constantly push data to it, rather than creating a new connection every time you have some data to send. Is there any kind of facility in delayed_job for running a sort of 'setup' method when a worker starts, have it set some instance variables (like the XMPP connection object) that can then be used by all the jobs that come up? It's okay if each worker runs its own setup method. I just don't want every job (thousands per day) connecting to the XMPP server from scratch every time. Thanks for any help!

    Read the article

  • Would you tell your prospective boss your SO username?

    - by Sebi
    Today I met a friend who is also using stackoverflow. He had a job interview today at a small business and during the interview, the prospective boss asked him how he assures that he's alawys up-to-date concerning technical questions and what he's doing to seek for a solution for a problem he can't solve by its own. Besides some magazines, journals, books and blogs my friend also mentioned stackoverflow. The prospective boss seems very interested about that and asked him if he could tell him his username. It appears that was the most difficult during the whole interview ;) Would you tell your prospective boss your username? An the pro side one can mention that the boss sees that you're very involved in your business and community but on the other hand it is a really private thing and you cant post anymore in thread like "what was the worst working environment?" My friend circumnaviagted this question by a rather lame answer (more or less: i use autologin, thats why i have to check the username later at home, ill maybe send you an email)

    Read the article

  • Working for free

    - by truncate
    Finances are making me take an extended period off of my college education. In my current state, I don't feel fully qualified to be employed by an iPhone software company. While I work on getting things back together, I'd like to try an work for a software company for free in my local area (I'm going to college out of state and have to move back as well). The economy has forced employers to be very picky about who they hire, if any at all. Since I'd like to continue refining my abilities, I was wondering on what the consensus is on working for free. It can't be considered an internship, as I would no longer be in school..., I guess an apprenticeship is more appropriate. Like I said, I don't think I'm qualified to be paid for my services, and I don't want to be. I just don't know how to ask, or if it's even appropriate to ask them to show me how to develop software in the real world. My thinking is that they would be willing to get some work done for free and if I prove myself, they could hire me. If not, there was no major loss. They get some free development, and lose a bit of time helping show me the ropes. I get either a job, or valuable experience that I need. The other alternative is that I try to work out things by myself on the iPhone platform, but that sounds terrifying. I appreciate any input the community has to offer.

    Read the article

  • delayed_job :run_at is not working. all jobs set to run at current time

    - by jtwg
    I have installed the collectiveidea fork for delayed_job at git://github.com/collectiveidea/delayed_job.git but cannot get it to accept :run_at from my gemfile gem 'rails', '3.2.2' gem 'delayed_job_active_record' when I try it in the console 1.9.2-p318 :005 > Time.now => 2012-03-24 10:20:34 -0700 1.9.2-p318 :006 > User.delay.new :run_at => 5.days.from_now SQL (0.1ms) BEGIN SQL (1.6ms) INSERT INTO `delayed_jobs` (`attempts`, `created_at`, `failed_at`, `handler`, `last_error`, `locked_at`, `locked_by`, `priority`, `run_at`, `updated_at`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [["attempts", 0], ["created_at", Sat, 24 Mar 2012 17:20:36 UTC +00:00], ["failed_at", nil], ["handler", "--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/class 'User'\nmethod_name: :new\nargs:\n- :run_at: 2012-03-29 17:20:36.876374000Z\n"], ["last_error", nil], ["locked_at", nil], ["locked_by", nil], ["priority", 0], ["run_at", Sat, 24 Mar 2012 17:20:36 UTC +00:00], ["updated_at", Sat, 24 Mar 2012 17:20:36 UTC +00:00]] (2.7ms) COMMIT => #<Delayed::Backend::ActiveRecord::Job id: 17, priority: 0, attempts: 0, handler: "--- !ruby/object:Delayed::PerformableMethod\nobject:...", last_error: nil, run_at: "2012-03-24 17:20:36", locked_at: nil, failed_at: nil, locked_by: nil, created_at: "2012-03-24 17:20:36", updated_at: "2012-03-24 17:20:36"> I see there is some UTC offset in the runtime, but based on Time.now, I can tell run_at is not going forward by 5 days. "run_at", Sat, 24 Mar 2012 17:20:36 UTC +00:00 Any ideas?

    Read the article

  • Applying for .net jobs as a "self learner"

    - by DeanMc
    Hi All, I have recently started applying for .Net jobs. I currently work in a sales role with a large telco. I found out quite late that I like programming and as such bought my house and made commitments that mean college is not an option. What I would like to know is, is it harder to get a junior job as a self learner? I have gotten a few enquiries regarding my C.V but nothing concrete yet. I try to be involved in projects as I get the chance and tend to put up any worthwhile projects as I develop them. Some examples of my work are: A Xaml lexer and parser: http://www.xlight.mendhak.com A font obfuscation tool: http://www.silverlightforums.com/showthread.php?1516-Font-Obsfucation-Tool-ALPHA A tagger for m4a: http://projectaudiophile.codeplex.com/SourceControl/list/changesets I, of course think that these are great examples of my work but that is my opinion based on self learning. The other query is how much should I actually know? I've never used linked lists but I know that strings are immutable and I understand what that means. I am only touching on T-SQL but I understand things like how properties function in IL (as two standard methods :) ). I suppose I understand a lot of concepts but specific features need some looking up to implement as I may not know the syntax off the top of my head.

    Read the article

  • Mgmt wants to re-title my position: Any help...? [closed]

    - by JohnFlyTN
    Management here wants to re-title my position, since I'm doing quite a bit of different work than was originally planned. They want my input. After a quick glance over my skill set and job duties, what would we need to describe this position as? I'll just list things I'm at least proficient in, I will not list things I have a passing knowledge of. About me : ~10 years software development. Languages : C, C++, Perl, PHP, C#, TCL, Unix shell scripting, SQL (TSQL, PLSQL) Systems : MS-Dos, Windows 3.1 to 7 for client, NT 4 to 2008 for server, OS/2, IBM MVS & z/OS, Linux ( multiple distros), AIX Current position: I do all sorts of in-house software. The range is single user apps to large systems spanning multiple OS's. One of the larger projects I've designed and coded is about 100k lines of C#, and a database where I have been the sole designer and maintainer. I have near total freedom to design as I see fit, restraints are usually budgetary. Skills required to replace me in my current role: Windows and Unix admin, Database design, .NET up to 3.5 (C#, ASP.NET), C++, Perl, good skills in designing large and efficient data processing systems. Given this small level of information what would you see this as being titled? (is more information required to render a decision?)

    Read the article

  • Why learning new things is not important on a job hunt? [closed]

    - by IAdapter
    I have just finished my job hunt. I think it was about 40 job interviews, I like to travel and get to know many companies. One thing I did not like is that they don't care about new technologies. I think only 2 persons asked me about new stuff in Java world. Most of them care if I know Java (certification and many years of experiance is not enough for them, they need to test me) For example in IBM they only cared what IBM products do I know. Have I ever used any custom extensions of WebSphere? I don't understand those questions. If I learn new frameworks every day then I can learn whatever technology they have very fast. So why it matters if I have ever used those "great" custom extensions of WebSphere? After those 40 interviews I have no reason to learn any new framework, because I see that they don't care. Why those "developers" don't ask questions about new technologies? are they so long at those comapnies that they don't care about new stuff?

    Read the article

  • How does a CS student negotiate in/after a job interview?

    - by Billy ONeal
    Alright, I've gotten to the second step in the interview process. At this point I'm working under the assumption that I might be offered a position -- flying my butt to Redmond would be quite an expense if they weren't at least considering me for something (*crosses fingers*). So, if one is offered a position, how should a CS student negotiate? I've heard a few strategies about dealing with software companies when you are being considered for a hire, but most of them are considering the developer in a powerful position. In such examinations, (s)he has lots of job experience, and may even be overqualified for what the employer is looking for. (s)he is part of a small job market of qualified developers, because 99% of applications companies receive are from those who are woefully under qualified. I'm in a completely different position. I think I compare favorably to most of my fellow students, and I have been a programmer for almost 10 years, but often I still feel green compared to most of my coworkers. I'm in a position where the employer holds most of the chips; they'd be doing me quite a favor by hiring me. I think this scenario is considerably different than the targets for most of the advice I've seen. Above all, I don't want to be such a prick negotiating that it damages my chances to actually operate in a position, even if it means not negotiating at all. How should one approach a scenario like this? P.S. If this is off topic feel free to close it -- I think it's borderline and I'm of the opinion that it's better to ask and be closed than not ask at all ;)

    Read the article

  • Will an online degree get you a job that requires "CS or equivalent 4-year degree"? [on hold]

    - by qel
    I'm a nerdy slacker type who didn't get my life together till I was 30. I've had a real job for a couple years doing C#/SQL. I've gotten several raises, but I'm making less than most developers, and the atmosphere is ... not positive. Looking for a new job, I think my applications get thrown out because I don't have a degree. And I want to finish a Bachelor's just to feel like less of a loser. I have a lot of college credits from 1996-2003 and a low GPA, so I don't know if that's worth much. An online degree looks like a good option, but I just don't know what I should be looking at for online schools because they all look like fake degrees. If they had programs equivalent to a real Comp Sci degree, I don't think they would have weird sounding names like they do. University of Phoenix has a B.S./Information Technology-Software Engineering. DeVry has a B.S./Computer Engineering Technology program. But that's not CS, and most other things I see have even more fake-sounding names. Are these useless degrees? Some people say DeVry and UoP are acceptable, some people say they're a joke. I have enough experience now, though, that maybe all I'm missing is being able to check the box that I have a 4-year degree. Harvard Extension seems like a real degree, even if it isn't a real Harvard degree, but I'd have to live there at least 3 months, which kinda defeats the purpose of an online degree fitting around work.

    Read the article

  • Is there a website that scrapes job postings to determine the popularity of web technologies? [closed]

    - by dB'
    I'm often in a position where I need to choose between a number of web technologies. These technologies might be programming languages, or web application frameworks, or types of databases, or some other kind of toolkit used by programmers. More often than not, after some doing research, I end up with a list of contenders that are all equally viable. They're all powerful enough to solve my problem, they're all popular and well supported, and they're all equally familiar/unfamiliar to me. There's no obvious rationale by which to choose between them. Still, I need to pick one, so at this point I usually ask myself a hypothetical question: which one of these technologies, if I invest in learning it, would be most helpful to me in a job search? Where can I go on the internet to answer this question? Is there a website/service that scrapes the texts of worldwide job postings and would allow me to compare, say, the number of employers looking for expertise in technology x vs. technology y? (Where x and y are Rails vs. Djando, Java vs. Python, Brainfuck vs. LOLCode, etc.)

    Read the article

  • How do I tell my parents that landing a job is what actually counts?

    - by shovonr
    On one side, I just want to get a degree with a 3.0 GPA. On the other side, my parents want more than just a 3. Now here's the thing. I program with a passion. I spend day and night programming. And I ace all my programming courses. However, I do terrible on all my elective courses -- such as writing, history, and all that stuff -- which only leaves me with a 3.1 to 3.2 GPA. And my parents want more. They think that university is like high school, where you need super-stellar grades to get to the next level. But they don't realize that good enough grades will land me a job. And they don't realize that a programmer needs to practice to become good at programming, and that having good skills is what will land a job in a nice software development company. Thankfully, though, they don't threaten to beat me with a baseball bat or anything like that. They just occasionally give me the little "tsk-tsk". But even that little "tsk-tsk" makes me feel guilty for opening up an IDE. And on top of that, I procrastinate because of that feeling of guilt. So now, I want to come clean with them. I want to know what's a good way to do that. [Edit] OK, so now, I realized, I should aim for higher grades, as some have suggested below.

    Read the article

  • What does N years of experience with a language really mean?

    - by marcgg
    I've been looking at jobs descriptions since I'm graduating soon and looking for a job and what's always coming back - I'm not teaching you anything - is the "N years of experience in this language". It has been discussed in this question that if you work professionally with let's say Ruby for 2 years, but during these two years you also did some C# and PHP and were actually coding in Ruby 50% of the time. Do you say you have 1 year of experience in Ruby? 2 years? Another issue that hasn't been reviewed in the other post is for "non-professional experience". I'll give you a personal example: I've been working with Ruby on Rails since 2004 while at school. I did a lot of personal projects and school projects using this technology. I also used Rails in 2 6-month internships. Do I have 5 years of Rails experience (2004-now)? Do I have 1 year(2 internships)? Do I have nothing? I feel like I don't deserve the credit for 5 years, because the first years I wasn't working a lot with rails, but since last year I launched some websites and invested myself a lot in this technology and just saying 1 year doesn't really reflect how much I know the technology... Another example: I Learned C++ at school and did 1 big project with it (2-3 month of work and a semester of classes). I never used it in a company but I'd be able to be productive fairly quickly if I had to work on a C++ project and I have a good grasp of the concepts. Do I have no experience? 3 months? 6 months? ... something else? What I'm really trying to do is to find a way to present my skill set in a way that is compliant to what recruiters expect. I also don't want to end up at an interview that would go something like this... Recruiter (finding out the horrible truth): Oh but you said that you had 2 years of experience with this when you have none! / slaps me in the face / Me (in pain): Oh! The irony! Recruiter (yelling): Get out of my office / calls security, punches me in the throat /

    Read the article

  • Building Simple Workflows in Oozie

    - by dan.mcclary
    Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

    Read the article

< Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38  | Next Page >