Search Results

Search found 2157 results on 87 pages for 'sequential workflow'.

Page 11/87 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18  | Next Page >

  • Sharepoint 2010 start workflow programmatically error

    - by user522429
    I have a workflow associated with a content type. I try to kick it off from code from within the event receiver on the same content type, so when an item is updated, if there is a certain condition (status = ready for review) I start it. //This line does find the workflow association var assoc = properties.Web.ContentTypes["Experiment Document Set"].WorkflowAssociations.GetAssociationByName("Experiment Review Workflow", ultureInfo.CurrentUICulture); //I had tried to use this line from something I found online, but it would return null //var assoc = properties.Web.WorkflowAssociations.GetAssociationByName("Experiment Review Workflow", CultureInfo.CurrentUICulture); var result = properties.Web.Site.WorkflowManager.StartWorkflow(properties.ListItem, assoc, string.Empty, SPWorkflowRunOptions.Synchronous); //The line above gives me this error: System.ArgumentException: Workflow failed to start because the workflow is associated with a content type that does not exist in a list. Before re-starting the workflow, the content type must be added to the list. To check this, I was looking at the content type of the list item being updated and it is correct properties.ListItem.ContentType.Name "Experiment Document Set" So basically I have a workfow associated with the content type "Experiment Document Set". When I try to start a workflow from an event receiver in "Experiment Document Set", I get an error saying the content type "Experiment Document Set" does not exist in the list which doesn't make sense.

    Read the article

  • B-trees, databases, sequential inputs, and speed.

    - by IanC
    I know from experience that b-trees have awful performance when data is added to them sequentially (regardless of the direction). However, when data is added randomly, best performance is obtained. This is easy to demonstrate with the likes of an RB-Tree. Sequential writes cause a maximum number of tree balances to be performed. I know very few databases use binary trees, but rather used n-order balanced trees. I logically assume they suffer a similar fate to binary trees when it comes to sequential inputs. This sparked my curiosity. If this is so, then one could deduce that writing sequential IDs (such as in IDENTITY(1,1)) would cause multiple re-balances of the tree to occur. I have seen many posts argue against GUIDs as "these will cause random writes". I never use GUIDs, but it struck me that this "bad" point was in fact a good point. So I decided to test it. Here is my code: SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[T1]( [ID] [int] NOT NULL CONSTRAINT [T1_1] PRIMARY KEY CLUSTERED ([ID] ASC) ) GO CREATE TABLE [dbo].[T2]( [ID] [uniqueidentifier] NOT NULL CONSTRAINT [T2_1] PRIMARY KEY CLUSTERED ([ID] ASC) ) GO declare @i int, @t1 datetime, @t2 datetime, @t3 datetime, @c char(300) set @t1 = GETDATE() set @i = 1 while @i < 2000 begin insert into T2 values (NEWID(), @c) set @i = @i + 1 end set @t2 = GETDATE() WAITFOR delay '0:0:10' set @t3 = GETDATE() set @i = 1 while @i < 2000 begin insert into T1 values (@i, @c) set @i = @i + 1 end select DATEDIFF(ms, @t1, @t2) AS [Int], DATEDIFF(ms, @t3, getdate()) AS [GUID] drop table T1 drop table T2 Note that I am not subtracting any time for the creation of the GUID nor for the considerably extra size of the row. The results on my machine were as follows: Int: 17,340 ms GUID: 6,746 ms This means that in this test, random inserts of 16 bytes was almost 3 times faster than sequential inserts of 4 bytes. Would anyone like to comment on this? Ps. I get that this isn't a question. It's an invite to discussion, and that is relevant to learning optimum programming.

    Read the article

  • Sequential coupling in code

    - by dotnetdev
    Hi, Is sequential coupling (http://en.wikipedia.org/wiki/Sequential_coupling) really a bad thing in code? Although it's an anti-pattern, the only risk I see is calling methods in the wrong order but documentation of an API/class library with this anti-pattern should take care of that. What other problems are there from code which is sequential? Also, this pattern could easily be fixed by using a facade it seems. Thanks

    Read the article

  • Changing an MSSQL clustered index field from containing "random" GUIDs to sequential GUIDs - how wil

    - by Eyvind
    We have an MSSQL database in which all the primary keys are GUIDs (uniqueidentifiers). The GUIDs are produced on the client (in C#), and we are considering changing the client to generate sequential (comb) GUIDs instead of just using Guid.NewGuid(), to improve db performance. If we do this, how will this affect installations that already have data with "random" GUIDs as clustered PKs? Can anything be done (short of changing all the PK values) to rebuild the indexes to avoid further fragmentation and bad insert performance? Please give explicit and detailed answers if you can; I am a C# developer at heart and not all too familiar with all the intricacies of SQL Server. Thanks!

    Read the article

  • Filling cells with sequential numbers in an Excel (2003) macro

    - by Fred Hamilton
    I need to fill an excel column with a sequential series, in this case from -500 to 1000. I've got a macro to do it, but it takes a lot of lines for something that seems like it should be a single function [something like FillRange(A2:A1502, -500, 1000, 1)]. But if that function exists, I can't find it. Is the following as simple and elegant as it gets? 'Draw X axis scale Cells(1, 1).Value = "mV" Cells(2, 1).Value = -500 Cells(3, 1).Value = -499 Cells(4, 1).Value = -498 Dim selection1 As Range, selection2 As Range Set selection1 = Sheet1.Range("A2:A4") Set selection2 = Sheet1.Range("A2:A1502") selection1.AutoFill Destination:=selection2

    Read the article

  • What technologies are appropriate for a human workflow system?

    - by CCw
    I'm researching various workflow architectures and it is overwhelming. The workflow system I am creating will be almost completely human-driven. Very little, if any, asynchronous activity will be taking place. One possibility is to simply use a RDBMS and have a task table, from which stored procedures would be used to enforce synchronous access to each task. This seems very simple, but I'm having a hard time coming up with reasons why I might need to involve a heavier solution. If my system has ~500 concurrent users, and there is very little in the way of automated or asynchronous tasks, should I even consider the various workflow patterns/packages out there like Mule, BPEL/SOA, Spring Work Flow, etc?

    Read the article

  • Explicitly persist states in Workflow 4.0 rather than everything.

    - by jlafay
    I have ran into an issue with my SQL instance store attached to a WorkflowApplication that is running. When I exit my application I'm calling an Unload() on the WF app to persist it. I didn't think about it during design time, but it does makes sense, it's persisting an arg that was passed in to the WorkflowApplication constructor when instanced. When the application runs, everything in the workflow works as expected. When I call Unload() I get an unhandled exception that states that the arg is not serializable and needs [DataContractAttribute]. What's passed into the workflow is my applications custom logger object that I wrote so that the WF can log to disk in a uniform way that I prefer. How do I prevent the workflow app from persisting this one argument and persist everything else? I'm sure something can be done with extensions but I'm having a hard time finding info on them or finding persistence examples for my scenario.

    Read the article

  • What is a resonable workflow for designing webapps?

    - by Evan Plaice
    It has been a while since I have done any substantial web development and I'd like to take advantage of the latest practices but I'm struggling to visualize the workflow to incorporate everything. Here's what I'm looking to use: CakePHP framework jsmin (JavaScript Minify) SASS (Synctactically Awesome StyleSheets) Git CakePHP: Pretty self explanatory, make modifications and update the source. jsmin: When you modify a script, do you manually run jsmin to output the new minified code, or would it be better to run a pre-commit hook that automatically generates jsmin outputs of javascript files that have changed. Assume that I have no knowledge of implementing commit hooks. SASS: I really like what SASS has to offer but I'm also aware that SASS code isn't supported by browsers by default so, at some point, the SASS code needs to be transformed to normal CSS. At what point in the workflow is this done. Git I'm terrified to admit it but, the last time I did any substantial web development, I didn't use SCM source control (IE, I did use source control but it consisted of a very detailed change log with backups). I have since had plenty of experience using Git (as well as mercurial and SVN) for desktop development but I'm wondering how to best implement it for web development). Is it common practice to implement a remote repository on the web host so I can push the changes directly to the production server, or is there some cross platform (windows/linux) tool that makes it easy to upload only changed files to the production server. Are there web hosting companies that make it eas to implement a remote repository, do I need SSH access, etc... I know how to accomplish this on my own testing server with a remote repository with a separate remote tracking branch already but I've never done it on a remote production web hosting server before so I'm not aware of the options yet. Extra: I was considering implementing a javascript framework where separate javascript files used on a page are compiled into a single file for each page on the production server to limit the number of file downloads needed per page. Does something like this already exist? Is there already an open source project out in the wild that implements something similar that I could use and contribute to? Considering how paranoid web devs are about performance (and the fact that the number of file requests on a website is a big hit to performance) I'm guessing that there is some wizard hacker on the net who has already addressed this issue.

    Read the article

  • Workflow Adapter/Connector Pair

    The Workflow Adapter/Connector pair are custom WF Activities for a Business-To-Business Logical Connectivity based on the Interface Contract. The connectivity handles the invoking and consuming of Workflows, Remoting objects, WCF Services in the transparent manner based on the configuration.

    Read the article

  • Custom activity designers in Workflow Foundation 3.5: How do they work?

    - by stakx
    Intent of this post: I realise that Workflow Foundation is not extremely popular on StackOverflow and that there will probably be not many answers, or none at all. This post is intended as a resource to people trying to customise workflow activities' appearance through custom designer classes. Goals: I am attempting to create a custom designer class for Workflow activities to achieve the following: Make activities look less technical. For example, I don't necessarily want to see the internal object name as the activity's "title" -- instead, I'd like to see something more descriptive. Display the values of certain properties beneath the title text. I would like to see some properties' values directly underneath the title so that I don't need to look somewhere else (namely, at the Properties window). Provide custom drop areas and draw custom internal arrows. As an example, I would like to be able to have custom drop areas in very specific places. What I found out so far: I created a custom designer class deriving from SequentialActivityDesigner as follows: [Designer(typeof(SomeDesigner))] public partial class SomeActivity: CompositeActivity { ... } class PlainDesigner : SequentialActivityDesigner { ... } Through overriding some properties and the OnPaint method, I found out about the following correspondences between the properties and how the activity will be displayed: Figure 1. Relationship between some properties of an SequentialActivityDesigner and the displayed activity. Possible solutions for goal #1 (make activities look less technical) and goal #2 (display values of properties beneath title text): The displayed title can be changed through the Title property. If more room is required to display additional information beneath the title, the TitleHeight property can be increased (ie., override the property and make it return base.TitleHeight + n, where n is some positive integer). Override the OnPaint method and draw additional text in the area reserved through TitleHeight. Open questions: What are the connectors, connections, and connection points used for? They seem to be necessary, but for what purpose? While the drop targets can be got through the GetDropTargets method, it seems that this is not necessarily where the designer will actually place dropped activities. When an activity is dragged across a workflow, the designer displays little green plus signs where activities can be dropped; how does it figure out the locations of these plus signs? How does the designer figure out where to draw connector lines and arrows?

    Read the article

  • How do you combine "Revision Control" with "WorkFlow" for R?

    - by Tal Galili
    Hello all, I remember coming across R users writing that they use "Revision control" (e.g: "Source control"), and I am curious to know: How do you combine "Revision control" with your statistical analysis WorkFlow? Two (very) interesting discussions talk about how to deal with the WorkFlow. But neither of them refer to the revision control element: http://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing A Long Update To The Question: Following some of the people's answers, and Dirk's question in the comment, I would like to direct my question a bit more. After reading the Wiki article about "revision control" (which I was previously not familiar with), it was clear to me that when using revision control, what one does is to build a development structure of his code. This structure either leads to a "final product" or to several branches. When building something like, let's say, a website. There is usually one end product you work towards (the website), with some prototypes along the way. But when doing a statistical analysis, the work (to my view) is different. Sometimes you know where you want to get to. But more often, you explore. Explore cleaning the dataset. Explore different methods for statistical analysis, and ask various questions of your data (and I am writing this, knowing how Frank Harrell, and other experience statisticians feels about Data dredging). That is way the WorkFlow question with statistical programming is (in my view) a serious and deep question, raising many issues, The simpler ones are technical: Which revision control software do you use (and why) ? Which IDE do you use(and why) ? The more interesting question are about work process: How do you structure your files? What do you keep as a separate file and what as a revision? or asking in a different way - What should be a "branch" and what should be a "sub project" in your code? For example: When starting to explore your data, should a plot be creating and then erased because it didn't lead any where (but kept as a revision) or should there be a backup file of that path? How you solve this tension was my initial curiosity. The second question is "what might I be missing?". What rules (of thumb) should one follow so to avoid common pitfalls doing statistical programming with version control? In my intuition, I feel that statistical programming is inherently different then software development (I am writing this without being a real expert in statistical programming, and even less so in software development). That's way I am unsure which of the lessons I have read here about version control would be applicable. Thanks a lot, Tal

    Read the article

  • Windows Workflow Foundation in .NET4

    Windows Workflow Foundation (WF4) in .NET 4 is designed to make it easier for new developers to learn, addresses a wider range of customer scenarios, and is more efficient.  WF is a programming model for composing application logic and coordinating execution, allowing developers to abstract complicated code while leveraging a set of runtime services.  Activities are the building blocks that are composed together to build workflows.  The runtime provides the ability to save the state...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • High level project workflow

    - by user775060
    We are a small software company trying our hand at our second game. Since our first games' process was a living nightmare (since we used webdevelopment workflow) I have decided to educate myself on how to manage a game project on a high level. How does your process work, from idea to launch? Preferably in situations where you have a team that needs to cooperate. I've seen these 2 links, which are useful in a way, but was wondering if there are better/more comprehensive ways to do this? http://www.goodcontroller.com/blog/?p=136 http://gogogic.wordpress.com/2009/02/09/symbol6-how-we-created-an-iphone-game/ All input would be infinitely appreciated.

    Read the article

  • How to set-up a simple subversion workflow

    - by Milen Bilyanov
    I am trying to set-up a simple SVN workflow at home. I am new to subversion (and programming) so I have been reading the official PDF documentations but still not sure about how to set-up my repository. I am working mainly with python, bash and rsl (Renderman Shading Language) So I already have a /dev structure on my disk as this: http://imageshack.us/f/708/devstructure.png/ And I have a /site structure that links to my /dev folder: http://imageshack.us/f/651/sitestructure.png/ So obviously starting to use SVN will change this approach that I already have in place. The question is when I am setting-up my SVN repository for the work I do in my /dev folder: Will I set-up a separate repository for each different programming platform? and Where exactly I should be placing my repository? Thanks.

    Read the article

  • Which VCS is more applicable for our workflow?

    - by Thomas Mancini
    Currently we have code stored on a shared network drive and do not use any kind of VCS. The code stored on our shared network drive is always being backed up. We would like to keep things as close to they are now as possible, while using some kind of VCS software. I am envisioning a centralized workflow with each developer having a local copy of the code on his/her machine. We don't do any branching or working offline. Typically when we spin off a new version we would just copy the current working directory to a new directory. I believe we would continue doing this and just create a repository for the new version. I would rather not get into an argument over which VCS is better, just hoping to get some opinions for which is best suited and most applicable for what we are trying to do.

    Read the article

  • How should my local git workflow work?

    - by Anonymous -
    At home, I have a server that is running some software (on a LAMP stack, but only accessible internally). I have another machine and a laptop that I both use for developing said software. What is the best workflow for me? Should I have a repository on my local server, create a live branch, staging branch and development branch, then checkout the development branch from my laptop/development PC to work on, commit that back when I'm done, then merge the development branch with the staging branch for testing, before further merging to the live branch? Would I simply checkout the production branch to my /www/var/ on my server? Or am I thinking/going about this all wrong? Thanks.

    Read the article

  • GPG Workflow in 11.04

    - by Ross Bearman
    At work we handle the transfer of small bits of sensitive data with GPG, usually posted on a secure internal website. Until Firefox 4 was released, we used FireGPG for inline decryption; however the IPC libraries that it relied upon were no longer present in FF4, making it unusable and it will no longer install in FF5. Currently I'm manually pasting the GPG blocks into a text file, then using the Nautilus context-menu plugin or the command line to decrypt the contents of the file. When we're handling large amount of these small files throughout the day this starts to become a real chore. I've looked around but can't seem to find much information on useful GPG clients in Ubuntu. A client that allowed me to paste in a GPG block and instantly decrypt it, and also paste in plaintext and easily encrypt it for multiple recipients would be ideal. So my question is does this exist? I can't seem to find anything about this with obvious searches on Google, so hopefully someone here can help, or offer an alternative workflow.

    Read the article

  • Should I use a workflow engine?

    - by Fernando
    I need to add some new features to a PHP application. It is to follow the steps of a order. A process create some orders, the order goes to confirmation, then if approved is sent to a provider, later the provider confirm that can deliver the order, a request is made to the provider and so on... I need to register when every step is made and send notifications. Also, some steps have a estimate time, and if that time is elapsed I need to send notifications so everybody know about the delay. When a process starts, it have a predefined set of steps, but in a middle the user should be able to create new sub-steps, and delete or skip future steps.. Should I use a workflow engine? Which one do you suggests (free-opensource only)?

    Read the article

  • A declarative workflow does not start automatically after you install Windows SharePoint Services 3.

    - by ira
    That's the title of kb947284 actually. Recently I was involved in a Sharepoint project. I run into this problem where my workflow does not start automatically. If I run the workflow manually, it's just fine. I found kb947284. Apparently the cause of this problem is the installation of WSS 3 SP1. I did what it said in the Resolution section but it didn't work. The Resolution said to set the application pool account to use a different user account. What kind of different does it mean? What kind of user account will work? I changed the user account but both old and new account is in the same group of administrator for the server where SP is installed. Oh! I also found a copy of kb947284 and one comments stated there is already a hotfix (kb956057). I've read the issues that are fixed but could not find anywhere about this workflow problem. Could anyone please tell me how it's supposed to be done? Thanks in advance.

    Read the article

  • Flexible design - customizable entity model, UI and workflow

    - by Ngm
    Hi All, I want to achieve the following aspects in the software I am building: 1. Customizable entity model 2. Customizable UI 3. Customizable workflow I have thought about an approach to achieve this, I want you to review this and make suggestions: Entity objects should be plain objects and will hold just data Separate Entity model and DB Schema by using an framework (like NHibernate?). This will allow easy modification of entity objects. Business logic to fetch/modify entities has to be granular enough so that they can be invoked as part of the workflow. Business objects should not hold any state, and hence will contain only static methods The workflow will decide depending upon the "state" of an entity/entities which methods on business object/objects to invoke. The workflow should obtain the results of the processing and then pass on the business objects to the appropriate UI screen. The UI screen has to contain instructions about how to display a given entity/entites. Possibly the UI has to be generated dynamically based on a set of UI instructions. (like XUL) What do you think about this approach? Suggest which existing frameworks (like NHiberante, Window Workflow) fit into this model, so that I will not spend time on coding these frameworks Also suggest is there any asp.net framework that can generate dynamic asp.net ajax pages based on a set of UI instructions (like Mozilla XUL)? I have recently been exploring Apache Ofbiz and was impressed by its ability to customize most areas of the application: UI, workflow, entities. Is there any similar (not necessarily an ERP system) application developed in C#/.Net which offers a similar level of customization? I am looking for examples of applications developed in C# that are highly customizable in terms of UI, Workflow and Entity Model

    Read the article

  • Can I use a specific model from within a behavior in CakePHP?

    - by Paul Willy
    I'm trying to write a behavior that will give my models access to a simple workflow engine I've devised. The workflow engine itself works as a CakePHP model, with workflow data stored in the database just as any other model data is stored. Basically what I want to do is have the behavior use the workflow model whenever an action is called on the base model. For example, if the edit() action is executed for Posts, then the Post (with the behavior attached) will trigger the workflow behavior with its own model name, action, and id as arguments (e.g. [Post, edit, 1]). Then the behavior will invoke the functionality of the Workflow model, which has a record for what to do when edit is run on Posts (e.g. send e-mail to users who are subscribed to that post) and will carry that out. My question is, what is the proper way to invoke model/controller methods from within the behavior? The model to be used from within the behavior will always be Workflow, but the behavior should be usable from basically any model (aside from Workflow itself). I know I could run SQL queries directly from the behavior, but of course this is not the Cake way :-) Or, am I going about this in the wrong way? I want to store a certain amount of logic in the database so that it is easily configurable by different users, and not have endless configuration checks within the model/controller logic itself so that workflow steps can be easily added/changed/removed in the future.

    Read the article

  • Building Simple Workflows in Oozie

    - by dan.mcclary
    Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

    Read the article

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18  | Next Page >