I asked to build periodic job to perform data aggregation MR and upload to HBase based on: 1. period of time 2. start running when input folder exists 3. start running when input folder contains _SUCCESS file So I define 4 steps in my workflow Step 1 : based on job requirements check if input folder exists and contains _SUCCESS file <decision name="upload-decision"> <switch> <case to="create-csv"> ${fs:exists(startFlag)} </case> <default to="end"/> </switch> </decision> Step 2 : Running MR job as java action of Oozie In prepare section appears deletion of _SUCCESS file and MR job output folder <action name="create-csv"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete p...
I am happy to share architecture/technical solutions of system challenges or frameworks usage examples