· 6 years ago · Feb 18, 2019, 11:36 PM
1# VNGRS Challange
2
3accessKey = "AKIAJS6BVS43QIKU7CWQ", secretKey="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";
4schemePath="s3://vngrsbucket/schema.csv"
5inputPath="s3://vngrsbucket/2019/02/13/16/*"
6jarPath= "s3://vngrsbucket/spark-1.0-SNAPSHOT-jar-with-dependencies.jar"
7outputPath = "s3://vngrsbucket/reports/";
8
9## Before Use
10VngrsChallange repository is mainly responsible for user interface and aws service controller.
11SparkProcess repository is responsible for processing .csv file and generating reports to S3 storage.
12# How to use
13
14Firstly, You need to create executable jar from source code or use given jar file which is located at GitHub repository.
15Either download provided executable jar or generate jar file from given source code.
16Run using below command from CLI
17>java -jar VngrsChallange-1.0-SNAPSHOT-jar-with-dependencies.jar
18
19## Main Page
20You need to select either 1 or 2 otherwise program exists.
21Selection 1 -> Uploads given data to already created S3 Bucket using Firehose service
22Selection 2 -> Uploads given data to already created S3 Bucket using Firehose service with default inputs
23Selection 3 -> Runs Spark on local machine and generate 2 reports according to given task
24Selection 4 -> Runs Spark on local machine and generate 2 reports according to given task with default inputs
25
26Sample Printout of program:
27>VNGRS Challange
28>---System Menu---
29>1. Upload Data to S3
30>2. Upload Data to S3 without given input
31>3. Generate Report
32>4. Generate Report without given input
33>Otherwise Exit
34
35## Upload Data to S3
36
37First of all, this method retrieves active list of firehose delivery systems. Then, Asks user to choose which firehose system and path of data file.
38These inputs are not required for Selection 2
39
40## Generate Reports
41This method creates EMR cluster with Spark and Hadoop installed in it. Then, apply a spark job. After Spark job executes, it terminates the cluster and generates reports under s3://vngrsbucket/reports/. Provided Credentials and paths which are accessKey, secretKey,
42schemePath, inputPath, jarPath, and outputPath should be given to program.
43These inputs are not required for Selection 4
44
45# Architecture
46```mermaid
47graph LR
48A[VngrsChallange] -- Upload Data --> B((Firehose))
49A --Generate Report--> C(EMR)
50C --Request SparkProcess-->E[Spark Process]
51E --Process .csv files-->C
52C --Sends generated reports-->D
53B --> D{S3 Bucket}
54C --Request Input Data--> D
55D --Send Input Data-->C
56```