ChunJun generic configuration

Configuration file structure details

A conplete ChunJun configuration script includes two parts

Content
- Content Indicates the input source and output source of a task, including reader and writer
Setting
- Setting sets the overall environment Settings of a task, including speed, errorLimit, metricPluginConf, restore, log, and dirty

The overall structure is as follows:

{
    "job" : {
        "content" :[{
        "reader" : {},
        "writer" : {}
    }],
    "setting" : {
      "speed" : {},
      "errorLimit" : {},
      "metricPluginConf" : {},
      "restore" : {},
      "log" : {},
      "dirty":{}
    }
    }
}

Name		Description	Required
content	reader	reader plugin detailed configuration	required
content	writer	reader plugin detailed configuration	required
setting	speed	Rate limit	optional
	errorLimit	Dirty data tolerance control	optional
	metricPluginConf	metric plugin conf	optional
	restore	Task type and breakpoint continuation configuration	optional
	log	Log file configuration	optional
	dirty	Configure the dirty data storage mode	optional

Content

Reader

The Reader is used to configure the input source of the data, that is, where the data came from. The specific configuration is as follows:

"reader" : {
  "name" : "xxreader",
  "parameter" : {
        ......
  }
}

name	Description	Required
name	reader connector name. For details, see the documents of each data source	required
parameter	reader connector configuration parameters. For details, see the documents of each connector	required

Writer

"writer" : {
  "name" : "xxwriter",
  "parameter" : {
        ......
  }
}

Name	Description	Required
name	writer connector name. For details, see the documents of each data source	required
parameter	writer connector configuration parameters. For details, see the documents of each connector	required

Setting

speed

Speed Is used to set the parallelism of job and speed limit. The configuration is as follows：

"speed" : {
  "channel": 1,
  "readerChannel": -1,
  "writerChannel": -1,
  "bytes": 0,
  "rebalance" : true
}

Name	Description	Required	Default	DataType
channel	parallelism of job	optional	1	Integer
readerChannel	source parallelism,-1 indicates that the < channal > value is used	optional	-1	Integer
writerChannel	source parallelism,-1 indicates that the < channal > value is used	optional	-1	Integer
bytes	Bytes >0 indicates that task limiting is enabled	optional	0	Long
rebalance	Whether to force rebalance. Enabling this rebalance consumes performance	optional	false	Boolean

ErrorLimit

errorLimit used to configure error control for data reads and writes while the task is running. The specific configuration is as follows:

"errorLimit" : {
  "record": 100,
  "percentage": 10.0
}

Name	Description	Required	Default	DataType
record	Error threshold. When the number of error records exceeds this threshold, the task fails	optional	0	Integer
percentage	Error ratio threshold. When the error ratio exceeds this threshold, the task fails	optional	0.0	Double

MetricPluginConf

MetricPluginConf is used to configure ChunJun metric reporter information.

Currently only applied in JDBC connectors, StartLocation and EndLocation metrics are sent to the specified data source at the end of the job.

Prometheus and Mysql are currently supported. The specific configuration is as follows:

Prometheus

PromethusReporter relies on pushGateway to interact with Prometheus

"metricPluginConf" : {
  "pluginName": "promethus"
}

Configuration information about Prometheus needs to be configured in the flink-conf.yaml

metrics.reporter.promgateway.host: 127.0.0.1
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: testjob
metrics.reporter.promgateway.randomJobNameSuffix: true
metrics.reporter.promgateway.deleteOnShutdown: false

Name	Description	Required	Default
metrics.reporter.promgateway.host	pushGateway host	required	none
metrics.reporter.promgateway.port	pushGateway port	required	0
metrics.reporter.promgateway.jobName	jobname	optional	none
metrics.reporter.promgateway.randomJobNameSuffix	Whether to add a random suffix to the job name to prevent job name duplication	optional	false
metrics.reporter.promgateway.deleteOnShutdown	Whether to delete indicator information after the job is complete	optional	true

Mysql

The target table must have at least two String fields, metric_name and metric_value, which record indicator name and indicator value respectively

"metricPluginConf" : {
  "pluginName": "promethus"
  "pluginProp": {
    "jdbcUrl":"",
    "database":"",
    "table":"",
    "username":"",
    "password":"",
    "properties":{
    }
  }
}

Name	Description	Required	Default	DataType
jdbcUrl	mysql jdbc	required	none	String
table	mysql tablename	required	none	String
username	mysql username	required	none	String
password	mysql password	required	none	String
properties	mysql extra properties	optional	none	Map

Restore

Restore Configures the synchronization task type (offline synchronization and real-time collection) and the flink restart strategy. The specific configuration is as follows:

"restore" : {
  "isStream" : false,
  "isRestore" : false,
  "restoreColumnName" : "",
  "restoreColumnIndex" : 0
}

Name	Description	Required	Default	DataType
isStream	Whether it is a real-time collection task	optional	false	Boolean
isRestore	Whether to enable resumable data transfer at breakpoint	optional	false	Boolean
restoreColumnName	Resumable field name	required	none	String
restoreColumnIndex	Index of the breakpoint resume field	required	none	Integer

Log

LogConf used to configure ChunJun log file，The details as follows:

"log" : {
  "isLogger": false,
  "level" : "info",
  "path" : "/tmp/dtstack/flinkx/",
  "pattern":""
}

Name	Description	Required	Default	DataType
Logger	Whether to save log records	optional	false	Boolean
level	Log level	optional	info	String
path	Path for saving logs on the server	optional	/tmp/dtstack/flinkx/	String
pattern	Log format	optional	log4j：%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n logback : %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n	String

Dirty

Dirty is used to configure the storage of dirty data in the HDFS. It is usually used in conjunction with ErrorLimit. The configuration is as follows:

"dirty" : {
"path" : "xxx",
"hadoopConfig" : {
......
}
}

Name	Description	Required	Default	DataType
path	Path for saving dirty data	required	none	Sring
hadoopConfig	Hadoop Configuration	required	none	Map

ChunJun generic configuration

Configuration file structure details​

Content​

Reader​

Writer​

Setting​

speed​

ErrorLimit​

MetricPluginConf​

Prometheus​

Mysql​

Restore​

Log​

Dirty​