Skip to main content

ChunJun generic configuration

Configuration file structure details

A conplete ChunJun configuration script includes two parts

  • Content
    • Content Indicates the input source and output source of a task, including reader and writer
  • Setting
    • Setting sets the overall environment Settings of a task, including speed, errorLimit, metricPluginConf, restore, log, and dirty

The overall structure is as follows:

{
"job" : {
"content" :[{
"reader" : {},
"writer" : {}
}],
"setting" : {
"speed" : {},
"errorLimit" : {},
"metricPluginConf" : {},
"restore" : {},
"log" : {},
"dirty":{}
}
}
}
NameDescriptionRequired
contentreaderreader plugin detailed configurationrequired
writerreader plugin detailed configurationrequired
settingspeedRate limitoptional
errorLimitDirty data tolerance controloptional
metricPluginConfmetric plugin confoptional
restoreTask type and breakpoint continuation configurationoptional
logLog file configurationoptional
dirtyConfigure the dirty data storage modeoptional

Content

Reader

The Reader is used to configure the input source of the data, that is, where the data came from. The specific configuration is as follows:

"reader" : {
"name" : "xxreader",
"parameter" : {
......
}
}
nameDescriptionRequired
namereader connector name. For details, see the documents of each data sourcerequired
parameterreader connector configuration parameters. For details, see the documents of each connectorrequired

Writer

"writer" : {
"name" : "xxwriter",
"parameter" : {
......
}
}
NameDescriptionRequired
namewriter connector name. For details, see the documents of each data sourcerequired
parameterwriter connector configuration parameters. For details, see the documents of each connectorrequired

Setting

speed

Speed Is used to set the parallelism of job and speed limit. The configuration is as follows:

"speed" : {
"channel": 1,
"readerChannel": -1,
"writerChannel": -1,
"bytes": 0,
"rebalance" : true
}
NameDescriptionRequiredDefaultDataType
channelparallelism of joboptional1Integer
readerChannelsource parallelism,-1 indicates that the < channal > value is usedoptional-1Integer
writerChannelsource parallelism,-1 indicates that the < channal > value is usedoptional-1Integer
bytesBytes >0 indicates that task limiting is enabledoptional0Long
rebalanceWhether to force rebalance. Enabling this rebalance consumes performanceoptionalfalseBoolean

ErrorLimit

errorLimit used to configure error control for data reads and writes while the task is running. The specific configuration is as follows:

"errorLimit" : {
"record": 100,
"percentage": 10.0
}
NameDescriptionRequiredDefaultDataType
recordError threshold. When the number of error records exceeds this threshold, the task failsoptional0Integer
percentageError ratio threshold. When the error ratio exceeds this threshold, the task failsoptional0.0Double

MetricPluginConf

MetricPluginConf is used to configure ChunJun metric reporter information.

Currently only applied in JDBC connectors, StartLocation and EndLocation metrics are sent to the specified data source at the end of the job.

Prometheus and Mysql are currently supported. The specific configuration is as follows:

Prometheus

PromethusReporter relies on pushGateway to interact with Prometheus

"metricPluginConf" : {
"pluginName": "promethus"
}

Configuration information about Prometheus needs to be configured in the flink-conf.yaml

metrics.reporter.promgateway.host: 127.0.0.1
metrics.reporter.promgateway.port: 9091
metrics.reporter.promgateway.jobName: testjob
metrics.reporter.promgateway.randomJobNameSuffix: true
metrics.reporter.promgateway.deleteOnShutdown: false
NameDescriptionRequiredDefault
metrics.reporter.promgateway.hostpushGateway hostrequirednone
metrics.reporter.promgateway.portpushGateway portrequired0
metrics.reporter.promgateway.jobNamejobnameoptionalnone
metrics.reporter.promgateway.randomJobNameSuffixWhether to add a random suffix to the job name to prevent job name duplicationoptionalfalse
metrics.reporter.promgateway.deleteOnShutdownWhether to delete indicator information after the job is completeoptionaltrue

Mysql

The target table must have at least two String fields, metric_name and metric_value, which record indicator name and indicator value respectively

"metricPluginConf" : {
"pluginName": "promethus"
"pluginProp": {
"jdbcUrl":"",
"database":"",
"table":"",
"username":"",
"password":"",
"properties":{
}
}
}
NameDescriptionRequiredDefaultDataType
jdbcUrlmysql jdbcrequirednoneString
tablemysql tablenamerequirednoneString
usernamemysql usernamerequirednoneString
passwordmysql passwordrequirednoneString
propertiesmysql extra propertiesoptionalnoneMap

Restore

Restore Configures the synchronization task type (offline synchronization and real-time collection) and the flink restart strategy. The specific configuration is as follows:

"restore" : {
"isStream" : false,
"isRestore" : false,
"restoreColumnName" : "",
"restoreColumnIndex" : 0
}
NameDescriptionRequiredDefaultDataType
isStreamWhether it is a real-time collection taskoptionalfalseBoolean
isRestoreWhether to enable resumable data transfer at breakpointoptionalfalseBoolean
restoreColumnNameResumable field namerequirednoneString
restoreColumnIndexIndex of the breakpoint resume fieldrequirednoneInteger

Log

LogConf used to configure ChunJun log file,The details as follows:

"log" : {
"isLogger": false,
"level" : "info",
"path" : "/tmp/dtstack/flinkx/",
"pattern":""
}
NameDescriptionRequiredDefaultDataType
LoggerWhether to save log recordsoptionalfalseBoolean
levelLog leveloptionalinfoString
pathPath for saving logs on the serveroptional/tmp/dtstack/flinkx/String
patternLog formatoptionallog4j:%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n
logback : %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
String

Dirty

Dirty is used to configure the storage of dirty data in the HDFS. It is usually used in conjunction with ErrorLimit. The configuration is as follows:

"dirty" : {
"path" : "xxx",
"hadoopConfig" : {
......
}
}
NameDescriptionRequiredDefaultDataType
pathPath for saving dirty datarequirednoneSring
hadoopConfigHadoop ConfigurationrequirednoneMap