본문 바로가기
IT

Druid (S3 direct 연동)

by ¢Å‰¤㏄ 2021. 11. 5.

https://groups.google.com/forum/#!msg/druid-user/vpAOj9KIoTg/EkivfryCBgAJ url 같이 수행시계속해서 default aws 값을 읽어오려고 한다. (druid 에서 설정한 s3 설정값을  읽어옴)
확인결과org.apache.hadoop:hadoop-aws:3.0.0-alpha1  -> aws-java-sdk 1.10.6 버전을 참조org.apache.hadoop:hadoop-aws:2.7.2 -> aws-java-sdk 1.7.4 버전을 참조


Org수정

aws-java-sdk 1.10.21 1.10.21
hadoop-client 2.3.0 2.7.2
hadoop-aws X 3.0.0.-alpha1

aws-java-sdkhadoop-clienthadoop-aws결과

1.10.21 2.7.2 3.0.0-alpha1 OK
1.10.21 2.3.0 3.0.0-alpha1 OK
1.7.4 2.7.2 2.7.2 X (workaround)

 

 

* 0.9.0 소스를 받아서 빌드 -> 0.9.1로 버전으로 나옴

  - indexing-hadoop/src/main/java/io/druid/indexer/JobHelper.java

     401번째 줄에 (case "s3a": ) 추가 

 

 

1. 다운받은 소스에서 명령어 실행

java -cp "lib/*" -Ddruid.extensions.directory="extensions" -Ddruid.extensions.hadoopDependenciesDir="hadoop-dependencies" io.druid.cli.Main tools pull-deps --no-default-hadoop -h "org.apache.hadoop:hadoop-client:2.7.2" -h "org.apache.hadoop:hadoop-aws:3.0.0-alpha1"

명령어 실행

* 안될경우 복사하여 해당 경로에 붙여넣기

 

 

2. cat conf/_common/common.runtime.properties

#s3 extension 추가

druid.extensions.loadList=["druid-s3-extensions"]       

 

#deep storage s3로 설정

druid.storage.type=s3

druid.storage.bucket=your-bucket

druid.storage.baseKey=druid/segments

druid.s3.accessKey=XXXXXXXXXXXXXX

druid.s3.secretKey=XXXXXXXXXXXXXX

 

 

3. MiddleManager에 peon HadoopCoordinates 설정

$ cat conf/druid/middleManager/runtime.properties

  > druid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:2.3.0", "org.apache.hadoop:hadoop-aws:3.0.0-alpha1"]

 

4. Start Historical Node

- hadoop-aws classpath 추가

$ nohup java `cat conf/druid/middleManager/jvm.config | xargs` -cp conf/druid/_common:conf/druid/middleManager:lib/*:hadoop-dependencies/hadoop-aws/3.0.0-alpha1/* io.druid.cli.Main server middleManager > ~/druid/log/middleManager.log 2>&1&

 

 

5. index file에 s3 access/secret key 추가

"jobProperties" : {

        "fs.s3.impl" : "org.apache.hadoop.fs.s3a.S3AFileSystem",

        "fs.s3n.impl" : "org.apache.hadoop.fs.s3a.S3AFileSystem",

        "fs.s3a.endpoint" : "s3.ap-northeast-2.amazonaws.com",

        "fs.s3a.access.key" : "XXXXXXX",

        "fs.s3a.secret.key" : "XXXXXXXXX",

        "mapreduce.job.classloader": "true"

 

6. index file에 hadoopDependencyCoordinates 추가

 -> hadoopDependencyCoordinates" : ["org.apache.hadoop:hadoop-client:2.7.2", "org.apache.hadoop:hadoop-aws:3.0.0-alpha1"]

 

7. jets3t Properties 추가

 : historical node에서 s3a 파일 Access 시 필요

$ cat conf/_common/jets3t.properties

s3service.s3-endpoint = s3.ap-northeast-2.amazonaws.com

storage-service.request-signature-version=AWS4-HMAC-SHA256

 

'IT' 카테고리의 다른 글

Apache Spark란?  (0) 2021.11.08
Cookie vs Local Storage vs Session Storage  (0) 2021.11.05
ELK stack이란?  (0) 2021.11.05
Druid  (0) 2021.11.05
Apache Kafka  (0) 2021.11.05

댓글