Hive : s3 서버에 query 결과 upload하기, s3서버에서 파일 불러와 database table 만들기

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

달나라 노트

Hive : s3 서버에 query 결과 upload하기, s3서버에서 파일 불러와 database table 만들기 본문

SQL/Apache Hive

Hive : s3 서버에 query 결과 upload하기, s3서버에서 파일 불러와 database table 만들기

CosmosProject 2021. 2. 26. 00:38

728x90

Hive에서 s3 서버를 다룰 수 있는 방법을 알아봅시다.

----- Hive database -> S3 server -----
drop table if exists test_schema.test_table;
create external table test_schema.test_table (
    col_1     bigint,
    col_2     string
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
-- row format delimited fields terminated by ','
-- lines terminated by '\n'
stored as textfile
location 's3://root_dir/test_dir/'
;


----- Insert data 
insert overwrite table test_schema.test_table
select  col_1
        , col_2
from origin_schema.origin_table
;

Redshift에서는 query 결과를 S3서버에 file의 형태로 unload한 후, 해당 file을 copy해서 Redshift database table을 만드는 방식이었습니다.

하지만 Hive는 좀 다릅니다.

위 query가 Hive에서 S3 server에 data를 upload하는 것입니다.

----- Hive database -> S3 server -----
drop table if exists test_schema.test_table;
create external table test_schema.test_table (
    col_1     bigint,
    col_2     string
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
-- row format delimited fields terminated by ','
-- lines terminated by '\n'
stored as textfile
location 's3://root_dir/test_dir/'
;

Hive에선 먼저 S3 상의 어떤 directory에 external table을 만들어놓습니다.

row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

이 부분은 S3 server에 생서성할 파일의 구분자를 정하는 부분입니다.

row format serde 부분을 사용하면 CSV에 필요한 설정을자동으로 정해줍니다.

-- row format delimited fields terminated by ','
-- lines terminated by '\n'

만약 그게 싫다면

위 부분처럼 직접 컬럼 구분자는 콤마, 행 구분자는 줄바꿈(\n)으로 명시해줄 수 있습니다.

stored as textfile
location 's3://root_dir/test_dir/'

이 부분은 어떤 파일로 저장할지와 저장할 경로를 지정해주는 것입니다.

----- Insert data 
insert overwrite table test_schema.test_table
select  col_1
        , col_2
from origin_schema.origin_table
;

S3 server에 이렇게 생성한 external table(test_schema.test_table)에 위처럼 쿼리의 결과를 insert할 수 있습니다.

insert를 하게되면 해당 external table과 S3 server에 있는 파일에까지 모두 데이터가 삽입됩니다.

728x90

'SQL > Apache Hive' 카테고리의 다른 글

Hive : drop view if exists, create temporary view (temporary view 만들고 없애기) (0)	2021.06.02
Hive : table 복구 불가능하게 drop하기. drop table ~ purge (0)	2021.05.19
Hive : collect_list, collect_set, concat, concat_ws (여러 행 데이터 합치기, 문자 연결하기, Hive 문자 연결, Hive listagg) (0)	2021.01.14
Hive : turncate (table data 비우기) (0)	2020.12.17
Hive : delete, insert, update (테이블의 데이터 삭제, 삽입, 변경) (0)	2020.12.17

'SQL/Apache Hive' Related Articles

Comments

달나라 노트

Hive : s3 서버에 query 결과 upload하기, s3서버에서 파일 불러와 database table 만들기 본문

Hive : s3 서버에 query 결과 upload하기, s3서버에서 파일 불러와 database table 만들기

'SQL > Apache Hive' 카테고리의 다른 글

티스토리툴바