'hive' 태그의 글 목록 (6 Page)

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

목록hive (43)

달나라 노트

Hive : hive.mapred.mode (Hive data full scan하기. partition 모두 조회하기)

Hive에서는 기본적으로 partition이 있는 table은 partition 조건을 명시해야합니다. (data full scan은 성능에 영향을 미칠 수 있기 때문이죠.) 만약 partition 조건을 명시하지 않으면 Error가 발생합니다. 그런데 사용하다보면 partition 조건 없이 table full scan을 해야하는 경우가 있죠. set hive.mapred.mode = nonstrict; 이 경우 위처럼 hive.mapred.mode를 nonstrict 모드로 설정해주면 partition table에 대해서도 table full scan이 가능해집니다. 참고 위같은 option은 그냥 쿼리돌리듯이 돌리면 설정됩니다. Apache Hive document = https://cwiki.ap..

SQL/Apache Hive 2021. 6. 4. 02:15

Hive : hive.exec.reducers (Reducer에 메모리 할당하기)

Hive에서 큰 데이터를 다루다보면 reducer가 더 많은 메모리를 필요로 하는 경우가 있습니다. Reducer memory set hive.exec.reducers.bytes.per.reducer = 256000000; hive에서 위 setting은 하나의 reducer당 할당되는 메모리의 크기를 의미합니다. reducer 하나에 할당되는 메모리 기본값은 256MB(256,000,000B)입니다. 위 예시는 기본값인 256MB를 할당하도록 되어있지만 이걸 바꾸면 원하는 크기의 메모리를 할당할 수 있습니다. Reducer max set hive.exec.reducers.max = 128; reducer max option은 hive job마다 사용할 수 있는 최대 reducer의 개수를 정해줍니다. ..

SQL/Apache Hive 2021. 6. 4. 02:09

Redshift : unload & copy (CSV format으로 s3 서버에 query 결과 upload하기, s3서버에서 파일 불러와 database table 만들기)

Redshift -> S3 -- Redshift -> S3 unload(' ----- (S3 server에 올릴 data를 추출하는 query) select col1 , col2 from test_table_1 -- where col3 in (''valid'') ----- (쿼리 전체가 따옴표로 감싸져있기 때문에 쿼리 내부의 문자는 따옴표 2개로 string을 감싸야함.) ') to 's3://root_dir/test_dir/' ----- (unload 속에 적힌 query 결과가 저장될 s3 server의 경로) iam_role 'credentials' ----- (s3 server에 로그인하기 위한 credential) csv ----- (csv format으로 저장) delimiter ',' ---..

SQL/Redshift 2021. 6. 3. 01:47

Hive : first_value, last_value : 첫 번째 값, 마지막 값 뽑기(window function)

first_value, last_value는 window function으로서 이용 가능합니다. first_value([column_name]) over(partition by [column_name] order by [column_name] rows between ~~ and ~~) last_value([column_name]) over(partition by [column_name] order by [column_name] rows between ~~ and ~~) 예시를 보면 위처럼 사용할 수 있습니다. 해석을 해보면 partition by [column_name] = 이 컬럼을 parititon으로 나눠서 order by [column_name] = 이 컬럼 기준으로 정렬을 한 후 first_va..

SQL/Apache Hive 2021. 6. 3. 00:16

Prev 1 ··· 3 4 5 6 7 8 9 ··· 11 Next

목록hive (43)

달나라 노트

티스토리툴바