반응형
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- gas
- matplotlib
- PySpark
- SQL
- Github
- numpy
- Redshift
- GIT
- Mac
- google apps script
- Kotlin
- django
- string
- PostgreSQL
- c#
- math
- Google Excel
- hive
- Apache
- Tkinter
- Excel
- 파이썬
- Java
- array
- Google Spreadsheet
- Python
- PANDAS
- dataframe
- list
Archives
- Today
- Total
달나라 노트
Python Hive : access hive and run query using python 본문
Python/Python ETC
Python Hive : access hive and run query using python
CosmosProject 2020. 12. 23. 00:43728x90
반응형
from pyhive import hive
def query_execute_hive(query, result_cols):
hive_connection = hive.Connection(
host='host_name_url',
port=10000 # port_number,
username='hive_user_id',
password='hive_user_password',
database='default',
auth='LDAP'
)
cursor = hive_connection.cursor()
raw_query_list = query.split(';\n')
raw_query_list = [str(x).strip() for x in raw_query_list]
raw_query_list = list(filter(None, raw_query_list))
query_list = [str(x) + ';' for x in raw_query_list]
for q in query_list:
cursor.execute(q[:-1])
try:
output = cursor.fetchall()
except:
output = None
df_output = pd.DataFrame(output, columns=result_cols)
return df_output
query = '''
create temporary table test_table as
select col1
, col2
from test_table
where 1=1
and registered_dt >= 20201012
;
select *
from test_table
;
'''
result_columns = ['col1', 'col2']
df_query_result = query_execute_hive(query, result_columns)
You can run query in hive on python code.
from pyhive import hive
def query_execute_hive(query, result_cols):
hive_connection = hive.Connection(
host='host_name_url',
port=10000 # port_number,
username='hive_user_id',
password='hive_user_password',
database='default',
auth='LDAP'
)
cursor = hive_connection.cursor()
...
The above code connects you to server.
from pyhive import hive
def query_execute_hive(query, result_cols):
...
raw_query_list = query.split(';\n')
raw_query_list = [str(x).strip() for x in raw_query_list]
raw_query_list = list(filter(None, raw_query_list))
query_list = [str(x) + ';' for x in raw_query_list]
...
query = '''
create temporary table test_table as
select col1
, col2
from test_table
where 1=1
and registered_dt >= 20201012
;
select *
from test_table
;
'''
...
Above code will split your query based on semicolon and make a python list containing each query.
(The reason why I used filter function in list on 'list(filter(None, raw_query_list))' is to filtering any element I don't want to maintain as part of query. This is not necessary.)
from pyhive import hive
def query_execute_hive(query, result_cols):
...
for q in query_list:
cursor.execute(q[:-1])
...
Run queries one by one using for loop.
from pyhive import hive
def query_execute_hive(query, result_cols):
...
try:
output = cursor.fetchall()
except:
output = None
df_output = pd.DataFrame(output, columns=result_cols)
return df_output
...
Get query result using 'cursor.fetchall()'.
And make the result as DataFrame. Here, the result column list you put as a parameter('result_cols') will be column names of the result DataFrame.
728x90
반응형
'Python > Python ETC' 카테고리의 다른 글
Python : Web 페이지에서 원하는 태그의 정보 추출하기 (1) | 2021.01.23 |
---|---|
Python : Web page의 HTML 코드를 가져오기. (Web Crawling, 크롤링, 웹 크롤링) (0) | 2021.01.23 |
Python tistory api : 앱 등록 & Access token 발급 (0) | 2021.01.04 |
Python에서 로또 당첨 번호 가져오기 (2) | 2021.01.04 |
Python psycopg2 : access redshift and run query using python, python으로 redshift에서 쿼리 돌리기 (0) | 2020.12.23 |
Comments