달나라 노트

Python Hive : access hive and run query using python 본문

Python/Python ETC

Python Hive : access hive and run query using python

CosmosProject 2020. 12. 23. 00:43
728x90
반응형

 

 

from pyhive import hive

def query_execute_hive(query, result_cols):
    hive_connection = hive.Connection(
        host='host_name_url',
        port=10000 # port_number,
        username='hive_user_id',
        password='hive_user_password',
        database='default',
        auth='LDAP'
    )
    cursor = hive_connection.cursor()

    raw_query_list = query.split(';\n')
    raw_query_list = [str(x).strip() for x in raw_query_list]
    raw_query_list = list(filter(None, raw_query_list))
    query_list = [str(x) + ';' for x in raw_query_list]

    for q in query_list:
        cursor.execute(q[:-1])

    try:
        output = cursor.fetchall()
    except:
        output = None

    df_output = pd.DataFrame(output, columns=result_cols)

    return df_output



query = '''
create temporary table test_table as
select	col1
        , col2
from test_table
where 1=1
and registered_dt >= 20201012
;

select	*
from test_table
;
'''



result_columns = ['col1', 'col2']
df_query_result = query_execute_hive(query, result_columns)


You can run query in hive on python code.

 

 

 

 

 

 

from pyhive import hive

def query_execute_hive(query, result_cols):
    hive_connection = hive.Connection(
        host='host_name_url',
        port=10000 # port_number,
        username='hive_user_id',
        password='hive_user_password',
        database='default',
        auth='LDAP'
    )
    cursor = hive_connection.cursor()

    ...
    
    
    

The above code connects you to server.

 

 

 

 

 

from pyhive import hive

def query_execute_hive(query, result_cols):
    ...

    raw_query_list = query.split(';\n')
    raw_query_list = [str(x).strip() for x in raw_query_list]
    raw_query_list = list(filter(None, raw_query_list))
    query_list = [str(x) + ';' for x in raw_query_list]

    ...



query = '''
create temporary table test_table as
select	col1
        , col2
from test_table
where 1=1
and registered_dt >= 20201012
;

select	*
from test_table
;
'''

...


Above code will split your query based on semicolon and make a python list containing each query.

(The reason why I used filter function in list on 'list(filter(None, raw_query_list))' is to filtering any element I don't want to maintain as part of query. This is not necessary.)

 

 

 

 

 

 

 

 

 

from pyhive import hive

def query_execute_hive(query, result_cols):
    ...

    for q in query_list:
        cursor.execute(q[:-1])
    
    ...


Run queries one by one using for loop.

 

 

 

 

 

 

 

 

from pyhive import hive

def query_execute_hive(query, result_cols):
    ...

    try:
        output = cursor.fetchall()
    except:
        output = None

    df_output = pd.DataFrame(output, columns=result_cols)

    return df_output
    
    ...


Get query result using 'cursor.fetchall()'.

And make the result as DataFrame. Here, the result column list you put as a parameter('result_cols') will be column names of the result DataFrame.

 

 

 

 

728x90
반응형
Comments