Sometimes you would want to load maps which contain maps into Hive. I mean, this structure
map <string,map<string,string>>
Hive allows you this. In fact, it allows even deeper levels of mapping. However, the question is, how do you tell it where your inner maps end, since this is not one of the parameters in the LOAD DATA INPATH statement. Well, there is an undocumented default, and that is, '\004' and '\005' for inner maps.
Here is how your data has to be formatted (using an image, to show the non-ascii separators)
and this is how you define your table
CREATE EXTERNAL TABLE map_table
(
complex_map map <string, map<string,string>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\003'
MAP KEYS TERMINATED BY '\002'
STORED AS TextFile;
and load it as follows
LOAD DATA INPATH "/mydata" OVERWRITE INTO TABLE map_table;
and you can have the following queries:
hive> select * from map_table;
Result:
{"key1":{"innerkey11":"innervalue11","innerkey21":"innervalue21"},"key2":{"innerkey12":"innervalue12","innerkey22":"innervalue22"}}
also
hive> select complex_map["key1"] from map_table;
Result:
{"innerkey11":"innervalue11","innerkey21":"innervalue21"}
and even
hive> select complex_map["key1"]["innerkey11"] from map_table;
Result:
innervalue11
Now, it is true, this uses the default coding for inner-level maps for Hive load, and that's not documented, but it IS the coding which is unlikely to change.
Credits: this post talks about it, and it was brought to my attention by Steven Wong of Netflix.
No comments:
Post a Comment