ClickHouse Dictionary

ClickHose Dictionary kavram olarak key –> value yapılar olarak düşünülebilir, Bir Dictionary kaynağı, yerel bir tablo, local veya uzak bir file, HTTP(s) bir kaynak veya farklı bir DBMS sisteminden alınabilir,

ClickHouse Dictinary için;

Tamamı veya bir kısmı Memory dedir
Belli zaman aralıklarında veya dinamik olarak güncellenebilirler.
config.xml ile veya DDL query ile Dictionary oluşturulabilir.

DDL ile Dictionary Oluşturmak

Önerilen yöntem budur, ekstra herhangi bir configuration file a bir şey yazmamız gerekmez. Table veya view gibi sorgulanabilir,

CREATE DICTIONARY dict_name
(
    ... -- attributes
)
PRIMARY KEY ... -- complex or single key configuration
SOURCE(...) -- Source configuration
LAYOUT(...) -- Memory layout configuration
LIFETIME(...) -- Lifetime of dictionary in memory

Configuration File ile Dictionary Oluşturmak

Cloud kullanıyorsanız , DDL kullanın;

<clickhouse>
    <comment>An optional element with any content. Ignored by the ClickHouse server.</comment>

    <!--Optional element. File name with substitutions-->
    <include_from>/etc/metrika.xml</include_from>


    <dictionary>
        <!-- Dictionary configuration. -->
        <!-- There can be any number of dictionary sections in a configuration file. -->
    </dictionary>

</clickhouse>

Bir Dictinary Memoryde Nasıl Tutulur

Bunun bir kaç yolu vardır; Önerilen yöntemler : flat, hashed, complex_key_hashed Düşük performans ve uygun parametrelerin kullanılma zorluğu nedeni ile caching metodu çok fazla önerilmez.

XML ile;

<clickhouse>
    <dictionary>
        ...
        <layout>
            <layout_type>
                <!-- layout settings -->
            </layout_type>
        </layout>
        ...
    </dictionary>
</clickhouse>

DDL ile;

CREATE DICTIONARY (...)
...
LAYOUT(LAYOUT_TYPE(param value)) -- layout settings
...

Layout Türleri (Memoryde Tutma Yöntemleri)

flat : diziler halinde dictionary içeriği memory de saklanır, dictionary key defaul sınırı max_array_size 500.000 dir, dictionary oluştrulurken daha büyük bir key değeri olursa hata alır ve dcitionary oluşturulmaz. Ayrıca flat dxitionary dizi sınırı initial_array_size varsayılan olarak 1024 değeri vardır, XML içeriği aşağıdaki gibidir;

...
<layout>
  <flat>
    <initial_array_size>50000</initial_array_size>
    <max_array_size>5000000</max_array_size>
  </flat>
</layout>
....

DDL içeriği ise aşağıdaki gibidir; Dictionary key UInt64 türündedir.

.....
LAYOUT(FLAT(INITIAL_ARRAY_SIZE 50000 MAX_ARRAY_SIZE 5000000))

hashed

Dictionary tamamı memoryde saklanır, herhangi bir sınırı yoktur, sınır memory kapasitenizdir.

...
<layout>
  <hashed />
</layout>
...

veya DDL ile

...
LAYOUT(HASHED())

hash dictionary de shard sayımız 1 den fazla ve büyük bir dictionary oluşturacaksak, konfigürasyonda shard sayısı belirtmek dictionarynin parallel olarak load olmasını sağlayacaktır,

...
<layout>
  <hashed>
    <!-- If shards greater then 1 (default is `1`) the dictionary will load
         data in parallel, useful if you have huge amount of elements in one
         dictionary. -->
    <shards>10</shards>

    <!-- Size of the backlog for blocks in parallel queue.

         Since the bottleneck in parallel loading is rehash, and so to avoid
         stalling because of thread is doing rehash, you need to have some
         backlog.

         10000 is good balance between memory and speed.
         Even for 10e10 elements and can handle all the load without starvation. -->
    <shard_load_queue_backlog>10000</shard_load_queue_backlog>

    <!-- Maximum load factor of the hash table, with greater values, the memory
         is utilized more efficiently (less memory is wasted) but read/performance
         may deteriorate.

         Valid values: [0.5, 0.99]
         Default: 0.5 -->
    <max_load_factor>0.5</max_load_factor>
  </hashed>
</layout>
....

veya DDL ile

LAYOUT(HASHED([SHARDS 1] [SHARD_LOAD_QUEUE_BACKLOG 10000] [MAX_LOAD_FACTOR 0.5]))

MAX_LOAD_FACTOR memory nin ne kadar efective veya hızlı kullanılacağını ölçekleyen bir parametredir. küçük olması yüksek memory kullnımına, düşük olması ise sorgu sürelerinin düşmesine neden olacaktır.

sparse_hashed , complex_key_hashed ve complex_key_sparse_hashed hash ile benzerdir,

hashed_array , bu llayout yönetimindede tüm dictionary memryde tutulmaktadır, her bir attribute dizi (array) içerisinde tutulmaktadır.

XML conf;

...
<layout>
  <hashed_array>
  </hashed_array>
</layout>
....

DDDl ile ;

....
LAYOUT(HASHED_ARRAY([SHARDS 1]))

range_hashed ;

Yularıda belirtilen hash yönteminin belirli bir tarih aralığı belirtebildiğimiz şeklidir, range_max ve range_min değerleri tanımlanabilmektedir;

...
<layout>
    <range_hashed>
        <!-- Strategy for overlapping ranges (min/max). Default: min (return a matching range with the min(range_min -> range_max) value) -->
        <range_lookup_strategy>min</range_lookup_strategy>
    </range_hashed>
</layout>
<structure>
    <id>
        <name>advertiser_id</name>
    </id>
    <range_min>
        <name>discount_start_date</name>
        <type>Date</type>
    </range_min>
    <range_max>
        <name>discount_end_date</name>
        <type>Date</type>
    </range_max>
    ...

DDL ile

CREATE DICTIONARY discounts_dict (
    advertiser_id UInt64,
    discount_start_date Date,
    discount_end_date Date,
    amount Float64
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(TABLE 'discounts'))
LIFETIME(MIN 1 MAX 1000)
LAYOUT(RANGE_HASHED(range_lookup_strategy 'max'))
RANGE(MIN discount_start_date MAX discount_end_date)

bu sözlükleri bir sorgu içerisinde çağırabiliriz dictGet fonksiyonu ile çağırabiliriz, bir aralık veya bir değişken ile filtereleyebiliriz, herhangi bir sorgu içerisinde kullanabiliriz;

dictGet('dict_name', 'attr_name', id, date)

Örnek

SELECT dictGet('discounts_dict', 'amount', 1, '2022-10-20'::Date);

Aşağıda Örnek bir internal Dictionary örneği verilmiş , bu dictionary kullanılarak bşir Join yerine nasılkullanılabileceği örneği verimiştir.

CREATE table  test.taxi_zone on cluster mycluster
(
  `LocationID` UInt16 DEFAULT 0,
  `Borough` String,
  `Zone` String,
  `service_zone` String
)
ENGINE = ReplicatedMergeTree
PRIMARY KEY LocationID;

CREATE DICTIONARY taxi_zone_dictionary3 on cluster mycluster
(
  `LocationID` UInt16 DEFAULT 0,
  `Borough` String,
  `Zone` String,
  `service_zone` String
)
PRIMARY KEY LocationID
SOURCE(CLICKHOUSE(TABLE 'taxi_zone'))
LIFETIME(MIN 1 MAX 100)
LAYOUT(HASHED_ARRAY());



-- Join li Sorgu


SELECT
    count(1) AS total,
    Borough
FROM test.trips
INNER JOIN test.taxi_zone_dictionary3 ON toUInt64(trips.pickup_nyct2010_gid) = taxi_zone_dictionary3.LocationID
WHERE (dropoff_nyct2010_gid = 132) OR (dropoff_nyct2010_gid = 138)
GROUP BY Borough
ORDER BY total DESC

Query id: 4084660a-936c-44cd-b291-0b6b23f93cab

┌─total─┬─Borough───────┐
│  7053 │ Manhattan     │
│  6828 │ Brooklyn      │
│  4458 │ Queens        │
│  2670 │ Bronx         │
│   554 │ Staten Island │
│    53 │ EWR           │
└───────┴───────────────┘

6 rows in set. Elapsed: 0.013 sec. Processed 2.00 million rows, 4.01 MB (153.21 million rows/s., 306.87 MB/s.)
Peak memory usage: 5.28 MiB.



--Dictionary Fonksiyonlu Sorgu;

SELECT
    count(1) AS total,
    dictGetOrDefault('test.taxi_zone_dictionary3', 'Borough', toUInt64(pickup_nyct2010_gid), 'Unknown') AS borough_name
FROM test.trips
WHERE (dropoff_nyct2010_gid = 132) OR (dropoff_nyct2010_gid = 138)
GROUP BY borough_name
ORDER BY total DESC

Query id: 7a978508-e5b1-4634-a070-d52deef7f205

┌─total─┬─borough_name──┐
│ 23683 │ Unknown       │
│  7053 │ Manhattan     │
│  6828 │ Brooklyn      │
│  4458 │ Queens        │
│  2670 │ Bronx         │
│   554 │ Staten Island │
│    53 │ EWR           │
└───────┴───────────────┘

7 rows in set. Elapsed: 0.009 sec. Processed 2.00 million rows, 4.00 MB (212.43 million rows/s., 424.85 MB/s.)
Peak memory usage: 5.28 MiB.

Oracle Undo Management And Tuning

ClickHouse Disaster Recovery

Bir yanıt yazın Yanıtı iptal et

Popüler Gönderiler

SQL PLAN FIX -1 (BASELINE)

Installation Kibana and Register Elasticsearch to New Kibana

CREATE TABLE AND DATA TYPE

Önemli

Oracle Database Constraint Management (Enable & Disable)

FILE TO BLOB PLSQL

PL-SQL BLOB ALANDAKI RESIM DATASINI JPG DOSYASINA DONÜŞTÜREN PROCEDUR (BLOB TO JPG)

Son Yazılar

Recent Comments

Press ESC to close