2024-04-26

Metaの「Llama 3」をOpenAI API互換のサーバーを持つllama-cpp-pythonとLocalAIで試す

これは、なにをしたくて書いたもの？

MetaからLlama 3がリリースされました。

Meta、無料で商用可の新LLM「Llama 3」、ほぼすべてのクラウドでアクセス可能に - ITmedia NEWS

このLlama 3をOpenAI API互換のサーバーを持つllama-cpp-pythonおよびLocalAIで動かせそうなので、試してみることにしました。

Llama 3

Llama 3はMetaの公開しているLLMです。

Meta Llama 3

Introducing Meta Llama 3: The most capable openly available LLM to date

パラメーターは8B、70Bの2種類で、ベースのモデルとInstruction tuning済みのモデルがそれぞれあります。

そしてこのモデルをllama-cpp-pythonやLocalAIで使いたいのですが。

まずllama.cppでは対応済み。

Added llama-3 chat template by DifferentialityDevelopment · Pull Request #6751 · ggerganov/llama.cpp · GitHub

llama-cpp-pythonでも対応済みです。

Add Llama-3 chat format by andreabak · Pull Request #1371 · abetlen/llama-cpp-python · GitHub

LocalAIについてはテンプレートを使えば大丈夫そうです。

How to run llama3? · mudler LocalAI · Discussion #2076 · GitHub

では、試してみようと思います。

オリジナルのモデルはこれらですが、

今回使うのはこちらのGGUFフォーマットかつ量子化済みのモデルにします。

QuantFactory/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face

環境

今回の環境はこちら。

$ python3 --version
Python 3.10.12


$ pip3 --version
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

モデルをダウンロードする

こちらからモデルをダウンロードします。

QuantFactory/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face

5GBほどのモデルです。

$ curl -L https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q4_K_M.gguf?download=true -o Meta-Llama-3-8B-Instruct.Q4_K_M.gguf


$ ll -h Meta-Llama-3-8B-Instruct.Q4_K_M.gguf
-rw-rw-r-- 1 xxxxx xxxxx 4.6G  4月 25 00:15 Meta-Llama-3-8B-Instruct.Q4_K_M.gguf

llama-cpp-pythonで試す

まずはllama-cpp-pythonで試してみます。

インストール。

$ pip3 install llama-cpp-python[server]

依存関係を含むバージョン。

$ pip3 list
Package           Version
----------------- -------
annotated-types   0.6.0
anyio             4.3.0
click             8.1.7
diskcache         5.6.3
exceptiongroup    1.2.1
fastapi           0.110.2
h11               0.14.0
idna              3.7
Jinja2            3.1.3
llama_cpp_python  0.2.64
MarkupSafe        2.1.5
numpy             1.26.4
pip               22.0.2
pydantic          2.7.1
pydantic_core     2.18.2
pydantic-settings 2.2.1
python-dotenv     1.0.1
PyYAML            6.0.1
setuptools        59.6.0
sniffio           1.3.1
sse-starlette     2.1.0
starlette         0.37.2
starlette-context 0.3.6
typing_extensions 4.11.0
uvicorn           0.29.0

起動。オプションに--chat_format llama-3が必要です。

$ python3 -m llama_cpp.server --model Meta-Llama-3-8B-Instruct.Q4_K_M.gguf --chat_format llama-3

動かしてみます。自己紹介をお願いしてみましょう。

$ time curl -s -XPOST -H 'Content-Type: application/json' localhost:8000/v1/chat/completions -d \
    '{"messages": [{"role": "user", "content": "Could you introduce yourself?"}]}' | jq
{
  "id": "chatcmpl-ff1221b5-5555-4a32-9c1b-c3c2818efc02",
  "object": "chat.completion",
  "created": 1713972674,
  "model": "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "I'd be happy to introduce myself.\n\nI am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm not a human, but rather a computer program designed to simulate conversation and answer questions to the best of my ability based on the knowledge and data I've been trained on.\n\nI'm constantly learning and improving my responses based on user interactions, so please bear with me if I don't always get it right at first. My goal is to assist and provide helpful information to those who interact with me, while also making our conversation as engaging and natural as possible.\n\nWhat would you like to talk about or ask?",
        "role": "assistant"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 138,
    "total_tokens": 154
  }
}

real    1m9.754s
user    0m0.051s
sys     0m0.003s

日本語でも試してみましたが、実行時間がだいぶ伸びることに加えて日本語で返ってきませんでした…。

$ time curl -s -XPOST -H 'Content-Type: application/json' localhost:8000/v1/chat/completions -d \
    '{"messages": [{"role": "user", "content": "あなたの自己紹介をしてください"}]}' | jq
{
  "id": "chatcmpl-cd219a91-a85c-4ceb-ae8c-4b49a28cd881",
  "object": "chat.completion",
  "created": 1713972772,
  "model": "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf",
  "choices": [
    {
      "index": 0,
      "message": {
        "content": "Nice to meet you! 😊\n\nMy name is LLaMA, and I'm a large language model AI trained by Meta AI that can understand and respond to human input in a conversational manner. My primary function is to assist users with information queries, provide helpful responses, and even engage in creative conversations.\n\nHere are some interesting facts about me:\n\n1. **Language skills**: I'm fluent in multiple languages, including English, Japanese, Spanish, French, German, Italian, Chinese, and many more! 🌎\n2. **Knowledge base**: My training data consists of a massive corpus of text from the internet, which allows me to provide accurate answers to a wide range of questions.\n3. **Conversational abilities**: I can understand natural language processing (NLP) and respond accordingly, making it feel like you're having a conversation with a human! 💬\n4. **Creative capabilities**: I can generate text, poetry, stories, dialogues, and even entire scripts!\n5. **Continuous learning**: My training is ongoing, so I'm always improving my understanding of language and updating my knowledge base.\n\nI'm here to help answer your questions, provide information, or simply chat about any topic you're interested in! What would you like to talk about? 🤔",
        "role": "assistant"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 262,
    "total_tokens": 281
  }
}

real    2m42.950s
user    0m0.042s
sys     0m0.018s

意味は通じているようなのですが。

Llama 3は英語で使うことにしましょう。

ところで、--chat_format llama-3というのは以下で利用されるものですね。

https://github.com/abetlen/llama-cpp-python/blob/v0.2.64/llama_cpp/llama_chat_format.py#L929-L946

これらのトークンについてですが

    _roles = dict(
        system="<|start_header_id|>system<|end_header_id|>\n\n",
        user="<|start_header_id|>user<|end_header_id|>\n\n",
        assistant="<|start_header_id|>assistant<|end_header_id|>\n\n",
    )
    _begin_token = "<|begin_of_text|>"
    _sep = "<|eot_id|>"

こちらに記載があります。

Meta Llama 3 | Model Cards and Prompt formats

LocalAIで試す

次は、LocalAIで試しましょう。

ダウンロード。

$ curl -LO https://github.com/mudler/LocalAI/releases/download/v2.12.4/local-ai-avx2-Linux-x86_64
$ chmod a+x local-ai-avx2-Linux-x86_64
$ ./local-ai-avx2-Linux-x86_64 --version
LocalAI version v2.12.4 (0004ec8be3ca150ce6d8b79f2991bfe3a9dc65ad)

modelsディレクトリに量子化されたLlama 3のモデルを配置します。

$ tree models
models
└── Meta-Llama-3-8B-Instruct.Q4_K_M.gguf

0 directories, 1 file

設定ファイルを用意します。

local-ai-config.yaml`

- name: llama-3-8b-instruct
  backend: llama-cpp
  mmap: true
  context_size: 8192
  f16: true
  stopwords:
    - <|im_end|>
    - <dummy32000>
    - "<|eot_id|>"
  parameters:
    model: Meta-Llama-3-8B-Instruct.Q4_K_M.gguf
  template:
    chat_message: |
      <|start_header_id|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}<|end_header_id|>

      {{ if .FunctionCall -}}
      Function call:
      {{ else if eq .RoleName "tool" -}}
      Function response:
      {{ end -}}
      {{ if .Content -}}
      {{.Content -}}
      {{ else if .FunctionCall -}}
      {{ toJson .FunctionCall -}}
      {{ end -}}
      <|eot_id|>
    function: |
      <|start_header_id|>system<|end_header_id|>

      You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
      <tools>
      {{range .Functions}}
      {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
      {{end}}
      </tools>
      Use the following pydantic model json schema for each tool call you will make:
        {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
        Function call:
    chat: |
      <|begin_of_text|>{{.Input }}
      <|start_header_id|>assistant<|end_header_id|>
    completion: |
      {{.Input}}
    usage: |
      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
          "model": "llama3-8b-instruct",
          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
      }'

主な内容はLlama 3向けのテンプレートを入れたもので、このあたりを参考に作成しています。

How to run llama3? · mudler LocalAI · Discussion #2076 · GitHub

models(llama3): add llama3 to embedded models by mudler · Pull Request #2074 · mudler/LocalAI · GitHub

https://github.com/mudler/LocalAI/blob/48d0aa2f6da0b1c039fa062e61facf5e6191420e/embedded/models/llama3-instruct.yaml

起動。

$ ./local-ai-avx2-Linux-x86_64 --config-file local-ai-config.yaml --models-path models --threads 4

確認。

$ time curl -s -XPOST -H 'Content-Type: application/json' localhost:8080/v1/chat/completions -d \
    '{"model": "llama-3-8b-instruct", "messages": [{"role": "user", "content": "Could you introduce yourself?"}]}' | jq
{
  "created": 1714056935,
  "object": "chat.completion",
  "id": "059aff63-0d6a-4d29-ba9c-02b4a467f03a",
  "model": "llama-3-8b-instruct",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "I'd be happy to introduce myself!\n\nI'm LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm not a human, but I'm designed to simulate conversation and answer questions to the best of my ability. I can provide information on a wide range of topics, and I'm constantly learning and improving my responses.\n\nI don't have personal experiences or emotions like humans do, but I'm here to help you with any questions or topics you'd like to discuss. I'm happy to chat and provide information to the best of my ability."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

real    0m46.588s
user    0m0.043s
sys     0m0.006s

初回のモデルのロードには、3分ほどかかりましたが…。

11:55PM INF Trying to load the model 'Meta-Llama-3-8B-Instruct.Q4_K_M.gguf' with all the available backends: llama-cpp, llama-ggml, gpt4all, bert-embeddings, rwkv, whisper, stablediffusion, tinydream, piper
11:55PM INF [llama-cpp] Attempting to load
11:55PM INF Loading model 'Meta-Llama-3-8B-Instruct.Q4_K_M.gguf' with backend llama-cpp
11:58PM INF [llama-cpp] Loads OK

日本語の確認は、こちらではパスします。試してはみましたが、遅い＆やっぱり英語で返ってきました…。

こんなところでしょうか。

おわりに

MetaのLLM、Llama 3をllama-cpp-pythonおよびLocalAIで試してみました。

最小のモデルが8BとLlama 2よりもちょっと大きいのですが、ある意味予想通りでしたが割とあっさり使えて良かったです。

2024-04-06

Fakerでフェイクデータを作成する

Python

これは、なにをしたくて書いたもの？

Qdrantのドキュメントを見ていて、Fakerというフェイクデータを生成するライブラリーがあることを知ったので簡単に試しておきます。

Faker

FakerのWebサイト、GitHub リポジトリーはこちら。

Welcome to Faker’s documentation! — Faker 24.4.0 documentation

GitHub - joke2k/faker: Faker is a Python package that generates fake data for you.

Fakerはフェイクデータを作成するライブラリーで、データベースのセットアップや性能テスト、匿名化したデータの作成といった用途で
利用することが想定されています。

今回はPythonのFakerを扱うのですが、複数の言語で存在するようです。

PHPのFakerのリポジトリーはアーカイブされていました。

GitHub - fzaninotto/Faker: Faker is a PHP library that generates fake data for you

ここからはPython版のFakerを見ていきます。

使い方はこんな感じで、Fakerのインスタンスを作成した後にフェイクデータに対応するメソッドを呼び出すようです。

from faker import Faker
fake = Faker()

fake.name()
# 'Lucy Cechtelar'

fake.address()
# '426 Jordy Lodge
#  Cartwrightshire, SC 88120-6700'

fake.text()
# 'Sint velit eveniet. Rerum atque repellat voluptatem quia rerum. Numquam excepturi
#  beatae sint laudantium consequatur. Magni occaecati itaque sint et sit tempore. Nesciunt
#  amet quidem. Iusto deleniti cum autem ad quia aperiam.
#  A consectetur quos aliquam. In iste aliquid et aut similique suscipit. Consequatur qui
#  quaerat iste minus hic expedita. Consequuntur error magni et laboriosam. Aut aspernatur
#  voluptatem sit aliquam. Dolores voluptatum est.
#  Aut molestias et maxime. Fugit autem facilis quos vero. Eius quibusdam possimus est.
#  Ea quaerat et quisquam. Deleniti sunt quam. Adipisci consequatur id in occaecati.
#  Et sint et. Ut ducimus quod nemo ab voluptatum.'

これらのメソッド（上記例だとnameやaddress、text）のことは"fake"と呼ばれ、"fake"の多くはプロバイダーとしてパッケージされている
ようです。

プロバイダーのリストはこちら。

Standard Providers — Faker 24.4.0 documentation

コミュニティによるプロバイダーもあるようです。

Community Providers — Faker 24.4.0 documentation

また、自分でプロバイダーを作成したり、カスタマイズしたりすることもできるようです。

ローカライズにも対応しているようです。ja_JPで、日本語のフェイクデータが生成できそうです。

from faker import Faker
fake = Faker(['it_IT', 'en_US', 'ja_JP'])
for _ in range(10):
    print(fake.name())

# 鈴木 陽一
# Leslie Moreno
# Emma Williams
# 渡辺 裕美子
# Marcantonio Galuppi
# Martha Davis
# Kristen Turner
# 中津川 春香
# Ashley Castillo
# 山田 桃子

Localization

コマンドラインツールとしても使えます。

Command line usage

紹介はこれくらいにして、使っていってみましょう。

環境

今回の環境はこちら。

$ python3 --version
Python 3.10.12


$ pip3 --version
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)

Fakerを使ってみる

それでは、Fakerを使ってみましょう。

まずはインストール。

$ pip3 install Faker

現在のバージョンは24.4.0です。

$ pip3 list
Package         Version
--------------- -----------
Faker           24.4.0
pip             22.0.2
python-dateutil 2.9.0.post0
setuptools      59.6.0
six             1.16.0

コマンドラインツールとして使ってみる

最初はコマンドラインツールとして使ってみましょう。

バージョン。

$ faker --version
faker 24.4.0

ヘルプ。

$ faker --help
usage: faker [-h] [--version] [-v] [-o output] [-l LOCALE] [-r REPEAT] [-s SEP] [--seed SEED] [-i [INCLUDE ...]] [fake] [fake argument ...]

faker version 24.4.0

positional arguments:
  fake                  name of the fake to generate output for (e.g. profile)
  fake argument         optional arguments to pass to the fake (e.g. the profile fake takes an optional list of comma separated field names as the first argument)

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         show INFO logging events instead of CRITICAL, which is the default. These logging events provide insight into localization of specific providers.
  -o output             redirect output to a file
  -l LOCALE, --lang LOCALE
                        specify the language for a localized provider (e.g. de_DE)
  -r REPEAT, --repeat REPEAT
                        generate the specified number of outputs
  -s SEP, --sep SEP     use the specified separator after each output
  --seed SEED           specify a seed for the random generator so that results are repeatable. Also compatible with 'repeat' option
  -i [INCLUDE ...], --include [INCLUDE ...]
                        list of additional custom providers to user, given as the import path of the module containing your Provider class (not the provider class itself)

supported locales:

  ar_AA, ar_AE, ar_BH, ar_EG, ar_JO, ar_PS, ar_SA, az_AZ, bg_BG, bn_BD, bs_BA, cs_CZ, da_DK, de, de_AT, de_CH, de_DE, dk_DK, el_CY, el_GR, en, en_AU, en_BD, en_CA, en_GB, en_IE, en_IN, en_NZ, en_PH, en_TH, en_US, es, es_AR, es_CA, es_CL, es_CO, es_ES, es_MX, et_EE, fa_IR, fi_FI, fil_PH, fr_BE, fr_CA, fr_CH, fr_FR, fr_QC, ga_IE, he_IL, hi_IN, hr_HR, hu_HU, hy_AM, id_ID, it_CH, it_IT, ja_JP, ka_GE, ko_KR, la, lb_LU, lt_LT, lv_LV, mt_MT, ne_NP, nl_BE, nl_NL, no_NO, or_IN, pl_PL, pt_BR, pt_PT, ro_RO, ru_RU, sk_SK, sl_SI, sq_AL, sv_SE, ta_IN, th, th_TH, tl_PH, tr_TR, tw_GH, uk_UA, vi_VN, zh_CN, zh_TW, zu_ZA

  Faker can take a locale as an optional argument, to return localized data. If
  no locale argument is specified, the factory falls back to the user's OS
  locale as long as it is supported by at least one of the providers.
     - for this user, the default locale is ja_JP.

  If the optional argument locale and/or user's default locale is not available
  for the specified provider, the factory falls back to faker's default locale,
  which is en_US.

examples:

  $ faker address
  968 Bahringer Garden Apt. 722
  Kristinaland, NJ 09890

  $ faker -l de_DE address
  Samira-Niemeier-Allee 56
  94812 Biedenkopf

  $ faker profile ssn,birthdate
  {'ssn': u'628-10-1085', 'birthdate': '2008-03-29'}

  $ faker -r=3 -s=";" name
  Willam Kertzmann;
  Josiah Maggio;
  Gayla Schmitt;

サポートされているLocaleも表示されるんですね。

supported locales:

  ar_AA, ar_AE, ar_BH, ar_EG, ar_JO, ar_PS, ar_SA, az_AZ, bg_BG, bn_BD, bs_BA, cs_CZ, da_DK, de, de_AT, de_CH, de_DE, dk_DK, el_CY, el_GR, en, en_AU, en_BD, en_CA, en_GB, en_IE, en_IN, en_NZ, en_PH, en_TH, en_US, es, es_AR, es_CA, es_CL, es_CO, es_ES, es_MX, et_EE, fa_IR, fi_FI, fil_PH, fr_BE, fr_CA, fr_CH, fr_FR, fr_QC, ga_IE, he_IL, hi_IN, hr_HR, hu_HU, hy_AM, id_ID, it_CH, it_IT, ja_JP, ka_GE, ko_KR, la, lb_LU, lt_LT, lv_LV, mt_MT, ne_NP, nl_BE, nl_NL, no_NO, or_IN, pl_PL, pt_BR, pt_PT, ro_RO, ru_RU, sk_SK, sl_SI, sq_AL, sv_SE, ta_IN, th, th_TH, tl_PH, tr_TR, tw_GH, uk_UA, vi_VN, zh_CN, zh_TW, zu_ZA

たとえば、nameで試してみましょう。

$ faker name
木村 陽一

いきなり日本語が出ました…。

よく見ると、Localeを未指定の場合はOSのLocaleを使うと書かれていますね。

  Faker can take a locale as an optional argument, to return localized data. If
  no locale argument is specified, the factory falls back to the user's OS
  locale as long as it is supported by at least one of the providers.
     - for this user, the default locale is ja_JP.

明示的に指定する場合は-lまたは--langで指定します。

$ faker -l ja_JP name
高橋 聡太郎


$ faker --lang en_US name
Benjamin Shea

今回は明示的に指定することにしましょう。

-rで、指定した回数出力できます。

$ faker -l ja_JP -r 3 address
大阪府大島町上高野10丁目8番13号 パレス湯本塩原922

長野県長生郡白子町四番町6丁目21番6号

佐賀県羽村市北上野36丁目12番3号

-sでセパレーターを指定します。空行が気になるので、空文字にしました。

$ faker -l ja_JP -r 3 -s '' address
京都府我孫子市東神田25丁目13番17号
青森県大田区押上12丁目22番17号
島根県荒川区北上野25丁目25番4号 上吉羽パレス925

-oでファイルに出力できます。

$ faker -l ja_JP -r 10 -s '' -o isbn13-10.txt  isbn13

結果。

isbn13-10.txt

978-0-7743-8714-9
978-1-72817-220-0
978-1-61194-154-8
978-0-9625246-4-6
978-0-344-76991-7
978-0-252-38021-1
978-0-9788212-0-3
978-1-66603-286-4
978-1-9815-2440-2
978-1-103-64848-1

プロバイダーによっては、こういう出力になるものもあります。

$ faker -l ja_JP profile
{'job': '公務員', 'company': '有限会社長谷川保険', 'ssn': '871-33-9808', 'residence': '新潟県八千代市六番町2丁目1番7号 柿木沢新田パーク717', 'current_location': (Decimal('-62.3285515'), Decimal('-22.976681')), 'blood_group': 'O-', 'website': ['https://www.nakamura.com/', 'https://ito.jp/', 'http://nakamura.com/'], 'username': 'fyamaguchi', 'name': '吉田 零', 'sex': 'M', 'address': '長野県青梅市高輪36丁目19番18号 パーク橋場885', 'mail': 'momokokondo@gmail.com', 'birthdate': datetime.date(2018, 6, 29)}

この場合、指定した要素に絞って出力もできます。

$ faker -l ja_JP profile job,company
{'job': '司法書士', 'company': '合同会社松田保険'}

また、プロバイダーによってはプロバイダーをそのまま指定してもダメなものもあります。

## これはNG
$ faker -l ja_JP person
Traceback (most recent call last):
  File "/path/to/site-packages/faker/generator.py", line 92, in get_formatter
    return getattr(self, formatter)
AttributeError: 'Generator' object has no attribute 'person'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/site-packages/faker/cli.py", line 97, in print_doc
    print(fake.format(provider_or_field, *args), end="", file=output)
  File "/path/to/site-packages/faker/generator.py", line 88, in format
    return self.get_formatter(formatter)(*args, **kwargs)
  File "/path/to/site-packages/faker/generator.py", line 98, in get_formatter
    raise AttributeError(msg)
AttributeError: Unknown formatter 'person' with locale 'ja_JP'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/bin/faker", line 8, in <module>
    sys.exit(execute_from_command_line())
  File "/path/to/site-packages/faker/cli.py", line 291, in execute_from_command_line
    command.execute()
  File "/path/to/site-packages/faker/cli.py", line 265, in execute
    print_doc(
  File "/path/to/site-packages/faker/cli.py", line 99, in print_doc
    raise ValueError(f'No faker found for "{provider_or_field}({args})"')
ValueError: No faker found for "person([])"


## これはOK
$ faker -l ja_JP name
佐藤 涼平

これは、あくまで指定するのはfakeだからです。

nameはpersonプロバイダーのfakeです。プロバイダーとfakeの名前が一致するとは限らないようなので、プロバイダーの一覧
というよりはその中にあるfakeを見た方がよさそうですね。

faker.providers.person — Faker 24.6.0 documentation

コマンドラインツールとしての使い方は、これくらいにしておきましょう。

プログラムで使う

続いて、プログラムで使ってみましょう。

こんな感じで作成。

app.py

from faker import Faker

faker = Faker("ja_JP")

header = ["name", "address", "phone-number", "job"]

print(" | ".join(header))

for _ in range(10):
    print(
        " | ".join([faker.name(), faker.address(), faker.phone_number(), faker.job()])
    )

実行結果。

$ python3 app.py
name | address | phone-number | job
高橋 修平 | 島根県四街道市神明内1丁目7番9号 元浅草コーポ173 | 090-6292-3048 | 音楽家
小林 加奈 | 山口県八王子市上広谷14丁目23番3号 | 73-4556-0885 | 裁判官
山田 浩 | 栃木県稲城市羽折町22丁目19番6号 シャルム西浅草447 | 080-4384-5952 | コピーライター
斎藤 あすか | 大阪府武蔵村山市北上野16丁目1番20号 東浅草アーバン995 | 070-2098-5528 | 電気工事士
池田 篤司 | 宮城県山武郡芝山町箭坪24丁目5番5号 コート戸島217 | 080-9504-4444 | 医師
橋本 千代 | 群馬県山武郡横芝光町上高野35丁目26番8号 上高野コート456 | 65-8558-5395 | モデル
吉田 太一 | 熊本県香取市蟇沼24丁目16番5号 | 080-1606-0085 | 公務員
渡辺 千代 | 長崎県日野市中小来川8丁目3番17号 クレスト一ツ橋795 | 070-7744-8111 | 気象予報士
山田 和也 | 青森県武蔵村山市蔵前26丁目26番16号 | 66-1338-0212 | ゲームクリエイター
高橋 康弘 | 高知県横浜市鶴見区天神島2丁目27番19号 | 18-2240-9622 | 救急救命士

だいたい雰囲気はわかりましたね。

あとはプロバイダーのドキュメントを見ていけばよいかなと思います。

オマケ

ちょっと気になるので、実装を見てみましょう。

プロバイダーは、Localeごとのディレクトリにあるようです。

https://github.com/joke2k/faker/blob/v24.4.0/faker/providers/person/ja_JP/__init__.py

どんなデータが生成されそうなのかは、ソースコードを見ればわかりそうです。

    first_name_female_pairs = (
        ("明美", "アケミ", "Akemi"),
        ("あすか", "アスカ", "Asuka"),
        ("香織", "カオリ", "Kaori"),
        ("加奈", "カナ", "Kana"),
        ("くみ子", "クミコ", "Kumiko"),
        ("さゆり", "サユリ", "Sayuri"),
        ("知実", "サトミ", "Satomi"),
        ("千代", "チヨ", "Chiyo"),
        ("直子", "ナオコ", "Naoko"),
        ("七夏", "ナナミ", "Nanami"),
        ("花子", "ハナコ", "Hanako"),
        ("春香", "ハルカ", "Haruka"),
        ("真綾", "マアヤ", "Maaya"),
        ("舞", "マイ", "Mai"),
        ("美加子", "ミカコ", "Mikako"),
        ("幹", "ミキ", "Miki"),
        ("桃子", "モモコ", "Momoko"),
        ("結衣", "ユイ", "Yui"),
        ("裕美子", "ユミコ", "Yumiko"),
        ("陽子", "ヨウコ", "Yoko"),
        ("里佳", "リカ", "Rika"),
    )

    # for backwards compatibility
    first_names_female = tuple(map(itemgetter(0), first_name_female_pairs))
    first_kana_names_female = tuple(map(itemgetter(1), first_name_female_pairs))
    first_romanized_names_female = tuple(map(itemgetter(2), first_name_female_pairs))

    first_name_male_pairs = (
        ("晃", "アキラ", "Akira"),
        ("篤司", "アツシ", "Atsushi"),
        ("治", "オサム", "Osamu"),
        ("和也", "カズヤ", "Kazuya"),
        ("京助", "キョウスケ", "Kyosuke"),
        ("健一", "ケンイチ", "Kenichi"),
        ("修平", "シュウヘイ", "Shohei"),
        ("翔太", "ショウタ", "Shota"),
        ("淳", "ジュン", "Jun"),
        ("聡太郎", "ソウタロウ", "Sotaro"),
        ("太一", "タイチ", "Taichi"),
        ("太郎", "タロウ", "Taro"),
        ("拓真", "タクマ", "Takuma"),
        ("翼", "ツバサ", "Tsubasa"),
        ("智也", "トモヤ", "Tomoya"),
        ("直樹", "ナオキ", "Naoki"),
        ("直人", "ナオト", "Naoto"),
        ("英樹", "ヒデキ", "Hideki"),
        ("浩", "ヒロシ", "Hiroshi"),
        ("学", "マナブ", "Manabu"),
        ("充", "ミツル", "Mituru"),
        ("稔", "ミノル", "Minoru"),
        ("裕樹", "ユウキ", "Yuki"),
        ("裕太", "ユウタ", "Yuta"),
        ("康弘", "ヤスヒロ", "Yasuhiro"),
        ("陽一", "ヨウイチ", "Yoichi"),
        ("洋介", "ヨウスケ", "Yosuke"),
        ("亮介", "リョウスケ", "Ryosuke"),
        ("涼平", "リョウヘイ", "Ryohei"),
        ("零", "レイ", "Rei"),
    )

    # for backwards compatibility
    first_names_male = tuple(map(itemgetter(0), first_name_male_pairs))
    first_kana_names_male = tuple(map(itemgetter(1), first_name_male_pairs))
    first_romanized_names_male = tuple(map(itemgetter(2), first_name_male_pairs))

    # for backwards compatibility
    first_names = first_names_male + first_names_female
    first_kana_names = first_kana_names_male + first_kana_names_female
    first_romanized_names = first_romanized_names_male + first_romanized_names_female

    first_name_pairs = first_name_male_pairs + first_name_female_pairs

    last_name_pairs = OrderedDict(
        (
            (("佐藤", "サトウ", "Sato"), 366803.0),
            (("鈴木", "スズキ", "Suzuki"), 321135),
            (("高橋", "タカハシ", "Takahashi"), 266782),
            (("田中", "タナカ", "Tanaka"), 245821),
            (("伊藤", "イトウ", "Ito"), 203357),
            (("渡辺", "ワタナベ", "Watanabe"), 200504),
            (("山本", "ヤマモト", "Yamamoto"), 200134),
            (("中村", "ナカムラ", "Nakamura"), 195219),
            (("小林", "コバヤシ", "Kobayashi"), 191819),
            (("加藤", "カトウ", "Kato"), 160283),
            (("吉田", "ヨシダ", "Yoshida"), 154461),
            (("山田", "ヤマダ", "Yamada"), 151675),
            (("佐々木", "ササキ", "Sasaki"), 135927),
            (("山口", "ヤマグチ", "Yamaguchi"), 119501),
            (("松本", "マツモト", "Matsumoto"), 116490),
            (("井上", "イノウエ", "Inoue"), 111287),
            (("木村", "キムラ", "Kimura"), 107446),
            (("林", "ハヤシ", "Hayashi"), 101826),
            (("斎藤", "サイトウ", "Saito"), 101774),
            (("清水", "シミズ", "Shimizu"), 97826),
            (("山崎", "ヤマザキ", "Yamazaki"), 90781),
            (("阿部", "アベ", "Abe"), 86833),
            (("森", "モリ", "Mori"), 86507),
            (("池田", "イケダ", "Ikeda"), 84860),
            (("橋本", "ハシモト", "Hashimoto"), 82836),
            (("山下", "ヤマシタ", "Yamashita"), 80588),
            (("石川", "イシカワ", "Ishikawa"), 77471),
            (("中島", "ナカジマ", "Nakajima"), 74106),
            (("前田", "マエダ", "Maeda"), 72930),
            (("藤田", "フジタ", "Fujita"), 72375),
            (("後藤", "ゴトウ", "Goto"), 71629),
            (("小川", "オガワ", "Ogawa"), 71179),
            (("岡田", "オカダ", "Okada"), 70347),
            (("長谷川", "ハセガワ", "Hasegawa"), 69201),
            (("村上", "ムラカミ", "Murakami"), 68606),
            (("近藤", "コンドウ", "Kondo"), 68297),
            (("石井", "イシイ", "Ishii"), 67079),
            (("遠藤", "エンドウ", "Endo"), 62620),
            (("斉藤", "サイトウ", "Saito"), 62540),
            (("坂本", "サカモト", "Sakamoto"), 62308),
            (("青木", "アオキ", "Aoki"), 59516),
            (("藤井", "フジイ", "Fujii"), 59204),
            (("西村", "ニシムラ", "Nishimura"), 58821),
            (("福田", "フクダ", "Fukuda"), 58714),
            (("太田", "オオタ", "Ota"), 58439),
            (("三浦", "ミウラ", "Miura"), 58006),
            (("藤原", "フジワラ", "Fujiwara"), 57742),
            (("松田", "マツダ", "Matsuda"), 55883),
            (("岡本", "オカモト", "Okamoto"), 55539),
            (("中川", "ナカガワ", "Nakagawa"), 55221),
        )
    )

https://github.com/joke2k/faker/blob/v24.4.0/faker/providers/person/ja_JP/__init__.py#L11-L138

プロバイダーによって、用意されているLocaleには差がありそうなので注意ですね。

たとえばpersonは多いですが

https://github.com/joke2k/faker/tree/v24.4.0/faker/providers/person

barcodeは5つだけです。

https://github.com/joke2k/faker/tree/v24.4.0/faker/providers/barcode

おわりに

フェイクデータを生成するFakerを試してみました。

こういうものがあるのを知らなかったです。複数の言語での実装もあるようですし、便利そうですね。少なくともPython版については
カスタマイズもできそうですし。

覚えておくとよさそうです。

CLOVER🍀

That was when it all began.

Metaの「Llama 3」をOpenAI API互換のサーバーを持つllama-cpp-pythonとLocalAIで試す

これは、なにをしたくて書いたもの？

Llama 3

環境

モデルをダウンロードする

llama-cpp-pythonで試す

LocalAIで試す

おわりに

Fakerでフェイクデータを作成する

これは、なにをしたくて書いたもの？

Faker

環境

Fakerを使ってみる

コマンドラインツールとして使ってみる

プログラムで使う

オマケ

おわりに