メモ帳

python, juliaで機械学習をやっていく

dockerでubuntu16+python3.6+mecab(neolog-ipadic)を構築する

ubuntu上にpython3.6をいれて、mecab-python3でmecabが使えるような構成が以下のDockerfileで実現できる。

# base image 
FROM ubuntu:16.04

# language conf
ENV LANG C.UTF-8
ENV TZ Asia/Tokyo
ENV PYTHONIOENCODING "utf-8"
ENV PYTHONUNBUFFERED 1

# update ubuntu
RUN apt-get update -y && \
    apt-get upgrade -y && \
    apt-get dist-upgrade -y

# install python3.6
RUN apt-get install -y software-properties-common
RUN add-apt-repository ppa:jonathonf/python-3.6

RUN apt-get update \
  && apt-get install python3.6 python3.6-dev python3-pip make curl git sudo cron -y \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/* \
  && cd /usr/local/bin \
  && ln -s /usr/bin/python3.6 python

# install mecab from github
WORKDIR /opt
RUN git clone https://github.com/taku910/mecab.git
WORKDIR /opt/mecab/mecab
RUN ./configure  --enable-utf8-only \
  && make \
  && make check \
  && make install \
  && ldconfig

WORKDIR /opt/mecab/mecab-ipadic
RUN ./configure --with-charset=utf8 \
 && make \
 && make install

# neolog-ipadic.
# もしimageのサイズが気になるなら以下コメントアウトするとより軽量なipadic辞書のmecabが使える
WORKDIR /opt
RUN git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
WORKDIR /opt/mecab-ipadic-neologd
RUN ./bin/install-mecab-ipadic-neologd -n -y


# install package in python
WORKDIR /opt/backend
ADD requirements.txt /opt/backend/requirements.txt

RUN python -m pip install pip --upgrade \
 && python -m pip install -r requirements.txt
COPY . /opt/backend

in requiremets.txt

...
mecab-python3
...

が入っていればpythonからmecabが使える

$ docker run -it [name] exec python manage.py shell
Python 3.6.7 (default, Oct 25 2018, 09:16:13)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import MeCab
>>> tagger = MeCab.Tagger("-Owakati")
>>> sentence = "mecabの環境構築は結構はまりやすい"
>>> out = tagger.parse(sentence)
>>> out
'mecab の 環境 構築 は 結構 はまり やすい \n'