Skip to main content

Incremental synchronization from SMB with smbprotocol on Linux: NTLM authentication and log control

Rogelio Guerra Riverón
Author
Rogelio Guerra Riverón
Building my own web infrastructure from scratch. Here I document each step: servers, networks, containers and everything that comes along.

The Problem
#

Recently I needed to sync an SMB share from a NAS with my Linux server. The obvious solution would be smbclient or mount -t cifs, but I wanted:

  1. Incremental synchronization (only new or modified files)
  2. Detect files deleted from the share
  3. Control NTLM authentication directly from code
  4. Silence the obscene amount of logs that smbprotocol spits out

The Python smbprotocol library solved all of this, but there’s no documentation on how to do it properly. Here’s my solution.

Initial Setup
#

Install the dependencies:

pip install smbprotocol sqlalchemy pydantic python-dotenv

The basic idea: maintain a SQLite database with a record of all synchronized files (name, MD5 hash, timestamp). Each execution compares the current share with the database and processes only changes.

Silencing smbprotocol logs
#

This is critical. Without controlling it, the library fills your console with debug messages:

import logging

# Silenciar smbprotocol
logging.getLogger('smbprotocol').setLevel(logging.WARNING)
logging.getLogger('smbprotocol.connection').setLevel(logging.WARNING)
logging.getLogger('smbprotocol.session').setLevel(logging.WARNING)

# Tu logger
logger = logging.getLogger('sync_smb')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s'))
logger.addHandler(handler)

This reduces logs to something reasonable. Without this, each operation generates 50 lines of garbage.

SQLite Database Structure
#

from datetime import datetime
from sqlalchemy import create_engine, Column, String, DateTime, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()

class SyncedFile(Base):
    __tablename__ = 'synced_files'
    
    filename = Column(String, primary_key=True)
    md5_hash = Column(String)
    file_size = Column(Integer)
    last_modified = Column(DateTime)
    sync_timestamp = Column(DateTime, default=datetime.utcnow)

engine = create_engine('sqlite:///smb_sync.db')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

NTLM Connection
#

from smbprotocol.session import Session
from smbprotocol.tree import TreeConnect
import socket

username = "DOMINIO\\usuario"  # Formato NETBIOS\usuario
password = "contraseña"
host = "192.168.x.x"
share = "compartido"

# Conexión básica
connection = smbprotocol.connection.Connection(
    uuid.uuid4(),
    host,
    445,
)
connection.connect()

session = Session(connection, username, password)
session.connect()

tree = TreeConnect(session, f"\\\\{host}\\{share}")
tree.connect()

NTLM is negotiated automatically. You don’t need to do anything special, but make sure you use the correct DOMINIO\usuario format.

Incremental Synchronization
#

import hashlib
from pathlib import Path

def get_file_hash(file_data):
    """Calcula MD5 de contenido en bytes"""
    return hashlib.md5(file_data).hexdigest()

def sync_smb_share(local_path: Path):
    session = Session()
    remote_files = {}
    
    # Listar archivos del share
    directory = tree.open_file(share, FileAttributes.DIRECTORY, CreateOptions.FILE_DIRECTORY_FILE)
    
    for file_info in directory.query_directory():
        if file_info.file_attributes & FileAttributes.DIRECTORY:
            continue  # Ignorar carpetas por ahora
        
        filename = file_info.file_name
        remote_files[filename] = {
            'size': file_info.end_of_file,
            'modified': file_info.change_time.timestamp()
        }
    
    # Leer archivos nuevos o modificados
    local_db = session.query(SyncedFile).all()
    local_files = {f.filename: f for f in local_db}
    
    for filename, info in remote_files.items():
        # Nuevo o modificado
        if filename not in local_files or local_files[filename].file_size != info['size']:
            logger.info(f"Descargando: {filename}")
            
            file_obj = tree.open_file(filename)
            content = b""
            for chunk in file_obj:
                content += chunk
            
            md5 = get_file_hash(content)
            (local_path / filename).write_bytes(content)
            
            # Actualizar BD
            sync_record = local_files.get(filename) or SyncedFile()
            sync_record.filename = filename
            sync_record.md5_hash = md5
            sync_record.file_size = info['size']
            sync_record.last_modified = datetime.fromtimestamp(info['modified'])
            
            session.merge(sync_record)
            session.commit()
    
    # Detectar eliminados
    for filename in local_files:
        if filename not in remote_files:
            logger.warning(f"Archivo eliminado en remoto: {filename}")
            (local_path / filename).unlink(missing_ok=True)
            session.query(SyncedFile).filter_by(filename=filename).delete()
            session.commit()
    
    tree.close()
    session.close()

if __name__ == "__main__":
    sync_smb_share(Path("/mnt/sync"))

Automation with cron
#

0 */4 * * * /usr/bin/python3 /opt/sync_smb/sync.py >> /var/log/smb_sync.log 2>&1

This syncs every 4 hours.

Conclusion
#

With this setup you process only changes, control NTLM authentication without weird tricks, and have readable logs. The SQLite database is efficient even with thousands of files.

I’ve used this in production for months without issues.


Recommended Equipment#

Affiliate links. No extra cost to you.