The Problem#
Recently I needed to sync an SMB share from a NAS with my Linux server. The obvious solution would be smbclient or mount -t cifs, but I wanted:
- Incremental synchronization (only new or modified files)
- Detect files deleted from the share
- Control NTLM authentication directly from code
- Silence the obscene amount of logs that
smbprotocolspits out
The Python smbprotocol library solved all of this, but there’s no documentation on how to do it properly. Here’s my solution.
Initial Setup#
Install the dependencies:
pip install smbprotocol sqlalchemy pydantic python-dotenvThe basic idea: maintain a SQLite database with a record of all synchronized files (name, MD5 hash, timestamp). Each execution compares the current share with the database and processes only changes.
Silencing smbprotocol logs#
This is critical. Without controlling it, the library fills your console with debug messages:
import logging
# Silenciar smbprotocol
logging.getLogger('smbprotocol').setLevel(logging.WARNING)
logging.getLogger('smbprotocol.connection').setLevel(logging.WARNING)
logging.getLogger('smbprotocol.session').setLevel(logging.WARNING)
# Tu logger
logger = logging.getLogger('sync_smb')
logger.setLevel(logging.INFO)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s'))
logger.addHandler(handler)This reduces logs to something reasonable. Without this, each operation generates 50 lines of garbage.
SQLite Database Structure#
from datetime import datetime
from sqlalchemy import create_engine, Column, String, DateTime, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class SyncedFile(Base):
__tablename__ = 'synced_files'
filename = Column(String, primary_key=True)
md5_hash = Column(String)
file_size = Column(Integer)
last_modified = Column(DateTime)
sync_timestamp = Column(DateTime, default=datetime.utcnow)
engine = create_engine('sqlite:///smb_sync.db')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)NTLM Connection#
from smbprotocol.session import Session
from smbprotocol.tree import TreeConnect
import socket
username = "DOMINIO\\usuario" # Formato NETBIOS\usuario
password = "contraseña"
host = "192.168.x.x"
share = "compartido"
# Conexión básica
connection = smbprotocol.connection.Connection(
uuid.uuid4(),
host,
445,
)
connection.connect()
session = Session(connection, username, password)
session.connect()
tree = TreeConnect(session, f"\\\\{host}\\{share}")
tree.connect()NTLM is negotiated automatically. You don’t need to do anything special, but make sure you use the correct DOMINIO\usuario format.
Incremental Synchronization#
import hashlib
from pathlib import Path
def get_file_hash(file_data):
"""Calcula MD5 de contenido en bytes"""
return hashlib.md5(file_data).hexdigest()
def sync_smb_share(local_path: Path):
session = Session()
remote_files = {}
# Listar archivos del share
directory = tree.open_file(share, FileAttributes.DIRECTORY, CreateOptions.FILE_DIRECTORY_FILE)
for file_info in directory.query_directory():
if file_info.file_attributes & FileAttributes.DIRECTORY:
continue # Ignorar carpetas por ahora
filename = file_info.file_name
remote_files[filename] = {
'size': file_info.end_of_file,
'modified': file_info.change_time.timestamp()
}
# Leer archivos nuevos o modificados
local_db = session.query(SyncedFile).all()
local_files = {f.filename: f for f in local_db}
for filename, info in remote_files.items():
# Nuevo o modificado
if filename not in local_files or local_files[filename].file_size != info['size']:
logger.info(f"Descargando: {filename}")
file_obj = tree.open_file(filename)
content = b""
for chunk in file_obj:
content += chunk
md5 = get_file_hash(content)
(local_path / filename).write_bytes(content)
# Actualizar BD
sync_record = local_files.get(filename) or SyncedFile()
sync_record.filename = filename
sync_record.md5_hash = md5
sync_record.file_size = info['size']
sync_record.last_modified = datetime.fromtimestamp(info['modified'])
session.merge(sync_record)
session.commit()
# Detectar eliminados
for filename in local_files:
if filename not in remote_files:
logger.warning(f"Archivo eliminado en remoto: {filename}")
(local_path / filename).unlink(missing_ok=True)
session.query(SyncedFile).filter_by(filename=filename).delete()
session.commit()
tree.close()
session.close()
if __name__ == "__main__":
sync_smb_share(Path("/mnt/sync"))Automation with cron#
0 */4 * * * /usr/bin/python3 /opt/sync_smb/sync.py >> /var/log/smb_sync.log 2>&1This syncs every 4 hours.
Conclusion#
With this setup you process only changes, control NTLM authentication without weird tricks, and have readable logs. The SQLite database is efficient even with thousands of files.
I’ve used this in production for months without issues.
Recommended Equipment#
- Intel N100 Mini PC — Silent and efficient mini PC for 24/7 home server
- Raspberry Pi 3 B+ — Lightweight, low-power server to start your homelab
Affiliate links. No extra cost to you.