fix(scraper): recherche _getproduitslist actualisé

Suite à une refont de l'UI et du backend, la structure de données JSON envoyé par la page web a été simplifié. Ancienne structure: - `"props"->"pageProps"->"initialReduxState"->"categ"->"content->"produits"` Nouvelle structure: - `"props"->"pageProps"->"produits"`
retravail: changement documentation et suppression version anglaise
2026-03-28 18:03:47 +00:00 · 2026-03-27 21:47:06 +01:00 · 2026-03-09 19:10:06 +01:00 · 2026-03-09 14:35:57 +01:00 · 2026-03-09 14:16:05 +01:00 · 2026-03-06 22:54:10 +01:00
10 changed files with 175 additions and 10 deletions
--- a/.github/workflows/static.yml
+++ b/.github/workflows/static.yml
@@ -0,0 +1,58 @@
+# Simple workflow for deploying static content to GitHub Pages
+name: Deploy static content to Pages
+
+on:
+  # Runs on pushes targeting the default branch
+  push:
+    branches: ["main"]
+
+  # Allows you to run this workflow manually from the Actions tab
+  workflow_dispatch:
+
+# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
+# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
+concurrency:
+  group: "pages"
+  cancel-in-progress: false
+
+jobs:
+  # Single deploy job since we're just deploying
+  deploy:
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.10'
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          # Installe le projet en mode éditable avec les extras de doc
+          pip install -e ".[doc]"
+
+      - name: Setup Pages
+        uses: actions/configure-pages@v5
+
+      - name: Build Documentation
+        run: mkdocs build
+      - name: Upload artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          # Upload entire repository
+          path: './site'
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
--- a/README.md
+++ b/README.md
@@ -1 +1,37 @@
-# millesima_projetS6
+# Millesima AI Engine 🍷
+
+> A **University of Paris-Est Créteil (UPEC)** Semester 6 project.
+
+## Documentation
+- 🇫🇷 [Version Française](https://guezoloic.github.io/millesima-ai-engine)
+> note: only french version enabled for now.
+---
+
+## Installation
+> Make sure you have **Python 3.10+** installed.
+
+1. **Clone the repository:**
+    ```bash
+    git clone https://github.com/votre-pseudo/millesima-ai-engine.git
+    cd millesima-ai-engine
+    ```
+
+2. **Set up a virtual environment:**
+    ```bash
+    python3 -m venv .venv
+    source .venv/bin/activate  # Windows: .venv\Scripts\activate
+    ```
+
+3. **Install dependencies:**
+    ```bash
+    pip install -e .
+    ```
+
+## Usage
+
+### 1. Data Extraction (Scraping)
+To fetch the latest wine data from Millesima:
+```bash
+python3 src/scraper.py
+```
+> Note: that will take some time to fetch all data depending on the catalog size.
--- a/docs/cleaning.md
+++ b/docs/cleaning.md
@@ -0,0 +1,17 @@
+# Cleaning
+
+## Sommaire
+[TOC]
+
+---
+
+## Classe `Cleaning`
+::: src.cleaning.Cleaning
+    options:
+      heading_level: 3
+      members:
+        - __init__
+        - getVins
+        - drop_empty_appellation
+        - fill_missing_scores
+        - encode_appellation
--- a/docs/index.md
+++ b/docs/index.md
@@ -1 +1,16 @@
-# Millesima
+# Millesima
+
+L’objectif de ce projet est d’étudier, en utilisant des méthodes d’apprentissage automatique, l’impact de différents critères (notes des critiques, appelation) sur le prix d’un vin. Pour ce faire, on s’appuiera sur le site Millesima (https://www.millesima.fr/), qui a l’avantage de ne pas posséder de protection contre les bots. Par respect pour l’hébergeur du site, on veillera à limiter au maximum le nombre de requêtes. En particulier, on s’assurera d’avoir un code fonctionnel avant de scraper l’intégralité du site, pour éviter les répétitions.
+
+## projet
+<div style="text-align: center;">
+    <object
+        data="/millesima-ai-engine/projet.pdf"
+        type="application/pdf"
+        width="100%"
+        height="1000px"
+    >
+        <p>Votre navigateur ne peut pas afficher ce PDF. 
+        <a href="/millesima-ai-engine/projet.pdf">Cliquez ici pour le télécharger.</a></p>
+    </object>
+</div>
--- a/docs/projet.pdf
+++ b/docs/projet.pdf
--- a/docs/scraper.md
+++ b/docs/scraper.md
@@ -1,3 +1,31 @@
 # Scraper

-::: scraper.Scraper
+## Sommaire
+[TOC]
+
+---
+
+## Classe `Scraper`
+::: scraper.Scraper
+    options:
+      members:
+        - __init__
+        - getvins
+        - getjsondata
+        - getresponse
+        - getsoup
+      heading_level: 4
+
+## Classe `_ScraperData`
+::: scraper._ScraperData
+    options:
+      members:
+        - __init__
+        - getdata
+        - appellation
+        - parker
+        - robinson
+        - suckling
+        - prix
+        - informations
+      heading_level: 4
--- a/docs/scraperdata.md
+++ b/docs/scraperdata.md
@@ -1,4 +0,0 @@
-
-# _ScraperData
-
-::: scraper._ScraperData
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,4 +1,5 @@
 site_name: "Projet Millesima S6"
+site_url: "https://github.guezoloic.com/millesima-ai-engine/"

 theme:
  name: "material"
@@ -7,6 +8,11 @@ plugins:
  - search
  - mkdocstrings

+extra:
+  generator: false
+
+copyright: "Loïc GUEZO & Chahrazad DAHMANI – UPEC S6 – 2026"
+
 markdown_extensions:
  - admonition
  - pymdownx.details
--- a/src/cleaning.py
+++ b/src/cleaning.py
@@ -99,7 +99,11 @@ def main() -> None:

    filename = argv[1]
    cleaning: Cleaning = Cleaning(filename)
-    _ = cleaning.drop_empty_appellation().fill_missing_scores().encode_appellation()
+    cleaning.drop_empty_appellation()   \
+        .fill_missing_scores()          \
+        .encode_appellation()           \
+        .getVins()                      \
+        .to_csv("clean.csv", index=False)


 if __name__ == "__main__":
--- a/src/scraper.py
+++ b/src/scraper.py
@@ -377,13 +377,18 @@ class Scraper:
        try:
            data: dict[str, object] = self.getjsondata(subdir).getdata()

-            for element in ["initialReduxState", "categ", "content"]:
-                data = cast(dict[str, object], data.get(element))
+            # Changement dans la maniere du site stocke ses données.
+            #  
+            # for element in ["initialReduxState", "categ", "content"]:
+            #     data = cast(dict[str, object], data.get(element))
+            #     print(data)

            products: list[dict[str, Any]] = cast(
                list[dict[str, Any]], data.get("products")
            )

+            print(products)
+
            return products

        except (JSONDecodeError, HTTPError):
Author	SHA1	Message	Date
Loïc GUEZO	f5d5703e49	fix(scraper): recherche _getproduitslist actualisé Suite à une refont de l'UI et du backend, la structure de données JSON envoyé par la page web a été simplifié. Ancienne structure: - `"props"->"pageProps"->"initialReduxState"->"categ"->"content->"produits"` Nouvelle structure: - `"props"->"pageProps"->"produits"`	2026-03-27 21:47:06 +01:00
Loïc GUEZO	888defb6b6	retravail: changement documentation et suppression version anglaise	2026-03-09 19:10:06 +01:00
Loïc GUEZO	734e3898e9	ajout: commencement écriture README.md	2026-03-09 14:35:57 +01:00
Loïc GUEZO	4bb3112dd0	ajout: creation de fichier out.csv main et ajout README	2026-03-09 14:16:05 +01:00
Loïc GUEZO	54e4b7860b	ajout: 3e jalon pdf	2026-03-06 22:54:10 +01:00
Loïc GUEZO	b865a59aba	ajout: cleaning.md et modification scraper.md	2026-03-06 21:56:51 +01:00
Loïc GUEZO	fde1f36148	Enhance workflow with Python setup and docs build Added Python setup and documentation build steps to workflow.	2026-03-06 21:41:34 +01:00
Loïc GUEZO	6fbb36ea37	Merge branch 'jalon2-loic' of https://github.com/guezoloic/millesima_projetS6	2026-03-06 21:36:48 +01:00
Loïc GUEZO	bcacd7a915	Merge pull request #11 from guezoloic/jalon2-loic Jalon2 loic	2026-03-06 21:09:52 +01:00
Loïc GUEZO	d182e08f9b	Merge pull request #10 from guezoloic/jalon2-loic Jalon2 loic	2026-03-04 12:52:47 +01:00