Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Filter by
Sorted by
Tagged with
2 votes
0 answers
89 views

How to correctly calculate distance and similarity for each step in hierarchical clustering (Ward.D2)?

I am grouping my data using the ward.D2 hierarchical clustering method in R. I need to calculate the distance and similarity for each step, from 2 to 20 clusters. Similarity is calculated using the ...
pnlp's user avatar
  • 35
1 vote
1 answer
146 views

Neo4j vector similarity function

I'm trying to understand the difference between the vector.similarity.cosine Cypher function and the gds.similarity.cosine function in Neo4j. According to the Neo4j documentation, both are used to ...
Gal Shubeli's user avatar
1 vote
1 answer
178 views

Rapidfuzz giving no matches but Fuzzywuzzy does

I have been developing a matching system which matches the rows of the client and our central database depending on similarity. I have used a hybrid approach where I needed to somehow map the Company, ...
Prabhjit Singh's user avatar
1 vote
0 answers
148 views

How to compute text–image similarity under local inference with generative vision-language models (e.g. Qwen2.5-VL, Gemma 3)?

I’ve been working with Qwen2.5-VL and Gemma3 locally, and I need to measure the similarity between text and image embeddings—similar to CLIP/SigLIP—but I’m resource-limited and can’t spin up ...
H.H's user avatar
  • 11
1 vote
3 answers
91 views

In sql, group by using similar group_name

How can I perform a GROUP BY in SQL when the group_name values are similar but not exactly the same? In my dataset, the group_name values may differ slightly (e.g., "Apple Inc.", "...
Ahamad's user avatar
  • 1
0 votes
0 answers
86 views

Use terra to calculate relative similarity of raster values between areas inside and outside of a group of polygons in R

This question builds on a helpful solution provided for calculating uniqueness across a SpatRaster -- Using terra in R to calculate & map similarity and uniqueness across cells of a large GDM-...
Sean Basquill's user avatar
0 votes
1 answer
184 views

Using terra in R to calculate & map (dis)similarity and uniqueness across cells of a large GDM-based raster

I am looking to employ terra to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R with the raster package. The workflow involves spatial analyses of ...
Sean Basquill's user avatar
0 votes
0 answers
36 views

Near Similarity and duplication detection

I have a ticketing system where people create ticket for their issue. When someone is trying to create a new ticket I have to search my elastic search to identify whether a similar ticket is already ...
Ramji's user avatar
  • 75
0 votes
1 answer
67 views

how to compare lists with their frequency

I'm interested in finding similarities between two lists. I have the count of duplicates in the first column, and the pattern is in the second column. What would be the most logical way to compare ...
rollTHERoad's user avatar
0 votes
0 answers
97 views

wrong similarity search result on chroma db

I am appending each row in csv file into chromadb with format, such as below; #Acme1 prod #Acme1 line, AC1 #Acme2 prod #Acme2 line, AC2 embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-...
Orkun Gedik's user avatar
0 votes
1 answer
69 views

catelog sentences into 5 words that represent them

I have dataframe with 1000 text rows. df['text'] I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1) every score will be in df["word1&...
rafine's user avatar
  • 469
1 vote
1 answer
56 views

Visualize species occurrence similarities of 5x5 grid-cells and soil samples in R

I have two data frames: ref_df containing for each row information about species and the latitude and longitude they was recorded and sample_df with sample names as rows and species names as columns,...
Elie Tièche's user avatar
0 votes
1 answer
72 views

similarity from word to sentence after doing words Embedding

I have dataframe with 1000 text rows. I did word2vec . Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king". I ...
rafine's user avatar
  • 469
3 votes
1 answer
61 views

How can one obtain the "correct" embedding layer in BERT?

I want to utilize BERT to assess the similarity between two pieces of text: from transformers import AutoTokenizer, AutoModel import torch import torch.nn.functional as F import numpy as np tokenizer ...
Beitian Ma's user avatar
1 vote
0 answers
31 views

Ideal method to do image similarity comparison between two binary images - Tactile Maps

I have a task for image similarity comparisons between two binary images of tactile maps (maps for the visually impaired to map out an area). The goal is to have an output score of how similar the two ...
Michael Liang's user avatar
0 votes
0 answers
27 views

compare two blob colomn values to find percentage of similarity

how to compare two blob colomns on the same table ? For example: PIC1 contains a picture of a bridge taken from 5 meter distance, whereas PIC2 contains a picture of the same bridge taken from 10 meter ...
padjee's user avatar
  • 265
1 vote
1 answer
49 views

Fetch rows from PostgreSQL with rearranged words similar to a given string

I want to retrieve all rows from a PostgreSQL database that contain sentences similar to a provided string. The sentences in the database can have their words in any order (rearranged). How can I ...
Shivam Baldha's user avatar
1 vote
0 answers
36 views

Directed Graph Edit Distance Computation Issue Using AStar Algorithm in Graph-Matching-Toolkit

I am comparing pairs of identical directed graphs represented in GXL format using the Graph-Matching-Toolkit(https://github.com/dzambon/graph-matching-toolkit). Since the graphs are identical, I ...
NoName Su's user avatar
0 votes
1 answer
35 views

error in recontructuion function of SA attack algorithm for CB systems

I try to execute the attached code in the following link: https://github.com/biometricsecurity/Preimage-attack-on-BTP-template The code aims to measure the similarity attacks at CB systems. I have ...
GeGe's user avatar
  • 1
1 vote
1 answer
41 views

How to create a list of comparisons with an agentset?

The agents (called farms) in my model have peers (an agentset of farms they have links to). Each farm has an attribute called rotation, which is a list of values between 0 and 1 (same length for all ...
Bartosz Bartkowski's user avatar
0 votes
1 answer
121 views

Implementation of Angular Metric for Shape Similarity (AMSS) with Python

Is there any existing optimized Python implementation of Angular Metric for Shape Similarity (AMSS)? Otherwise, could I approximate it by considering the derivative DTW and using cosine similarity ...
hfaila's user avatar
  • 1
1 vote
0 answers
47 views

Mismatch between Milvus Id and Filename

I am trying to build a small image search program and using Milvus as a database to store my embeddings, on trying to retrieve the result by matching the vector embeddings with the embeddings obtained ...
lazy panda's user avatar
0 votes
1 answer
480 views

Why are my dimensions different when using OpenAi embeddings in Python?

I have a single Python function that I am using the embed JSON objects are different lengths. The issue I am having is that, somehow, the dimensions are different when comparing the vectors and I ...
Ken Tola's user avatar
1 vote
0 answers
56 views

How can I store boolean array of size 3000 efficiently in milvus?

I have a boolean array which represents store-availability of retail products across 3000 different stores. so, my schema looks like below: product_id = FieldSchema( name="product_id", ...
Mohamed Niyaz's user avatar
0 votes
0 answers
232 views

How can I perform accurate vector search for complex objects?

I have objects that have many attributes, example: item = { 'id': 123, 'name': 'Keyboard', 'price': 12, 'url': 'example.com', 'description': 'a keyboard that etc...', 'details': { 'color': '...
AbdulmohsenA's user avatar
1 vote
0 answers
85 views

How can one output n vectors with unique metadata in a query with ChromaDB?

Following on the example here, one way to create a query of the collection from ChromaDB with filtering by a given type of metadata (i.e. "source_type") is results = collection.query( ...
user18959's user avatar
0 votes
1 answer
191 views

Plot upper triangle correlation matrix with similarity scores using ggplot

I have a dataframe as given below: The table only has values from the upper triangle of a matrix. I want to plot a correlation plot (correlogram) where the colours show the correlation and size ...
Kamalika Ray's user avatar
0 votes
0 answers
118 views

Multi-attribute similarity search across millions or records based on criteria

Problem description: I am trying to perform an efficient multi-attribute similarity search across millions of records in a database. However, my process requires an hierarchical order of criteria for ...
AK2001's user avatar
  • 1
1 vote
0 answers
79 views

Is it possible to compare multiple line graphs to give a sort of ' similarity rating'

So I am trying to measure data from a smartphone ambient light sensor (ALS). My goal is to be able to be able to look at the data and be able to infer the location of the device. To do this my plan is ...
nosilak0's user avatar
1 vote
2 answers
145 views

similarity between two numpy arrays based on shape but not distance

import matplotlib.pyplot as plt import numpy as np from numpy.linalg import norm def cosine_similarity(arr1:np.ndarray, arr2:np.ndarray)->float: dot_product = np.dot(arr1, arr2) magnitude =...
Prashant's user avatar
  • 947
1 vote
2 answers
136 views

How to detect if two sentences are simmilar, not in meaning, but in syllables/words?

Here are some examples of the types of sentences that need to be considered "similar" there was a most extraordinary noise going on shrinking rapidly she soon made out there was a most ...
BLOCKCRAFT 2.0's user avatar
1 vote
0 answers
644 views

Langchain FAISS | Any solutions or alternatives for similarity search on vector DBs for slightly repetitive short words with numerics?

So basically I am trying to search a cell line vector data base that has entries that look like this using langchain: ID: 253F1 AC: CVCL_B513 SY: NA OX: NCBI_TaxID=9606; ! Homo sapiens (Human) CA: ...
Nicholas Piccaro's user avatar
-2 votes
1 answer
46 views

I have plots of points that I extract from an image. How can I determine a similarity measure between two different plots? [closed]

Each point has an x, y, and size. For example these should result in similar: Plot 1-A: Plot 1-B: And these should not result in similar: Plot 2-A: Plot 3-A: Are there any algorithms or ways to ...
vnnsnnd's user avatar
0 votes
2 answers
186 views

Shared triples between two knowledge graphs

I want to compare two semantic Knowledge Graphs, to see if they have any triple in common, using cypher. MATCH (n1)-[r1]-(c1) MATCH (n2)-[r2]-(c2) WHERE r1.filePath = "../data/graph1.json" ...
biowhat's user avatar
  • 17
0 votes
1 answer
75 views

record matching/similarity calculation for numbers and characters

I have a dataframe structured the following way with much more rows and columns: Report_ID Block_ID Number Character 1 1 5 A 2 1 3 A 3 1 2 B 4 2 10 A 5 2 11 B 6 2 100 C 7 3 2 D 8 3 #NA A 9 3 8 D 10 3 ...
Marius's user avatar
  • 3
1 vote
2 answers
97 views

VBA collect consecutive similar cells in the row

I have a list of non conformities appeared in different time with different products. I need to find out similar problems. I already made sorting Now I need to get new sheet with similar rows with ...
Andrei Samoilov's user avatar
2 votes
1 answer
355 views

Textual similarity between two tags in Nodejs

I want to rate the similarity between two tags. For example the words technology, computer and chip should have high similarity, a word like food should be low similarity. Given the recent ...
Sir hennihau's user avatar
  • 1,874
2 votes
0 answers
38 views

Get similarity within a column based on another column

I have a table with three columns: Source, Target, Similarity. The first two are strings, the last one is a float. This table came about by comparing source elements and target elements and finding ...
Mitsarien's user avatar
0 votes
1 answer
2k views

SSIM (Structural similarity index measure) performance

I have a reference image A and 2 target images B and C , I tried to measure the SSIM as follows : (from a human vision perception A & B are from the same class) and A & C from different ...
Ziri's user avatar
  • 736
1 vote
0 answers
128 views

Similarity search in a python database using rdkit

How to run a similarity search in a database and the output should be a table with molecules which passed a specific treshold? I tried this query = sql.SQL(""" SELECT *, ...
Vincent_chem's user avatar
0 votes
0 answers
72 views

How to quickly query similar text through postgresql?

I need to query the top few texts that are most similar based on the input content. The table structure is as follows: create table documents ( id bigserial primary ...
accbear's user avatar
  • 23
0 votes
1 answer
79 views

Trying to skip columns in loop if requirement isn't satisfied

I have this python code trying to find similarities between brands based on the types of products they sell and at what price point. One issue I'm running into is price, sales, and number of products ...
krizik's user avatar
  • 1
0 votes
2 answers
107 views

Creating a similarity matrix with jagged arrays

i have a dataframe as such. id action enc Cell 1 run,swim,walk 1,2,3 Cell 2 swim,climb,surf,gym 2,4,5,6 Cell 3 jog,run] 7,1 This table goes on for roughly 30k rows. After gathering all these actions, ...
Jacob's user avatar
  • 3
1 vote
0 answers
46 views

Find proportion of two vectors/arrays which overlap

I have two vectors, for example: a = 25,26,37,36,27,33,104,44,40,49,45,48,50,55,56,59,54,57,105,64,73,76,72,67,68,71,78,82,77,79,86,84,83,85,91,92,96,97,102,101,93,98,99,100,94,95,88,87,65,66,90,89,80,...
purecobalt's user avatar
-3 votes
2 answers
3k views

Meaning behind 'thefuzz' / 'rapidfuzz' similarity metric when comparing strings

When using thefuzz in Python to calculate a simple ratio between two strings, a result of 0 means they are totally different while a result of 100 represents a 100% match. What do intermediate results ...
David Shaw's user avatar
1 vote
0 answers
44 views

Algorithm to generate top N% of permutations with most dissimilarities

Let us consider a set of data with, say, 10 values. We need to estimate its characterstics by Monte Carlo method, that is with a large number of randomly generated permutated sets. If we'd consider ...
Stan's user avatar
  • 8,778
1 vote
1 answer
96 views

Maximum jaccard similarity in igraph

From the igraph documentation: "The Jaccard similarity coefficient of two vertices is the number of common neighbors divided by the number of vertices that are neighbors of at least one of the ...
Zachary's user avatar
  • 381
0 votes
2 answers
1k views

Langchain: ChromaDB: Not able to retrive large numbers of PDF files vector database from Chroma persistence directory

My programme is chatting with PDF files in a directory. Surprisingly the code works if there 5 PDF files in directory of 1 page each. But it doesn't work when there are 1000 files of 1 page each. It ...
Rajeshwar Singh Jenwar's user avatar
2 votes
2 answers
104 views

Cosine similarity between each two rows in a dataframe

I have a data frame called text with two columns, year and text. Find the dput output below for an example: text <- structure(list(year = 2000:2007, text = c("I went to McDonald's and they ...
fsure's user avatar
  • 335
1 vote
1 answer
99 views

How to group redundant entries of a dictionary

I have a dictionary of entries like this: { 'A': { 'HUE_SAT': 1, 'GROUP_INPUT': 1, 'GROUP_OUTPUT': 1 }, 'D': { 'HUE_SAT': 1, 'GROUP_INPUT': 1, ...
Don Cheadle's user avatar

1
2 3 4 5
38