gpt4 book ai didi

c# - 什么算法将集合 A 中的每个元素映射到集合 B 中的最佳匹配?

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:10:59 26 4
gpt4 key购买 nike

我有两组字符串 A 和 B。

我想做的事情看起来类似于搜索引擎为关键字搜索构建索引可能必须做的事情,但应用程序将实体从一个数据集映射到另一个数据集,其中键不同但相似。

更新:由于我下面的示例数据似乎无法解决问题,所以我在最后粘贴了实际数据,但不确定它是否太长而无法提供帮助。

    A            B    -------      --------    Foo          Foo    Bar          Bar - US    Bat          bat    Bing         Bingo    Zep          Zee                 zepplin                 Bars

I'd like to go through each item of A and match it to an item of B.

Result:

Foo -> Foo
Bar -> Bar - US
Bat -> bat
Bing -> bing
Zep -> zepplin

我想知道是否已经有现成的方法来完成这个。我记得有一次在从段落中提取摘要句子的上下文中读到有关 Baysean 的东西或其他东西,但我不知道这是否可以应用。

我认为它必须有定义一些启发式的输入,但这似乎使它变得相当复杂。

真实数据样本

declare @A table (Name varchar(500))
declare @B table (FullName varchar(500))
insert into @A values ('AccuQuote')
insert into @A values ('Adchemy')
insert into @A values ('Affiliate Marketing Solutions')
insert into @A values ('Affinitas GmbH')
insert into @A values ('Alliance Health Networks')
insert into @A values ('Allied Van Lines')
insert into @A values ('Ascentive')
insert into @A values ('Astroway')
insert into @A values ('Astroway EUR')
insert into @A values ('Astroway UK')
insert into @A values ('B2E Marketing ')
insert into @A values ('Babylon')
insert into @A values ('Be2')
insert into @A values ('BeClose')
insert into @A values ('Bid Cactus')
insert into @A values ('Bidz.com')
insert into @A values ('BigPoint')
insert into @A values ('Bloomspot')
insert into @A values ('Borderless')
insert into @A values ('Brands 4 Friends')
insert into @A values ('Build My Move')
insert into @A values ('Buywithme')
insert into @A values ('Carchex')
insert into @A values ('Career Education Corporation')
insert into @A values ('Chilay Leads')
insert into @A values ('ClubeFashion')
insert into @A values ('Cole Haan')
insert into @A values ('Digital Performance')
insert into @A values ('Digital Target')
insert into @A values ('dLife')
insert into @A values ('EliteMate')
insert into @A values ('Elogia')
insert into @A values ('Encore')
insert into @A values ('Eskupina/Cdate')
insert into @A values ('Experian')
insert into @A values ('Fandango')
insert into @A values ('Funstage')
insert into @A values ('Game Tap')
insert into @A values ('GameDuell GmbH')
insert into @A values ('Gaylord Security')
insert into @A values ('Geico (precise Auto Quote)')
insert into @A values ('Global Test Market / GMI Euro')
insert into @A values ('Gold Star Events')
insert into @A values ('Guthy-Renker LLC')
insert into @A values ('HealthPlanOne')
insert into @A values ('Hifficiency')
insert into @A values ('HLG Solutions')
insert into @A values ('HotChalk')
insert into @A values ('HP AU/NZ')
insert into @A values ('HP UK')
insert into @A values ('IMVU')
insert into @A values ('InnoGames')
insert into @A values ('InsWeb Corporation')
insert into @A values ('Internet Brands')
insert into @A values ('Internet Order/Pimsleur')
insert into @A values ('JAG Method')
insert into @A values ('Kid Robot')
insert into @A values ('LexisNexis')
insert into @A values ('Lieferheld GmbH')
insert into @A values ('Life Line Screening')
insert into @A values ('Lovefilm')
insert into @A values ('LoveFilm GBP')
insert into @A values ('Marathon Data Systems')
insert into @A values ('Maximiles')
insert into @A values ('Medizine')
insert into @A values ('Meetic')
insert into @A values ('Mercury Media')
insert into @A values ('Merkle')
insert into @A values ('Mighty Net')
insert into @A values ('MyCityDeal EUR')
insert into @A values ('MyCityDeal GBP')
insert into @A values ('NARS')
insert into @A values ('New Peak Media')
insert into @A values ('Next Level Entertainment')
insert into @A values ('NPD Group')
insert into @A values ('Nutrasource')
insert into @A values ('Offer Shot')
insert into @A values ('OneTechnologies')
insert into @A values ('Pipeline Success')
insert into @A values ('Quinstreet')
insert into @A values ('Quinstreet / Surehits')
insert into @A values ('Quoteshound')
insert into @A values ('Radley & Co')
insert into @A values ('Red Ventures')
insert into @A values ('RentTheRunway')
insert into @A values ('Research Now')
insert into @A values ('Saban')
insert into @A values ('Savingstar')
insert into @A values ('Scholastic')
insert into @A values ('Scorebig')
insert into @A values ('SD&P')
insert into @A values ('ServiceMaster Brands')
insert into @A values ('Shermans Travels')
insert into @A values ('Shoebuy.com/Bagsbuy.com/FloraFlora')
insert into @A values ('Simplyink- Private')
insert into @A values ('Source Interlink Media - Automotive.com')
insert into @A values ('Spark Networks')
insert into @A values ('Terra Matrix')
insert into @A values ('The LASIK Vision Institute, LLC')
insert into @A values ('The Scooter Store')
insert into @A values ('Tickets Now')
insert into @A values ('Totsy.com')
insert into @A values ('Trafford Consulting')
insert into @A values ('Tranzact Media')
insert into @A values ('Tree.com')
insert into @A values ('Unirush')
insert into @A values ('United Sample')
insert into @A values ('Universal McCann')
insert into @A values ('Vinyl Interactive')
insert into @A values ('Vistaprint')
insert into @A values ('Vistaprint US')
insert into @A values ('Zamano')
insert into @A values ('Aaron A. the Advertiser')
insert into @A values ('Age of Learning ')
insert into @A values ('BrainyBaby')
insert into @A values ('Chrome Bags')
insert into @A values ('Datamark')
insert into @A values ('default')
insert into @A values ('Dish System')
insert into @A values ('Eminata')
insert into @A values ('Emma Stine')
insert into @A values ('Everyday Health')
insert into @A values ('Gate 1 Travel')
insert into @A values ('Hebrew Senior Life')
insert into @A values ('Itt Tech ')
insert into @A values ('Jan pro of Austin ')
insert into @A values ('Jan pro of Sacramento ')
insert into @A values ('KGB')
insert into @A values ('KupiKupon')
insert into @A values ('Lotto Elite')
insert into @A values ('Optical Express')
insert into @A values ('Personalization Mall')
insert into @A values ('PrintPal')
insert into @A values ('Prodege LLC')
insert into @A values ('Sixt')
insert into @A values ('StayFriends')
insert into @A values ('Urban Rivals')
insert into @A values ('Wpromote')
insert into @A values ('Besser Betreut')
insert into @A values ('ConnectionEngine')
insert into @A values ('CouponCoupon')
insert into @A values ('Coupons.com')
insert into @A values ('Everything Legal')
insert into @A values ('Gamigo')
insert into @A values ('Legacy Learning')
insert into @A values ('NFIB')
insert into @A values ('Noatel')
insert into @A values ('Termbusters')
insert into @A values ('Tioga Downs')
insert into @A values ('Alice.com')
insert into @A values ('BeRuby')
insert into @A values ('Betreut')
insert into @A values ('BidRivals')
insert into @A values ('Eye Buy Now')
insert into @A values ('Globe Life')
insert into @A values ('JP Austin')
insert into @A values ('JP Sacramento')
insert into @A values ('Lumos Labs')
insert into @A values ('Marketing Craze')
insert into @A values ('Pinney Insurance')
insert into @A values ('Simple Tuition')
insert into @A values ('T33ZE')
insert into @A values ('Tax Defense')
insert into @A values ('Ultradiamond')
insert into @A values ('UltraDiamonds')
insert into @A values ('World Commerce')
insert into @A values ('Emma Stein')
insert into @A values ('EseMarketing')
insert into @A values ('Heritage Resorts and Hotels')
insert into @A values ('Jan Pro Austin ')
insert into @A values ('Jan Pro Sacramento')
insert into @A values ('Jelly Belly')
insert into @A values ('JRM Management')
insert into @A values ('Lead Click Media')
insert into @A values ('Lumosity')
insert into @A values ('Military.com')
insert into @A values ('MobiKlix')
insert into @A values ('Monster ')
insert into @A values ('Monster Worldwide')
insert into @A values ('Nielsen')
insert into @A values ('Progrexion')
insert into @A values ('Studs-up')
insert into @A values ('Webjuice')
insert into @A values ('YouGov')
insert into @A values ('Betterment')
insert into @A values ('Credit Sesame')
insert into @A values ('Cupid PLC')
insert into @A values ('DebtManagers')
insert into @A values ('Education Dynamics')
insert into @A values ('Envision/Accurix')
insert into @A values ('Fortune Builders')
insert into @A values ('Fosina Marketing')
insert into @A values ('Fubar')
insert into @A values ('InterCall')
insert into @A values ('MayYeung')
insert into @A values ('OHP Direct')
insert into @A values ('SCB Media')
insert into @A values ('SmartQuote')
insert into @A values ('Western Wats')
insert into @A values ('Yves Rocher')
insert into @A values ('Anyhouse Exterminators')
insert into @A values ('Assicurazione')
insert into @A values ('Bigdeal.com')
insert into @A values ('Credit.com')
insert into @A values ('Cross Digital UK')
insert into @A values ('Direct Partners')
insert into @A values ('Flightline UK')
insert into @A values ('Lifescript')
insert into @A values ('LightSpeed')
insert into @A values ('Little Star Media ')
insert into @A values ('Match.com')
insert into @A values ('NAPW')
insert into @A values ('Planet49')
insert into @A values ('T33ZE/Specs Optics/')
insert into @A values ('Target Direct')
insert into @A values ('Any House Exterminating services')
insert into @A values ('Bidooka')
insert into @A values ('Christophe Danhier')
insert into @A values ('Direct Agents Creative ')
insert into @A values ('eCircle')
insert into @A values ('Ecombuffet')
insert into @A values ('Elite Clicks Media ')
insert into @A values ('Hebrew Seniorlife')
insert into @A values ('InternetOne')
insert into @A values ('Jan-Pro of Sacramento')
insert into @A values ('LEC Connect')
insert into @A values ('NewStream')
insert into @A values ('Platnium Y & E/ EZ Carpet')
insert into @A values ('Scorelluxe')
insert into @A values ('Sir Alistair Rai')
insert into @A values ('SmartDate EUR')
insert into @A values ('Afaze')
insert into @A values ('Defender Direct')
insert into @A values ('eCGlobal')
insert into @A values ('Entertainment Shopping')
insert into @A values ('Gold Clerk')
insert into @A values ('HP DE')
insert into @A values ('Insurance Agents')
insert into @A values ('Insurance-ITSOL')
insert into @A values ('Kelly Brady')
insert into @A values ('Midasplayer')
insert into @A values ('Prime Gaming')
insert into @A values ('Sign-post')
insert into @A values ('Singlesnet')
insert into @A values ('SmartDate USD')
insert into @A values ('Zoosk')
insert into @A values ('2Tor')
insert into @A values ('Adaptive')
insert into @A values ('Art.com')
insert into @A values ('Direct Brands')
insert into @A values ('EZ Carpet')
insert into @A values ('First Impression Interactive')
insert into @A values ('Funspire')
insert into @A values ('GMI')
insert into @A values ('Jan Pro Raleigh')
insert into @A values ('Mindspark')
insert into @A values ('PAMLI Capital Management')
insert into @A values ('Reliaquote')
insert into @A values ('Runge Moving')
insert into @A values ('Scholastic - Creative ')
insert into @A values ('Web2Carz')
insert into @B values ('Affinitas GmbH')
insert into @B values ('Assicurazione.it S.r.l.')
insert into @B values ('Astroway Ltd - Unit 2605')
insert into @B values ('Astroway Ltd - Unit 2605 UK')
insert into @B values ('Astrum Online Entertainment/ Mail. RU')
insert into @B values ('be2 GmbH')
insert into @B values ('BeRuby/ Maruby Internet')
insert into @B values ('Betreut.de')
insert into @B values ('Brands 4 Friends')
insert into @B values ('Clube Fashion')
insert into @B values ('Complaint Handling Services Limited')
insert into @B values ('Cross Digital/INTERACTIVE AVENUE')
insert into @B values ('Digital North - DNA')
insert into @B values ('Digital Performance')
insert into @B values ('Direct Agents - EURO')
insert into @B values ('Direct Agents - GBP')
insert into @B values ('Direct Agents, Inc. - US Transfer')
insert into @B values ('eCircle GmbH')
insert into @B values ('Ecircle Ltd. UK')
insert into @B values ('eProspects')
insert into @B values ('Eskupina/Cdate')
insert into @B values ('Everything Legal')
insert into @B values ('Flightline.co.uk')
insert into @B values ('Frogster Online Gaming GmbH')
insert into @B values ('FunStage')
insert into @B values ('GameDuell - Especial')
insert into @B values ('GameDuell GmbH')
insert into @B values ('Gamigo AG')
insert into @B values ('Global Test Market / GMI')
insert into @B values ('Greentube I.E.S. GmbH/Funstage')
insert into @B values ('Groupon-Especial')
insert into @B values ('Groupon Gmbh')
insert into @B values ('Groupon Gmbh:AE - Groupon FZ-LLC')
insert into @B values ('Groupon Gmbh:AT - Groupon AT GmbH')
insert into @B values ('Groupon Gmbh:AU - Stardeal Pty Ltd')
insert into @B values ('Groupon Gmbh:BE - Groupon S.P.R.L')
insert into @B values ('Groupon Gmbh:BR - Groupon Servi‡os Digitais Ltda.')
insert into @B values ('Groupon Gmbh:CH - Groupon CH GmbH')
insert into @B values ('Groupon Gmbh:ES - Groupon Spain SL')
insert into @B values ('Groupon Gmbh:FI - CityDeal Oy')
insert into @B values ('Groupon Gmbh:FR - Groupon France SAS')
insert into @B values ('Groupon Gmbh:IE - Groupon-CityDeal Ireland Ltd.')
insert into @B values ('Groupon Gmbh:IL - Grouper Social Shopping Ltd.')
insert into @B values ('Groupon Gmbh:IN - Friday Media (P) Ltd.')
insert into @B values ('Groupon Gmbh:IT - Groupon S.r.l.')
insert into @B values ('Groupon Gmbh:NL - Groupon B.V.')
insert into @B values ('Groupon Gmbh:NO - CityDeal AS')
insert into @B values ('Groupon Gmbh:NZ - Groupon New Zealand Ltd')
insert into @B values ('Groupon Gmbh:PH - Beeconomic Philippines Inc.')
insert into @B values ('Groupon Gmbh:PL - Groupon Sp. z o.o.')
insert into @B values ('Groupon Gmbh:RO - Groupon Internet SRL')
insert into @B values ('Groupon Gmbh:SE - MyCityDeal AB')
insert into @B values ('Groupon Gmbh:SG - Beeconomic Singapore Pte. Ltd.')
insert into @B values ('Groupon Gmbh:TR - Groupon Bilisim Pazarlama Hizmetleri')
insert into @B values ('Groupon Gmbh:ZA - Twangoo South Africa Pty (LTD)')
insert into @B values ('Grumbl Media')
insert into @B values ('Heritage Resorts and Hotels')
insert into @B values ('Hifficiency/AdRoi')
insert into @B values ('HP DE')
insert into @B values ('HP Enterprise Services UK Ltd')
insert into @B values ('HP Enterprise Services UK Ltd:HP AU')
insert into @B values ('InnoGames')
insert into @B values ('Just a Game GmbH')
insert into @B values ('KGB UK')
insert into @B values ('King.com')
insert into @B values ('Lieferheld')
insert into @B values ('LIGHTSPEED RESEARCH')
insert into @B values ('Little Star Media')
insert into @B values ('LOVEFiLM Deutschland GmbH 1')
insert into @B values ('LOVEFiLM International Limited')
insert into @B values ('Marketing Craze')
insert into @B values ('Maximiles UK')
insert into @B values ('Meetic')
insert into @B values ('MobiKlix Ltd')
insert into @B values ('MoneyNet/Sterling Business Consultants')
insert into @B values ('MyCityDeal')
insert into @B values ('MyTheresa.com')
insert into @B values ('NeoPoint Technologies')
insert into @B values ('Next Idea GMBH')
insert into @B values ('Nivoria Online Marketing Agency')
insert into @B values ('Noatel')
insert into @B values ('Optical Express')
insert into @B values ('Optical Express - DE')
insert into @B values ('Psychonomics')
insert into @B values ('SCBmedia')
insert into @B values ('SD&P Online Media Group')
insert into @B values ('Shoebuy UK')
insert into @B values ('Skillstar.com')
insert into @B values ('Smartdate')
insert into @B values ('Stay Friends')
insert into @B values ('Survey Sampling International-GBP payment')
insert into @B values ('Survey Sampling International / SSI')
insert into @B values ('Terra Matrix Media')
insert into @B values ('Twistbox Entertainment/AMV Holding Ltd.')
insert into @B values ('Urban Rivals')
insert into @B values ('Virtual World Direct Limited.')
insert into @B values ('Vistaprint-Germany')
insert into @B values ('VISTAPRINT ESPA¥A S.L')
insert into @B values ('Zamano')
insert into @B values ('ZED Germany GmbH')

最佳答案

看我的回答here .

在尝试找到可以做到的事情之前,您需要先定义自己想做什么。

什么算作“匹配”?

  • 如果它只是一个不区分大小写的 CONTAINS 那么它是微不足道的。
  • 如果您需要考虑/排除某些标点符号,则只需将其从每个数据集中删除以进行比较
  • 匹配缩写?例如,您可能会进行将 inc 映射到 incorporated 的查找(您似乎正在使用公司名称)
  • 拼写错误 - 查看 Levenstein 和其他 Edit distance algorithms
  • 语音学/“听起来像”- 研究 SoundEx 和其他 Phonetic algorithms

你更喜欢假阳性还是假阴性?您需要有多准确 - 这是第一次通过过滤器,之后可以手动完成少量操作,还是需要每次都正确的自动化过程?

它可以像您希望的那样复杂。

关于c# - 什么算法将集合 A 中的每个元素映射到集合 B 中的最佳匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8073658/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com